A photograph of a road and rail line Between Hapuku and Mangamaunu, South Island, New Zealand

Mainstream predictive analytics, at least in the "big data" sense, was once the reserve of business intelligence, marketing and speculative trading on the stock market but things are rapidly changing. For a lot of us, big data is about to become personal. Very personal indeed.

People analytics, also referred to as workforce analytics, corporate analytics and HR analytics, is the application of predictive analytics on large volumes of data (henceforth referred to as "big data") to create behavioural and social profiles for individual people.

Recruitment firms and some large employers have been using people analytics to qualify prospective candidates for quite some time, and some even have dedicated workforce analytics teams to monitor and analyse the activities of their staff. As with all things, the endless march of information technology is set to see this trend trickle down to all levels of employment in the future.

Nineteen Eighty-Four by George Orwell



People Analytics

In this day and age, everyone expects that employers will do some online investigation of prospective employees, however, people analytics proves to be much more sinister. In addition to profiling potential employees using external data (public, paid-for, or otherwise), some employers continually analyse and profile existing staff using external data, and via analysis of internal social media, email, surveys, and the like. So, if you’re filling out questionnaires at work, using the corporate social networking tools, crowd-sourcing and ideation tools, using email, doing surveys, wikis, what have you, there may be additional reasons beyond engaging with staff, such as analysis and profiling. And, everything you’re doing publicly online is bound to be used at some time in the future for profiling and analysis as well.

But really, that’s all just tip of the iceberg stuff. According to an article in “The Atlantic” 1, the Las Vegas casino Harrah’s tracks the smiles of the card dealers and waitstaff (its analytics team has quantified the impact of smiling on customer satisfaction) and Bloomberg logs every keystroke of every employee along with every time they enter and leave the premises according to “Business Insider Australia” 2.

The people analytics company Gild 3 use an algorithm that scours the internet for open-source code in order to profile software engineers, and perhaps more alarmingly, the algorithm purportedly evaluates the code for its simplicity, elegance, documentation, and several other factors, including the frequency with which it’s been adopted by other programmers. Stack Overflow is also mined to analyse questions and answers by individuals and the breadth and popularity of that advice. Linkedin and Twitter are mined to analyse the language individuals use, and Gild have decided that certain phrases and words used in association with one another distinguishes expert software engineers from less skilled programmers. According to Gild, it knows “these phrases and words are associated with good coding because it can correlate them with its evaluation of open-source code, and with the language and online behavior of programmers in good positions at prestigious companies”. 1. “It’s an algorithmic challenge to correctly match up data from various sources and assemble those into accurate master profiles with high degree of confidence – and that’s part of the secret sauce”. 4

There are many other people focused analytics companies out there, and it’s unknown what data they collect, how their algorithms analyse that data, and what “score” they end up giving any individual 5. Data collected about us and the profiles created from that data are not in our control. If you’re a programmer and have ever touched the internet, chances are that many of these companies have already profiled you.

Causation

In an article in “The New York Times” 6, Data Scientist, Dr. Ming said of the algorithms she is now building for Gild: “Let’s put everything in and let the data speak for itself”.

Big data produces correlates, uncovers trends and probabilities, and finds weak bonds between data. “A correlates with B” however, is not a legitimate form of argument as data can never “speak for itself” because it is strictly quantitative. Correlation is simply not enough. Without causation there is no evidence to support any correlate (in big data or otherwise), and can lead one to believe something that’s simply not true. A patently ridiculous example 7 of this has been compiled by Tyler Vigen which shows a 99.2% correlation between suicide by hanging, strangulation and suffocation in the US, and spending on science, space and technology.

People analytics does not provide causality. If there’s no critical research to determine the causal mechanism that underlies correlations in data then there’s no reason to believe that the result is anything other than mere coincidence. Determining causation from correlations and the causal nexus between them would be an almost impossible problem mathematically, so without empirical evidence showing a connection between cause and effect, a conclusion can not be drawn. Correlates can only direct us as to what questions to ask and which avenues to investigate, correlates cannot, without extensive investigation, be used in isolation to determine any “score” of any significance.

Systematic Bias

Big data, and especially people analytics suffers from the “N = All” fallacy. “N = All” simply means that the sample size is equivalent to population, and with regards to big data, it’s this fallacy that produces the assumption that sampling bias and uncertainty is reduced to zero. The most comprehensive data sets pertaining to people and their behaviour will always have incomplete information.

The interactive, social and behavioural models imposed by any platform (social, e-commerce, crowd sourced, whatever) will dictate who uses the platform, and how they use it. It stands to reason then that this will taint the data collected and introduce systematic bias. Further compounding this, is an element of modulation and mediation because almost every data capturing interaction (services, recommenders, search result ranking, etc.) modulate results because they steer us in particular directions based both upon the interaction model imposed by the service (architecture, environment, and so on), and upon commercial, social or behavioural interests of those providing the service.

An illustrative example of systematic bias as observed in the “Boston Street Bump app”, is detailed by Tim Harford in “Big data: are we making a big mistake?” 8

“As citizens of Boston download the app and drive around, their phones automatically notify City Hall of the need to repair the road surface. The City of Boston proudly proclaims that the “data provides the City with real-time information it uses to fix problems and plan long term investments.”

“Yet what Street Bump really produces, left to its own devices, is a map of potholes that systematically favours young, affluent areas where more people own smartphones. Street Bump offers us “N = All” in the sense that every bump from every enabled phone can be recorded. That is not the same thing as recording every pothole.”

Accuracy

The accuracy of the digital footprint left behind by people in their online activities is highly questionable. Personal data, in particular the “self produced” variety cannot be assumed to be objective or accurate. Sarcasm, irony and many forms of wit and commentary (e.g. repeating or quoting something you disagree with) betray the surface meaning of the words used by their creator and most often relate to some external and undefined context.

Skills and qualifications pertaining to anyone (on Linkedin for example) can not be assumed to be up-to-date or accurate, or indeed reflect the intelligence of the person in question, so too the general demeanour of people in online communities cannot be reliably used to determine their character in real life.

Privacy and Secrecy

Every day there’s an insane amount of data being collected about people as they go about their off and online lives, and as time goes on, the types of, and rate of information collected will continue to grow. Do you even know what data is collected, stored, analysed and sold on by third party companies about you?

Privacy is perhaps the most worrying aspect of big data in general, and particularly, people analytics. Government surveillance and data retention laws have been a hot topic of late in countries like Australia, Germany, Sweden and the United States, and there is an increasing chorus of concern about the capacity of government agencies to perform surveillance and compile dossiers on their citizens. Is the type of unsolicited surveillance and profiling done by private companies any less worrisome?

In the article “The Value of Privacy” 9, Bruce Schneier writes with regards to surveillance and privacy: “…they accept the premise that privacy is about hiding a wrong. It’s not. Privacy is an inherent human right, and a requirement for maintaining the human condition with dignity and respect.” Schneier concludes with “Widespread police surveillance is the very definition of a police state. And that’s why we should champion privacy even when we have nothing to hide.”

Privacy is also important for innovation which is evident in clandestine research and development projects, sometimes referred to as “skunkworks” 10. Julie Cohen, in the “Harvard Law Review” 11 states “A society that permits the unchecked ascendancy of surveillance infrastructures, which dampen and modulate behavioral variability, cannot hope to maintain a vibrant tradition of cultural and technical innovation. Efforts to repackage pervasive surveillance as innovation — under the moniker “Big Data” — are better understood as efforts to enshrine the methods and values of the modulated society at the heart of our system of knowledge production. The techniques of Big Data have important contributions to make to the scientific enterprise and to social welfare, but as engines of truth production about human subjects they deserve a long, hard second look.”

“Regimes of pervasively distributed surveillance and modulation seek to mold individual preferences and behavior in ways that reduce the serendipity and the freedom to tinker on which innovation thrives.”

In the interests of protecting our rights to privacy, legislation needs to deal with the growth in unsolicited collection and analysis of personal data by private corporations.

The Age of Analysis, Profiling and Omniscient Surveillance

With people analytics, correlations are drawn and judgements are made about our behaviour and social skills, and statistics are used to score our intelligence, quality of work, and probabilities of success or failure for particular tasks. However, in any form of real world analysis, finding correlations in data (big, disparate or otherwise) is only the starting point of research, the process thereafter needs to establish the accuracy and validity of that data and whether correlations present a causal relationship or are merely coincidental. People analytics is being sold as a turn-key solution to recruitment and human resources, and is being touted as a substitute for, rather than a supplement to, observation and analysis. What makes people analytics particularly repugnant is the vast amount of intrusion and invasion of privacy it allows corporations to engage in and its apparent acceptance as a normal business practise.

Even if we can solve the problems of causal mechanism, systematic bias, and accuracy, do we want to live in a world where every aspect of our lives is monitored and our worthiness continually calculated? Will there be serious social and psychological consequences of reducing people to numbers? Will this lead to a lack of employee diversity? Will constant surveillance stifle creativity and innovation leading to less serendipitous discoveries? How can fairness and due process be articulated when correlations from a complex and secretive algorithm is the one keeping “score”? Am I negatively affecting my “score” by publishing this article?

What message are we to take away from all this? Keep your head down, do your work, follow the group, work is life.

Although the content of this article is mostly concerned with those in the programming community, it brings into question the ethics of commoditisation and trading of data and subjective analysis on citizens, erosion of privacy and ultimately democracy itself.

Notes





Related Articles

Step Away From the Kool Aid
Your Financial History. Your Business.