Big data analytics is the secret sauce of the American polity and economy—widely utilized but poorly understood. Organizations use various types of big data analytics to make decisions, correlations, and predictions about their constituents or stakeholders. The market for data is big and growing rapidly; it’s estimated to hit $100 billion before the end of the decade.
But the recipe for data analytics can at times contain a hidden ingredient: bias. Not surprisingly, there is evidence that reliance on big data analytical processes can lead to divisive, discriminatory, inequitable, and even dangerous outcomes—collective harms—for some of the people sorted into groups. That needs to change.
Big data analytics often requires a huge supply of anonymized personal data. The process unfolds as follows: Researchers clean and anonymize this data, and then separate this data into groupings based on attributes such as behavior, preferences, income, race, and ethnicity. Depending on the questions firms and governments want to answer, researchers may utilize sophisticated analytical techniques such as artificial intelligence or machine learning to make assertions about these groups. However, bias can seep into the process during data collection, when data is prepared, and when researchers choose what attributes to consider and/or ignore in their model. Individuals in these groups can’t change the decision-making or outcomes, unlike Harry Potter’s classmates at Hogwarts, who could challenge the Sorting Hat.
Individuals categorized by such techniques often don’t know that they have been “sorted.” For example, some clients of Microsoft 365 used the software to monitor their workers’ “productivity,” by scoring them on participation in group chats and by the number of emails sent by employees. Students may not know that many college admissions officers don’t just review essays, test scores, and recommendations, but rely on predictive analytics to decide whom to admit.
Firms are not the only entities addicted to big data analytics. U.S. policymakers often rely on one type, predictive analytics, to make decisions. In Pasco County, Fla., reporters found that the Sheriff’s Office created “a system to continuously monitor and harass” groups of individuals identified as potential criminals by a machine learning program. Meanwhile, the Internal Revenue Service attempted to identify and track potential tax dodgers by monitoring social media and analyzing cell phone location data.
Recent history is rife with examples of unanticipated negative side effects from this dependence on big data analytics for decision-making. Cambridge Analytica, a data analysis firm, built voter profiles designed to manipulate public opinion out of user data improperly obtained from Facebook. Some 87 million Facebook users were directly affected, as was trust in the political systems of several democracies. In another example, the athletic network Strava released a global heat map of user activity, inadvertently revealing the location of NATO military personnel. A study in the journal Science in 2019 found that U.S. hospitals and insurers depend on a specific algorithm to manage care for some 200 million Americans. The algorithm’s design made it less likely to refer Black people than white people who were equally sick to programs that aim to improve care for patients with complex medical needs.
We are all to blame for American dependency on these processes. Americans say they are concerned about data collection, yet freely provide their data to online services. Firms have become addicted to personal data and seek to acquire ever more. Many websites secretly track individuals as they move from one site to another online, acquiring even more data. Apps also take data unnecessary to their central function without our informed and direct consent. Since personal data can be easily reused, firms often resell it in an opaque market that subjects and sources cannot fully participate in.
Modern American privacy law encourages companies to extract as much value as possible from personal data in the short term. Instead, firms should be incentivized to protect that data and build trust among data suppliers that it won’t be misused. Congress should empower netizens to ensure that as individuals and as members of groups, their rights are protected. But the online-privacy legislation Congress has proposed since 2018 does not seriously address collective harms. The U.S. should follow the example of the European Union and ensure that members of groups can pursue firms that misuse their personal data to make decisions about access to credit or housing.
While helpful, stronger privacy laws won’t be sufficient. The Securities and Exchange Commission should also ask all publicly traded companies to disclose when and how they use data analytics to make decisions that affect their customers’ human rights, such as access to credit, education or health care. Like cyber-risk disclosures, such information can help investors better understand the company’s potential risks. Such mandated transparency could also give stakeholders greater understanding of how the platforms use our data, make the market for personal data less opaque, and incentivize firms to do more to protect personal data. In so doing, it could reduce collective harms from data analysis while also helping to build a market for privacy protected data.
The secret sauce may be popular, but it is increasingly difficult for many Americans to swallow. By giving groups of people greater transparency and the ability to challenge misuse of their data, big data analytics can remain on the menu of decision-making options.
Susan Ariel Aaronson is research professor, cross-disciplinary fellow, and director of the Digital Trade and Data Governance Hub at George Washington University.