While data analytics is a relatively young field, popular culture has portrayed the intimate details of the science through fiction. Two such works, the television shows Person of Interest and Westworld, capture many nuances of data science and the challenges data scientists face. This post will look at some of the ways data analytics is portrayed in these shows and how they connect to the real-life responsibilities data scientists have.
WARNING: Some minor spoilers below.
Person of Interest (CBS, 2011-2016)
Subtitle: The Machine’s interface, showing a decision tree. Decision trees are one of many tools data scientists can use.
Known as the series that predicted the Snowden revelations, Person of Interest is a cyberpunk thriller centered around the “Machine,” an artificial intelligence designed after 9/11 to predict acts of terror using the NSA’s surveillance feeds. However, the Machine also predicted violent crimes irrelevant to national security. The Machine’s creator, Harold Finch, thus forms a team to prevent those crimes and save lives in New York City. Both the U.S. government and Finch receive a social security number of just one person identified in a possible terror plot or crime. Therefore, they must find out whether that person is a victim or perpetrator, the extent of the impending crime, and stop a catastrophe from striking.
Joining Datasets: Data scientists do not always get their data wrapped in one neat package. Instead, they have to find ways to combine separate datasets by identifiers. In Person of Interest, the protagonists often combine data from credit card records, license plate readers, and phone networks to understand how a violent crime might happen. While data can often be collected separately, data scientists are able to make more powerful analyses by joining datasets together.
It is not “Only” Metadata: There is a common misconception that one or a few pieces of data are not enough to identify someone or yield any insight. In reality, a sliver of information can open the floodgates to many insights. In one episode, a gas station receipt, presumably belonging to an individual who regularly refills at that pump, yields insight into his selling of uranium to Iran. In another, relationship mapping between an engineer, her CEO, two UN diplomats, and an Iraqi translator allows Finch to connect the dots to a theft of six power generators. This enabled his team to intervene without knowing the purpose of the theft or the organization masterminding it.
Connections on social media, time and length of phone calls, GPS tagging of smartphone photographs, and Internet histories are just a few of the many forms of metadata that can be transformed into new variables. Data scientists can then model on these “data created from data” to yield astonishing insight into even just one person.
Human Decisions and AI: Finch designed the Machine to output an identifier for only one person involved in a violent crime or terrorist attack, even though the Machine would certainly classify all involved parties. He intended for the NSA to then task investigation teams to better understand who the victims and perpetrators are without immediately compromising citizens’ privacy.
Data scientists also have to be aware of how models are used, and the reality that data and models are not inherently clean and unbiased. When hiring new employees or evaluating a school’s performance, that human element may be necessary. On the other hand, there may be less risk for an algorithm to be more autonomous, like with video or movie recommendations on streaming services. Data scientists have to deal with these issues regularly and must communicate effectively to others on how to handle them.
Ethics in the Information Age: Finch and antagonist hacker, Root, debate the existential impact of the Machine and the Pandora’s Box of analyzing all the bits of information people leave in their wake. As a nascent field, many data scientists are beginning to feel it is the profession’s duty to ask for consent in data collection and communicate how data is being used in the communities impacted by their work.
Westworld (HBO, 2016-)
Inside the servers that store Westworld data, represented as an enormous library called the Forge. The cost of data storage has plummeted in the last 40 years, making mass storage much easier for anyone.
Taking place in the not-too-distant future, Westworld focuses on androids built as part of the titular theme park, where human guests can live out their fantasies in an immersive simulation of the American Frontier. While primarily revolving around the park’s androids gaining sentience, the series also takes a look at how a data-driven society may look in the future.
Customer Feedback: Delos, the company that runs Westworld, collects data on guests’ interactions and behaviors in part to identify their interests and reaction to the park, allowing the construction of new attractions and narratives. Many apps and websites collect user data so analysts can better identify ways to personalize advertisements, create features, and improve the overall experience for their users.
Data Security: The data that companies own is highly valuable and often sensitive, meaning security is essential. Various factions fight for access to the Forge, the repository of all guest data. Delos faces the challenge of keeping it secure from network and physical access. Keeping data safe is not only a responsibility for data scientists, but for every employee of an organization.
Time-Series: When machine learning engineer Engerraund Serac demonstrates algorithmic predictions of the stock market to businessman Liam Dempsey, Dempsey is in awe of the seemingly magical ability to predict prices anywhere from 15 minutes to 1 day into the future. Time series analysis, or the use of data spaced throughout time to forecast trends and seasonality, is not at all some supernatural spell. Many fields like economics, neuroscience, weather, and the stock market use time series to predict future events. These models often have to update and learn as data are collected, whether every month or every millisecond.
While these are only two series in a sea of media, both offer immense depth in their portrayal of data science. While it is good practice to take fiction with a grain of salt, these works do connect to the real world and show the challenges data scientists handle in their jobs and the massive power analytics has.
Columnist: Rohan Patel