Data Science Revealing Mysteries of the Past

I have always been fascinated by learning about history and analyzing trends throughout time. I have also always been interested in data science. Thus, when I started at the Institute for Advanced Analytics, I was drawn to the “Text Mining” class, which focused on qualitative analysis used in fields such as the humanities. In particular, I wanted to learn more about how these techniques are being applied in the field of history.

Text mining and Natural Language Processing (NLP) models are computational and statistical techniques that enable researchers to quickly and easily analyze hundreds of thousands of text documents. History is a field that requires studying thousands of documents. When put together, researchers are able to accomplish a few lifetimes’ worth of work in a few days. As a result, researchers can track trends throughout history, track bias throughout history, and uncover stories about marginalized groups.

Tracking Trends in Digital History

Using Natural Language Processing models has helped historians find trends and track sentiments throughout history. Carolyn Beans discusses the work of Jo Guldi in their publication, Historians use data science to mine the past. Guldi is a historian who has taken digitized books and used NLPs to mine thousands of documents. Utilizing NLPs has allowed historians to ask questions that would previously have taken lifetimes to answer. One of the main ways NLPs are used is to count words over time and track changes in politics or ideas. For example, if researchers were interested in how attitudes around climate change have changed, they could scan relevant documents and use NLPs to track how prevalent discussions about climate change were in the 1970s compared to the 1980s. 

While Guldi applied NLP to written records, other researchers are experimenting with less traditional sources, such as oral histories. In the study Text Mining Oral Histories in Historical Archaeology, researchers Madeline Brown and Paul Shackel use oral histories recorded from the anthracite region of Pennsylvania to see how text mining and NLPs can be applied to studying oral history. Brown and Shackel took transcripts of interviews and applied text mining and NLPs. However, one issue they encountered was that NLPs had difficulty processing slang, different vocabulary, or different dialects. Brown and Shackel suggested fine-tuning NLPs for oral histories. However, similar to Guldi’s work, Brown and Shackel were able to use NLPs to track changes in sentiment or topics from oral histories. These trends wouldn’t have been immediately apparent if Brown and Shackel had manually gone through the interviews themselves.

Finding Historical Bias

Tracking these trends can also help track bias. Jo Guldi led a project that analyzed the anti-environmentalist movement in Congress by mining a database of congressional debates. Guldi then used computational models to track what words appeared alongside “environmentalist” and tracked how those words changed every five years. She found the phrase “radical environmentalist” was said in four debates from 1970 to 1974; however, from 2005 to 2008, that phrase was said 83 times. Guldi also found that six politicians were responsible for 90% of the cases where “radical environmentalist” was said over this four-decade range. This analysis has helped Guldi pinpoint how sentiment was able to change over time and further understand how cultural shifts occur.

Uncovering Stories About Marginalized Groups

Beyond tracking biases over time, NLPs have also enabled historians to uncover stories about marginalized groups. The Smithsonian Discoverability Lab utilizes text mining to find women’s contributions to history. Researcher Margaret Rossiter used text mining on archival records and discovered that women had been contributing to scientific fields throughout history. Smithsonian’s digital strategist, Effie Kapsalis, has taken things further and employed text mining techniques to create a standardized database of women throughout history. Historians used text mining techniques to mine phone books, annual reports, and newspapers to identify women in science over time. Employing these techniques allowed researchers to better understand women’s roles in science. Employing text mining techniques has allowed historians to gain a more nuanced understanding of women throughout history and uncover their contributions.

Text mining is also being used to better understand the lives of African Americans throughout history. Mining the Dispatch is a project by Robert K. Nelson from the Digital Scholarship Lab at the University of Richmond. Mining the Dispatch is a project that explores the social and political changes of Richmond, Virginia (the capital of the Confederacy) during the Civil War. Nelson was interested in Richmond because it hadn’t been widely understood for centuries. This project mines the Richmond Daily Dispatch from November 1860 to April 1865. Nelson employed text mining techniques to explore changes and sentiments in topics around slavery, nationalism, soldiers, military, economy, news, and politics. Employing text mining techniques has allowed Nelson to analyze 112,000 articles and better understand cultural changes in Richmond, which previously couldn’t be studied.

Conclusion

The “Text Mining” class at the Institute for Advanced Analytics has helped build a solid foundation for conducting similar historical text mining projects. These historical research projects utilized sentiment analysis and topic clustering, which was a major focus of the “Text Mining” class at the Institute. During the text analytics project, my group analyzed the sentiment of Taylor Swift’s lyrics throughout her career, similar to how researchers such as Madeline Brown and Paul Sheckel have tracked sentiment in oral histories over time. Furthermore, throughout the text analytics project, we employed topic clustering to track how Taylor Swift’s lyrics evolve over time, similar to how Jo Guldi and Robert K. Nelson used topic clustering to analyze changes in trends throughout history. 

Text mining has revolutionized the field of archaeology and history. Historians are now able to study and uncover pieces of history that they never would have been able to study before. Furthermore, historians can now easily track trends and cultural shifts over time and track how those trends occurred. Text mining has helped reveal these mysteries of the past, and it should continue to be refined and implemented in the field of history going forward.

Works Cited

Beans, Carolyn. “Historians Use Data Science to Mine the Past.” Proceedings of the National Academy of Sciences, vol. 122, no. 18, 30 Apr. 2025, https://doi.org/10.1073/pnas.2508428122.

Brown, Madeline, and Paul Shackel. “Text Mining Oral Histories in Historical Archaeology.” International Journal of Historical Archaeology, vol. 17, no. 3, 13 Jan. 2023, https://doi.org/10.1007/s10761-022-00680-5.

Harmon, Elizabeth. “Discoverability Lab Offers New Look at Historical Data and Machine Learning.” Https://Womenshistory.si.edu/, Smithsonian American Women’s History Museum, 24 Feb. 2025, womenshistory.si.edu/blog/discoverability-lab-offers-new-look-historical-data-and-machine-learning. Accessed 24 Sept. 2025.

Nelson, Robert. “Mining the Dispatch.” Dsl.richmond.edu, 2020, dsl.richmond.edu/dispatch/.

Columnist: Priya Chilukuri