If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Group TMTA: An Analysis on Immigration through the Topic Modeling Tool

Page history last edited by julissa01@umail.ucsb.edu 9 years, 4 months ago

Julissa Villatoro

ENG 149

Alan Liu

12/15/14

Group TMTA: An Analysis on Immigration through the Topic Modeling Tool

The issue of immigration from Latin America proves to be a topic of interest for the humanitarian studies, and much research has been conducted in order to study as many possible angles of the issue. Many writers and scholars have conducted different methods of research, including conducting a close reading analysis of the texts written by people who have crossed those borders. However, there is little research to be found regarding making attempts to conduct a distant reading about the issue. Conducting a distant reading in general has drawn interest by scholars in the humanities. As pointed out in a study Michel Jean-Baptiste and his research team conducted (2011), they found that such a “computational analysis of [a] corpus enables us to observe cultural trends and subject them to quantitative research” (pp. 176). Thusly, the TMTA (or Topic Modeling Tool Analysis) project’s main goal was to study what kinds of perspectives and new information conducting a distant reading analysis on the issue would provide.

There were different directions to take the research, and it proved a difficult task to find a specific aspect of the issue to study. It was not long until the team learned that it would be practically an impossible-- and frankly, a rather daunting task-- to attempt to conduct a distant reading of the issue as a whole. Ultimately the team settled on studying the circumstances and other factors that lead women from Latin America to leave their home country, studying solely on nonfiction works written mostly in English. Those texts were Amigas: Letters of Friendship and Exile (an account of the narrator’s experiences in Chile), December Sky: Beyond my Undocumented Life (an account of the narrator’s experiences in El Salvador), and I, Rigoberta Menchu: an Indian Woman in Guatemala (an account of the narrator’s experiences in Guatemala). From each text, the project team selected a passage that each felt best described the social and cultural climate that may have provoked each narrator to immigrate from their respective countries.

The team then sought to decide which tools could effectively give a proper form of analysis to study these texts. After testing different tools of analysis and consultation, topic modeling proved to be the most useful program because to conduct such a study. As Graham, Weingart and Milligan explained in their tutorial blog about the program, a topic modeling tool is a family of computers that attempts to find semantic meaning out of multiple texts without the user having to necessarily read the entire text (or sets of texts) the individual wishes to analyze (2012). In other words, these computers attempt to analyze what kinds of patterns it could identify and create a sum-up of these patterns, which it calls “topics”. They can be programmed to identify several sets of these “topics”, depending on how much the user wished to analyze. The more topics inputted into the system to generate, the more specific the topics became; and the less inputted into the system, the broader. Further, the program also came with its own “stop words” setting, which blocked certain words from being categorized as its own topic, such as common pronouns and verbs. Although this tutorial proved useful trying to understand the nature of such a tool, it went on to explain how to use the MALLET program, a topic modeling tool that proved to be a much more advanced tool for first-time researchers conducting a small-scale experiment. The team then turned to the much simpler Topic Modeling Tool program.

After trying to understand how the system worked, the TMTA team followed Miriam Posner’s blog (2012) and her simpler tutorial (in comparison to Graham and his research team’s blog), and attempted to experiment with the settings in the program until the team found the tool better to work with. The team felt that the more topics generated, the more the team was able to gain more variety of perspective on the texts. The team also left the “stop words” setting in its default setting; in other words, not making any additional vocabulary in the “stop word bank”, since the researchers were not quite familiar with all the texts selected, and the team sought to run these topic analyses as similar as possible to each other. The program also generated different results despite the similar settings we ran in each trial.

Posner’s blog further explained specifically how to use the Topic Modeling Tool, and even offered a suggestion on how to interpret the sets of topics it generates. However, according to Posner, the program also proves a feat when attempting to draw conclusions and comprehensions on the resulting topics. Due to the low barrier in its own programming, the TMT program generates much less incoherent results than the MALLET program. For example, in one set of topics, the first “topic” read:

“1. talk mother group criminal hot worked washed church coming dreamed”

These topics appear to be a jumble of letters that further do not appear to have any pattern. However, Posner suggests to look at these “topics” as discourses, when attempting to interpret these topics into coherent data. The next best step would then be to find a way to effectively present the topics into a visual that would aid in such an interpretation. Posner suggests using the data generated to create a spreadsheet that will then generate a visual. However, the team found this step difficult because attempting to compute the numbers proved impossible without fully understanding what each topic was truly about. Therefore, the team decided that through word association by mere sight would suffice, which the team hypothesized would still offer meaningful interpretation. In this way, the multiple interpretations of each topic model was also being tested, challenging each member to reconsider and and to look at the discourses in each topic model at a different perspective.

The team began this categorization process by noting which words and what types of words appeared prevalent in every topic. The team then compiled these topics into four lists containing words that appeared to have a strong association with each other. This was narrowed down to military/politics, economics, violence and other sentimentals and/or emotions generated during their time there. Each topic was then distinguished by color: Political/Military (Red), Economic (Green), Violence (Orange), and Semantics(Yellow). The team ran through several trials, starting with few topics to work with, and then adding more to work with. With each new topic model generated, we repeated the same categorization process as well; identifying new topics and discourses that challenged us to approach the texts in a different way. By the end of the experiemnt, the team came to conclude that these topics further supported an overarching discourse within the three separate texts: a violent conflict that erupted and an economy in turmoil within countries in Latin America that ultimately forced these young women to flee.

The team focused efforts on attempting to understand if the topic modeling tool could identify the main motifs and themes, specifically if we can track what kinds of circumstances prompted the women in their respective testimonials to leave their country. Additionally, each team member interpreted each sets of topic model differently, which in turn offered different perspectives and created a new discourse about each text. For example, one team member had not read the book, “Rigoberta”, and simply read the topic model that the other group member generated for her to interpret on her own. She came to the conclusion that Rigoberta was an orphan from Nicaragua, and had to flee the country for more economic reasons. This gave a new perspective to the other member who was familiar with the text, because while she understood the violence that forced the narrator to flee, the economic situation was also an important issue that she needed to overcome as she battled to survive in her own native home. While it does prove difficult to come to solid conclusions about making sense of the patterns through sight, this experiment did show the team that by using topic modeling, the project team found that despite having little to no knowledge about a work, meaningful interpretations and data can still be found. For this project, the team came to the conclusion that the women in these accounts did not simply immigrate from their homes- they fled from poverty and death.

Combined with a close reading analysis, and perhaps finding a better tool that properly analyze these results better, topic modeling proves to be an interesting asset to discover different ways a work of nonfiction. After this introductory experiment into learning how to use the Topic Modeling Tool, the research team could potentially look into using the MALLET tool, and experiment with its more sophisticated settings at analyzing the texts into topics. Furthermore, drawing from more nonfiction texts about women who faced similar experiences and obstacles when migrating away from their Latin American countries would offer a more consensus sample for analysis. This project could have also benefitted if there had been more members on the team to offer more interpretations about the topic models generated. For example, one can argue that the different results being generated despite running the same text in the same settings should alert the researchers that the program’s usefulness might not be valid after all. If the program is meant to analyze the texts objectively and mathematically, why would there be different results each time? However, one must take into account that the system is making guesses about which words go together in a “topic” based on the way the word is used (Graham, 2012). Therefore, there are multiple possibilities to identify which ones actually go together as a topic within a text. In fact, what the experiment proved was that each researcher in the team had a certain bias and individual ideas regarding about immigration from Latin America, which would have been a variable better controlled whether the better software would have organized the topics in a more sophisticated manner.

Regardless, this experiment further supports the usefulness of topic modeling as a form of distant reading, especially when a research project calls for analysis of multiple texts. Attempting to interpret a broken text, certainly offered a new kind of perspective in this research experiment, and it could further be applied to other works. In conclusion, this analysis suggests the results found in this project offers a new perspective in the field of study in understanding the cognitive processes people have regarding immigration, and for the relatively new field of “culturomics”, as termed in Michel Jean-Baptiste and his research team’s study (2012). Perhaps this might further trace the cultural trends in works related to immigration, and continue the dialogue for more scholarly research on the topic that will build a better understanding of the issue.

Works Cited:

Cortez-Davis, Evelyn. December Sky: Beyond My Undocumented Life. Altadena, CA: In Xochitl in Cuicatl Productions, 2005. Print.

Graham, Shawn, Scott Weingart, and Ian Milligan. "Getting Started with Topic Modeling and MALLET." Web log post. The Programming Historian. N.p., 2 Sept. 2012.

Web. 7 Dec. 2014.

Jean-Baptiste, Michel, and et al. Quantitative Analysis of Culture Using Millions of Digitized Books 331.176 (2011): 176-82.Www.sciencemag.org. Science, 14 Jan. 2011. Web. 15 Dec.

2014.

Menchú, Rigoberta, and Elisabeth Burgos-Debray. I, Rigoberta Menchú: An Indian Woman in Guatemala. London: Verso, 1984. Print.

Posner, Miriram, and Andy Walace. "Very Basic Strategies for Interpreting Results from the Topic Modeling Tool." Miriam Posners Blog. N.p., 29 Oct. 2012. Web. 14

Dec. 2014.

Comments (0)

You don't have permission to comment on this page.

Group TMTA: An Analysis on Immigration through the Topic Modeling Tool

Group TMTA: An Analysis on Immigration through the Topic Modeling Tool

Page Tools

Insert links

Comments (0)

Join this workspace

Navigator

SideBar

Recent Activity