This weeks readings were useful in describing the tools available in text mining and topic modeling and also some of the important considerations in their use. These tools are closely related to the keyword searches we looked at last week, but a little but more advance. Like our readings on keyword searches, so of the most valuable elements of this weeks readings were less then nuts and bolts of the tools than some of the issues in their use and the discussion of the need for a proper understanding of their implications and methodology.
As Ted Underwood points out in his article, “Theorizing Research Practices We Forgot to Theorize Twenty Years Ago,” it is important to consider searches and text mining as a “philosophical discourse” rather than just as tools. This means seriously considering how we approach their use, and understanding the implications of their use rather than just using them as a faster route to the same results we would seek with more traditional research methods. A large element of this is, as Underwood points out, that we need to understand how the algorithm behind the search engine is producing results (ie, what does it define as relevant). Without this understanding, it is very easy to conduct searches that will never produce alternate theses, or keep trying new searches until ANY thesis eventually produces enough results to be judged “supported.”
Corollary to this is a methodological approach recommended both by Underwood and Frederick Gibbs and Daniel Cohen: rather than using text mining to provide evidence for a defined thesis, it can be used instead as an open ended investigation. By the use of text mining and “distant reading,” a volume of sources that would be impossible to compare using traditional methods can be studied in a way to reveal patterns otherwise undetectable. This, in Gibbs and Cohen’s words, this method can provide “signposts toward further explanation rather than conclusive evidence.” According to Robert Nelson, the same results can be achieved from topic modeling, which allows digital historians to “detect patterns within not a sampling but the entirety of an archive.”
Finally, the articles (or more properly digital projects) of Cameron Blevins and Miki Kaufman demonstrate the ability of text mining and topic modeling, in combination with other digital tools, to provide a visual demonstration of patterns and coherence drawn from a huge amount of data that would be difficult to research or comprehend using more traditional methods.