The central tension in crowdsourcing history, apparent in almost all of this week’s readings, is the issue of access and authority. This tension is not limited to digital history, as the pre-digital practice of oral history has long noted the importance of anonymity in collecting people’s stories. While the historians always prefer to know as much as possible about the contributor of their sources in order to facilitate analysis (a interview with a labor union leader will be read and evaluated much differently than that of a factory manager or a average laborer), many contributor to both oral history or digitally collected history are less willing to participate or be fully honest if their name and identifying info is attached to their interview or contribution. If anything this is even more true for digital history, as most internet users have a higher expectation of privacy and anonymity than the participants in an oral history project. To this is added the convenience factor, the more data or effort required to submit a contribution will have a direct effect on the number of visitors willing to contribute. Digital historians, like oral historians, must therefore balance their desire for metadata with their desire for a higher level of contributions.

The question of access goes beyond this debate between data and anonymity, however, as many crowdsourced digital history projects involve the every day user not just in the contribution but also in the editing and writing of history. The most prominent example of this is Wikipedia, but it can also be found in projects such as the Transcribe Bentham project which allows all comers to help transcribe and digitize the works of Jeremey Bentham. Wikipedia especially has been contentious for historians, as it values consensus over expertise and suffers from many faults in coverage, as its content is mainly driven by what people are interested in rather than a balanced coverage of history, and bias, despite its avowal of a neutral point of view. Despite the issues raised by allowing all comers to contribute and edit articles, it is impossible to argue that it has provided a much more massive corpus of openly available work than would be possible with a more rigorous expertise focused approach (see the relative failure of Nupedia detailed by Roy Rosenzweig).

Another issue with Wikipedia, pointed out by Leslie Madsen-Banks, is the unrepresentative nature of its contributors, who are dominated by male, internet-savvy, and English-speaking individuals. Thus, even while Wikipedia’s format and process arguably allow for more varied viewpoints than traditional history, it is a potential that has been stymied by the biases of the contributing population. This reflects a second issue of access: even when completely open access is allowed, it is shaped largely by the project’s (all projects, not just Wikipedia)  ability to attract contributors. This is a critical part of any digital collection project, and requires what Roy Rosenzweig and Dan Cohen have described as “magnet content” to bring contributors to the site and convince them to participate. Perhaps the greatest magnet content is other contributions, but this can be supplemented by direct contact thru email, media coverage and marketing, or social networking. As Shelia A. Brennan and T. Mills Kelly have pointed out, this can also involve outreach to relevant groups in the “analog world” offline, and special attention should be paid to providing avenues for individuals without internet access or skills to contribute (the Hurricane Digital Media Bank used both voicemail and mail in reply cards).

The issue of access and authority will thus overlay many of the decisions made by digital historians seeking to crowdsource history, and indeed this tension is a central theme in debate over almost all aspects of the “democratizing” influence of digital history. However, as access is one of the central benefits offered by the digital humanities, limiting it in the interest of greater academic authority must be a seriously considered decision.

Building an exhibit with Omeka this week demonstrated some of the same patterns that were apparent last week working on Google Map Engine. While I already had data on my computer from a previous research project on the U.S. Army Ordnance Department’s treatment of Breech-loading arms in the Civil War, the most time consuming portion of the entire exercise was going through the hundreds of images and finding the ones that would be useful in telling a story in exhibit form.

The next most time consuming part was entering the metadata for each item, which combined several decisions about how to categorize it with the drudgery of entering it. I found myself wishing there was a way to add metadata to a group of items rather than doing it individually for each one, as I found other than title and description, most other entries remained the same for each item. Working with a set of Ordnance Department records and correspondence from the National Archives (no copyright concerns!), I decided to list “U.S. Army Ordnance Department” as the creator and use the date of each individual record for the date. I listed the record group under source to help users locate the originals if they choose, but failed to go the extra step of listing out the full data I used in my record keeping under “identifier” as it does not conform to a standardized system. For the title of each item I used how I would cite it in a scholarly work, leaving a more detailed description for the captions.

I choose gallery as the layout for my exhibit, as it seemed to make the most sense for a simple exhibit like this. I organized the items chronologically, as it seemed the most reasonable considering the purpose of the exhibit in showing change over time, and the contrast between Ripley and Dyer.

Below is the link for my Omeka exhibit on the Ordnance Department and Breech-loaders during the Civil War:

 

Ordnance Department and the Civil War

 

 

 

From this week’s readings it seems that the digitization of history has been more openly welcomed by public historians than by their colleagues in academia. While Anne Lindsay notes that many public history institutions have been driven the web out of necessity as visitors increasingly see the web as a first stop for information, it is a necessity that is less powerful in academia, and has driven a fuller engagement with the digital world by public historians and heritage tourism sites. This has had a significant impact on these heritage sites and organizations, as their expansion into the virtual digital world has opened up public history to those who, for financial or other reasons, cannot make it to the physical sites. For those visitors who are restricted by non-financial means, this has also opened up a wider population of potential donors. The creation of virtual tourism however, is not the only role for the digitization of public history, as Lindsey points out that this digital experience must be harmonized with the narrative of the physical site as well, allowing each to reinforce the other and build a connection with visitors. Additionally, and tying back to many of the articles from last weeks reading, is the advantage posed for public history by the scale and accessibility offered by a digital presence. Unlike a physical site, the web is not restricted by space and, as it is easier to revisit than a physical museum, faces much less restriction in time as well.

Another idea, less explicitly stated by Lindsay, is developed more fully by Melissa Terras and Tim Sherratt. This is the role of social media in driving access to specific parts of collections and digitized sources. While some of this is intentional, such as the increasing use of Bots to draw attention to random entries from various collections, much more is the result of individual user decisions and links. This can have somewhat of a skewing effect, as Sherratt points out that the Trove’s visitors spiked due to a link to a specific article from redditt. Even more worrisome are the users that data mine these digital collections for evidence to support an already held opinion, then sharing that data without context.  This type of visit is also fickle, as Terras points out that these spikes in interest are often short lived. Visitors navigating to these sites also rarely engage further with the content available, with Sherrat pointing out that only 3% of visitors linked in from redditt further explored the Trove’s holdings past that one article. However, as Sherratt argues, “3% of a lot is still a lot,” and for a least some people this might have opened up a greater engagement with history. A point left unmade in either of these articles is also perhaps important: surely the original user of these articles and images spent significant time engaging with the site, then sharing it with social media or redditt both expands the exposure of public history and demonstrates the interest it held for that user.

For this weeks practicum I created a map using Google Map Engine showing the campaigns of the 1st Michigan Cavalry Regiment. The main take away, for me, was the sheer level of drudgery involved in a project like this. Using as my base data the service record of the regiment provided on the National Parks Service site, I input every service entry for the regiment from January 1864 through their mustering out in 1866. Despite the user friendly interface of Google Map Engine, this was a lot of work for two reasons. First, without an importable csv. there was the pure data entry of putting in each of the events. In addition, the data was not “clean” and I had to spend a substantial amount of time trying to find locations for each of these events, as not all came up in a simple google maps search. I had some success finding stuff using google and wikipedia (wikipedia was especially useful as the entries for several, but not all, of the battles had lat/long data that could be easily posted into the map engien), but even my end result is still only a partial solution, as finding the exact location of each of these events would require weeks of research.

For my map, I created layers for each year that can be displayed together or independently. Generally speaking, I marked each battle with a flag icon, each non-battle military event (“expedition,” “reconnaissance,” “demonstration,” etc.) with a horse icon, and occupations or encampments with a house symbol. The Grand Review got its own icon of two men walking in step (intended as hikers, but it worked ok for my purpose). For this classification, I used the assumption that anything not labeled otherwise was a battle. Movements with specified start and end points are shown as lines (such as the “Movement to Fort Leavenworth”), and I included two polygons for the Sheridan’s Shenandoah Campaign and the Expedition into Loudon and Faquier Counties to show the rough area of operations. Each icon is named using the title used by the NPS, and the description containing the date of the operation. This allows viewers to toggle back and forth between which is displayed using the label dropdown menu.

 

 

Two aspects of this week’s readings that stood out as particularly illuminating for the possibilities provided by digital history were the concept of an “interactive scholarly work” and the importance of scale. The idea of an interactive scholarly work has been inherent is some of the other tools we’ve looked at, but it is especially salient in mapping projects. An interactive scholarly work is more than just a static display of visual information, but rather allows users to interact and develop their own research agenda. In some cases this can produce citable evidence, but like many other digital tools this is often best used to raise questions for further research or exploration

The projects we looked at in this week’s readings, “Visualizing Emancipation,” “ORBIS,” and “Digital Harlem,” all allow the user to interact and display various data and connections, using layered searches to display relationships that would be difficult pick out through traditional means or the spatialization provided by map images. The better interactive scholarly works also tie their digital presentation closely to rigorous scholarly research. ORBIS provides all of its background data and sources, making it “not just a site, but also an online scholarly presentation” according to Scott Dunn. Digital Harlem supplements its interactive map with Blog post that explore various connections and ideas that the map reveals. This, combined with several longer articles published in connection to the program, allows Digital Harlem to “bridge the gap between digital and more traditional research” according to Nicholas Grant.

Another thread running through this week’s readings is the ability of these digital mapping projects to convey scale in a way impossible to do in print media. Edward Ayers and Scott Nesbit discuss this in connection with the concept of “deep contingency” where different aspects of social life interacted in unpredictable ways across the various different scales (local, regional, national, military, etc.) to effect individual actions and decisions.

Digital Harlem also deals closely with the effects of scale, as by mapping black life (and white presence) using Real Estate maps at a scale well beyond that typically described in text reveals and changes the way Harlem looks. Working at this smaller scale and including ALL evidence available provides a deeper and different picture. This picture is inherently digital, as it occurs at a scale that would be impossible to convey in print, and which can only be fully explored interactively with the ability to zoom in and out.

Interactivity and scale, therefore, are essential to the essence of digital mapping projects. The data available, both in amount and complexity, make it impossible to display them statically. Their full potential can only be unlocked through the digital medium and through an interactive user interface. However, tying this digitized and democratized history back to its scholarly background is key to both establish the credibility of these tools and their use for further research. The best digital mapping projects, including those we looked at this week, therefore, represent both interactive websites and online scholarly publications.

 

Working with several of the open source visualization tools available allowed me to see some of the possibilities visualization provides for mapping data and exposing connections. The data we worked with, units and battles of the Civil War, was relatively simple but even with this small sample set and uncomplicated relationships the visualizations help reveal connections faster than studying a table of raw data.

However, one thing I realized even from this limited project is that that the compilation and organization of that raw data is the biggest part of any visualization project. Both Palladio and RAW provide pretty user friendly interfaces for uploading data, but that data has to be properly formatted for the program (and most visualization tools have different formatting requirements). Besides the formatting, which can become fairly obnoxious in and of itself (Gelphi especially requires some pretty extensive work to get data properly input), a digital historian first has to compile the data itself, which unless your professor is kind enough to provide to you already organized and formatting, can take a significant amount of time and effort. With the relative accessibility of Palladio and RAW, any visualization project will likely involve way more time spent compiling and organizing data than interfacing with the actual visualization tools.

Uploading the data into Palladio should be quick and easy, but since I’m a Windows user it wasn’t. Originally the data uploaded fine, but wouldn’t display in the graph screen. I was able to get it to work by switching from Internet Explorer to Google Chrome.

Palladio

I thought the Palladio visualization was the most intuitive in showing both the connections between multiple units that saw service in the same battle and the relative amount of combat seen by the various regiments. There isn’t as many options in Palladio as in RAW, at least not for a limited data set we’re working with, although the addition of latitude and longitude would allow us to map the locations of battles these units fought in.

RAW allows more options for visualization, but once again with only this limited data set only a few are useful. The ones that worked the best were Alluvial Diagram, Circle Packing, Cluster Dendrogram, and Circular Dendogram.

Alluvial Diagram:

Alluvial

Circle Packing:

Circle Packing

 

Cluster Dendrogram:

Cluster Dendrogram

Circle Dendrogram:

Circle Dendrogram

Of these, the Alluvial Diagram does the best of illustrating overlapping battles, while the other three are only really useful in highlighting which units had participated in more battles. Still, the many options available on RAW provide more visualization options than Palladio.

Finally, there is Gephi. Gephi may be the exception to the rule I started this post with, that data compilation and organization was the most time consuming part of visualization.  With Gephi, figuring out Gephi is the longest part. I was actually never able to get it to fully work, despite having the data already set up and the very helpful tutorial provided by Elena Friot.

While not as explicitly made as in the readings on text mining last week, an important theme of this weeks’ articles on visualization and networks is the dual role of these tools. While these tools are usually used to provide a visualization of data, they can also be used in an exploratory mode to reveal connections not normal apparent, and thus serve as a starting point for inquiry rather than merely as evidence for an argument or a conclusion. Like many of the other tools we have learned about, they also have their flaws that must be accounted for if they are not to lead the novice digital historian astray.

To start with the traditional use of networks, I think David Staley’s two element definition of visuals images, quoted by John Theibault in his article on “Visualization and Historical Argument,” is an extremely simple and useful way to think about the use of these networked images. Visual images can be a stand-alone organization of meaningful information, much as the digital history projects on the Houston Daily Post and Kissinger’s memcoms and telecoms, or this week’s “Mapping the Republic of Letters.” Alternatively, and perhaps how we as historians are most familiar with their use, is when they are employed as a supplement to written accounts to further bolster textual arguments or provide additional evidence. Both of these uses are primarily for the purpose of displaying data, but while the standalone visualizations in our readings have been finished projects, it is easy to imagine how they could also be used to expose new questions and paths of inquiry.

Thiebault does a valuable job of exploring the possibilities opened up for further visualization by the increased use of digital tools, but one point he made struck me as especially surprising as well as illustrating the radically expansive possibilities and democratization made possible by new media. This was the simple point that you can use color extensively in digital history, while it is prohibitively expensive in print media. This seems like a minor point, but when one considers how important it is in visual images, it clearly illustrates how even such small factors create large changes in the new world of digital media.

This does not mean, however, that visual networks and other visual tools are without their pitfalls. As Johanna Drucker argues persuasively, we have an innate tendency to accept these images as substantiated fact with their own intrinsic proof in a way we would never do for a textual argument. Instead, we must consider visual images like we do interpretive arguments…by evaluating the evidence and methodology that underlie them to make our own determination if the end result is supported and convincing. To this danger, Scott Weingart adds several additional criticisms. For Weingart, “network structures are deceitful.” The must be evaluated closely to ensure that what the historian is attempting to show by applying his data to a network structure matches the network structure used. Central to this is networks lack of memory (ie they can only show connections, not how those connections were used) and difficulty showing multimodality. Weingarts final call is for historians to ensure these visual networks are employed only when appropriate, going back to Staley’s second definition of using visual images to supplement written arguments.

While these criticisms must be considered, none of these historians is calling for the abandonment of the use of visual networks. Rather, like the other tools we have learned about, they must be approached with a complete understanding of their methodology and implications in order to be properly applied. One component of this, not as explicitly covered in the readings, is the ability to use these visualizations as a research tool rather than simply as evidence or a final product.