Topic Modeling and the Margaret Sanger Papers

Posted on April 3, 2013 by Claire Payton

Cathy Hajo, my former advisor at the Margaret Sanger Papers at NYU recently wrote a fascinating blog post on topic modeling and its applicability to the humanities. It’s exciting to learn about other scholars’ engagement with digital technologies! Maybe we could bring some of these folks to Duke to participate in whatever Digital History Speakers series we put together next year!

With Cathy’s permission, I have re-blogged her piece:

Margaret Sanger

I recently attended the Women’s History in the Digital World conference, sponsored by Bryn Mawr College’s Albert M. Greenfield Digital Center for the History of Women’s Education. The sessions were packed with great papers and projects, many of which started the wheels turning on different ways that we might use digital research tools to better understand Sanger and her ideas.

In the very first panel I attended, Bridget Baird of Connecticut College and Cameron Blevins of Stanford University, talked about topic modeling, the process of using a computer program to mine digital texts and build sets of words that frequently appear together. Their work compared the diaries of Martha Ballard and the Elizabeth Drinker. The women lived about a century apart and in very different conditions, so there was an expectation that their diaries would describe very different lives. The sample comparisons shown at the panel demonstrated both similarity in word usage and contrasts that reflected differences in social class, location, and time period.

A visualization of gardening terms by month in the Ballard diary.

What topic modeling can offer a historian is an objective snapshot of the content of the collection. Rather than relying on our own readings of documents to combine them together into subject categories, we look instead to the words that appear together most frequently and then label those words in ways that make sense to us. In the case of Martha Ballard, one cluster of words (birth deld safe morn receivd calld left cleverly pm labour fine reward arivd infant expected recd shee born patient) clearly related to her profession as a midwife. Others regarding gardening (see image above), fall into predictable seasonal patterns. Still other groupings of words are less easy to label, and some may not at first make any cohesive sense. Yet, we can study the frequencies with which certain groups of words occur.

We cannot rely only on the computer-driven groups to use in analyzing texts. The next step is to look at the texts that contain repeating word patterns and conduct a close reading to see what we can learn about the topic. Plotting the topic over time enables us to locate trends in how important the topic was to the author, or when we compare them with other authors, we can investigate differences in the ways that two authors valued these topics or the different ways that they expressed themselves.

An example from the Ballard study is instructive, as Cameron Blevin discussed in his blog:

. . . topic modeling allows us a glimpse not only into Martha’s tangible world (such as weather or housework topics), but also into her abstract world. One topic in particular leaped out at me:

feel husband unwel warm feeble felt god great fatagud fatagued thro life time year dear rose famely bu good

The most descriptive label I could assign this topic would be EMOTION – a tricky and elusive concept for humans to analyze, much less computers. Yet MALLET did a largely impressive job in identifying when Ballard was discussing her emotional state. How does this topic appear over the course of the diary?

Like the housework topic, there is a broad increase over time. In this chart, the sharp changes are quite revealing. In particular, we see Martha more than double her use of EMOTION words between 1803 and 1804. What exactly was going on in her life at this time? Quite a bit. Her husband was imprisoned for debt and her son was indicted by a grand jury for fraud, causing a cascade effect on Martha’s own life – all of which Ulrich describes as “the family tumults of 1804-1805.” (285) Little wonder that Ballard increasingly invoked “God” or felt “fatagued” during this period.

Adopting topic modeling tools for the Sanger Papers’ Speeches and Articles project will be interesting as we have already spent a lot of time developing and affixing detailed subject terms to the texts in order to provide additional ways to search and display them. When you have over 600 speeches and articles, the vast majority of which discuss birth control, the trick is uncovering subtle differences between and among them. We create detailed index entries for each text in the edition, narrowing the focus in so that our readers can use the subjects to cut through the documents to find the best ones on a specific issue. Topic modeling can offer us some new groupings of documents that we might have overlooked, and it will give us the capacity to analyze Sanger’s rhetoric over time, looking for key changes.

An example might be the belief among women’s historians that Sanger abandoned her feminist rationales for birth control in the late 1910s and early 1920s as she sought support from experts in the fields of medicine, social work and eugenics. This comes from a qualitative reading of Sanger’s writings, not a strict quantitative one. If we can identify a cluster of words as “feminist,” we can then trace how frequently those words appeared in Sanger’s writings and whether the findings match our assumptions.

Will we find clusters of words we can describe with terms like “feminism,” “eugenics,” or “reproductive health”? What words will we find clumped with “abortion” or with “birth control”? Will we be able to trace these clusters over time to see how they change over the course of Sanger’s life? Interesting questions, and ones that we hope to be able to ask our digital edition.

Now just to find a programmer to work with!

DEVONThink Bootcamp

Posted on March 29, 2013 by gpw4

At our first digital tools Bootcamp, I shared some of the program’s functions and how I use it to organize my dissertation research. The audience had a few people who already used Evernote and/or DEVONThink, so we had a great exchange about what these programs are useful for as well as their limitations.

The primary advantage to DEVONThink is that you can compile, view, tag, organize and search any kind of document in one “database.” So, if you have project with photographs of archival documents, video, articles clipped from the web, audio files, pfds, word documents—anything—it can be imported into your database. You can also create annotations or notes that attach to files, or create a new file (a text document) within DEVONThink. Like Evernote (though in my opinion, not as seamlessly or intuitively), you can clip things from the web as you surf. DEVONThink also has some “intelligent” functions that help you find related terms and files.

After a summer research trip to archives in Alaska, I had tens of thousands of photographs and no manageable way to deal with them. I imported them all to DEVONThink, renamed the files, merged files that were of a single source, and began the process of organizing them in a way that makes sense for my project. I take notes on sources within DEVONThink, and even make citations for each source as I go.

In the discussion, we talked about the tension between being able to amass material and being able to meaningfully navigate it. People asked specific questions about how to do things, and we compared they ways we use the program. From what people shared, it seems DEVONThink is pretty adaptable. You can customize it to fit the needs of your sources, discipline, and way of writing and researching.

“An Avalache is Coming”

Posted on March 27, 2013 by Claire Payton

This is a essay that is floating around right now in the context of Duke’s decision to join the an consortium of schools that will accept credit for undergraduate online courses offered by one of the member schools. More on that soon.

The essay, called “An avalanche is coming” seems a little inflammatory to me. One of the promotional quotes on the website is:

‘Our belief is that deep, radical and urgent transformation is required in higher education as much as it is in school systems. Our fear is that, perhaps as a result of complacency, caution or anxiety, or a combination of all three, the pace of change is too slow and the nature of change too incremental.’

I completely agree that the environment we are working in is changing, but the tone of the piece is trying to bully people. It’s basic stance is that even though we haven’t spent much time thinking about the short and long term consequences, everyone should jump onboard with their agenda or else they will be wiped out. What’s the rush? What’s gonna happen if we don’t completely transform academic pedagogy immediately? Have we really been suffering so much before now? Most the leaders in this debate are themselves products of traditional elite university education. Have they been handicapped because of it? Can we talk about the digital turn without insulting people who still value the engaged pedagogy of a liberal education?

One of the big complaints in the piece is that the student consumer is “king” now and they rule with their money. But students aren’t getting the most for their money in universities that spend a bunch on research and influencial scholars, since that supposedly doesn’t influence their learning. The focus needs to be on good teaching. So instead of funding more professors so the class sizes can be smaller, they argue that we should fund less professors and have students all over the world learn online from a few good teachers.

Why is Duke rushing to join this online courses consortium without running it through the traditional channels of faculty governance? Are they afraid that the faculty will shut the project down? Someone suggested to me today that Duke is rushing because they want to become one the dominant schools that can then sell it’s classes to smaller and poorer universities, whose junior faculty and adjuncts will be out of luck.

I’m looking forward to our meeting this week so I can hear ya’lls thoughts on these issues.

A few thoughts on Evernote

Posted on March 27, 2013 by Claire Payton

The first event in our “Digital Tools Bootcamp” series was a success.
I was really pleased that I was able to share Evernote with people who
hadn’t already been introduced to it. Evernote, for those of you that
don’t know, is a cloud-based data organizing program. For the free
version there is a monthly upload limit, but I don’t think I have ever
come close to reaching it. You have to be connected to the internet to
use it, but you can upgrade to the pay version if you plan on needing
it offline.

The organizing principle of Evernote is “notes” (text, images, pdfs,
audio) and you organized into “notebooks” (Anthro class, Haiti
research, wedding planning). These can organized further into
“notebook stacks” (school, event planning, teaching and research).
Right now I use it mostly for keeping track of all my class notes and
readings, as well as all the events and projects I am working on. The
whole database and as well as individual notebooks are searchable, and
you can also use tags to label individual notes and search them that
way as well.

One of the coolest and most innovative features of Evernote, in my
opinion, is the seamless way a “web clipper” integrates into my
browser (I use Chrome). When I am doing research, looking at recipes,
or whatever, and I find something that I want to save and keep for
later, I can click the Evernote add-on in my browser and a little
window pops up, allowing me to select either a selection, the article,
or the entire full page the website, and decide which notebook I want
to store it in. Then the webpage is stored in my Evernote account.
Even if the original website is removed, the clipping in my account
will remain.

I hope you get a chance to explore Evernote and decide if it is a
useful tool for you. I know it has made a big difference in the way I
organize my notes and my research.

Taming the Elephant

Posted on March 22, 2013 by Ashley Young

Our workshop on Evernote and DEVONthink is going really well! I came across this interesting article on how to use the basic elements of Evernote to suit specific goals for the program etc: “Taming the Elephant.”

“Out of the box, Evernote comes with some pretty robust syncing tools for all your note-taking needs. If you haven’t dug in to all Evernote can do, though, you might not be aware of everything on offer or just how well you can integrate Evernote into your workflow. From automation to advanced searches, we’re going to make Evernote start working harder for you.”

Transcriva Review

Posted on March 19, 2013 by Claire Payton

I’ve been getting increasingly interested in oral history over the past few years, and I recently discovered a program that makes working with oral histories SO MUCH EASIER! It’s called Transcriva. I’m amazed that I ever tried to use any other system for listening to and transcribing, annotating, or taking notes on interviews . I was a little wary at first, because it costs $30.00, but they let you play around with the very limited free version and I decided to spring for it because I was so frustrated with the inadequacies of the system I was using before (word processor + itunes = no fun). I’m very glad I did. Disclaimer: I think it might only be available for Macs.

click on the image to enlarge

It’s not a complicated program, so I was able to start using it quickly. To begin with, you link the transcript file to a audio file on your computer or hard drive (or online, but I haven’t tried that yet). Then you can assign different speakers, who each have their own shortcut keys to facilitate switching between speakers. You make new entries in the transcript by pressing “enter” or by using a new speakers’ shortcut key. There is also short cut keys to move backward or forward or to pause the audio. You can slow down or speed up the listening speed. And then, once you have transcribed or annotated your stuff, you can easily jump to different points in the interview in both the audio or the text. Over all it just streamlines the whole process and makes it much quicker, more intuitive and more manageable. I don’t dread transcription as much as I used to. It’s even kind of fun now.

There are other features I haven’t taken advantage of yet. Apparently you can record audio files directly into Transcriva, and you can link your transcripts to online content as well. You can also export the transcripts as text files, perserving the speakers and the timestamps of your notes.

In sum, if you are working with audio files or interviews for your research I would strongly suggest you check out Transcriva!

A Potential Partnership Down 15-501

Posted on March 19, 2013 by Ashley Young

I recently came across the website for UNC-CH’s Institute for the Arts and Humanities’ Digital Arts and Humanities Initiative. This may be a great organization to become involved with to build a relationship between Duke and UNC in the realm of the Digital Humanities. You can follow the IAH on Twitter.

March Madness, Facebook data mining, and Mapping

Posted on March 19, 2013 by Ashley Young

A good friend of mine sent me this article about a creative use of Facebook data mining and visualization of basketball fan loyalties (a very appropriate project in the midst of March Madness). Arguably, projects like this visualization of “fandom” are a good way to introduce students to the concepts of data mining and mapping, putting into motion the cogs and coils of inspiration for mapping projects related to change over time. First step: Facebook and fandom. Second step: Wikipedia and the historical record. Third step: independent research project. The possibilities are endless!

Check out this image of Duke and North Carolina loyalties:

Visual Complexity

Posted on March 5, 2013 by Claire Payton

I’m at a PhD Lab event and the speaker, Ann Pendleton-Jullian, just mentioned this awesome looking website, Visual Complexity .

It’s a site about different ways to visualize data. Its about how you frame what you think is most important in the information. Think about that famous information map that links the temperature to Napoleons retreat in Russia.

Digital Tools Boot Camp

Posted on February 25, 2013 by Claire Payton

We’ve organized a lunch series for grad students to help folks learn about digital tools for research and teaching. Hope you’ll be able to join us!

Digital History Working Group

at Duke University