Playing with visual text analysis using Voyant

As I’ve started to dip my toes into the DH current, one thing I’ve been excited to play with is visual presentations of text analysis. Until I hadn’t had a strong need for it, but with the approaching SCI survey of alt-academics and the analysis it will entail, I finally have a good reason to start exploring what’s out there.

The first tool I’ve checked out is Voyant (developed by Stéfan Sinclair and Geoffrey Rockwell as part of their project), which allows you to upload a document, point to a URL, or copy text; it can analyze a single document or a corpus. I uploaded my dissertation as a sample and, after stripping out articles and such (which the tool makes very easy), I got a nifty word cloud:

Below it, Voyant displays a list of words by frequency. Checking boxes next to one or more words gives a distribution of word appearance in the document or corpus. Here are three commonly appearing words charted through the diss:

I found it interesting to see that while I clearly used the word “trauma” a ton, the places where it appeared the most were in the intro and conclusion–suggesting that I relied on the term when I was pulling my argument together, but much less in the actual analysis. A section below the chart shows the context of the selected words in a table that can be sorted in a variety of ways. All the data in each section can be exported in a number of formats, too, for use in other sites or documents. (More than ever, I’m feeling pinched by having my blog hosted by, which doesn’t support things like iFrames; I hope to get a more flexible set-up going before too long.)

There’s a lot more that Voyant can do, and I’m looking forward to playing with it (and other tools) a lot more as I get a clearer sense of what kind of analysis I want to do. More soon!

Learning by destruction

In preparation for my first THATCamp, I’ve been breaking things. I’m new to the DH world, and only recently have I been dipping my toe into the “hack” side of the hack/yack divide. Enchanted by why’s (poignant) guide, I explored a bit of Ruby; then, like many others, I tried (and, by the end of February, failed) Codecademy. Most recently, I’ve been learning a little bit about HTML and CSS (I’m mainly using this book by @jcmeloni).

While the first two attempts to gain concrete technical skills didn’t take me very far, this latest effort is yielding some real results. The difference, I think, has to do with motivation and goals. My first attempts at Ruby and Javascript stemmed from a sense that learning a programming language was something I should do. Though the DH community has taken care to emphasize that coding isn’t everything, my thinking about it hasn’t changed–it’s an increasingly important literacy, and a basic level of knowledge is already important and will become more so, if for no other reason than to understand which problems are hard and which are easy. (My spouse, a programmer, is continually dismayed when he describes some cool new innovation, and I fail to be impressed, not realizing that it solves a very tricky problem.)

I didn’t abandon the lessons because I found them unimportant. I just couldn’t dig into them. This confused me; after all, if I’m good at anything, it’s learning things! Plus, why’s (poignant) guide and Codecademy take such different tacks that if one didn’t work for me pedagogically, it seemed the other should have. But still, I walked away from both of them.

What I’ve come to realize is that without an actual problem that could help me contextualize and apply the new skills, I was having a hard time making the connections I would need to really learn and understand what I was doing. Weeks in, I still didn’t really know what Ruby or Javascript looked like in the wild, and so while I was enjoying making little snippets of code that did things (enjoying it a lot, actually), my interest in both tapered as other priorities came up.

I cracked open the (e-)book on HTML and CSS for completely different reasons. I had been working on the census of #alt-academics that I’ve written about before, as well as the not-yet-public survey that will be its more rigorous counterpart, and I was hitting some roadblocks. Most of these were stylistic: I wanted the logo to appear here, not there, and I wanted it to link back to the site (well, Wufoo wouldn’t let me get past that hurdle, but it wasn’t for lack of trying–and I succeeded on the survey); I wanted a wider margin around the text. In short, I wanted to have more control than the visual editing interface allowed. I picked up the lessons in HTML and CSS because I had a problem I was trying to solve, and that has made all the difference in the way the instruction clicks for me.

My pre-THATCamp efforts have been similar. I’m at a point where I want to start having more control over my site (I am an Order Muppet, after all), so I want to learn more about what I could do with WordPress beyond its ready-made themes; I also want to start doing more with visual and other multimedia materials in my research, so I want to learn more about Omeka. For both of these things, I need to know about web servers and FTP clients, both of which I’m sure are second nature to a ton of THATCamp participants, but they’re new for me. So I have been tinkering, with the guidance of ProfHacker and my stellar colleagues in the Scholars’ Lab.

And along the way, I’ve been breaking things. I had a single triumphant moment in which everything seemed to be working as it should–and then, I went one step further, and managed to completely lock myself out of MySQL, rendering the whole setup unusable.

Here’s where things got tricky, and a little interesting. I had to find a way to dig myself out, and I had no idea how to do that. Most of the troubleshooting instructions I found on involved the command line–which I do not know how to use, much to my chagrin. (Again, following Prof Hacker’s lead, I’ve learned how to do the simplest of tasks–but really, knowing how to create a text file wasn’t going to get me out of the trouble I was having!). I went down a time-consuming and frustrating rabbit hole. As I tried to figure out what to do, I realized a lot of things–among them, I didn’t totally know what MySQL was, why I needed it, or what had gone wrong.

That sounds bad, but it has actually been energizing. The risks for me are still low at this point, but the potential reward is high. I’m finally starting to get a sense of what I don’t know–whereas before, all I saw was an abyss of confusion. My questions at this point are still incredibly basic (and, to be honest, I’m not always comfortable asking them), but I feel like certain elements are slowly coming into focus.

Breaking things has given me problems to solve, which is where the opportunity and desire to learn seem greatest. This is not new news to most of the DH community–@samplereality has argued compellingly that DH is about destroying things, and @jessifer tweeted about giving his students a problem without giving them the tools to solve it. As a pedagogical strategy, it makes sense: that’s how we learn when we’re doing things on our own–with a sense of urgency and a problem to solve.

As I continue to think about reforming humanities graduate training, my own experience of trying to learn, failing, and then needing to learn and (at least partially) succeeding, will remain at the front of my mind. I still haven’t really figured out what all went wrong, but I managed to get things working again (that counts as hacking, right?) and have a much better sense of what questions to ask my fellow THATCampers. Had everything gone smoothly, I would have learned so much less.