Writing Centers, data mining, assessment

I see that over at THATCamp-LAC, someone has proposed a session on “Technology in a Writing Center,” asking “What are some tools and best practices that could be helpful to student writers? How can technology help us beter understand the needs of students in writing pedagogy?” I’m the Assistant Director of the Writing Center at Emory University, and I’m certainly interested in yanking about thesebtopics, but I’ve also got a specific project using data mining for assessment that i’d like to work on hacking.

After every conference at the Emory Writing Center, the tutor writes a reflective paragraph about the session (e.g., “John Doe came in with an essay on David Foster Wallace’s Infinite Jest for Prof. Smith’s Comp Lit 390 class. The assignment was to … The student argued … The problem he is having is … I suggested that he … He seemed confused at first, but then he seemed to understand what I meant & decided he would …”). I get a compilation of approximately 100 such reports each week, which I use for various sorts of administrative tasks (e.g., realizing theree are lots of conferences where students are having similar issues & reminding tutors that we have a resource they might consider using in their next such conference). We’ve been producing reports like these for more than 10 years and using them as assessment tools to gauge how we’re performing as a center.

I look at these reports and see a massive, rich body of data about writing at my university for the last 10-plus years. But there’s so much of it and I’m flooded with it on a week-to-week basis. So I’m starting to put together a project to mine this data to allow for a more distant reading of what it can tell us about larger trends. I believe this data set presents some interesting, specific problems different from this found in, say, the Civil War newspapers that Robert Nelson wrote about in the New York Times last week. For one thing, there are some privacy issues with the Writing Center reports (like specific student names). For another, I’m not so much interested in trying to fit this data into some sort of already existing historical framework as I am in trying to find out what (unexpected?) things they tell us about the sorts of issues students have with writing, how writing has been taught, how these factors have changed over time if they have. I don’t even have a really clear list of just what factors I would need to tag as I encode theses reports.

Does this project sound similar in any ways to projects you have going on? Hw have you handled those projects? I think lots of writing centers gather data in similar ways to what we do at Emory, and my sense is that few of us really know what to do with this data now that all the new DH tools have come along.

I’ve got a small team of grad students I’m paying to work for me this summer. Some of their time will be spent tutoring, but some of their time will be devoted to helping me with this project. I’m looking for help figuring out how to use them most effectively and to get the most out of these next two months working with this data.


  1. Brian Croxall

    Love it. I’m there.

  2. Tad

    I’d also be fascinated to see what datamining would allow one to extract from such a relatively formulaic corpus. There’s going to be a lot of convention guiding narrative and word choice that will vary slightly from case to case and writer to writer.

    The interesting question to me is how much text mining let’s us find microtrending we might otherwise miss within boilerplate.

Comments have been disabled.

Skip to toolbar