Yesterday afternoon, Brad Pasanek and I decided to play at text-mining. We started working with MALLET and this GUI tool but were soon lost in the mine, buried in code, with nary a respirating canary, shafted.
Our proposal includes two potential approaches:
(1) a session could look at how a scholar might begin to use topic modeling in the humanities. What do those of us with limited technical nous need to know in order to begin this type of work? We imagine a walk-through, cooking-show-like presentation that goes from A (here are some texts) to B (here is a visualization). Between A and B there are many difficult and perilous interactions with shell scripts, MALLET extrusions, statistics, spread sheets, and graphing tools. While we two are probably not capable of getting from A to B with elegance, flailing about in a group, roughing out a work flow, getting advice from sundry THATCampers, and making time for questions would be generally instructive—or so we submit.
(2) An alternative approach assumes some basic success with topic-modeling, and focuses instead on working with the cooked results. How can my-mine-mein data (we would bring something to the session and invite others to do the same) be interpreted, processed, and visualised? This secondary concern may even be included in the visualization session that has already been proposed.
Both bits assume a willingness to wield the MALLET and do some topic modeling. We aim primarily at a how-to and hack-and-help, and not a discussion of the pros and cons of topic modeling or text-mining in general.