One of the first things I thought of when I took a look at the PEP incident summaries is that it would be interesting to do a word frequency visualization of the summaries themselves. Commonly referred to as “Tag Clouds”, a word frequency visualization is a way to present the text data in a qualitative way; the utility of knowing the frequency with which a word is used is dubious, but it’s an interesting way to get a sense of the nature of the data.
As I’ve mentioned before, I am in the process of taking the data from all of the SAR incidents in BC since 2003 and putting it online. In the first few passes through the data I’ve brought it into a database format. I can iterate through all 8000 or so records, and pull out each word from the summary field and add it to a dictionary, incrementing the count each time a word is repeated.
Rather than write my own tag-cloud generator, I took my weighted word list (the list of words with the number of occurrences) and dumped them into the excellent Wordle generator. Results below:
Properly named, this is a text cloud since the summaries do not have tags as metadata (yet). Another iteration, with colour:
The verbs and nouns certainly stand out. You get a nice sense of the purpose of SAR right there; SAR members respond, search, rescue. You can see smaller words like “deceased” (less than 2% of all subjects in BC are located deceased). If you look closely you can see various rescue modes, agencies and SAR teams mentioned.
I’ll be putting some word frequency functionality into the InfoSAR project, stay tuned for the first release in a few weeks.