Visualizations Updated
This post is an updated version of the previous post, with the added data from more recent labeled news event excerpts. Also, this visualization contains only the excerpts of 5 sentences or less to keep the document sizes consistent.
The news events included in this visualization are listed in the following list. These events can also be analyzed individually, by clicking the links.
Text Visualizations Explanation
This file will display visualizations of the text based on the labelled categories, shown as the circles on the distance plot. This plot also shows the word distributions associated with each category. The word distributions on the right show the most common words in each category when lambda=1, and the most specific words to the category when lambda = 0, computed by the relevance metric.
The categories are labelled on the plot as numbers, and the corresponding label titles are:
Topic 1: POLICY, number of words: 178427
Topic 2: EVENT, number of words: 185512
Topic 3: VICTIMS, number of words: 132444
Topic 4: ACCOUNT, number of words: 139529
Topic 5: MOURNING, number of words: 69965
Topic 6: SAFETY, number of words: 71556
Topic 7: GRIEF, number of words: 66832
Topic 8: PERPETRATOR, number of words: 68053
Topic 9: INVESTIGATION, number of words: 43818
Topic 10: SOCIALSUPPORT, number of words: 43542
Topic 11: TRAUMA, number of words: 53066
Topic 12: RESOURCES, number of words: 37341
Topic 13: PHOTO, number of words: 31942
Topic 14: RACECULTURE, number of words: 27025
Topic 15: LEGAL, number of words: 24346
Topic 16: MEDIA, number of words: 22273
Topic 17: THREAT, number of words: 16626
Topic 18: JOURNEY, number of words: 24483
Topic 19: MISCELLANEOUS, number of words: 12121
Topic 20: HERO, number of words: 5923
The size of the circles correspond to the size of that category. Also, if hovering over a word in the chart on the right, the size of the circles will adjust proportional to count of that word in each category. Clicking on a topic will display that topic word distribution, and clicking away on the empty part of the distance plot will show the overall word distribution of all the documents.