This analysis will compare different text representations with simple sklearn linear and ensemble classifiers. The basis of all the text representations tested in this analysis is the sklearn vectorizer. The performance of these representations were tested in the previous post, and so the best performing methods from that analysis will be analyzed to determine which features (i.e. tokens or words) contribute to the decision to classify something as accountability. This will give further understanding into which method of text representation is most suitable for this task.

Recap of Performance Results

The most notable effect on performance from the various representations was that character based consistently performed higher than other methods.

In the comparison of classifiers, it is clear overall the linear methods perform best, and the balanced linear classifiers was slightly better in most cases.

Analysis of Features on performance

The following analysis will demonstrate which features contributed the most to the classification decision with various different representations. This can give an indication why some methods may be performing better that others, and also to see if the good performing methods have features that seem to related to the topic of accountability.

Unigrams, Balanced

In the case of unigram, and balanced cost function, the features seem like they have some coherence with the topic of accountability, though do contain some event specific terms, such as the name “hornaday”, that is most prevalent in a specific set of articles. This indicates that this representation seems like it would not generalize as well to new events.

Positive Features

Weight Feature
+11.159 hornaday
+10.813 counsellor
+10.489 bullied
+10.130 rage
+9.911 ugly
+9.760 counselor
+9.615 sprees
+8.979 science

Negative Features

Weight Feature
-8.918 massacred
-9.079 camera
-9.144 marquez
-9.147 stringent
-9.266 animals
-9.279 proposed
-9.763 bills
-9.897 apple
-10.262 undergo
-10.303 peace
-11.353 proposals
-13.895 caption

Unigrams, Unbalanced

This representation seems to clearly demonstrate meaning related to accountability, with both the terms “blame” and “blamed” in the top most relevant terms to predict as positive for accountability. This may be indicating that despite the slightly higher performance in f-score from the balanced classifier, the unbalanced may actually be giving more meaningful results.

Positive Features

Weight Feature
+5.122 blame
+4.720 culture
+4.605 blamed
+4.453 shall
+4.126 misogyny
+4.097 counselor
+4.040 rage
+3.882 senselessness
+3.868 isolated
+3.841 hornaday
+3.795 void
+3.795 bullied
+3.788 videos
+3.727 none
+3.686 illness
+3.664 failure
+3.622 counsellor

Negative features

Weight Feature
-3.843 bmw
-3.935 students
-4.184 caption

Balanced, Character Based

The balanced character based was the best performing representation, though this is perplexing when analyzing the feature importances for the classifier. The features are not easily interpretable, and do not seem to clearly convey meaning.

Positive Features

Weight Feature
+19.469 e? 
+19.191 s? 
+17.221 n? 
+16.696 a? 
+14.978 fame
+13.483 tered.
+13.476 ak
+12.957 ndw
+11.972  be.
+11.777 war,
+11.760 var
+11.499 u.
+11.329 wild
+11.228 se. 
+11.153 fame
+10.631 od. 

Negative Features

Weight Feature
-11.270 spa
-11.304  6 
-12.947  work 
-13.294 ?” 

Balanced, Tri-grams

For classifiers incorporating 1-3 ngrams, the results are similar to the unbalanced unigram results. This method also did have a fairly good performance in f-score.

Positive Features

Weight? Feature
+10.839 blame
+8.991 in other
+8.746 know this
+8.640 knew he
+8.603 jaylen
+8.571 blamed
+8.456 motives
+8.343 culture
+8.291 believe the
+8.232 fame
+8.226 hornaday
+8.069 he doesn
+8.020 teens
+7.797 women

Negative features

Weight Feature
-7.655 marquez
-7.663 apple
-7.784 who was
-7.974 jaylen fryberg
-8.451 yes
-10.954 caption

Conclusions

Methods that have the highest performance in f-score on the current datasets, do not necessarily have the best performance for generalizing to future unseen articles. To assess if the classifiers are capturing something meaningful, feature analysis is vary useful. From this analysis of feature importance, we can see that the highest performing methods: character based and balanced, are not capturing meaningful features specific to accountability. It seems from this analysis that unigrams and tri-grams methods were the capturing best meaning in the features.