Feature Importances of Tfidf Representations Classifiers

This analysis will compare different text representations with simple sklearn linear and ensemble classifiers. The basis of all the text representations tested in this analysis is the sklearn vectorizer. The performance of these representations were tested in the previous post, and so the best performing methods from that analysis will be analyzed to determine which features (i.e. tokens or words) contribute to the decision to classify something as accountability. This will give further understanding into which method of text representation is most suitable for this task.

Recap of Performance Results

The most notable effect on performance from the various representations was that character based consistently performed higher than other methods.

In the comparison of classifiers, it is clear overall the linear methods perform best, and the balanced linear classifiers was slightly better in most cases.

Analysis of Features on performance

The following analysis will demonstrate which features contributed the most to the classification decision with various different representations. This can give an indication why some methods may be performing better that others, and also to see if the good performing methods have features that seem to related to the topic of accountability.

Unigrams, Balanced

In the case of unigram, and balanced cost function, the features seem like they have some coherence with the topic of accountability, though do contain some event specific terms, such as the name “hornaday”, that is most prevalent in a specific set of articles. This indicates that this representation seems like it would not generalize as well to new events.

Positive Features

Weight	Feature
+11.159	hornaday
+10.813	counsellor
+10.489	bullied
+10.130	rage
+9.911	ugly
+9.760	counselor
+9.615	sprees
+8.979	science

Negative Features

Weight	Feature
-8.918	massacred
-9.079	camera
-9.144	marquez
-9.147	stringent
-9.266	animals
-9.279	proposed
-9.763	bills
-9.897	apple
-10.262	undergo
-10.303	peace
-11.353	proposals
-13.895	caption

Unigrams, Unbalanced

This representation seems to clearly demonstrate meaning related to accountability, with both the terms “blame” and “blamed” in the top most relevant terms to predict as positive for accountability. This may be indicating that despite the slightly higher performance in f-score from the balanced classifier, the unbalanced may actually be giving more meaningful results.

Positive Features

Weight	Feature
+5.122	blame
+4.720	culture
+4.605	blamed
+4.453	shall
+4.126	misogyny
+4.097	counselor
+4.040	rage
+3.882	senselessness
+3.868	isolated
+3.841	hornaday
+3.795	void
+3.795	bullied
+3.788	videos
+3.727	none
+3.686	illness
+3.664	failure
+3.622	counsellor

Negative features

Weight	Feature
-3.843	bmw
-3.935	students
-4.184	caption

Balanced, Character Based

The balanced character based was the best performing representation, though this is perplexing when analyzing the feature importances for the classifier. The features are not easily interpretable, and do not seem to clearly convey meaning.

Positive Features

Weight	Feature
+19.469	e?
+19.191	s?
+17.221	n?
+16.696	a?
+14.978	fame
+13.483	tered.
+13.476	ak
+12.957	ndw
+11.972	be.
+11.777	war,
+11.760	var
+11.499	u.
+11.329	wild
+11.228	se.
+11.153	fame
+10.631	od.

Negative Features

Weight	Feature
-11.270	spa
-11.304	6
-12.947	work
-13.294	?”

Balanced, Tri-grams

For classifiers incorporating 1-3 ngrams, the results are similar to the unbalanced unigram results. This method also did have a fairly good performance in f-score.

Positive Features

Weight?	Feature
+10.839	blame
+8.991	in other
+8.746	know this
+8.640	knew he
+8.603	jaylen
+8.571	blamed
+8.456	motives
+8.343	culture
+8.291	believe the
+8.232	fame
+8.226	hornaday
+8.069	he doesn
+8.020	teens
+7.797	women

Negative features

Weight	Feature
-7.655	marquez
-7.663	apple
-7.784	who was
-7.974	jaylen fryberg
-8.451	yes
-10.954	caption

Conclusions

Methods that have the highest performance in f-score on the current datasets, do not necessarily have the best performance for generalizing to future unseen articles. To assess if the classifiers are capturing something meaningful, feature analysis is vary useful. From this analysis of feature importance, we can see that the highest performing methods: character based and balanced, are not capturing meaningful features specific to accountability. It seems from this analysis that unigrams and tri-grams methods were the capturing best meaning in the features.