Analysis of Performance by Event

The objective of this analysis is to compare the performance of classifier trained to identify accountability specific to an event, vs a classifier trained to identify accountability in general. If there are common features across all events that indicate accountability, then the performance when trained on all the news events should increase (typically more data improves performance).

However, if the performance decreases when the classifier is trained on multiple datasets, this means that it is likely there are not a prominent features that capture the meaning of accountability in general. This could indicate that the annotations of accountability are event specific, or there is not enough data from a variety of different events to capture the generalized representation of accountability.

The results shown in this post also compare sentence vs excerpt level classifiers, and a comparison of different representation and classification algorithms.

Summary of Findings

The main observation, is that there is a wide range of performance results, with some events achieving performance in fscore above 0.8, while some are as low as ~0.5. Also, note that the inter-annotator agreement for the events ranges from 0.6-0.8.

The effect of transitioning from excerpt level to sentence level also decreases performance, but not by as much as the effect of the event.

An additional finding is that the character based representation, and the SVM classifier had the best performance out of the methods tested in this analysis, the the difference between performance in the linear classifiers is almost across all the variations tested is almost negligeable.

The are summarized in tables in the following sections.

Individual Events

Sentence Based

	count	mean	std	min	25%	50%	75%	max
event
Charleston	108.0	0.232965	0.141681	0.000000	0.120760	0.245000	0.357264	0.462500
Isla Vista	108.0	0.738317	0.024722	0.693498	0.721297	0.738237	0.750443	0.802410
Marysville	108.0	0.710327	0.028182	0.654321	0.687905	0.710819	0.733473	0.761905
Newtown	108.0	0.363988	0.127713	0.152091	0.245902	0.397473	0.475519	0.560870
Orlando	108.0	0.270458	0.157829	0.000000	0.138889	0.278532	0.418455	0.476190
San Bernardino	108.0	0.321567	0.116414	0.096386	0.235294	0.336304	0.421232	0.522293
Vegas	108.0	0.141236	0.115760	0.000000	0.000000	0.121212	0.250880	0.380952

Excerpts

	count	mean	std	min	25%	50%	75%	max
event
Charleston	108.0	0.323976	0.141580	0.050000	0.215686	0.338462	0.448497	0.568182
Isla Vista	108.0	0.757786	0.022385	0.722045	0.740443	0.754337	0.777850	0.813754
Marysville	108.0	0.762653	0.060974	0.649351	0.717634	0.768177	0.810127	0.882353
Newtown	108.0	0.413744	0.164910	0.067797	0.337558	0.476467	0.522574	0.599156
Orlando	108.0	0.237244	0.146436	0.000000	0.117647	0.288018	0.354430	0.487805
San Bernardino	108.0	0.412743	0.130067	0.121212	0.333333	0.448881	0.500000	0.615385
Vegas	108.0	0.143728	0.152868	0.000000	0.000000	0.080000	0.285714	0.518519

Combined Datasets

Sentences

	count	mean	std	min	25%	50%	75%	max
classifier
logregcv	27.0	0.538240	0.026181	0.502447	0.521963	0.528771	0.563489	0.583333
logregcv_balanced	27.0	0.560509	0.032042	0.504496	0.532829	0.565003	0.591144	0.606166
random_forest_balanced	27.0	0.506861	0.008723	0.489726	0.502209	0.505082	0.509686	0.532418
svm_balanced	27.0	0.552539	0.026245	0.511447	0.536557	0.550852	0.573312	0.607453

	count	mean	std	min	25%	50%	75%	max
vectorizer
1gram	12.0	0.523218	0.014292	0.502758	0.512716	0.524475	0.531200	0.546624
3gram	12.0	0.551585	0.032038	0.504334	0.528432	0.556689	0.574398	0.593997
char	12.0	0.570441	0.031584	0.518699	0.548608	0.579456	0.590890	0.607453
cust_all-1gram	12.0	0.517531	0.016412	0.496622	0.502655	0.517321	0.532072	0.542048
cust_all-3gram	12.0	0.548571	0.034563	0.498834	0.518188	0.558592	0.578283	0.590374
cust_no_nums-1gram	12.0	0.519189	0.017042	0.489726	0.505752	0.520005	0.534048	0.540292
cust_no_nums-3gram	12.0	0.550599	0.033735	0.505082	0.518338	0.557842	0.578534	0.593817
cust_only_alpha-1gram	12.0	0.518241	0.013186	0.501186	0.507472	0.517567	0.526971	0.537879
cust_only_alpha-3gram	12.0	0.556459	0.034341	0.506245	0.526145	0.568348	0.578001	0.602856

Excerpts

	count	mean	std	min	25%	50%	75%	max
classifier
logregcv	27.0	0.606388	0.029977	0.548485	0.592954	0.600801	0.624813	0.658854
logregcv_balanced	27.0	0.616874	0.023957	0.576471	0.600716	0.612943	0.629839	0.661818
random_forest_balanced	27.0	0.470446	0.021622	0.440141	0.454623	0.464883	0.489271	0.507993
svm_balanced	27.0	0.614373	0.028834	0.566914	0.590567	0.618690	0.629487	0.670190

	count	mean	std	min	25%	50%	75%	max
vectorizer
1gram	12.0	0.556811	0.060908	0.440141	0.540708	0.577642	0.601289	0.606838
3gram	12.0	0.587291	0.075564	0.451049	0.563028	0.620402	0.637051	0.658854
char	12.0	0.600079	0.070327	0.469178	0.579143	0.620645	0.648738	0.670190
cust_all-1gram	12.0	0.560850	0.062405	0.440141	0.547690	0.587992	0.597148	0.623542
cust_all-3gram	12.0	0.589742	0.072032	0.457539	0.576271	0.618028	0.633470	0.653504
cust_no_nums-1gram	12.0	0.555949	0.060072	0.445614	0.536471	0.584496	0.596083	0.604692
cust_no_nums-3gram	12.0	0.589490	0.070813	0.457539	0.577209	0.620488	0.632463	0.648438
cust_only_alpha-1gram	12.0	0.557290	0.060581	0.440141	0.535464	0.579711	0.595053	0.622642
cust_only_alpha-3gram	12.0	0.595678	0.071185	0.459413	0.578810	0.623086	0.639356	0.661433