The challenge is now over. But it remains open for post-challenge submissions!
The results are evaluated according to the following performance measures. The validation set is used for ranking during the development period. The test set is used for the final ranking.
The results for a classifier can be represented in a confusion matrix, where a,b,c and d represent the number of examples falling into each possible outcome:
Prediction | |||
---|---|---|---|
Class -1 | Class +1 | ||
Truth | Class -1 | a | b |
Class +1 | c | d |
The balanced error rate is the average of the errors on each class: BER = 0.5*(b/(a+b) + c/(c+d)). During the development period, the ranking is performed according to the validation BER.
The area under curve is defined as the area under the ROC curve. This area is equivalent to the area under the curve obtained by plotting a/(a+b) against d/(c+d) for each confidence value, starting at (0,1) and ending at (1,0). The area under this curve is calculated using the trapezoid method. In the case when no confidence values are supplied for the classification the curve is given by {(0,1),(d/(c+d),a/(a+b)),(1,0)} and AUC = 1 - BER.