PASCAL Agnostic Learning
Prior Knowledge

The challenge is now over. But it remains open for post-challenge submissions!

IMPORTANT: Entries made since February 1st 2007 might be using validation data, now available for training.

Data Grid (Coclustering)

Submitted by Marc B oulle

Data Grid (Coclustering)

Prior submission (prior for ada, gina and sylva, agnostic for hiva and nova)

Data Grids extend the MODL discretization and value grouping methods to the multivariate case.
It is applied to learn a coclustering on the instance*variable space,
using all the unlabelled train+valid+test data.
The clusters of instances are used for prediction using the available labels (train+valid). (paper submitted to IJCNN 2007)

The coclustering technique is used for the sparse datasets (gina, hiva and nova).
The Data Grid (CMA) technique is used for the other datasets (ada and sylva)

BER guess:
ada: 0.192
gina: 0.052
hiva: 0.320 (agnostic)
nova: 0.073 (agnostic)
sylva: 0.008

Dataset Balanced Error Area Under Curve  
Train Valid Test Train Valid Test
ada 0.1734 0.1994 0.1756 0.8507 0.8045 0.8464 prior
gina 0.043 0.0283 0.0516 0.9933 0.9975 0.9768 prior
hiva 0.0415 0.0297 0.3127 0.9817 0.9907 0.7077 agnostic
nova 0.0394 0.026 0.0488 0.9956 0.9971 0.9813 agnostic
sylva 0.0205 0.009 0.0228 0.981 0.9838 0.9798 prior
Overall 0.0636 0.0585 0.1223 0.9605 0.9547 0.8984 prior

This entry is a complete prior knowledge entry.