Let , the vapnik-Chervonenkis dimension of , denoted as is the cardinality of the largest set of points from that can be shattered by .
The set of input points is shattered by , if for every labeling , there exists a hypothesis such that
In other words, VC-dimension for hypothesis class is the maximum number of points this class can realize - for every possible labeling, there exists a classifier in set .
Important examples
| Class of models | ||
|---|---|---|
| Linear classifiers from | = | line can always separate 3 points, plane can always separate 4 points, … |
| Threshold classifier | = | Any two points with different labels break the threshold if they lie on “wrong side” of the threshold |
| Finite hypothesis space | There is possible labeling or points. We need at least that many hypothesis to shatter points. | |
| Memorizers | You can memorize any number of points. | |
| Nearest neighbors classifiers | Theoretically same as memorizer, we can have infinitely close points for each |