Multi-label classification

1.Formal definitions

Learning framework

multi-label indicators:
- label cardinality
- label density $LDen(D)=\frac{1}{|D|}LDcard(D)$
- label diversity $LDiv(D)=|\{Y|\exists x:(x,Y)\in D\}|$
- normalized label diversity $P LDiv(D) = \frac{1}{|D|} · LDiv(D)$
Real value function f: $f：X\times Y\rightarrow \mathbb{R}$
```
 where f(x, y) can be regarded as the confidence of y ∈ Y being the proper label of x. Specifically, given a multi-label example (x, Y ), f(·, ·) should yield larger output on the relevant label $y ′ ∈ Y$ and smaller output on the irrelevant label $y^{''}\notin   Y $
```
multi-label classifier h(·)： $h(x) = \{y | f(x, y) > t(x), y ∈ Y\}$

where t : X → R acts as a thresholding function which dichotomizes the label space into relevant and irrelevant label sets
key challenge:label correlations
- First-order strategy
- Second-order strategy
- High-order strategy
threshold calibration

in order to decide the proper label set for unseen instance x (i.e. h(x)), the real-valued output f(x, y) on each label should be calibrated against the thresholding function output t(x)
- constant function or inducing t(·) from the training examples
- a linear model for t(·)

1533789197423