SVM

SSU | [Metoda podpůrných vektorů]

Reformulation of the Maximum margin classifier problem into a quadratic program solving

(w^{*}, b^{*}) = arg w \in R^{d}, b \in R min \frac{1}{2} ∣∣ w ∣ ∣^{2}

subject to

⟨ x_{i}, w ⟩ + b \geq + 1 i \in I_{+} ⟨ x_{i}, w ⟩ + b \leq - 1 i \in I_{-}

Support vectors are the training examples that are closest to the decision hyperplane. These are the only training data we need to keep after training is finished.

Question

Draw a line and a point. Draw a triangle with a right angle and write what the distances are equal to between the line and the point. What is the distance from the margin to other points?

Answer

Question

Why does minimizing the weight vector maximizes the margin? What is left to minimize when the closest point has distance of 1?

Answer

When the distance $⟨ x_{i}, w ⟩ + b = 1$ , the distance from the decision bound to the support vector is just $1/∣∣ w ∣∣$ .

Soft-margin SVM

When data are not linearly separable, we can introduce slack-variables and reformulate the quadratic program to

(w^{*}, b^{*}, ξ^{*}) = arg w \in R^{d}, b \in R, ξ \in R^{m} min \frac{1}{2} ∣∣ w ∣ ∣^{2} + C i = 1 \sum m ξ_{i}

subject to

⟨ x_{i}, w ⟩ + b ⟨ x_{i}, w ⟩ + b ξ_{i} \geq + 1 - ξ_{i} \leq - 1 + ξ_{i} \geq 0 i \in I_{+} i \in I_{-} i \in \overset{m}{^}

Slack variable $ξ_{i}$ is equal to 0, when point with index $i$ is classified correctly and is positive when misclassified. It “pushes” the point further away from the decision bound, so the point ends up correctly classified.

Regularization constant $C$ is a hyperparameter. Setting it high pushes slack variables close to zero and tends to overfitting. Setting it low allows greater margin, smoother bound and better generalization.

Implicit structured risk minimization

Structural risk minimization is implemented directly inside the algorithm. We can re-define the objective function with Hinge loss as

(w^{*}, b^{*}) = arg w \in R^{d}, b \in R min \frac{1}{2} ∣∣ w ∣ ∣^{2} + C i = 1 \sum m max {0, 1 - ⟨ x_{i}, w ⟩ + b}

Hinge loss is exactly equal to the slack variables.
It also implicitly satisfies the conditions that margin is at least 1.

Parameter $C$ becomes the complexity term.

Implementation

TODO

Vojtěch Tóth

Explorer

SVM

Question

Answer

Question

Answer

Soft-margin SVM

Implicit structured risk minimization

Implementation

Graph View

Table of Contents

Backlinks