- Not all datapoints are linearly separable on lower dimension
- Transform such dataset to a higher dimensional space where it can be linearly separable by a hyperplane
- Support vectors:
- examples/data points closest to the hyperplane
- both classified and misclassified datapoints are counted
- If a datapoint is not a support vector, removing it will not affect the model
- Small number of support vectors = fast kernel SVMs
- margin : distance from a support vector to decision boundary
- best decision boundary has equal distance from all support vectors
- The best separable line is the hyperplane has the biggest margin
- Measure of closeness : Regularization parameters (hinge loss and l2)
- Boundary decision lines : The lines that touches the support vectors / closest the the support vectors
- Soft margin SVM : Used when the classes are not separable (Controlled by regularization parameter)
- kernel : Sometimes it is difficult to caculate the mapping of transformation. So we use a shortcut called kernel that is computationally less expensive.
- RBF : support vector = Difference between 2 inputs: X and X`
- C hyperparameter = regularization
- gamma hyperparameter = smoothness of the boundary
- Stochastic gradient descent (SGD) :
- Similar to SVM, but scales well for large dataset
- how : uses gradient descent to find out the maximised margin among possible margins.