- One-vs-Rest (OvR) or One-vs-All:
- a two class classifier trained for each class against the rest of the classes.
- Classifier with the highest confidence is chosen as the final prediction.
- One-vs-One (OvO):
- binary classifiers are trained for each pair of combination of classes.
- For n classes, n-1 classifiers are trained with pairs
- Final output is determined by majority of votes amongst the classifiers
- Multinomial (Softmax):
- directly trains a single classifier to predict the probabilities of each class.
- uses a "softmax" activation function, which converts the raw outputs into class probabilities.
- Calculates hyperplane distance with features and weights for each class. The distance is then converted to probability.
- class with the highest probability is chosen as the final prediction.
- nothing but the generalized argmax of multiple singular logistic regression