Abstract [ Reference]

In multi-class classification supervised machine learning, the label represents one of multiple possible classes.

Usually, the labels are mutually exclusive.
- In multi-label classification models, there may be more than one valid label for an observation.
Follows the same train > iterate > evaluate process as regression and binary classification.
Examples:
- The specifies of penguin based on its physical measurements.
- The genre of a movie based on its cast, director and budget.

Training Multi-class Classification Models

Use an algorithm to fit the training data to a function that calculates a probability value for each possible class.

A binary classification function is trained for each class:

That function — a sigmoid function — calculates the probability that the observation is an example of that class.

These algorithms create a single function that returns a multi-valued output.

The output is a vector (array of values) that contains the probability distribution for all possible classes with a probability score for each class:
- $f(x)= [P(y=0|x), P(y=1|x), …]$
In the resulting vector (for example, $[0.2, 0.3, 0.5]$) the elements represent the probabilities for class 0, 1, 2 and so on.
An example of such a function is the softmax function.

Multi-class classification models are evaluated calculating binary classification metrics for each individual class.