![](uploads/softmax-function-66559bfe7f573.png)
The softmax function is a commonly used activation function in neural networks, especially in the output layer of a classification model. It converts raw scores or logits into probabilities that sum up to 1. This allows the model to output a probability distribution over multiple classes, making it suitable for multi-class classification tasks.
The softmax function takes a vector of K real numbers as input and outputs a vector of the same size, where each element is in the range (0, 1) and the sum of all elements is 1. The softmax function is defined as:
$$softmax(x)_i = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}}$$
where \(x\) is the input vector of logits or raw scores, \(i\) is the index of the element in the output vector, and \(K\) is the number of classes. The softmax function exponentiates each element of the input vector and then normalizes the values to obtain a probability distribution.
Here is a simple implementation of the softmax function in Python:
import numpy as np
def softmax(x):
exp_scores = np.exp(x)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
return probs
# Test the softmax function
x = np.array([[1.0, 2.0, 3.0],
[0.5, 1.0, 2.0]])
print(softmax(x))
In this implementation, we first exponentiate the input scores using NumPy's exp
function. Then, we compute the softmax probabilities by dividing the exponentiated scores by the sum of all exponentiated scores along the appropriate axis (usually axis=1 for row-wise normalization).
The softmax function is widely used in various machine learning tasks, including:
While the softmax function is effective for multi-class classification, it has some limitations: