What is the difference between sparse_categorical_crossentropy and categorical_crossentropy? When should one loss be used as opposed to the other? For example, are these losses suitable for linear regression?
What is the difference between sparse_categorical_crossentropy and categorical_crossentropy?
Answer a question
Answers
Simply:
categorical_crossentropy(cce) produces a one-hot array containing the probable match for each category,sparse_categorical_crossentropy(scce) produces a category index of the most likely matching category.
Consider a classification problem with 5 categories (or classes).
-
In the case of
cce, the one-hot target may be[0, 1, 0, 0, 0]and the model may predict[.2, .5, .1, .1, .1](probably right) -
In the case of
scce, the target index may be [1] and the model may predict: [.5].
Consider now a classification problem with 3 classes.
- In the case of
cce, the one-hot target might be[0, 0, 1]and the model may predict[.5, .1, .4](probably inaccurate, given that it gives more probability to the first class) - In the case of
scce, the target index might be[0], and the model may predict[.5]
Many categorical models produce scce output because you save space, but lose A LOT of information (for example, in the 2nd example, index 2 was also very close.) I generally prefer cce output for model reliability.
There are a number of situations to use scce, including:
- when your classes are mutually exclusive, i.e. you don't care at all about other close-enough predictions,
- the number of categories is large to the prediction output becomes overwhelming.
220405: response to "one-hot encoding" comments:
one-hot encoding is used for a category feature INPUT to select a specific category (e.g. male versus female). This encoding allows the model to train more efficiently: training weight is a product of category, which is 0 for all categories except for the given one.
cce and scce are a model OUTPUT. cce is a probability array of each category, totally 1.0. scce shows the MOST LIKELY category, totally 1.0.
scce is technically a one-hot array, just like a hammer used as a door stop is still a hammer, but its purpose is different. cce is NOT one-hot.
更多推荐


所有评论(0)