Classifiers' Evaluation Metrics

Confusion matrix Confusion matrix is a table that holds True and False Positive values ('TP' and 'FP'), as well as True and False Negative values ('TN' and 'FN'). Image What is important for the proj

weixin_0010034

14人浏览 · 2022-08-09 20:02:38

weixin_0010034 · 2022-08-09 20:02:38 发布

Confusion matrix
Confusion matrix is a table that holds True and False Positive values ('TP' and 'FP'), as well as True and False Negative values ('TN' and 'FN').

Alt Text
Image

What is important for the project
For example, we have an image classifier, which identifies if a rock is a precious stone or not(e.g., diamond) and we use it for automated mining.
In this context, we may want to get as many stones as possible ('TP'), even if we have some not precious stones identified as diamonds ('FP'). Because it could be sorted out by an expert at a later stage.
Now let's imagine, that we are buying these stones by using our image classifier algorithm. We do not want to buy not precious stones('FP'), so our model should be very careful regarding False Positive predictions.

Common Evaluation Metrics
To evaluate and quantify the performance of a classification model, we can use common evaluation metrics: accuracy, balanced accuracy, precision, recall (a.k.a. sensitivity and True Positive Rate), Specificity (=1-False Positive Rate), ROC (=TPR vs FPR) and F1 score.
As we can see there are many options to choose from regarding evaluation metrics. However, all of these metrics can be calculated using confusion matrix values(TP, FP, TN, and FN). So, the main idea is to know what metrics are most important for the project, and how well balanced is the target we are trying to predict (classify).
The most general approach would be to choose a few metrics to optimize (e.g., accuracy, recall, precision, F1 score, ROC-AUC).

向你推荐>>>开发者社区

华为、百度、京东云现已入驻，来创建你的专属开发者社区吧！

更多推荐

关于 Jupyter 笔记本最糟糕的五件事

我曾经喜欢 Jupyter。我仍然认为它们是许多任务的绝佳工具,例如探索性数据分析和轻松轻松地向同事展示见解。然而,虽然它们有时非常适合数据科学,但有时却令人头疼。像任何软件工具一样,它们也有其缺点。以下是 Jupyter Notebooks 用于数据科学的五个最糟糕的事情: 1.练习良好的代码版本控制几乎是不可能的 Jupyter Notebooks 对于代码版本控制来说很糟糕。问题是它们存储为

大数据

2023 年流行的大数据和数据科学角色

数据科学和大数据提供了广泛的职业前景。涉及数据的角色的扩展伴随着数据科学的出现。它是当今最流行和最前沿的技术应用领域之一,这是有道理的。数据科学目前可能是最好的就业市场。与此同时,这一发展中的主题正在改变众多业务和技术。随着所有垂直领域的行业越来越受数据驱动,就业市场和必要的技能受到影响。随着我们学习新的数据接触点和评估方法,我们生活的社会、日常生活和国家经济越来越依赖数据。这是大数据和数据科学能

大数据

数据科学的主要组成部分和特点

数据科学是十年来增长最快、最具挑战性和高薪的工作之一。那么,究竟什么是数据科学?数据科学是一个跨学科领域,它结合了统计学、计算机科学和机器学习算法,以从结构化和非结构化数据中获得洞察力。据《经济时报》报道,尽管供应增长缓慢,但印度对通过数据科学课程认证的各行业数据科学专业人员的需求增长了 400% 以上。数据科学的组成部分 1\。数据探索这是最关键的一步,因为它花费的时间最多。数据探索消耗了大