使用 Python 绘制决策树

BIGdd

179人浏览 · 2022-08-11 14:48:18

BIGdd · 2022-08-11 14:48:18 发布

大家好,

要使用 python 绘制决策树作为输出,可以实现以下代码:-

[ Alt ](https://res.cloudinary.com/practicaldev/image/fetch/s--SvDr0mfd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev- to-uploads.s3.amazonaws.com/uploads/articles/pn98aq41u282so48wir9.png)

[ Alt ](https://res.cloudinary.com/practicaldev/image/fetch/s--lrJvsMwf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev- to-uploads.s3.amazonaws.com/uploads/articles/c6kn2x44em51a24ien6e.png)

在执行 python 代码之前,从以下链接下载数据集:

https://github.com/ruthvikraja/DT.git

# Decision Tree Classifier
import pandas as pd
from sklearn.model_selection import train_test_split
# This is used to split our data into training and testing sets
from sklearn import tree # Here tree is a module
from sklearn.metrics import accuracy_score
# Used to check the goodness of our model
import matplotlib.pyplot as plt
# Used to plot figures

df1=pd.read_excel("/Users/ruthvikrajam.v/Desktop/heart.xlsx")
# storing our excel file in df1
df1.info() # This function is used to check whether our data consists of any missing or null values
X=df1.loc[:,df1.columns!="target"]
y=df1["target"]
X_train, X_test, Y_train, Y_test=train_test_split(X, y, test_size=0.2, random_state=0)
# Here test_size = 0.2 means it uses 20% of our input data for testing and 80% for training set
# random_state = 0 means every time it uses the same set of testing and training set for evaluation

clftree1=tree.DecisionTreeClassifier(criterion="entropy")
# Using Entropy for computing the Decision Tree
clftree1.fit(X_train,Y_train)
pred=clftree1.predict(X_test)    # Predicting the values for our test data
accuracy_score1=accuracy_score(Y_test, pred)   # Finding the accuracy score of our model
print(accuracy_score1)

fig, ax = plt.subplots(nrows = 1, ncols = 1, figsize = (10,10),dpi=300)
# Let us create a figure with size (10X10) and density per inch = 300
tree.plot_tree(clftree1, feature_names=list(df1.columns),class_names="01",filled =True)
# plot_tree is used to plot our decision tree. The parameters are our Decision Tree, feature names, class names to be displayed in
  # string format (or) as a list, filled=True will automatically fill colours to our tree etc
fig.savefig("imagename1.jpeg.png")                                     

clftree2=tree.DecisionTreeClassifier(criterion="gini")
# Using Gini Index for computing the Decision Tree
clftree2.fit(X_train,Y_train)
pred=clftree2.predict(X_test)    # Predicting the values for our test data
accuracy_score2=accuracy_score(Y_test, pred)   # Finding the accuracy score of our model
print(accuracy_score2)

fig, ax = plt.subplots(nrows = 1,ncols = 1,figsize = (10,10),
dpi=300)
tree.plot_tree(clftree2, feature_names=list(df1.columns),
class_names="01", filled=True)
fig.savefig('imagename2.jpeg.png')

进入全屏模式退出全屏模式

完毕...

向你推荐>>>开发者社区

华为、百度、京东云现已入驻，来创建你的专属开发者社区吧！

更多推荐

关于 Jupyter 笔记本最糟糕的五件事

我曾经喜欢 Jupyter。我仍然认为它们是许多任务的绝佳工具,例如探索性数据分析和轻松轻松地向同事展示见解。然而,虽然它们有时非常适合数据科学,但有时却令人头疼。像任何软件工具一样,它们也有其缺点。以下是 Jupyter Notebooks 用于数据科学的五个最糟糕的事情: 1.练习良好的代码版本控制几乎是不可能的 Jupyter Notebooks 对于代码版本控制来说很糟糕。问题是它们存储为

大数据

2023 年流行的大数据和数据科学角色

数据科学和大数据提供了广泛的职业前景。涉及数据的角色的扩展伴随着数据科学的出现。它是当今最流行和最前沿的技术应用领域之一,这是有道理的。数据科学目前可能是最好的就业市场。与此同时,这一发展中的主题正在改变众多业务和技术。随着所有垂直领域的行业越来越受数据驱动,就业市场和必要的技能受到影响。随着我们学习新的数据接触点和评估方法,我们生活的社会、日常生活和国家经济越来越依赖数据。这是大数据和数据科学能

大数据

数据科学的主要组成部分和特点

数据科学是十年来增长最快、最具挑战性和高薪的工作之一。那么,究竟什么是数据科学?数据科学是一个跨学科领域,它结合了统计学、计算机科学和机器学习算法,以从结构化和非结构化数据中获得洞察力。据《经济时报》报道,尽管供应增长缓慢,但印度对通过数据科学课程认证的各行业数据科学专业人员的需求增长了 400% 以上。数据科学的组成部分 1\。数据探索这是最关键的一步,因为它花费的时间最多。数据探索消耗了大