Stack Exchange Network
Stack Exchange network consists of 183 Q&A communities including
Stack Overflow
, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Visit Stack Exchange
Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. It only takes a minute to sign up.
Sign up to join this community
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I do not understand the meaning of
colors
in nodes/leaves when building decision trees by
sklearn.tree DecisionTreeClassifier
.
Here's my code:
from sklearn import tree
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
iris = datasets.load_iris()
X = iris.data[:, [2, 3]]
y = iris.target
tree_model = DecisionTreeClassifier(criterion='gini',
max_depth=4,
random_state=1)
tree_model.fit(X,y)
plt.rcParams["figure.figsize"] = (12,10)
tree.plot_tree(tree_model,filled=True)
plt.show()
Is there any logic in the choice of colors by scikit-learn? It doesn't seem so.
$\begingroup$
The colours represent the majority class (and therefore the predicted class) in each node. Thus, the plot_tree()
function is using orange to represent the first class, green to represent the second class, and purple to represent the third class.
Also note that transparency is used to communicate the impurity of each node.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.