What Is 'random_state' in sklearn.model_selection.train_test_split Example?
In this article, we’ll explore what “random_state” is and why it’s important in data science . We’ll also demonstrate how you can use it in your projects to ensure reproducibility of your results.
Table of Contents
- Introduction
- What is train_test_split?
-
What is
random_state
? -
Why is
random_state
important? - Conclusion
What is train_test_split?
Before we dive into
random_state
, let’s first understand what
train_test_split
does. It’s a function in the
sklearn.model_selection
module that splits a dataset into two subsets: one for training and one for testing. The training set is used to train a
machine learning
model, while the testing set is used to evaluate its performance.
Here’s an example of how to use
train_test_split
:
from sklearn.model_selection import train_test_split
# Example of dummy data for X (features) and y (labels)
X = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
y = [0, 1, 0]
# Use train_test_split to split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Display training and testing data
print("Training data X:", X_train)
print("Training labels y:", y_train)
print("Testing data X:", X_test)
print("Testing labels y:", y_test)
In the example above,
X
and
y
are the dataset to be split, and
test_size
is the proportion of the dataset to be allocated to the testing set. The remaining data is used for training.
Output:
Training data X: [[7, 8, 9], [4, 5, 6]]
Training labels y: [0, 1]
Testing data X: [[1, 2, 3]]
Testing labels y: [0]
Once the data is split, you can use the subsets to train and evaluate your model. However, the results you obtain may differ each time you run the code. This is where “random_state” comes in.
What is
random_state
?
Training set X: [11 47 85 28 93 5 66 65 35 16 49 34 7 95 27 19 81 25 62 13 24 3 17 38
8 78 6 64 36 89 56 99 54 43 50 67 46 68 61 97 79 41 58 48 98 57 75 32 94 59 63 84 37 29 1 52 21 2 23 87 91 74 86 82 20 60 71 14 92 51]
Testing set X: [83 53 70 45 44 39 22 80 10 0 18 30 73 33 90 4 76 77 12 31 55 88 26 42
69 15 40 96 9 72]
Training labels y: [11 47 85 28 93 5 66 65 35 16 49 34 7 95 27 19 81 25 62 13 24 3 17 38 8 78 6 64 36 89 56 99 54 43 50 67 46 68 61 97 79 41 58 48 98 57 75 32 94 59 63 84 37 29 1 52 21 2 23 87 91 74 86 82 20 60 71 14 92 51]