Stack Exchange Network
Stack Exchange network consists of 183 Q&A communities including
Stack Overflow
, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Visit Stack Exchange
Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.
Sign up to join this community
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
Is it a problem if the test data only has a subset of the features that are used to train the xgboost model? All my predictor variables (except 1) are factors, so one hot encoding is done before converting it into xgb.DMatrix. So the different levels of the factor variables become the features and my test doesn't have all of these features, only a subset of it.
At the moment, while running my model on test data in R, I 'm running into an error saying that "Features names stored in object and newdata are different!".
I'm new in the field, so any help would be much appreciated. Thanks!
All the variables used to train the model must be present in the test set.
This is because you used all the variables to create the rules. Hence we would need those to score them.
If you are using python to do one hot encoding using fit or fit_transform functions in sklearn you will use the same object to transform the test set using transform function.
This will ensure the variables to be consistent in test and train.
$\begingroup$
$\endgroup$
–
$\begingroup$
$\endgroup$
–
$\begingroup$
$\endgroup$
–
$\begingroup$
$\endgroup$
–
$\begingroup$
$\endgroup$
–
Thanks for contributing an answer to Data Science Stack Exchange!
-
Please be sure to
answer the question
. Provide details and share your research!
But
avoid
…
-
Asking for help, clarification, or responding to other answers.
-
Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations.
MathJax reference
.
To learn more, see our
tips on writing great answers
.