添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
聪明的作业本  ·  Vibration Test ...·  2 周前    · 
温暖的鸵鸟  ·  Error thrown - ...·  1 周前    · 
爱跑步的钥匙  ·  MybatisPlus学习笔记 | ...·  4 天前    · 
大鼻子的手术刀  ·  ERROR! SSH Error: ...·  2 天前    · 
悲伤的熊猫  ·  react-scrollbars-custo ...·  1 月前    · 
道上混的斑马  ·  Docker compose up ...·  3 月前    · 
安静的铅笔  ·  Responsividade | ...·  1 年前    · 

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. Visit Stack Exchange

Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

Sign up to join this community

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

XGBoost: Can the features in test data be a subset of the features used to train the model?

Ask Question

Is it a problem if the test data only has a subset of the features that are used to train the xgboost model? All my predictor variables (except 1) are factors, so one hot encoding is done before converting it into xgb.DMatrix. So the different levels of the factor variables become the features and my test doesn't have all of these features, only a subset of it.

At the moment, while running my model on test data in R, I 'm running into an error saying that "Features names stored in object and newdata are different!".

I'm new in the field, so any help would be much appreciated. Thanks!

All the variables used to train the model must be present in the test set.

This is because you used all the variables to create the rules. Hence we would need those to score them.

If you are using python to do one hot encoding using fit or fit_transform functions in sklearn you will use the same object to transform the test set using transform function.

This will ensure the variables to be consistent in test and train.

$\begingroup$ When you say variables, does it also mean that all levels of the variables must be present? Because all my predictor variables are factors and some levels of these variables are not present in the test set. $\endgroup$ FMJ Jun 14, 2019 at 7:36 $\begingroup$ If your train dataset has all the factors and you have done a fit transform on one hot encoding . If you use the same object to transform your test dataset it will create all the factors in train dataset as columns with their values set to 0. $\endgroup$ mahesh ghanta Jun 14, 2019 at 7:40 $\begingroup$ If possible can you share your code. We can identify the issue and fix it $\endgroup$ mahesh ghanta Jun 14, 2019 at 7:41 $\begingroup$ Thank you! My code is in R. Will you be able to share how to do the fit transform in R? $\endgroup$ FMJ Jun 14, 2019 at 7:49 $\begingroup$ I figured it out in R and fixed the issue using dummyVars from caret package and then using this object to transform train and test. Thanks a lot for clarifying my query and pointing me in the right direction! $\endgroup$ FMJ Jun 14, 2019 at 12:39

Thanks for contributing an answer to Data Science Stack Exchange!

  • Please be sure to answer the question . Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference .

To learn more, see our tips on writing great answers .