R Graphics Essentials for Great Data Visualization: 200 Practical Examples You Want to Know for Data Science
NEW!!
In this chapter we’ll describe different statistical regression
metrics
for measuring the performance of a regression model (Chapter @ref(linear-regression)).
Next, we’ll provide practical examples in R for comparing the performance of two models in order to select the best one for our data.
Contents:
Model performance metrics
Loading required R packages
Example of data
Building regression models
Assessing model quality
Comparing regression models performance
Discussion
Model performance metrics
In regression model, the most commonly known evaluation metrics include:
R-squared
(R2), which is the proportion of variation in the outcome that is explained by the predictor variables. In multiple regression models, R2 corresponds to the squared correlation between the observed outcome values and the predicted values by the model. The Higher the R-squared, the better the model.
Root Mean Squared Error
(RMSE), which measures the average error performed by the model in predicting the outcome for an observation. Mathematically, the RMSE is the square root of the
mean squared error (MSE)
, which is the average squared difference between the observed actual outome values and the values predicted by the model. So,
MSE = mean((observeds - predicteds)^2)
and
RMSE = sqrt(MSE
). The lower the RMSE, the better the model.
Residual Standard Error
(RSE), also known as the
model sigma
, is a variant of the RMSE adjusted for the number of predictors in the model. The lower the RSE, the better the model. In practice, the difference between RMSE and RSE is very small, particularly for large multivariate data.
Mean Absolute Error
(MAE), like the RMSE, the MAE measures the prediction error. Mathematically, it is the average absolute difference between observed and predicted outcomes,
MAE = mean(abs(observeds - predicteds))
. MAE is less sensitive to outliers compared to RMSE.
The problem with the above metrics, is that they are sensible to the inclusion of additional variables in the model, even if those variables dont have significant contribution in explaining the outcome. Put in other words, including additional variables in the model will always increase the R2 and reduce the RMSE. So, we need a more robust metric to guide the model choice.
Concerning R2, there is an adjusted version, called
Adjusted R-squared
, which adjusts the R2 for having too many variables in the model.
Additionally, there are four other important metrics -
AIC
,
AICc
,
BIC
and
Mallows Cp
- that are commonly used for model evaluation and selection. These are an unbiased estimate of the model prediction error MSE. The lower these metrics, he better the model.
AIC
stands for (
Akaike’s Information Criteria
), a metric developped by the Japanese Statistician, Hirotugu Akaike, 1970. The basic idea of AIC is to penalize the inclusion of additional variables to a model. It adds a penalty that increases the error when including additional terms. The lower the AIC, the better the model.
AICc
is a version of AIC corrected for small sample sizes.
BIC
(or
Bayesian information criteria
) is a variant of AIC with a stronger penalty for including additional variables to the model.
Mallows Cp
: A variant of AIC developed by Colin Mallows.
Generally, the most commonly used metrics, for measuring regression model quality and for comparing models, are: Adjusted R2, AIC, BIC and Cp.
In the following sections, we’ll show you how to compute these above mentionned metrics.
Loading required R packages
tidyverse
for data manipulation and visualization
modelr
provides helper functions for computing regression model performance metrics
broom
creates easily a tidy data frame containing the model statistical metrics
library(tidyverse)
library(modelr)
library(broom)
Example of data
We’ll use the built-in R
swiss
data, introduced in the Chapter @ref(regression-analysis), for predicting fertility score on the basis of socio-economic indicators.
# Load the data
data("swiss")
# Inspect the data
sample_n(swiss, 3)
Building regression models
We start by creating two models:
Model 1, including all predictors
Model 2, including all predictors except the variable Examination
model1
Assessing model quality
There are many R functions and packages for assessing model quality, including:
summary()
[stats package], returns the R-squared, adjusted R-squared and the RSE
AIC()
and
BIC()
[stats package], computes the AIC and the BIC, respectively
summary(model1)
AIC(model1)
BIC(model1)
rsquare()
,
rmse()
and
mae()
[modelr package], computes, respectively, the R2, RMSE and the MAE.
library(modelr)
data.frame(
R2 = rsquare(model1, data = swiss),
RMSE = rmse(model1, data = swiss),
MAE = mae(model1, data = swiss)
R2()
, RMSE()
and MAE()
[caret package], computes, respectively, the R2, RMSE and the MAE.
library(caret)
predictions % predict(swiss)
data.frame(
R2 = R2(predictions, swiss$Fertility),
RMSE = RMSE(predictions, swiss$Fertility),
MAE = MAE(predictions, swiss$Fertility)
glance()
[broom package], computes the R2, adjusted R2, sigma (RSE), AIC, BIC.
library(broom)
glance(model1)
Manual computation of R2, RMSE and MAE:
# Make predictions and compute the
# R2, RMSE and MAE
swiss %>%
add_predictions(model1) %>%
summarise(
R2 = cor(Fertility, pred)^2,
MSE = mean((Fertility - pred)^2),
RMSE = sqrt(MSE),
MAE = mean(abs(Fertility - pred))
Comparing regression models performance
Here, we’ll use the function glance()
to simply compare the overall quality of our two models:
# Metrics for model 1
glance(model1) %>%
dplyr::select(adj.r.squared, sigma, AIC, BIC, p.value)
## adj.r.squared sigma AIC BIC p.value
## 1 0.671 7.17 326 339 5.59e-10
# Metrics for model 2
glance(model2) %>%
dplyr::select(adj.r.squared, sigma, AIC, BIC, p.value)
## adj.r.squared sigma AIC BIC p.value
## 1 0.671 7.17 325 336 1.72e-10
From the output above, it can be seen that:
The two models have exactly the samed adjusted R2 (0.67), meaning that they are equivalent in explaining the outcome, here fertility score. Additionally, they have the same amount of residual standard error (RSE or sigma = 7.17). However, the model 2 is more simple than model 1 because it incorporates less variables. All things equal, the simple model is always better in statistics.
The AIC and the BIC of the model 2 are lower than those of the model1. In model comparison strategies, the model with the lowest AIC and BIC score is preferred.
Finally, the F-statistic p.value of the model 2 is lower than the one of the model 1. This means that the model 2 is statistically more significant compared to model 1, which is consistent to the above conclusion.
Note that, the RMSE and the RSE are measured in the same scale as the outcome variable. Dividing the RSE by the average value of the outcome variable will give you the prediction error rate, which should be as small as possible:
sigma(model1)/mean(swiss$Fertility)
## [1] 0.102
In our example the average prediction error rate is 10%.
Discussion
This chapter describes several metrics for assessing the overall performance of a regression model.
The most important metrics are the Adjusted R-square, RMSE, AIC and the BIC. These metrics are also used as the basis of model comparison and optimal model selection.
Note that, these regression metrics are all internal measures, that is they have been computed on the same data that was used to build the regression model. They tell you how well the model fits to the data in hand, called training data set.
In general, we do not really care how well the method works on the training data. Rather, we are interested in the accuracy of the predictions that we obtain when we apply our method to previously unseen test data.
However, the test data is not always available making the test error very difficult to estimate. In this situation, methods such as cross-validation (Chapter @ref(cross-validation)) and bootstrap (Chapter @ref(bootstrap-resampling)) are applied for estimating the test error (or the prediction error rate) using training data.
Enjoyed this article? Give us 5 stars
(just above this text block)! Reader needs to be STHDA member for voting. I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.
Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Avez vous aimé cet article? Donnez nous 5 étoiles
(juste au dessus de ce block)! Vous devez être membre pour voter. Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.
Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Coursera - Online Courses and Specialization
Data science
Popular Courses Launched in 2020
Trending Courses
Books - Data Science
Our Books
Others
Comments
You are not authorized to post a comment
Comment
This is fantastic ! Thank you
When we use caret package for Logistic regression, how can I get various tests done?
1. Hosmer–Lemeshow test
2. Omnibus test - Likelihood test