Geoms to display summarizing statistics
They always take an x and compute and plot the corresponding summarizing y-values for it: central value and dispersion of some sort, or just one of these.
You have two options to say what values the geom should render:
supply individual functions to compute the lower and higher dispersion ranges, and a function to compute the central value. Argument slots to fill:
fun
,
fun.min
,
fun.max
.
supply a summary function that returns both/all of them. Its output must be a named vector. Argument slot to fill:
fun.data
.
You either have to write these functions yourself, or you can use a few well-matching ones from the
Hmisc
package or wrappers around them created for ggplot2.
Examples of summary functions
geom_errorbar
and
geom_linerange
Just ranges, without the central value
Compute standard error of the mean lifeExpectancy for each continent in each year. Render them as errorbars (i.e. without the mean).
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun.data = "mean_se", geom = "errorbar", size = 1)
Compute standard deviation of the life expectancy for each continent in each year using linerange (i.e. without the mean)
Use the
ggplot2::mean_sd
function. NB: it is a wrapper around the
Hmisc::smean_sdl
function. It is documented below. By default, the range represents standard error times two (double length). To alter this, one has to use the
mult
parameter just like in the original
Hmisc::smean_sdl
function. Look at the way
stat_summary
inputs these arguments:
fun.args = list()
.
In fun.data, preferably use these functions:
Usage
mean_cl_boot(x, ...)
mean_cl_normal(x, ...)
mean_sdl(x, ...)
median_hilow(x, ...)
Arguments
x a numeric vector
... other arguments passed on to the respective Hmisc function.
Value
A data frame with columns y, ymin, and ymax.
These are wrappers around some summary function from the
Hmisc package
, and they use Hmisc’s functions’ parameters. Documentation of these functions follows below.
Usage
smean.cl.normal(x, mult=qt((1+conf.int)/2,n-1), conf.int=.95, na.rm=TRUE)
smean.sd(x, na.rm=TRUE)
smean.sdl(x, mult=2, na.rm=TRUE)
smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)
smedian.hilow(x, conf.int=.95, na.rm=TRUE)
These cannot be used directly as fun.data
, since their output is different from the dataframe with column names y
, y.min
, and y.max
. Example:
c(1,1,1,10,10,10,10) %>% Hmisc::smean.cl.normal() %T>% str() #mean and confidence intervals
Named num [1:3] 6.14 1.69 10.59
- attr(*, "names")= chr [1:3] "Mean" "Lower" "Upper"
Mean Lower Upper
6.142857 1.693700 10.592015
As seen above, the resulting output is a named vector, with names different from y
, y.min
, y.max
.
In all summary functions, we can supply either fun.data
, or functions for each statistics separately. These arguments are called fun
(the central value), fun.min
(the lower dispersion value), and fun.max
(the upper dispersion value).
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)
geom = "linerange", size = 0.7,
position = position_dodge(width = 0.8))
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun.data = "mean_se", geom = "crossbar", size = 0.7)
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1),
geom = "pointrange", position = position_dodge(width = 0.8),
size = 0.5)
low_f <- function(x) {quantile(x, probs = 0.25)}
hi_f <- function(x) {quantile(x, probs = 0.75)}
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun = "median", fun.min = "low_f",
fun.max = "hi_f",
geom = "pointrange", position = position_dodge(width = 0.8),
size = 0.5)
gapminder %>% filter(year > 1995) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.3) +
geom_quantile(formula = y ~ x ,quantiles = c(0.01, 0.25, 0.5, 0.75),
aes(color = factor(..quantile..)), size = 2) +
geom_smooth( formula = y ~ x , method = "lm", color = "black", linetype = 2, se = FALSE)# + #facet_wrap(~continent)
---
title: "Ggplot Statistical transformation objects"
output: html_notebook
---
```{r message=FALSE}
library(gapminder)
library(tidyverse)
```

# Geoms to display summarizing statistics

They always take an x and compute and plot the corresponding summarizing 
y-values for it: central value and dispersion of some sort, or just one of these.

You have two options to say what values the geom should render:

  - supply individual functions to compute the lower and higher dispersion 
  ranges, and a function to compute the central value. Argument slots to fill:
  `fun`, `fun.min`, `fun.max`.  
  - supply a summary function that returns both/all of them. 
  Its output must be a named vector. Argument slot to fill: `fun.data`. 
  
You either have to write these functions yourself, or you can use a few 
well-matching ones from the `Hmisc` package or wrappers around them created for 
ggplot2. 

_Examples of summary functions_


### `geom_errorbar` and `geom_linerange`
Just ranges, without the central value

__Compute standard error of the mean lifeExpectancy
for each continent in each year. Render them as errorbars 
(i.e. without the mean).__

```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_se", geom = "errorbar", size = 1)
```
__Compute standard deviation of the life expectancy for each continent in each 
year using linerange (i.e. without the mean)__

Use the `ggplot2::mean_sd` function. NB: it is a wrapper around the 
`Hmisc::smean_sdl` function. It is documented below. By default, the range 
represents standard error times two (double length). To alter this, one has to 
use the `mult` parameter just like in the original`Hmisc::smean_sdl` function.
Look at the way `stat_summary` inputs these arguments: `fun.args = list()`. 

In fun.data, preferably use these functions:

```
Usage

mean_cl_boot(x, ...)

mean_cl_normal(x, ...)

mean_sdl(x, ...)

median_hilow(x, ...)

Arguments
x 	      a numeric vector

... 	    other arguments passed on to the respective Hmisc function.

Value

A data frame with columns y, ymin, and ymax. 
```
These are wrappers around some summary function from the `Hmisc package`, and they
use Hmisc's functions' parameters. Documentation of these functions follows below. 

```
Usage

smean.cl.normal(x, mult=qt((1+conf.int)/2,n-1), conf.int=.95, na.rm=TRUE)

smean.sd(x, na.rm=TRUE)

smean.sdl(x, mult=2, na.rm=TRUE)

smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)

smedian.hilow(x, conf.int=.95, na.rm=TRUE)

```

These cannot be used directly as `fun.data`, since their output is different from 
the dataframe with column names `y`, `y.min`, and `y.max`. Example:

```{r message=FALSE}
library(Hmisc)
library(magrittr)
c(1,1,1,10,10,10,10) %>% Hmisc::smean.cl.normal() %T>% str() #mean and confidence intervals
```
As seen above, the resulting output is a named vector, with names different from `y`, `y.min`, `y.max`.

In all summary functions, we can supply either `fun.data`, or functions for each statistics separately. These arguments are called `fun` (the central value), `fun.min` 
(the lower dispersion value), and `fun.max` (the upper dispersion value).  


```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)
               , 
               geom = "linerange", size = 0.7,
               position = position_dodge(width = 0.8))
```
###`geom_crossbar` and `geom_pointrange`
These include the central value

```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_se", geom = "crossbar", size = 0.7)

```
The 
```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1), 
               geom = "pointrange", position = position_dodge(width = 0.8),
               size = 0.5)

```

With `fun`, `fun.min`, and `fun.max`: you have to write your own functions first :-/ .  

```{r}
low_f <- function(x) {quantile(x, probs = 0.25)}
hi_f <- function(x) {quantile(x, probs = 0.75)}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun = "median", fun.min = "low_f", 
               fun.max = "hi_f", 
                   geom = "pointrange", position = position_dodge(width = 0.8),
               size = 0.5)

```

### `geom_smooth`, `stat_smooth`

```{r}
gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) + 
  geom_point() + 
  geom_smooth(method = "lm")
```

### `geom_quantile`, `stat_quantile`

```{r}
gapminder %>% filter(year > 1995) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + 
  geom_point(alpha = 0.3) + 
  geom_quantile(formula = y ~ x ,quantiles = c(0.01, 0.25, 0.5, 0.75), 
                 aes(color = factor(..quantile..)), size = 2) +
  geom_smooth( formula = y ~ x , method = "lm", color = "black", linetype = 2, se = FALSE)# + #facet_wrap(~continent)
```

# `stat_function` 


```{r}

gapminder %>% filter(continent == "Europe", year > 2000) %>% 
  ggplot(aes(x = gdpPercap)) + 
  geom_density() +
  stat_function(fun = dnorm, 
               color = "red", args = list(mean = mean(filter(gapminder, continent == "Europe", year > 2000)$gdpPercap), sd = sd(filter(gapminder, continent == "Europe", year > 2000)$gdpPercap)))
```





