添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Geoms to display summarizing statistics

They always take an x and compute and plot the corresponding summarizing y-values for it: central value and dispersion of some sort, or just one of these.

You have two options to say what values the geom should render:

  • supply individual functions to compute the lower and higher dispersion ranges, and a function to compute the central value. Argument slots to fill: fun , fun.min , fun.max .
  • supply a summary function that returns both/all of them. Its output must be a named vector. Argument slot to fill: fun.data .
  • You either have to write these functions yourself, or you can use a few well-matching ones from the Hmisc package or wrappers around them created for ggplot2.

    Examples of summary functions

    geom_errorbar and geom_linerange

    Just ranges, without the central value

    Compute standard error of the mean lifeExpectancy for each continent in each year. Render them as errorbars (i.e. without the mean).

    gapminder %>% 
      ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
      geom_point(alpha = 0.2, 
                 position = position_jitterdodge(jitter.width = 0.7, 
                                                 dodge.width = 0.7)) +
      stat_summary(fun.data = "mean_se", geom = "errorbar", size = 1)

    Compute standard deviation of the life expectancy for each continent in each year using linerange (i.e. without the mean)

    Use the ggplot2::mean_sd function. NB: it is a wrapper around the Hmisc::smean_sdl function. It is documented below. By default, the range represents standard error times two (double length). To alter this, one has to use the mult parameter just like in the original Hmisc::smean_sdl function. Look at the way stat_summary inputs these arguments: fun.args = list() .

    In fun.data, preferably use these functions:

    Usage
    mean_cl_boot(x, ...)
    mean_cl_normal(x, ...)
    mean_sdl(x, ...)
    median_hilow(x, ...)
    Arguments
    x         a numeric vector
    ...         other arguments passed on to the respective Hmisc function.
    Value
    A data frame with columns y, ymin, and ymax. 

    These are wrappers around some summary function from the Hmisc package , and they use Hmisc’s functions’ parameters. Documentation of these functions follows below.

    Usage
    smean.cl.normal(x, mult=qt((1+conf.int)/2,n-1), conf.int=.95, na.rm=TRUE)
    smean.sd(x, na.rm=TRUE)
    smean.sdl(x, mult=2, na.rm=TRUE)
    smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)
    smedian.hilow(x, conf.int=.95, na.rm=TRUE)
    

    These cannot be used directly as fun.data, since their output is different from the dataframe with column names y, y.min, and y.max. Example:

    c(1,1,1,10,10,10,10) %>% Hmisc::smean.cl.normal() %T>% str() #mean and confidence intervals
    
     Named num [1:3] 6.14 1.69 10.59
     - attr(*, "names")= chr [1:3] "Mean" "Lower" "Upper"
         Mean     Lower     Upper 
     6.142857  1.693700 10.592015 

    As seen above, the resulting output is a named vector, with names different from y, y.min, y.max.

    In all summary functions, we can supply either fun.data, or functions for each statistics separately. These arguments are called fun (the central value), fun.min (the lower dispersion value), and fun.max (the upper dispersion value).

    gapminder %>% 
      ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
      geom_point(alpha = 0.2, 
                 position = position_jitterdodge(jitter.width = 0.7, 
                                                 dodge.width = 0.7)) +
      stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)
                   geom = "linerange", size = 0.7,
                   position = position_dodge(width = 0.8))
    gapminder %>% 
      ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
      geom_point(alpha = 0.2, 
                 position = position_jitterdodge(jitter.width = 0.7, 
                                                 dodge.width = 0.7)) +
      stat_summary(fun.data = "mean_se", geom = "crossbar", size = 0.7)
    
    gapminder %>% 
      ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
      geom_point(alpha = 0.2, 
                 position = position_jitterdodge(jitter.width = 0.7, 
                                                 dodge.width = 0.7)) +
      stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1), 
                   geom = "pointrange", position = position_dodge(width = 0.8),
                   size = 0.5)
    
    low_f <- function(x) {quantile(x, probs = 0.25)}
    hi_f <- function(x) {quantile(x, probs = 0.75)}
    gapminder %>% 
      ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
      geom_point(alpha = 0.2, 
                 position = position_jitterdodge(jitter.width = 0.7, 
                                                 dodge.width = 0.7)) +
      stat_summary(fun = "median", fun.min = "low_f", 
                   fun.max = "hi_f", 
                       geom = "pointrange", position = position_dodge(width = 0.8),
                   size = 0.5)
    
    gapminder %>% filter(year > 1995) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + 
      geom_point(alpha = 0.3) + 
      geom_quantile(formula = y ~ x ,quantiles = c(0.01, 0.25, 0.5, 0.75), 
                     aes(color = factor(..quantile..)), size = 2) +
      geom_smooth( formula = y ~ x , method = "lm", color = "black", linetype = 2, se = FALSE)# + #facet_wrap(~continent)
    ---
title: "Ggplot Statistical transformation objects"
output: html_notebook
---
```{r message=FALSE}
library(gapminder)
library(tidyverse)
```

# Geoms to display summarizing statistics

They always take an x and compute and plot the corresponding summarizing 
y-values for it: central value and dispersion of some sort, or just one of these.

You have two options to say what values the geom should render:

  - supply individual functions to compute the lower and higher dispersion 
  ranges, and a function to compute the central value. Argument slots to fill:
  `fun`, `fun.min`, `fun.max`.  
  - supply a summary function that returns both/all of them. 
  Its output must be a named vector. Argument slot to fill: `fun.data`. 
  
You either have to write these functions yourself, or you can use a few 
well-matching ones from the `Hmisc` package or wrappers around them created for 
ggplot2. 

_Examples of summary functions_


### `geom_errorbar` and `geom_linerange`
Just ranges, without the central value

__Compute standard error of the mean lifeExpectancy
for each continent in each year. Render them as errorbars 
(i.e. without the mean).__

```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_se", geom = "errorbar", size = 1)
```
__Compute standard deviation of the life expectancy for each continent in each 
year using linerange (i.e. without the mean)__

Use the `ggplot2::mean_sd` function. NB: it is a wrapper around the 
`Hmisc::smean_sdl` function. It is documented below. By default, the range 
represents standard error times two (double length). To alter this, one has to 
use the `mult` parameter just like in the original`Hmisc::smean_sdl` function.
Look at the way `stat_summary` inputs these arguments: `fun.args = list()`. 

In fun.data, preferably use these functions:

```
Usage

mean_cl_boot(x, ...)

mean_cl_normal(x, ...)

mean_sdl(x, ...)

median_hilow(x, ...)

Arguments
x 	      a numeric vector

... 	    other arguments passed on to the respective Hmisc function.

Value

A data frame with columns y, ymin, and ymax. 
```
These are wrappers around some summary function from the `Hmisc package`, and they
use Hmisc's functions' parameters. Documentation of these functions follows below. 

```
Usage

smean.cl.normal(x, mult=qt((1+conf.int)/2,n-1), conf.int=.95, na.rm=TRUE)

smean.sd(x, na.rm=TRUE)

smean.sdl(x, mult=2, na.rm=TRUE)

smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)

smedian.hilow(x, conf.int=.95, na.rm=TRUE)

```

These cannot be used directly as `fun.data`, since their output is different from 
the dataframe with column names `y`, `y.min`, and `y.max`. Example:

```{r message=FALSE}
library(Hmisc)
library(magrittr)
c(1,1,1,10,10,10,10) %>% Hmisc::smean.cl.normal() %T>% str() #mean and confidence intervals
```
As seen above, the resulting output is a named vector, with names different from `y`, `y.min`, `y.max`.

In all summary functions, we can supply either `fun.data`, or functions for each statistics separately. These arguments are called `fun` (the central value), `fun.min` 
(the lower dispersion value), and `fun.max` (the upper dispersion value).  


```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)
               , 
               geom = "linerange", size = 0.7,
               position = position_dodge(width = 0.8))
```
###`geom_crossbar` and `geom_pointrange`
These include the central value

```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_se", geom = "crossbar", size = 0.7)

```
The 
```{r}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1), 
               geom = "pointrange", position = position_dodge(width = 0.8),
               size = 0.5)

```

With `fun`, `fun.min`, and `fun.max`: you have to write your own functions first :-/ .  

```{r}
low_f <- function(x) {quantile(x, probs = 0.25)}
hi_f <- function(x) {quantile(x, probs = 0.75)}
gapminder %>% 
  ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + 
  geom_point(alpha = 0.2, 
             position = position_jitterdodge(jitter.width = 0.7, 
                                             dodge.width = 0.7)) +
  stat_summary(fun = "median", fun.min = "low_f", 
               fun.max = "hi_f", 
                   geom = "pointrange", position = position_dodge(width = 0.8),
               size = 0.5)

```

### `geom_smooth`, `stat_smooth`

```{r}
gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) + 
  geom_point() + 
  geom_smooth(method = "lm")
```

### `geom_quantile`, `stat_quantile`

```{r}
gapminder %>% filter(year > 1995) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + 
  geom_point(alpha = 0.3) + 
  geom_quantile(formula = y ~ x ,quantiles = c(0.01, 0.25, 0.5, 0.75), 
                 aes(color = factor(..quantile..)), size = 2) +
  geom_smooth( formula = y ~ x , method = "lm", color = "black", linetype = 2, se = FALSE)# + #facet_wrap(~continent)
```

# `stat_function` 


```{r}

gapminder %>% filter(continent == "Europe", year > 2000) %>% 
  ggplot(aes(x = gdpPercap)) + 
  geom_density() +
  stat_function(fun = dnorm, 
               color = "red", args = list(mean = mean(filter(gapminder, continent == "Europe", year > 2000)$gdpPercap), sd = sd(filter(gapminder, continent == "Europe", year > 2000)$gdpPercap)))
```





