Hi all.
I am finding the
tidy
function from
broom
to be very slow when
conf.int=TRUE
compared to "manually" calculating the CIs (as the estimate +/- 1.96* SE).
Looking on the GitHub it appears that it is actually the use of the
confint
function with default method that is slow (compared to simply using the SE as I mention above). When including many exposure (dozens or hundreds) it means
tidy
is very slow but produces almost exactly the same CIs as the simple manual method. Is there a more elegant/quick/
tidy
way to do this?
Examples below for a glm with single exposure, and another with 10.
Thanks,
library(tidyverse)
library(broom)
library(rbenchmark)
set.seed(123)
## create continuous exposure and outcome
x = runif(10000, 1, 5)
y = x + rnorm(10000)
## create "quantiles" version of `x`
x_qt10 = as.factor(findInterval(x, quantile(x, probs = seq(0.1, 1, 0.1), na.rm=TRUE), rightmost.closed=TRUE) + 1)
## combine
df = tibble(x,y,x_qt10)
## model of continuous x on y
fit_lm1 = glm(y ~ x, data = df)
benchmark("tidy CIs" = { fit_lm1_tidy1 = tidy(fit_lm1, conf.int=TRUE) },
"manual CIs" = { fit_lm1_tidy2 = tidy(fit_lm1) %>% mutate(lci=estimate-(1.96*std.error), uci=estimate+(1.96*std.error)) })
#> test replications elapsed relative user.self sys.self user.child sys.child
#> 2 manual CIs 100 0.39 1.000 0.39 0.00 NA NA
#> 1 tidy CIs 100 10.44 26.769 10.41 0.03 NA NA
lm1_cis = data.frame(tidy_lci=fit_lm1_tidy1$conf.low, tidy_uci=fit_lm1_tidy1$conf.high,
manual_lci=fit_lm1_tidy2$lci, manual_uci=fit_lm1_tidy2$uci)
lm1_cis
#> tidy_lci tidy_uci manual_lci manual_uci
#> 1 -0.02244986 0.08722226 -0.02245087 0.08722327
#> 2 0.97271488 1.00696027 0.97271456 1.00696058
## model of 10 quantiles on y
fit_lm2 = glm(y ~ x_qt10, data = df)
benchmark("tidy CIs" = { fit_lm2_tidy1 = tidy(fit_lm2, conf.int=TRUE) },
"manual CIs" = { fit_lm2_tidy2 = tidy(fit_lm2) %>% mutate(lci=estimate-(1.96*std.error), uci=estimate+(1.96*std.error)) })
#> test replications elapsed relative user.self sys.self user.child sys.child
#> 2 manual CIs 100 0.41 1.000 0.41 0.00 NA NA
#> 1 tidy CIs 100 106.22 259.073 105.71 0.25 NA NA
lm2_cis = data.frame(tidy_lci=fit_lm2_tidy1$conf.low, tidy_uci=fit_lm2_tidy1$conf.high,
manual_lci=fit_lm2_tidy2$lci, manual_uci=fit_lm2_tidy2$uci)
lm2_cis
#> tidy_lci tidy_uci manual_lci manual_uci
#> 1 1.1795716 1.304530 1.1795704 1.3045313
#> 2 0.2768399 0.453558 0.2768383 0.4535597
#> 3 0.6776729 0.854391 0.6776713 0.8543926
#> 4 1.0788242 1.255542 1.0788225 1.2555439
#> 5 1.4379012 1.614619 1.4378996 1.6146209
#> 6 1.8634412 2.040159 1.8634396 2.0401610
#> 7 2.2922514 2.468969 2.2922497 2.4689711
#> 8 2.5767003 2.753418 2.5766987 2.7534200
#> 9 3.0297996 3.206518 3.0297980 3.2065193
#> 10 3.4727892 3.649507 3.4727876 3.6495090
My results on an M1 are similarly ugly
test replications elapsed relative user.self
2 manual CIs 100 0.396 1.000 0.394
1 tidy CIs 100 24.716 62.414 24.637
sys.self user.child sys.child
2 0.001 0 0
1 0.083 0 0
test replications elapsed relative user.self
2 manual CIs 100 0.418 1.000 0.41
1 tidy CIs 100 163.035 390.036 162.38
sys.self user.child sys.child
2 0.007 0 0
1 0.701 0 0
For a way to do this in tidy
, I'd ask on the github issues page. If I were doing something similar, however, I'd roll my own display from {base} using {gt} or similar.