broom::tidy() is very slow when estimating confidence intervals - tidyverse

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

阳光的酱肘子 · 【上海外国语大学】非遗寻访，丝路探源│上海外 ...· 2 月前 ·

爱旅游的茄子 · Help And Training ...· 4 月前 ·

小眼睛的跑步机 · 喵绅士换地址了 - 百度· 5 月前 ·

好帅的咖啡 · Search isn't working ...· 5 月前 ·

失恋的薯片 · 喜达屋(美国饭店及娱乐休闲企业)_搜狗百科· 6 月前 ·

Hi all.

I am finding the tidy function from broom to be very slow when conf.int=TRUE compared to "manually" calculating the CIs (as the estimate +/- 1.96* SE).

Looking on the GitHub it appears that it is actually the use of the confint function with default method that is slow (compared to simply using the SE as I mention above). When including many exposure (dozens or hundreds) it means tidy is very slow but produces almost exactly the same CIs as the simple manual method. Is there a more elegant/quick/ tidy way to do this?

Examples below for a glm with single exposure, and another with 10.

Thanks,

library(tidyverse)
library(broom)
library(rbenchmark)
set.seed(123)
## create continuous exposure and outcome
x = runif(10000, 1, 5)
y = x + rnorm(10000)
## create "quantiles" version of `x`
x_qt10 = as.factor(findInterval(x, quantile(x, probs = seq(0.1, 1, 0.1), na.rm=TRUE), rightmost.closed=TRUE) + 1)
## combine
df = tibble(x,y,x_qt10)
## model of continuous x on y
fit_lm1 = glm(y ~ x, data = df)
benchmark("tidy CIs"    = { fit_lm1_tidy1 = tidy(fit_lm1, conf.int=TRUE) },
          "manual CIs"  = { fit_lm1_tidy2 = tidy(fit_lm1) %>% mutate(lci=estimate-(1.96*std.error), uci=estimate+(1.96*std.error)) })
#>         test replications elapsed relative user.self sys.self user.child sys.child
#> 2 manual CIs          100    0.39    1.000      0.39     0.00         NA        NA
#> 1   tidy CIs          100   10.44   26.769     10.41     0.03         NA        NA
lm1_cis = data.frame(tidy_lci=fit_lm1_tidy1$conf.low, tidy_uci=fit_lm1_tidy1$conf.high,
                     manual_lci=fit_lm1_tidy2$lci, manual_uci=fit_lm1_tidy2$uci)
lm1_cis
#>      tidy_lci   tidy_uci  manual_lci manual_uci
#> 1 -0.02244986 0.08722226 -0.02245087 0.08722327
#> 2  0.97271488 1.00696027  0.97271456 1.00696058
## model of 10 quantiles on y
fit_lm2 = glm(y ~ x_qt10, data = df)
benchmark("tidy CIs"    = { fit_lm2_tidy1 = tidy(fit_lm2, conf.int=TRUE) },
          "manual CIs"  = { fit_lm2_tidy2 = tidy(fit_lm2) %>% mutate(lci=estimate-(1.96*std.error), uci=estimate+(1.96*std.error)) })
#>         test replications elapsed relative user.self sys.self user.child sys.child
#> 2 manual CIs          100    0.41    1.000      0.41     0.00         NA        NA
#> 1   tidy CIs          100  106.22  259.073    105.71     0.25         NA        NA
lm2_cis = data.frame(tidy_lci=fit_lm2_tidy1$conf.low, tidy_uci=fit_lm2_tidy1$conf.high,
                     manual_lci=fit_lm2_tidy2$lci, manual_uci=fit_lm2_tidy2$uci)
lm2_cis
#>     tidy_lci tidy_uci manual_lci manual_uci
#> 1  1.1795716 1.304530  1.1795704  1.3045313
#> 2  0.2768399 0.453558  0.2768383  0.4535597
#> 3  0.6776729 0.854391  0.6776713  0.8543926
#> 4  1.0788242 1.255542  1.0788225  1.2555439
#> 5  1.4379012 1.614619  1.4378996  1.6146209
#> 6  1.8634412 2.040159  1.8634396  2.0401610
#> 7  2.2922514 2.468969  2.2922497  2.4689711
#> 8  2.5767003 2.753418  2.5766987  2.7534200
#> 9  3.0297996 3.206518  3.0297980  3.2065193
#> 10 3.4727892 3.649507  3.4727876  3.6495090
              My results on an M1 are similarly ugly
       test replications elapsed relative user.self
2 manual CIs          100   0.396    1.000     0.394
1   tidy CIs          100  24.716   62.414    24.637
  sys.self user.child sys.child
2    0.001          0         0
1    0.083          0         0
        test replications elapsed relative user.self
2 manual CIs          100   0.418    1.000      0.41
1   tidy CIs          100 163.035  390.036    162.38
  sys.self user.child sys.child
2    0.007          0         0
1    0.701          0         0
For a way to do this in tidy, I'd ask on the github issues page. If I were doing something similar, however, I'd roll my own display from {base} using {gt} or similar.