November 4, 2017
I had to compute an indicator this week. It had confidence intervals that relied on taking 100,000 samples from the indicator’s approximate distribution. I had to repeat this over multiple GP practices and for twelve different demographic groups.
I decided to use dplyr1 because I thought it would help me organise all subgroups involved. I used mutate_at() heavily and thought that dplyr was keeping everything organised. However, when I moved from the 10 samples I’d used for testing to the 100,000 samples required by the specification of the indicator, my code moved to a crawl.