This post deals with how to generate random numbers in R. It is good to know how to generate random numbers with a particular language or software package for at least one of the following three reasons:
- You want to test something that depends on a particular distribution.
- You’re running a stochastic process of some kind (Branching process, random walk etc) and you need random numbers for deciding whether an event occurs.
- You forgot to pick your lottery numbers this week.
Let’s step through doing each of these with R. Over time I will write this post out again for C++, Java, Python, and Ruby. This post is just a memory aid that I can use later on and is not meant as anything more rigorous than that. As such it is a living document, I will mutate this post in place as and when I need to. Memory aids are useful for when you haven’t used a particular programming language or software package for a while. Who knows, it might save me a couple of searches with DuckDuckGo.
Ideally there would be one central random number generating function and you would pass it the distribution you need along with your parameters. R appears to have separate functions for each distribution, which is a bit annoying, but at least they are (fairly) sensibly named. Here’s a table:
|Distribution you need values from||Sample R command|
Note that for each distribution
dist, the name of that function that generates random numbers from
rdist. There are also functions
ddist will return the height of the pdf,
pdist will return the cdf, and
qdist will give the inverse cdf (the quantiles).
It's annoying that there is no clear way that the names of the distributions become the stem at the heart of those d-, p-, q-, and r- functions. Is it the first four letters? No, it's the first three letters for the exponential distribution and (not the first) five letters for the negative binomial distribution. (And the whole name when you start look at things like the Weibull distribution.) Obviously you learn the names after you use them for a while. And to repeat, it is also strange that you can't pass the name of a distribution (or your own pdf) to a more generic function. Mind you, such generic functions would have to handle the gamut of idiosyncrasies of different probability distributions. Maybe it is better this way.
I will add more distributions later. It will also be useful to work out how to generate samples from my own pdfs. I will add instructions for this later. There are also tests to checking hypotheses that data follow a particular distribution and I will add instructions for this in another post.
Often this is simply a case of generating uniform random numbers in [0,1]. You can use
runif(1) because 0.0 and 1.0 are the default values for
runif. The seed changes each time you start R. If you need to use the same seed to repeat previous work, you can use the
set.seed function before using
runic. Check out the R man page for Random Number Generation for more detail1.
I'll add other techniques that use other distributions later. I think particularly of the negative binomial distribution but I want to write a separate post about that a later date. When I write that post I'll add the relevant R code to this post (and the Java to the Java post, and so on).
The lottery is a mug’s game, but you definitely won’t win if you never play. (Technically if you don’t play you can’t lose, and lotteries set you up to lose: the clue is in the name.) Nevertheless, for special occasions when you might want to quickly generate lottery numbers, or N integers in some other range, you should use
sample(6,1:49) for your lucky ticket, or
sample(numSamples,low:high) for the more general case.
It also goes without saying that it's a great idea to use the amazing help in R.
I generated the hero image with PlotDevice, a cool program that creates generative images using the Python programming language. The documentation is a joy to read. This image was one of the first I created. It adapts one of the PlotDevice demos.
First I generate a transparent slice consisting of two thousand circles. Each circle is between 15 and 200 pixels in diameter. Each circle is a random colour. Each circle is randomly positioned across the canvas. Then I overlay multiple copies of that slice. Each successive slice is then scaled down to 98% of its original size and rotated through a random angle. If you are lucky this creates a pleasing kaleidoscope effect.
This always feels like something that I should know more thoroughly, yet every time I come to random number generation I skim the literature to convince myself it works and then I leave it at that. Generally I tend to stick to the state of the art in what happens to ‘just work’ and then hammer that.↩