Binomial Distributions in R

0.1 Libraries

library(ggplot2)

1 Introduction

Binomial distributions have several qualities

  • There is a fixed number of trials
  • Each trial can have one of two possible outcomes
  • The outcomes of each trial are independent of each other
  • The probability of success is the same for each trial

A good example of this would be the number of heads after flipping a coin 20 times.

2 The Formula

In a binomial experiment the probability of exactly X successes in n trials is:

\[P(X) = \frac{n!}{(n-X)!X!}\cdot p^{X} \cdot q^{n-X}\]

Where:

  • P(X) is the probability of the number of successes in n trials
  • n is the number of trials
  • X is the number of successes in n trials
  • q is the probability of failure
  • p is the probability of success

Note that this formula is fairly similar to the formula for combinations without replacement.


2.1 Example


So if we wanted the probability of 5 heads in 20 flips of a fair coin we’d do this:

\[P(\text{5 heads in 20 flips}) = \frac{20!}{(20-5)!5!} \cdot .5^{5} \cdot .5^{20-5} = 0.0147857666\]

This means that there is a 1.48% chance of getting 5 heads in 20 flips of a fair coin.


in R we would just enter: dbinom(5,20,.5)

See more below

3 Binomial functions in R

There are four binomial functions in R.

  • dbinom(x, size, prob, log = FALSE)
  • pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
  • qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
  • rbinom(n, size, prob)


dbinom will give the probability of a number of positive trials(x), from a number of total trials (size), based on the probability of the positive result (prob). For example we could find the probability of 5 heads (x=5), from a total of 20 total flips (size = 20), given that the coin is fair and the probability of heads is 0.5 (prob=0.5). dbinom(5, 20, 0.5)

pbinom will give the probability of less than or equal to OR greater than or equal to positive trials(q), from a number of total trials(size), and a probability of a positive outcome (prob). Note the lower.tail variable in the command. For example if we can find the probability that 5 or fewer heads (q = 5) will be flipped from a total of 20 flips (size = 20), and the coin is fair (prob = 0.5). In this case it’s 5 or fewer so lower.tail=TRUE. pbinom(5, 20, 0.5, lower.tail=TRUE)

qbinom is the opposite of pbinom. qbinom will give the number of positive trials less than or equal to a probability (p) of a positive outcomes, the size of the total number of trials (size), and the probability of a positive trial (prob). Note the lower.tail variable so we can have greater than or equal to a number of positive trials. So if we want to know the number of heads that we would get less than or equal to 10% of the time (p = .1), from a total number of 20 trials (size = 20), with a fair coin (prob = 0.5). qbinom(.1, 20, .5). If we wanted to get the deciles of 20 fair coin flips we would enter: qbinom(seq(0,1,.1), 20, .5)

rbinom will create a binomial distribution with a certain size (n), that has a trial size (size), and a probability of a positive outcome for each trial. For example we can create a vector of 1000 coin flips of a fair coin with this: rbinom(1000,1,.5). Or 1000 trials of a drug that has a 75% rate of success, and 20 patients per trial with this: rbinom(1000,20,.75)

3.1 Example dbinom

  • dbinom(x, size, prob, log = FALSE)


In our hypothetical scenario there are 20 people randomly selected and nationally 5% of the population is afraid of being home alone at night. Now we want to know what the probability is that exactly 5 of these 20 are afraid of being home alone at night.

We use the dbinom function which works like this: dbinom(number of successes, size of sample, probability of success)

dbinom(5, 20, 0.05)
## [1] 0.002244646

So this means that there is a .22% chance that 5 people will be afraid of being home alone at night from a sample of 20 people with a probability of 5%.
We can also find the probability that someone will be afraid in each possible outcome, from 0 through 20.

dbinom(0:20, 20, 0.05)
##  [1] 3.584859e-01 3.773536e-01 1.886768e-01 5.958215e-02 1.332759e-02
##  [6] 2.244646e-03 2.953482e-04 3.108928e-05 2.658952e-06 1.865931e-07
## [11] 1.080276e-08 5.168784e-10 2.040309e-11 6.608289e-13 1.739024e-14
## [16] 3.661102e-16 6.021550e-18 7.457027e-20 6.541252e-22 3.623962e-24
## [21] 9.536743e-27

This last one we can plot and we can see that the most likely outcomes in this scenario are that 1, 0, and the 2 people will be afraid from a sample of 20 people.

y<-dbinom(0:20, 20, 0.05)
x<-(0:20)
plot(x,y)

And we can also plot this using ggplot2

library(ggplot2)
y<-dbinom(0:20, 20, 0.05)
x<-(0:20)
qplot(x,y)


Now we can make a plot which would show the probability of 0 through 100 people having a condition from a group of 100 randomly selected individuals where the probability of that condition is 35%.

y<-dbinom(0:100,100,.35)
x <-0:100
qplot(x,y)

3.2 Example pbinom

  • pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)

So in this example, we have a drug that has a success rate of 75%, we have a group of 20 people, and we want to know the probability that between 0 and 12 people in the group of 20 will respond positively to the drug.

pbinom(12, 20, .75)
## [1] 0.1018119


And now we can find the probability of between 0 and x people will respond positively to this drug from a group of 20.

pbinom(1:20, 20, .75)
##  [1] 5.547918e-11 1.610715e-09 2.960496e-08 3.865316e-07 3.813027e-06
##  [6] 2.951175e-05 1.837041e-04 9.353916e-04 3.942142e-03 1.386442e-02
## [11] 4.092517e-02 1.018119e-01 2.142181e-01 3.828273e-01 5.851585e-01
## [16] 7.748440e-01 9.087396e-01 9.756874e-01 9.968288e-01 1.000000e+00

And we can plot these probabilities

y <- pbinom(1:20, 20, .75)
x <- 1:20
qplot(x,y)

3.3 Example qbinom

  • qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)


This will give us a quantile. This will give us the bottom quantile, or the bottom 10% percent of outcomes with a drug trial that has a 75% success rate and each trial has 20 patients. To put it another way 10% of trials will have between 0 and 12 patients respond positively to this treatment.

qbinom(.1, 20, .75)
## [1] 12


If we wanted the number of patients with a positive response in the bottom 20% of trials we would enter this below. Or to put it another way, in 20% of the trials, between 0 and 13 people will respond positively to this drug treatment.

qbinom(.2, 20,.75)
## [1] 13


If we wanted each decile in this drug test we would enter:

qbinom(seq(.1,1,.1), 20, .75)
##  [1] 12 13 14 15 15 16 16 17 17 20
y <- qbinom(seq(.1,1,.1), 20, .75)
x <- seq(.1,1,.1)
qplot(x,y)


If we tossed a fair coin 10 times and wanted to know the median number of successes (5th decile) we would enter this:

qbinom(.5,10 ,.5)
## [1] 5


3.4 Example rbinom

  • rbinom(n, size, prob)


So as an example we have a fictional drug that has a 75% success rate, it’s been tried out on groups of 20 people 1000 times and we want a binomial distribution of the number of success in each trial.

We can enter this:

rbinom(1000, 20, .75)
##    [1] 12 16 16 16 14 18 15 18 16 17 15 14 17 13 16 17 10 14 19 13  9 18 10
##   [24] 13 17 13 15 15 16 12 14 16 18 14 15 12 14 13 16 13 15 14 19 13 14 15
##   [47] 12 16 15 13 10 16 17 18 13 13 16 16 15 13 13 15 17 13 15 17 18 17 10
##   [70] 15 16 14 17 15 14 13 18 13 18 14 12 12 17 14 16 14 15 14 18 14 17 18
##   [93] 15 14 15 12 13 16 14 11 17 11 17 13 15 17 16 15 14 16 10 14 14 16 15
##  [116] 15 16 14 13 16 17 13 14 14 15 13 16 15 14 17 16 16 14 15 14 14 12 15
##  [139] 14 15 15 14 17 18 13 18 14 12 17 16 15 19 14 15 15 16 15 15 18 15 16
##  [162] 15 13 14 15 12 17 16 16 17 16 15 16 17 18 16 15 12 16 12 15 14 17 13
##  [185] 15 16 13 17 14 13 17 18 13 18 13 18 17 14 14 17 17 18 14 15 15 15 15
##  [208] 13 12 17 14 13 18 14 12 15 12 12 14 16 16 15 13 15 14 16 15 14 17 16
##  [231] 14 14 14 18 16 15 15 17 14 12 13 14 15 16 13 17 16 15 16 13 13 16 13
##  [254] 17 15 16 11 12 13 14 12 13 15 14 13 18 14 16 15 15 15 13 15 17 15 15
##  [277] 13 15 12 15 13 17 12 15 13 15 12 17 12 16 14 12 15 17 11 14 16 18 18
##  [300] 15 15 15 14 17 14 11 15 12 13 18 18 14 14 15 19 15 14 19 15 16 13 17
##  [323] 13 15 15 16 17 14 17 15 14 16 14 18 15 14 15 16 11 17 16 17 18 19 15
##  [346] 17 15 15 13 17 17 19 17 17 12 12 12 16 17 14 18 18 15 15 13 14 15 13
##  [369] 16 15 17 16 14 15 18 17 15 16 14 12 14 15 14 18 18 14 11 11 17 16 16
##  [392] 16 17 15 15 17 15 18 16 16 18 14 17 14 16 16 14 13 13 18 16 13 14 17
##  [415] 16 14 16 16 17 18 15 13 13 17 14 14 19 14 13 18 13 15 15 12 16 13 17
##  [438] 14 14 13 17 13 14 16 14 13 13 17 14 16 16 16 18 17 12 13 16 15 16 18
##  [461] 15 15 10 16 16 17 14 15 14 16 16 18 15 15 15 16 13 17 15 17 13 14 15
##  [484] 12 13 18 13 15 13 15 20 13 16 17 17 16 13 14 15 12 16 13 15 14 13 12
##  [507] 17 11 16 15 12 13 16 12  9 14 13 18 16 15 17 12 17 15 14 16 14 13 17
##  [530] 16 12 14 14 15 17 13 15 15 17 16 16 17 17 16 12 17 18 14 16 19 13 20
##  [553] 16 10 15 15 18 12 15 17 17 15 17 13 17 17 14 11 14 16 13 18 12 14 16
##  [576] 15 16 13 14 15 14 16 12 15 17 14 17 11 13 16 16 19 16 13 15 16 15 12
##  [599] 13 15 14 17 17 17 18 12 17 11 16 14 15 13 14 12 16 11 15 17 17 17 12
##  [622] 19 18 12 14 14 15 15 16 17 12 15 18 11 12 15 13 14 14 19 17 16 16 14
##  [645] 15 15 15 10 18 16 15 15 15 12 17 14 14 14 17 18 14 17 13 18 14 15 13
##  [668] 17 16 16 12 16 17 16 13 16 15 18 16 16 14 15 16 14 14 16 13 16 15 18
##  [691] 17 16 14 16 17 15 14 12 16 17 17 15 13 13 15 17 13 15 15 16 15 16 14
##  [714] 17 12 15 20 15 16 14 13 13 13 13 12 11 13 16 16 15 16 15 13 16 13 13
##  [737] 13 18 13 12 12 16 16 17 15 15 17 17 15 13 15 15 19 18 15 13 17 13 15
##  [760] 16 12 19 16 12 13 18 15 14 19 12 17 16 16 15 15 17 11 17 14 17 16 17
##  [783] 16 15 13 13 13 18 17 14 16 20 15 15 14 17 17 14 17 18 15 14 14 19 18
##  [806] 15 17 11 13 14 17 13 18 16 15 14 15 13 16 12 18 14 14 14 14 13 17 17
##  [829] 15 15 13 15 15 18 18 17 19 15 14 14 15 13 15 16 16 17 14 18 16 12 17
##  [852] 15 16 15 13 16 13 17 11 16 16 14 14 14 17 17 10 17 16 14 14 16 14 17
##  [875] 13 12 10 18 14 17 16 15 17 12 16 17 10 15 15 17 16 10 13 16 14 17 13
##  [898] 16 15 13 15 16 17 12 18 11 14 15 15 15 13 14 12 14 15 17 12 17 14 15
##  [921] 15 16 16 17 16 10 14 17 16 15 15 11 12 18 14 14 16 13 13 13 14 11  9
##  [944] 16 17 15 14 17 15 12 14 12 14 12 16 17 16 17 19 15 15 19 16 18 13 14
##  [967] 16 17 17 15 17 13 18 14 19 12 16 16 18 16 15 12 16 19 16 15 16 14 15
##  [990] 13 17 15 17 15 16 14 15 14 16 12


And plot it like this:

hist(rbinom(1000,20,.75))

Or if we wanted to create a binomial distribution of 100 coin flips with a 50% chance of getting either heads or tails:

rbinom(100,1,.5)
##   [1] 1 1 1 1 0 0 1 1 0 1 1 0 0 0 0 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 0 1 0
##  [36] 1 0 0 1 1 1 0 1 0 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1
##  [71] 1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 0 1 1 1 0 0 1 1 0 1 1