Multinomial Distribution

Multinomial Distributions allow you to calculate the probability of an event with more than two outcomes.

Example

A hypothetical survey has found that 50% of people want a dog for a pet, 30% want a cat, 20% want fish. If 5 people are randomly selected what is the probability that 3 want a dog, 1 wants a cat, and 1 wants fish.

dmultinom(c(3,1,1),5,c(.5,.3,.2))
## [1] 0.15


The math looks like this

\[ P(X) = \frac{5!}{3! \cdot 1! \cdot 1!} \cdot (0.50)^3 \cdot (0.30)^1 \cdot (0.20)^1 = 0.15\]

OR

\[ P(X) = \frac{(n)!}{(X_1)!(X_2)!(X_3)!} \cdot (p_1)^{X_1} \cdot (p_2)^{X_2} \cdot (p_3)^{X_3}\]
where:

  • n = number of total people who will be randomly selected

  • X = number of people interested in the different types of pets

  • p = probability of people wanting those different types of pets


It’s important to point out that the sum of all p values must equal 1 and the sum of all X values must equal n.


Example

This data set shows the results of a survey of 237 people about their smoking habits. People rated the amount that they smoked and the categories were Heavy, Never, Occas, Regul, and NA.

library(MASS)
survey$Smoke[1:20]
##  [1] Never Regul Occas Never Never Never Never Never Never Never Never
## [12] Never Never Never Never Never Never Never Never Never
## Levels: Heavy Never Occas Regul
summary(survey$Smoke)
## Heavy Never Occas Regul  NA's 
##    11   189    19    17     1
length(survey$Smoke)
## [1] 237


Assuming that this survey accurately represents the entire population, if you were to ask 30 random people about smoking what is the probability that:

  • 2 are heavy smokers

  • 19 never smoke

  • 5 occasionally smoke

  • 4 regularly smoke

  • 0 NA’s


library(MASS)
library(plyr)

Pheavy <- count(survey$Smoke)[1,2]/length(survey$Smoke) #Probability of heavy smoker
Pnever <- count(survey$Smoke)[2,2]/length(survey$Smoke) #Probability of never smoking
Poccas <- count(survey$Smoke)[3,2]/length(survey$Smoke) #Probability of occasional smoker
Pregul <- count(survey$Smoke)[4,2]/length(survey$Smoke) #Probability of regular smoker
Pna <- count(survey$Smoke)[5,2]/length(survey$Smoke) #Probability of na

dmultinom(c(2,19,5,4,0), 30, c(Pheavy,Pnever,Poccas, Pregul, Pna))
## [1] 0.0009700987


Or to put it another way:

\[ P(X) = \frac{30!}{2! \cdot 19! \cdot 5! \cdot 4! \cdot 0!} \cdot (0.046)^2 \cdot (0.797)^{19} \cdot (0.080)^5 \cdot (0.072)^5 \cdot (0.004)^0 \approx 0.00097\]


The long drawn out way to do that same math operation in R looks like this:

factorial(30)/(factorial(2)*factorial(19)*factorial(5)*factorial(4)*factorial(0))*Pheavy^2*Pnever^19*Poccas^5*Pregul^4*Pna^0
## [1] 0.0009700987