Sunday, 18 May 2008

The Binomial Distribution

This is something that keeps catching me out with my A2 statistics, so I figured that I may as well write a quick listing of notes to help myself, and hopefully, anyone who reads this page.

What is the Binomial Distribution?

The Binomial distribution is a method that applies to the probabilities of a number of events that can either be as success or a failure. A better way of explaining it would be to look at cars. Say, for example, I know that the probability that a Car has been manufactured by Vauxhall is 0.3 – in other words, I know that they make 30% of all cars. If I then sit outside in my street and wait for 5 cars to pass, I can use my probability to find out the likely hood that a certain proportion of those cars will have been made by Vauxhall.

Say, for example, that I want to know what the odds of all 5 of the cars going past being made by Vauxhall are. The laws of probability state that the chance of two independent events happening is the probability of the first multiplied by the probability of the second. Our probability is 0.3, and we want that to happen 5 times, so we can do:

0.3*0.3*0.3*0.3*0.3 = 0.00243

This means that the probability of Vauxhall having made all three cars is 0.00243, or 0.243% - that's about 1 in 400! However, if we want to know what the odds of 3 of the cars being made by Vauxhall are, we would need to replace some of those 0.3s with 0.7s, the probability that the car is not made by Vauxhall – two in fact, leaving us with the probability that 3 cars are made by Vauxhall and 2 are not. At first, this sounds simple:

0.3*0.3*0.3*0.2*0.2 = 0.00108

But that's not quite right – there are more than one ways we can combine 3 Vauxhalls and two other cars, and that probability only covers one of those ways. Since there are lots of different combinations, and we can have any of them, we need to add them all up...

0.3*0.3*0.3*0.2*0.2
0.2*0.3*0.3*0.3*0.2
0.2*0.2*0.3*0.3*0.3
0.3*0.2*0.2*0.3*0.3
0.3*0.3*0.2*0.2*0.3
........... etc .............

The list goes on – as you can imagine, it would be impractical to calculate all the different combinations. For this reason, we use the Binomial Distribution.

How does it work?

The binomial distribution is defined as:

Now, that looks pretty nasty the first time you see it but basically all it does is give you the probability of an event happening "x" times out of "n" (where "X" represents what actually happens), without having to go through and work out each combination. In our car example, we would work it out as follows:

That means that there's a one in 10 chance that exactly 3 of the cars will be manufactured by everyone's favourite car company!

Ah, but there's more!

That's not even a question.

However, there is more to it, because these binomial distributions can be pretty powerful. In an exam, we're more likely to be asked to work out the probability that 3 OR LESS of the cars are made by Vauxhall. We could do this by simply putting 3, 2, 1 and 0 into the equation and adding the results. That would get very long-winded if you had larger numbers, though, which is why the table called "CUMULATIVE BINOMIAL DISTRIBUTION FUNCTION" is available in the exam.

This is set out in boxes, for different values of "n". In our case, we want the box headed n=5, because we're looking at 5 cars. We could look at 50, in which case the method would be unchanged but you would look in a different box. We know that our probability is 0.3, so we look for the column marked "0.3" at the top. Now, down the left hand side are the values for "x". These actually represent the value for "x" that "X" is less than, so when we look up a value in the table we actually get P(X ≤ x). Have a look at the row for "3", in the column we found earlier. The value written down should be 0.9692 – I've highlighted it in green on the table below.


n=5

0

0.9510

0.9039

0.8587

0.8154

0.7738

0.7339

0.6957

0.6591

0.6240

0.5905

0.4437

0.3277

0.2373

0.1681

0.1160

0.0778

0.0503

0.0313

1

0.9990

0.9962

0.9915

0.9852

0.9774

0.9681

0.9575

0.9456

0.9326

0.9185

0.8352

0.7373

0.6328

0.5282

0.4284

0.3370

0.2562

0.1875

2

1.0000

0.9999

0.9997

0.9994

0.9988

0.9980

0.9969

0.9955

0.9937

0.9914

0.9734

0.9421

0.8965

0.8369

0.7648

0.6826

0.5931

0.5000

3


1.0000

1.0000

1.0000

1.0000

0.9999

0.9999

0.9998

0.9997

0.9995

0.9978

0.9933

0.9844

0.9692

0.9460

0.9130

0.8688

0.8125

4






1.0000

1.0000

1.0000

1.0000

1.0000

0.9999

0.9997

0.9990

0.9976

0.9947

0.9898

0.9815

0.9688

5











1.0000

1.0000

1.0000

1.0000

1.0000

1.0000

1.0000

1.0000


If you've been paying attention, rather than dozing off, you might have noticed that the values for P in that table only go up to 0.5. That's fine for us, but what if the Vauxhall Astra suddenly became this season's big thing and Vauxhall became the manufacturers of 60% of cars? We'd be in a right pickle then. But, luckily, the probability of less than or equal to 3 cars being made by Vauxhall is the same as the probability that more than 2 cars weren't made by them. Think of it like this.

P(X≤3) = P(X=0,1,2 or 3)
= P(X'=5,4,3 or 2) (X' (not X), should be N-X . X+X' must always add up to 5. If I see 3 green cars, I must see two that aren't green)
= P(X'≥2)

Now, we know that the probability that a car is made by Vauxhall is 1-0.6, since both the probabilities must add up to 1.

X' ~ B(5, 0.4)
P(X'≥2) = ?

However, our tables won't give us a value for "Greater than". We know that:

P(X'≥2) = P(X'=5,4,3 or 2)

But for a full set of possibilities, which will have a probability of 1 (The probability that we see between 0 and 5 Vauxhall cars), we also need:

P(X'=0, or 1) or P(X'≤1)

Giving us:

P(X'≤1) + P(X'≥2) = 1
P(X'≥2) = 1- P(X'≤1)
=> P(X≤3) = 1- P(X'≤1)
= 1 - 0.3370
= 0.663

Which means that there is more than a 1 in 2 chance that 3 or less Vauxhalls will be spotted!

In conclusion

As long as you remember that all the possibilities must add up to one, you shouldn't have trouble using these Binomials. It seems that examiners LOVE to ask you to turn the inequalities on their head, but as long as you visualise the possibilities and remember to account for everything, it shouldn't be too much of a problem, even for you :P.