Why all this fuss over a few Sigma?

Most experiments at CERN look for evidence for -or against - scientific hypotheses. To pick an example at random, that could be the existence of a new particle - the Higgs boson perhaps. After collecting a lot of data to search for the Higgs boson, the question is how to link the numbers that emerge to the existence of the hypothetical particle? What is needed is a way to give meaning to the result by converting the data into a probability that expresses the evidence that the particle exists. Physicists talk about probability because the most interesting results in experimental physics are often close to the limit of what is observable with existing technology, so conclusions can seldom be reached with absolute certainty. Moreover, it is important to communicate the results of an experiment in such a way that other researchers can judge how strong the experimental evidence is for - or against - the hypothesis under test.
Sigma, the standard deviation, is the quantity used most often by physicists to express the strength of evidence for a new phenomenon. It is the way they measure the distance between the data observed and the data expected if there were no new phenomenon. A small distance - or sigma - is expected if the result is due to statistical fluctuations. A large Sigma is improbable from fluctuations alone, and so is taken as evidence for a new phenomenon.

A recent example is the evidence for a Higgs boson with a mass of about 115 GeV coming from data acquired in the last year of LEP running. Taking, for example, the ALEPH experiment alone, the Higgs signal was a '3.2 sigma effect'. What that means is best understood by taking a more familiar example. Toss a fair coin and you know there is an equal chance of getting heads or tails. Toss several supposedly fair coins many times, and it would be extremely unusual if all gave 50% heads and 50% tails. Instead, the results from all the coins will follow a bell-shaped distribution, like the one shown on the next page. A perfectly fair coin would appear near the centre of the curve, near the mean of the distribution. If the coin were biased it would appear away from the mean. If after 100 tosses you got 55 heads and 45 tails, then the result would be just one sigma from the mean. The probability with an unbiased coin to be at least this far away from the mean is 32%. Put another way, the probability that the coin is biased is 68%. If you observed 60 heads, that would be a two-sigma deviation, and the probability for that with an unbiased coin is about 5%. The larger the difference between heads and tails, the greater the sigma from the mean, and the stronger the evidence for a biased coin. At a deviation of five sigma - 75 heads - the probability for a perfectly fair coin to produce the result is just 1 in 2 million. A gambler would be quite justified in suspecting that there was something fishy going on.

In a 100-throw coin tossing experiment, the probability that the coin is biased increases the further the result is from the centre of the curve. A 50/50 result would fall right at the centre, that's to say, on the mean. For a result of 55/45, the probability that the coin is biased is 68% - a one sigma effect in statistical parlance. That probability increases to 95.5% for two sigma, 99.75% for three sigma and so on.

In the case of particle physics the meaning of sigma is the same. ALEPH's 3.2 sigma effect means the probability that it is due to a statistical fluctuation is about 0.2%. To the layman, that sounds like overwhelming odds, but to the physicist it is not. It means that if the experiment were to be repeated one thousand times, and there was no Higgs boson near 115 GeV, only two repetitions would give a result of 3.2 sigma. So it might not happen very often but it could happen. Try it for yourself - if you have a thousand 'identical' coins and a little spare time!
Physicists have precise rules for dealing with standard deviations. Three sigma or more is considered evidence for a new phenomenon, but a discovery can only be claimed for a five sigma affect or more. So taken alone, ALEPH's result looks like fair evidence for the Higgs boson, but when all the LEP experiments are taken together, the sigma is substantially less. Interesting, but not, alas, evidence for the Higgs boson.