Appendix 6 The Concept of Experimental Error and its Statistical Basis

To understand what is meant by the term "error" in the context of experimental measurement, consider the first experiment you did in C10J practicals - the determination of the acetic acid content of a sample of vinegar. Let's say you obtained a value of 4.79%. It is quite reasonable to ask of you (and your instructor who designed the experiment) how sure you both are that this result is the true value. How likely is it that you would obtained this same exact value again, if you repeated the experiment on a similar sample of vinegar, using very similar volume measuring devices, and using the same technique?

The answer of course is that it is very likely that you would not obtain this exact same value, even if you repeated it several times. A sceptical fellow student (especially one who is doing some philosophy) might well ask what good is your measurement if it is not likely to give the same result consistently. The truth is that if you repeated the measurement several times, you are likely to obtain values like those listed below:

 4.89 4.95 4.76 4.69 4.85 4.63 4.78 4.75 4.74 4.86 4.79 4.8 4.66 4.8 4.75

(These values were actually obtained by members of a former C10J class). One can see that the numbers are not just totally random - they range in value from 4.63 to 4.95, and are really not so very different at all from the single value you obtained. In fact if we were to collect all 50 values obtained by your classmates any one day, and plot a frequency distribution of the values, in groups, we are likely to obtain a plot similar to that shown below.

The bar graph shows a typical distribution of say 50-odd data points, while the dashed line shows the distribution likely with large number of data points plotted in very narrow-width groups.

(A frequency distribution is the plot of the number of times a value occurs within a group, say 4.75 - 4.85, plotted against the group value). As you can see, this results in a fairly symmetrical pattern of distribution of values about a central value. The distribution is one which is fairly well known to statisticians. It is known as a Gaussian or normal distribution, but what is more important for us, is that it has properties which tell us quite a bit about the likelihood of obtaining certain values for our measurement, and that is exactly what we want to know.

When the results of repeated similar experiments follow a normal distribution we know that the most probable value of the quantity we are measuring is the mean (or average) value. We find also that 95% of the values obtained by repeating the experiment will fall between (the mean - 2σ) and (the mean + 2σ) where s is a measure of the broadness of the distribution. Also, 68% of the values can be expected to fall between (the mean - σ) and (the mean + σ).

The quantity σ is a measure of dispersion used by statisticians, and it is knows officially as the standard deviation or the standard error of the set of data. The formula for calculating σ is

The denominator, (n-1) is used instead of n when the sample is small (< 30). xbar is the mean value and (xi - xbar) is the deviation of each data point from the mean.

Applying this formula to the 15 data points listed earlier gives a standard deviation of 0.085, with the mean value being 4.78. The meaning of this result can be stated thus:
"95% of the values obtained by students in the lab group from which the sample is drawn, are likely to fall between 4.61 and 4.95, i.e. (4.78 ± 0.17) and 68% are likely to fall between 4.69 and 4.87, i.e. (4.78 ± 0.09)."

We can view this another way. For any single determination of x, there is a 95% probability that the mean will fall between (x - 2s) and (x + 2s) and 68% that it will fall between (x - s) and (x + s). When it is viewed this way , it is obvious that s is a measure of the error (or uncertainty) associated with the experimental determination of x. But what should we write as the error, ±s or ±2s? We can write either, as long as we make it clear which one is being written. The latter (±2s) is quite a conservative estimate and even the former (±s) represents better than 50% certainty that the mean falls within the range specified. (The value which represents 50% certainty is in fact known as the probable error and it turns out to be ±0.67s). For reasons which we will see later, it is it is the most probable error that is usually written as the error in measurements. However it does not make a great deal of difference if we regard the error as ±s.

So, if we had known the value of σ, we could have written the result of our single measurement as:
acid content = (4.79 ±0.09)%

The result could then have been interpreted (to your philosophical fellow student, for example), to mean that if you had repeated the experiment several times using the same equipment and technique, the average of these repeated determinations is very likely to fall within the range 4.70 to 4.88.

One begins to really appreciate the importance of this statement of error, when one is asked to make a judgement on the value obtained for the acid content of vinegar. Suppose, for example, that the label on the product, as it is sold on the market, claims that the acid content is 6.5%, could we use your single result as the basis for saying that the producer has labelled the this product wrongly? The answer would of course be yes. For if we know the value of s to be 0.085, we are quite confident (95%) that the acid content is somewhere within the range 4.61 to 4.95.

We might wish in fact to improve our confidence in the value, by doing the determination once or twice again, using the mean of these determinations instead of the single value, as our experimental result. We might also wish to check that our reagents were really as pure as we assumed, and that our standard NaOH was indeed the molarity stated, before we made any public accusations. It should be quite clear however that we can make judgements based on our results, provided we know what uncertainty or error we can reasonably associate with these results. Conversely it would be fair to say that your result has no real use or meaning if you have no idea of the uncertainty associated with that result.

Types of Error, Accuracy and Precision
It should be evident from our discussion so far that experimental errors exist and are present (to some extent) in all experimental measurement. Our task as experimental scientists must therefore be to design experiments to produce the best results we can, in spite of such errors, and to assess the uncertainty which these unavoidable errors cause in our measurement.

It is useful to distinguish between two types of experimental error - systematic (or bias) error, and random error, though the distinction can appear to be somewhat artificial in some cases.
Systematic error is experimental error whose numerical value tends to remain constant, or follow a consistent pattern over a number of experimental runs. In the determination of the acid content of vinegar, systematic error would have been introduced if the standard sodium hydroxide was of a lower molarity than it is said to be. Suppose its molarity was determined from the mass of NaOH pellets made up to say 500 cm3. This, apart from being bad practice, would almost certainly introduce systematic error into the determination because NaOH pellets absorb water from the atmosphere very rapidly, when they are being weighed.

The mass of water absorbed would have been interpreted as mass of NaOH and so the true molarity would have been less than that calculated. The molarity of acid determined by titration against this base would turn out to be systematically greater than the true value.

We can generalize a bit and say that systematic error arises from incorrect procedure, incorrect use of instruments, or failure of some value to be what it is assumed to be.

Random error, on the other hand, is experimental error whose numerical value and sign are likely to change from one measurement to the next. The average value for random error is zero, and it is the random error in measurements which cause the frequency distribution of repeated experimental results to follow the normal or Gaussian distribution mentioned earlier.

The error which arises because one does not have ones eyes exactly level with the bottom of the meniscus, when reading a burette, is usually random error. Sometimes the eyes will be above the level and the reading will be greater than the true value, while some times the eyes will be below the level and the reading will be less than the true value (usually by small amounts). Many of you already have a "rule of thumb" for estimating random error in readings such as that of a burette - the error being approximately ± a half of the smallest division. This is a good estimate in some cases, but not so good in others, as is illustrated below:

These two types of errors usually have very different effects on the results of an experiment. To illustrate this, we took 30 results of students doing the determination of the acid content of vinegar on a different lab day from those already given. These results had a mean of 4.57 and a standard deviation of 0.068, (compared to 4.78 and 0.066). The distribution from which these samples are drawn can be represented by the two curves shown below.

Systematic errors account for the shift in the mean from one day to the next, while random errors account for the spread of the values about the mean each day. The random error can be seen to be associated more with the technique and instrumentation used, from the fact that it is about the same each day the experiment is done.

What do you think is the cause of the systematic difference between the results? We say that systematic error affects the accuracy of the result - i.e. its closeness to the 'true value, while ramdom error affects the precision or repeatability of the results. The 'true' value is simply the best available experimental value, in our case. For universal constants such as R, the gas constant, and h, Plank's constant, the true or accepted value is the mean of the values obtained by the best experiments done by different workers.

Coping with Errors in Measurement
Systematic errors can be identified and their effect reduced, by putting some thought into the design of the experiment, to ensure in particular that experimental conditions are controlled, the equipment is appropriately calibrated and that any assumptions which are inherent in the technique, are adequately met in the conditions selected.

In the experiment to determine the acid content of vinegar, good practice would require that each batch of "standard" NaOH is standardized against a good reproducible standard such as pure potassium hydrogen phthalate, and that an indicator which changes near the equivalence point (pH » 8.0) is used. This ensures that the assumption that the acid and base are present in equivalent number of moles, is valid in our experiment. These and other features of the design which are aimed specifically at reducing systematic error are usually not recognized by those who follow the instructions in recipe-like fashion. But how does one decide which of the two lab days had the more accurate result?

If all else fails, a good check on the accuracy of the result would be to have the determination done independently, elsewhere - perhaps by a completely different method in a non-University lab. Agreement between two such independent determinations is very strong support for any claim of accuracy. When it is not possible to have such independent checks done, "internal" monitoring of accuracy can be achieved by determining an accurately known standard of acetic acid with each new batch of reagents used, and by determining this standard as part of the routine on each lab day.

The strategy for dealing with random error is some what different, for it relies mainly on repeating the same experiment a number of times. The rationale for this approach is simply the fact that the average of a number of observations is more precise than a single observation. In the acetic acid determination the mean of 33 values, 4.78, is more precise than the single value obtained by any individual. The standard error in the mean is given by the statistical considerations as

So whereas the single value has a standard error of 0.066, the mean has a standard error of 0.011.

Since the value of 1/√n decreases very rapidly for increasing n, when n is small, it is good practice to repeat the determination a couple of times, if you want to be a bit more confident about your results. That is why it was recommended earlier that you repeat the determination of acetic acid in vinegar at least once or twice before making accusations against the manufacturer.

Medical laboratories usually do their determinations in duplicate or triplicate for this reason, and many of the run "internal" standards with each batch of samples, as a means of checking on changes in the accuracy of the procedure from week to week.

Of course it follows that more precisely calibrated instruments will help to improve precision. A thermometer which has its smallest division as 1°C is likely to have reading errors of about ±0.5°C, (i.e. half of the smallest division), while one with ten divisions per 1°C can be read to about ±0.05°C.

Lastly it should be pointed out that you do not need to have the most accurate and most precise instruments available to do every measurement. For example it is usually sufficient to determine the melting point of a substance to ± a tenth of a degree Celcius to ascertain its purity for most practical purposes in Organic labs. But a thermometer is required which is fairly accurate, because it is the absolute value of the temperature on which the assessment of purity is based.

On the other hand the measurement of the temperature change which accompanies the dissolution of 1 g of potassium nitrate in 100 cm3 of water, (about 0.50°C), requires a more precise measurement, perhaps to ± a hundredth of a degree Celcius. The absolute accuracy of the thermometer is not all that critical because the assessment here is based on a temperature difference.

When you put 2 kg of water into the jacket of a bomb calorimeter, you need to know this with a precision of only ± 1 g to have a 0.05% precision, but when you weigh out 2 g of potassium phthalate to make up standard solutions you need to know the mass to ± 1 mg in order to have the same precision.

Estimating Error in a Single Determination

The random error which one should ascribe to an experimental result has been shown to be ± s. So far we have seen this standard error determined only from statistical analysis of a number of repeated determinations of the experimental result. The question of what to do when you have made only one or two determinations of the experimental quantity, is still unanswered. It is an important question because in most of our experiments we have time to determine the final value only once, even though we may do duplicate determinations of some primary quantities in the procedure.

We do already have some idea of the error associated with the measurement of individual primary quantities like volume, time and mass, (± ½ the smallest division), and it seems reasonable to assume that if we combine these errors, using the methods of statistics, we should end up with a good estimate of the error in the final result. In fact, most of you already make some estimate of error in your final result by taking the sum of the percentage errors in the various primary measurements. This is not quite correct from a statistical point of view. It is in fact the square root of the sum of the squares of the errors, or percentage (relative) errors that really represents the combine error.

The following rules can be used to give a reasonable estimate of the combined error in your final result.

(a) For sums and differences the actual error in the result is the square root of the sum of the squares of the actual errors in the contributing terms:

If,
A(±a) . B(±b) = C(±c)

where a is the actual error in A, etc. then
c = (a2 + b2)½

EXAMPLE        A titre volume which is usually obtained by taking the difference in burette readings before and after titration, will have a combined error of each reading (± 0.05 cm3) given by:

Final reading 18.30 ± 0.05 cm3
Initial reading 5.05 ± 0.05 cm3

Titre volume 13.25 ± ?

c = [(0.05)2 + (0.05)2]½ = 0.07 cm3

Titre volume = 13.25 ± 0.07 cm3

(b) For products and quotients the relative (or percentage) error in the results is the square root of the sum of the squares of the relative (or percentage) errors in the contributing terms.

If,
A(±a) x B(±b) = D(±d)

then,

EXAMPLE        Consider the case of the molarity of acetic acid determined from titration against standard NaOH.

Macid = (MNaOH. VNaOH) / Vacid

If we represent the error in each quantity as D(quantity), we have

Assuming that the error in the 10 cm3 pipette, used to take the sample of acid for titration, is about ± 0.05 cm3; and assuming negligible error in the molarity of NaOH, we have

So the relative error in the acid molarity is 0.0073 or 0.7% (using some actual data taken from a C10 student).
(c) To calculate the error in the final result, arrange the equation (used to get the result) in its simplest form, and combine errors using the rules given above. You may neglect any error (or a relative error) which is less than a tenth of the largest error (or relative error).

EXAMPLE
In the case of the acetic acid determination, we already have established in the above example that ~ 0.7% error exists in the molarity of the diluted acid due to uncertainty of the volumes used in titration. But there is also an uncertainty due to the fact that you took an aliquot of 10.00 ± 0.05 cm3 of the acid and diluted that up to 100.0 ± 0.2 cm3. Both uncertainties contribute to the error in the dilution factor resulting in an uncertainty of about 0.053 in 10.00, the error in the volume of the diluted acid being almost negligible.

The final result which is the mass of acetic acid in 100 g of vinegar is given by mA, where

We have estimated the error in our single value, 4.79, and found it to be approximately ± 0.065. This is really quite a good estimate considering that the value determined from the statistical spread of 33 repeated values was ± 0.066. This agreement between the two is better than what one would normally expect, but it does make the point that good estimates of error can be made for an experimental result by considering the error involved in the individual primary measurement.

It is this "art" which we hope you will develop. Accordingly we expect you to do an error analysis in every experiment you do in the Physical Chemistry Lab, and will treat it as an integral part of your report on lab-work for purposes of assessment.

Significant Figures

Every measurement should be recorded to as many figures as are significant, and no more. Wherever possible the uncertainty of the measurement should be recorded, but if it is not known, it is assumed to be uncertain by ± 1 in the last digit.

Those zeros in a number which locate the decimal point are not significant; 0.00314 and 80200 each have three significant figures. The exponential (scientific) notation, 8.0800 x 104 should be used.

The logarithm of a number should have as many significant figures in the mantissa as there are significant figures in the number.

Example        antilog 0.032 = 1.08. The result shows three significant figures in the mantissa.

Note that in the case of logarithms, the zero of the mantissa is counted.

The number of significant figures of a computed result is limited by the term with the largest uncertainty. In addition and subtraction, the number of significant figures in the result depends on the term with the largest absolute uncertainty.

Example:        1.9546 g + 2.03 g = 3.98 g. The result is expressed only to the nearest 0.01 g because 2.03 is only expressed to the nearest 0.01 g.

Example:        68.7 cm - 68.42 cm = 0.3 cm. This result is expressed only to the nearest 0.1 cm because 68.7 cm is only expressed to the nearest 0.1 cm.

In multiplication and division, the number of significant figures in the result depends on the term with the largest relative uncertainty. The relative uncertainty of a quantity is the absolute uncertainty divided by the value of the quantity.

Example:        0.26 cm x 21.902 cm2 = 5.7 cm3. The result is expressed to two significant figures because the relative uncertainty in 5.7 cm, 0.1/5.7 or 1.8%, is about the same as that in 0.26 cm, 0.01/0.26 or 3.8%. In multiplication and division the number of significant figures in the result is usually equal to that of the term with the least number of significant figures. In some cases, however, the result may have a greater number of significant figures than that of the term containing the least number of significant figures.

Example:        1049 g/94 g mol-1 = 11.2 mol and not 11 mol. The result is expressed to three significant figures rather than to two because the relative uncertainty of 11.2 mol, 0.1/11.2 or 0.9%, is about the same as that for 94 g mol-1, 1/94 or 1.1%, while that of 11 mol, 1/11 or 9.1%, is much larger.

In performing extended calculations, one should carry along one more than the allowed number of significant figures. When the answer is obtained, it can be rounded off to the correct number of significant figures.

In rounding off a number the following rules should be observed:

When a number smaller than 5 is dropped, the last remaining digit is left unchanged: when a number larger than 5 is dropped, the last remaining digit is increased by one digit: When the number to be dropped is a 5 followed by a 0, the last remaining digit is rounded off to the nearest even digit.

Example:        The following numbers are rounded off to 3 figures: 1.7348 becomes 1.73; 1.735 becomes 1.74.