View all publications

Appendix I: Notes on the Upper Confidence Limit

The upper confidence limit (UCL) is a statistical term that can be calculated for data collected from a statistically designed soil-sampling programme. The method for calculating the UCL will depend on the data distribution. Soil samples collected from a statistically designed programme are taken to be representative of the actual environmental conditions onsite (ie, samples collected are a subset of the actual site conditions, but represent the whole site).

The 95% confidence interval (or error) is the region about the sample mean that is likely to contain the underlying population mean (representing the whole site itself) with a probability of 95%. Confidence intervals of 80%, 90%, 99%, 99.5%, etc. can be similarly defined. In other words, based on the samples collected, there is a probability of only 5% (1 in 20) that the population mean for the entire site will fall outside the boundaries defined by the 95% confidence interval. The confidence interval is dependent on the sample size, with the interval estimate providing an indication of how much uncertainty there is in the estimate of the true mean. The larger the population size (recommended n > 30), the narrower the confidence interval.

The 95% confidence limit is simply the mean plus or minus the confidence error, giving an upper and lower confidence limit, respectively. For contaminated sites, the UCL is naturally of more interest than the overall confidence interval, and further discussion focuses on this.

In order for an estimated UCL to be valid, the method selected has to be appropriate for the underlying distribution. Under some circumstances, such as broad-field horticultural soil sampling away from any spray shed or mixing area, the distribution of contaminants is likely to be normal. In these cases, the UCL can be calculated from either the Student's t-distribution (n < 30) or the normal distribution (n > 30), depending on the number of samples. However, at most contaminated sites the distribution of contaminants is not normal. The more frequent pattern is a cluster of low to mid-range results containing most observations, along with a smaller group of results containing very high observations, representing the most contaminated areas. In these cases, a number of approaches are possible for estimating the UCL.

A software package designed to assist in calculating the confidence interval and UCL, called ProUCL, is provided free of charge by the US EPA. This provides 10 possible methods for calculating the UCL, and can also be used to check data normality. At the time of writing, it is available from: www.epa.gov/esd/tsc/TSC_form.htm.

However the UCL is estimated, it is worth noting that the value does not represent the 'worst case scenario' for a site but the limit above which the site average is unlikely to occur. As such, it can form a useful part of a statistical summary, but is not the final word on contamination. Valuable uses for a properly derived UCL are:

  • conservative estimation of long-term (chronic) exposure risk, where the nature of contaminants and the concentrations are such that short-term (acute) exposure is not an issue - UCLs are appropriate for this because long-term risk relates to averaged and aggregated exposure
  • comparing results to a (long-term) guideline value - as a rule of thumb, the site will be acceptable if the 95% UCL is at or below the guideline, provided no result is more than twice the guideline value.