View all publications

8 Data Management Protocols

8.1 Data quality assurance

Quality assurance of monitoring data is intimately linked to the entire air quality monitoring process, from the choice of site, choice of instrumentation, proficiency of staff, calibration and maintenance processes, data storage, and retrieval and analysis systems. The final product (ambient air quality monitoring data) will only ever be as good and reliable as the systems that produce it.

Figure 8.1 presents a flow chart of the essential elements of the quality assurance process that ultimately aims to provide quality assured data as the end product. It shows that the quality assurance process comprises an organisation’s own quality control procedures and factors that lead to a quality output, such as staff training, standard operating procedures, and the use of standard methods for monitoring. Internal quality control is complemented by an external assessment or audit of systems, procedures and processes to provide an appropriate level of confidence in the data being produced.

Quality control is the overall system of technical activities that measures the attributes and performance of a process, item, or service against defined standards to verify that they meet the stated requirements of the output. For air quality monitoring, quality control is used to ensure measurement uncertainty is maintained within acceptable limits, such as those defined by standard monitoring methods and a monitoring agency’s own data quality objectives. The fundamental objectives of a quality assurance/control programme should be as follows.

  • The data obtained from air quality measurement systems is representative of the spatial scale being investigated.

  • A minimum data capture rate of 95 per cent is achieved (refer to Recommendation 24).

  • A minimum of 75 per cent valid data is collected when calculating averages.

  • Measurements are accurate, precise and traceable.

  • Data is comparable and reproducible. Results from a monitoring network are internally consistent and comparable with national, international and other accepted standards.

Figure 8.1: Process elements that provide data quality assurance


S
ee text description of figure 8.1

This chapter outlines the basic process of data quality assurance to achieve an end product of quality-assured data that is ready for further analysis and reporting.

8.2 Documentation and procedures

There is a basic need to document quality assurance procedures, and this should be developed as part of establishing an air quality monitoring site. A quality assurance procedures manual should incorporate the calibration and maintenance documentation discussed in section 7.4 as well as methods for:

  • data quality acceptance criteria

  • data storage procedures, including file creation and archiving systems

  • data handling and adjustment procedures to correct for calibrations, checks and baseline or span drift

  • documentation of any data adjustments and excluded or missing records.

A monitoring agency should define overarching data quality objectives that match the intended use of the data and the purpose of monitoring, such as compliance, air quality research, or screening monitoring studies.

8.3 Principal elements of data quality assurance

This section provides an overview of and guidance on data handling and adjustment to produce a quality assured data set that is then ready to be used for air pollution studies and compliance assessments.

Figure 8.2: Flow diagram of the acceptance process for routine air quality measurements

 

Figure 8.2 presents a flow diagram of the basic process for accepting routine data. It shows that the measurement of a sample to provide a response by an instrument needs to be verified (calibrated) against the response of that instrument to a known concentration (reference material). This calibration data is then used to assess the quality of the sample data (are the measured concentrations real?) before the data is accepted (validated). If data is out of specification then this must inform the quality control process so that system improvements can be made and invalid data excluded from the final data set. The following sections describe some of the essential elements of the data acquisition, quality assurance and validation process.

8.3.1 Data acquisition

Most modern continuous air quality monitoring instruments contain their own data acquisition system (DAS) or datalogger, as well as provision for analogue or digital output of data to an external datalogger. As a result, it is possible to utilise an instrument’s DAS followed by transfer to a laptop to collect and store monitoring data. However, data management becomes unwieldy if there are a number of instruments and/or a number of monitoring sites. It is recommended that an external datalogger (purpose-built or PC-based) be used for all instruments, including meteorological instruments. It is preferable to use digital signals because analogue voltages can vary over time. Use of an external datalogger ensures that all parameters have exactly the same date/time stamp (New Zealand Standard Time) for subsequent inter-comparisons and analyses.

Continuous monitors should still be configured so that their own DAS is recording data in parallel to an external datalogger in case of any data loss through a data-logger fault. An instrument’s internal clock should be synchronised as closely as possible with the external datalogger to prevent any time disparities. If need be, the two sets of data can also be used to check that the external datalogger’s programming and averaging algorithms are correct.

It is necessary to have high-resolution raw data files, and the external datalogger should provide and store, at a minimum, 10-minute averages calculated from finer-resolution instantaneous measurements (eg, 10 seconds). Monitoring site dataloggers will require downloading periodically to a central data archive, and the most efficient method for achieving this for permanent monitoring sites is via a telemetry system, usually several times a day. Newer general packet radio service systems allow an almost continuous transfer of data over cellular networks if the sites are within good communication areas. Many monitoring agencies will already operate a telemetry system for other environmental monitoring data such as hydrology networks.

Manual downloads of data are an acceptable alternative but involve regular visits to the monitoring site that take into account the datalogger’s data storage capacity. Manual downloads still run the risk of missing instrument faults occurring between site visits.

8.3.2 Data storage, archiving and retrieval

All data should be stored in a central database that is regularly backed up. Each monitoring site and parameter should be assigned a unique identifier that enables easy retrieval. It is preferable to store data in such a way that incoming data is appended to the archive file so it can be viewed as a continuous data set. Two parallel data sets should be maintained: one that preserves raw data in its original form and the other that has been quality assured and is available for further analysis. Keeping a raw data set archived means that the data can be revisited and re-analysed if any problems arise with the original quality assurance process (USEPA, 1994).

A variety of software packages are available that can efficiently store air quality data as time-dependent variables and provide for data manipulation, including graphical analysis, the calculation of fixed and moving averages, and exceedence percentiles. Some software also allows for the incorporation of electronic ‘comment’ files that can be viewed alongside data that has been through the quality assurance process, providing an audit trail that would otherwise have to be recorded separately.

8.3.3 Daily data checks

It is essential to screen air quality data by visual examination for spurious and unusual measurements. The main advantage of regular data transfer by telemetry is that the data can be checked at least once a day so that instrument faults, systems failures, data spikes, human error, power failures, interference or other disturbances can be readily identified and promptly remedied to minimise instrument down time and data loss. It is recommended that daily data checks be done (and recorded) for each site that is telemetered (or whenever data is downloaded from untelemetered sites) and notes of events that may affect results (eg, bushfires, dust storms, roadworks, fireworks) recorded.

8.3.4 Instrument checks, calibrations and maintenance

Regular instrument checks, calibrations and maintenance are vital to data quality assurance, along with good site logs and technician notes stating exact times and adjustments made, as these will have to be read alongside archived data when validating or removing suspect data.

Recommendation 21: Data acquisition, storage and data checks

The use of an external datalogger is recommended for all instruments to eliminate one source of variation using analogue connections.

All data should be stored in a central database that is regularly backed up.

It is recommended that daily data checks be done for each telemetered site (or whenever data is downloaded from untelemetered sites), and events noted that may affect results.

8.4 Data adjustment

Timely quality assurance of data is important to keep on top of the incoming data stream. It is recommended that quality assurance be done at least monthly given that the NES for air quality require public reporting of a breach within 30 days of its occurrence.

8.4.1 Applying manual check or calibration results

The use of the term ‘datalogger response curve’ in this section means the values recorded on an external datalogger or the instrument’s internal datalogger, whichever is used for the data processing.

A datalogger’s response curve relates the response of the datalogger to known concentration units of gas. It can be either linear or non-linear. The response of most analysers and dataloggers tends to drift with the passing of time. These two conditions must be addressed in the mechanism that is used to process the raw analyser readings into final concentration measurements. The theory behind this is discussed below.

The response curve is used to convert the datalogger readings to concentration values and is defined by an equation (if a linear equation is used, the slope and intercept are the important components). This curve is updated at each manual check or calibration. Both the unadjusted and adjusted response readings are required for each point on the curve. Each ambient concentration is calculated from individual slope and intercept values. This is determined by linear interpolation between the response curves of the most recent and first subsequent check, as shown in Figure 8.3.

Figure 8.3: Response curves used to calculate actual concentrations from recorded instrument response R(x) at time T(x)

For a known concentration of 40 ppm, the datalogger will give a response of R(T–1) on the curve at time T–1 and a response of R(T) from the curve at time T. Therefore, at time T(x) a concentration of 40 ppm will give a response of R(x) where:

R(x) = slope x time(x) + intercept.

A linear equation is required for all ambient concentrations. Many computer programmes will automatically calculate out the concentrations from the input of two response curves and take into account the time between the curves.

8.4.2 Changes in zero or span values

Changes to baselines and other concentrations which happen gradually, as in the example shown in Figure 8.4, can be resolved by applying the response curve to the raw data, as described in the previous section. This is usually known as a ‘ramp correction’.

Figure 8.4: Example of baseline drift in CO data
 

 

See text description of figure 8.4

Another issue arises when there is a sudden baseline change, as shown in Figure 8.5.

Figure 8.5: Example of a sudden change in data baseline

See text description of figure 8.5

In this instance, applying the ramping method discussed above would not be truly representative of the data. If we just looked at the zero values and assumed the manual check was done at the start and end of the graph (with zero being 0.2 on 1 December and 2.8 on 1 January), then a straight line between the two, as shown in Figure 8.6, would not be representative, as it is obvious something happened on the 15th to cause a change.

Figure 8.6: Incorrectly applying a ramp correction

 

See text description of figure 8.6

This example demonstrates the benefits of daily zero/span checks or similar, such as daily data checks. These checks would show that something had changed and prompt the technicians to physically visit the site to determine what this was. Some options to resolve this situation if regular checks are done include:

  • removing all the data back to the last good zero/span check

  • if the period is short enough, then extrapolation of the response curves may be possible, as shown in Figure 8.7

  • lowering all the data to match the previous batch if zero and span check results indicate that this would be legitimate.

Figure 8.7: Extrapolating a calibration

See text description of figure 8.7

In any of these situations, having a valid reason for data editing is essential before making any changes. The reasons for any changes made should be recorded with the data.

8.4.3 Data correction

Data correction is not always necessary. If it cannot be avoided, it is recommended that the following be considered before correcting data:

  • the primary objective of the monitoring programme

  • the reason(s) for making any correction

  • the duration of the co-location data set and the strength of that relationship (minimum 12 months co-location)

  • the complexity of the airshed (eg, emission sources)

  • seasonality

  • the co-location method (eg, hi-vol, TEOM, Partisol).

In any event, a copy of the raw data should be archived. Corrected data should also be clearly marked as such to inform data users.3

Recommendation 22: Data adjustment

Data quality assurance should be subsequent to multi-point calibrations for gases and done at least monthly given that the NES for air quality require public reporting of a breach within 30 days of its occurrence.

Applying the response curve to raw data can correct gradual changes to baselines but is not recommended when there is a sudden baseline change.

A minimum co-location period of one year is recommended before correcting data. A copy of raw data should be archived, and all corrected data should be marked to inform data users.

8.5 Data validation

Data validation must be carried out at regular intervals (eg, three- or six-monthly) to ensure it is reliable and consistent. The data validation process involves a critical review of all information relating to a particular data set in order to verify, amend or reject the data, and forms the crux of the quality assurance process. A wide range of inputs need to be considered in the ratification process. When the data has been validated, it represents the final data set to be used in the review and assessment process. It is therefore important that the validation process be undertaken very carefully. Steps in the validation process include:

  • examination of check and calibration records to ensure the correct application of check and calibration factors

  • examination of data for other contaminants, meteorological data and other monitoring sites to highlight any anomalies

  • deletion of data known to be spurious (eg, spikes generated by the analyser)

  • removal of data collected during calibration and maintenance, including sufficient time for instrument stabilisation

  • correction of any analyser / datalogger drift, as indicated by examination of zero and span check records.

Factors that need to be considered during data validation include:

  • instrument history and characteristics – has the equipment malfunctioned in this way before?

  • calibration factors and drift – rapid or excessive response drift can make data questionable

  • negative or out-of-range data – is the data correctly scaled?

  • rapid excursions or ‘spikes’ – are such sudden changes in pollution concentrations likely?

  • the characteristics of the monitoring site – is the station near a local pollution sink or source that could give rise to these results?

  • the effects of meteorology – are such measurements likely under these weather conditions?

  • time of day and year – are such readings likely at this time of day/week/year?

  • the relationship between different contaminants – some contaminant concentrations may rise and fall together (eg, from the ‘same source’)

  • results from other sites in a network – these may indicate whether observations made at a particular site are exceptional or questionable

  • occurrence of anomalous events such as bushfires, volcanic eruptions and fireworks displays during Chinese New Year and Guy Fawkes' night.

A robust understanding of air contaminant chemistry, air pollution meteorology, local emission sources and instrument calibration processes is required to provide good data validation.

8.6 Negative data

Every instrument has an uncertainty associated with each measurement. This is normally described as ± a specific value (eg, the FH62 BAM is reported as being ± 9 ug/m3 at a 10‑minute average concentration). This means that at very low ambient concentrations, it is conceivable that the FH62 BAM could report a result of –9 ug/m3 as a 10-minute average. Likewise, most calibration and datalogging systems will also have an uncertainty measurement. It is necessary to calculate the total of all the uncertainties for the entire operation to determine what the overall uncertainty for the data is.

Because there is no such thing as negative PM10 (or a negative gas concentration for that matter), it can be very tempting to simply delete any result below zero. Unfortunately, removing all negative data from the data set (or replacing the negative data with zero) will artificially increase the ambient concentration, although the increase when averaged over 24 hours will normally be very minor. Instead of deleting negative data, it is recommended to leave negative data in the data set where such data is within the expected system uncertainty.

Occasionally, large negative spikes may occur due to instrumental error. These negative (and positive) spikes should be reviewed during the data analysis process to evaluate whether they are real or spurious. Unless there is good evidence to remove a value, it should be left in and a comment made in the metadata.

Inadequate or faulty heating of the inlet air on some particulate monitors (most commonly seen on BAMs) can allow moisture to affect the sample, giving rise to large positive spikes, normally followed by large negative spikes. In such cases, care should be taken not to remove the large negative spike and leave the corresponding positive spike, as this will artificially increase the resulting concentration. Instead, it is recommended that both spikes be removed as invalid data, the temperature sensors checked for faults, and the inlet temperature set to 40 degrees.

Recommendation 23: Negative data

Negative and positive spikes should be reviewed during the data analysis process to evaluate whether they are real or spurious. Unless there is good evidence to remove a value, it should be left in and a comment made in the metadata.

Where negative values are within the expected error of the instrument, they should be retained within the data set to avoid creating a positive bias in the final result.

Where large negative spikes are observed in the data record from some particulate monitors, check to see whether a large positive spike is also present. If both a large positive and a large negative spike are present, then remove both spikes as invalid data and check the inlet temperature sensors for faults.

8.7 Missing data

No monitoring record is ever complete. There will inevitably be periods of missing data – some deliberate and necessary, such as calibration periods – but most unforeseen, such as equipment failures, power outages, bias and drifts. Even in the most diligently operated monitoring networks it is difficult to reach anything close to 100 per cent valid data for long-term monitoring.

Note that calculation of data capture normally excludes down time for routine calibrations and maintenance while the per cent valid data calculation includes this down time. Slight bias, drifts or calibration shifts can often be dealt with, but complete outages need special consideration.

The diagram below shows an example of a data capture rate and per cent valid data calculation.

The diagram below shows an example of a data capture rate and per cent valid data calculation.

  24-hour average = 24 1-hour averages
If
  power cut = 1 hour (1 1-hour average)
   calibration = 2 hours (2 1-hour averages)
  valid data points = 21 hours (21 1-hour averages)
  then
per cent valid data for averaging = 21/24 = 88%
data capture rate = 21/24-2 = 95%
data loss = 1/24 = 4%

Interpolation or extrapolation to fill in missing data should not be used in the process of producing a basic quality-assured data set, and the missing data should be left as a gap. If a gapless data set is required for a specific purpose (eg, dispersion modelling), then it should be constructed for that purpose alone using whatever interpolation or extrapolation is considered valid.

Recommendation 24: Per cent valid data and data capture rate

Sites used for compliance monitoring should achieve at least:

  • 75% valid data for averaging

  • 95% data capture.

Per cent valid data for averaging = number of valid data points obtained
                                               total number of data points in the averaging period

Data capture rate = number of valid data points obtained
                total number of data points for the period – calibration/maintenance data points

8.8 Monitoring site metadata

Documented site metadata is necessary to interpret air quality monitoring results. This is because the interpretation of data from any air quality monitoring site needs to take into account the site situation and its implications. It is also useful to have a general description of the site characteristics and any local sources of air contamination.

Recommendation 25 lists the type of information that should be recorded about the site that can influence air quality and monitoring. These lists provide the minimum amount of information that should be recorded about a monitoring site. Information should be recorded either in hard copy or in a database.

 

Recommendation 25: Site metadata

Documented site metadata should be used when interpreting air quality monitoring results.
Metadata should be recorded either in hard copy or in a database.

Parameter Explanation
Monitoring site metadata
Indicators/contaminants monitored List all the contaminants that have been or are being monitored at the site
Site code Site code specified by monitoring agency
Site title Common name of site; eg, Taihape
Location Street address of site
Region For example, Southland region
Co-ordinates New Zealand Mapping Series/Grid reference preferable (latitude and longitude optional)
Equipment owner’s name/s Name of party/ies who own the equipment at the site
Land owner’s details Name/s and contact information of land owner/s
Equipment housing For example, shed, lab, air conditioning
Housing environment For example, air conditioning at 25oC
Monitoring objectives For example, to determine population exposure in high-density areas where air quality is suspected to be poor
Site topography For example, there are hills 1 km to the southwest; to the north are high-rise commercial buildings
Location and description of major emission sources This should include information on the nature, location and distance to predominant sources (eg, roads, factories, domestic fires)
Site category See section 6.2
Scale of representation See section 6.2
Site height above sea level  
Electrician and air conditioner service person contact details  
Photographs of the site  
Meteorological site metadata
Meteorological variables measured For example, wind speed, wind direction, temperature at height at which they are measured
Meteorological data operator Person who operates the met station
Location of meteorological site For example, on site up mast 6 m high, at the neighbouring airport
Meteorological data information Where met data can be obtained (eg, met service, regional council)
Regional and local meteorological characteristics A brief description of met conditions likely to affect air quality at the site (eg, inversions, prevailing wind direction)
Contaminant metadata
Contaminant For example, PM10, NO2
Data owner Name of organisation that actually owns the data recorded by the equipment
Instrument/s Name and any other details of the instrument/s (make, brand, serial number and model). If using more than one instrument, include both in this section
Period of operation Dates when equipment was operated (eg, 12.3.2001 to 8.5.2003 or 11.3.2008 ongoing)
Method Details of the standard method followed to operate the equipment (eg, USEPA or AS/NZS standard). Also, include details of any deviation from the standard method (eg, conditioning and weighing of filters)
Data logging For example, remote via modem, or not used
Data storage Describe how the data is stored by the data owner
Sampling period How often the concentration is sampled and measured
Sampling probe height Height of probe above ground
Calibration frequency Summarised details of equipment calibration
Per cent valid data Amount of data that has passed quality assurance checks (see Recommendation 24), recorded on a yearly basis
(eg, 2005 – 95%; 2006 – 98%; 2007 – 96%)

Records (description, time and date) should also be kept of any unusual events that may affect air quality, such as scrub fires, power shortages (resulting in an increase in the use of domestic fires), weather extremes, volcanic eruptions, factory fires, roadworks, firework displays.

8.9 Monitoring units

It is often necessary to convert measurements between various units, and methods for carrying this out are discussed in the next section. However, such conversions depend on the temperature and pressure used. In some applications, the correct specification of an appropriate temperature and pressure is vital, particularly where compliance with guidelines or standards is being assessed. Ambient air quality guidelines and standards are specified at a wide range of temperatures for various historical reasons, and there is little consistency. Some of these are:

  • NES for air quality – not specified

  • AAQG – use 0°C

  • Australian Environmental Protection Measures – use 0°C

  • USEPA – uses 25°C

  • conventional engineering practice (eg, American Society of Heating, Refrigerating and Air-Conditioning Engineers) – uses 0°C

  • stack testing – often referenced at 0°C

  • WHO – uses 0°C

  • Organisation for Economic Co-operation and Development Indicators guidelines – specify 0°C (Ministry for the Environment, 2000).

For the purposes of monitoring and recording air quality data, it is recommended that the following units be used:

  1. Gases may be recorded and archived as ppm or ppb, with conversion to mg/m3 or µg/m3 at 0°C for reporting purposes against standards or guidelines as necessary (see section 8.10 for a discussion of conversion factors).
  2. PM10 should be recorded and archived as µg/m3 at 0°C.

Care should be taken to understand an instrument’s data reporting software protocols and alter these as necessary. For example, monitoring instruments manufactured in the US are likely to have a default correction setting of 25°C if mass concentration units are used. Instrument output in ambient mass concentration units (mg/m3 or µg/m3) is usually calculated using internal pressure and temperature sensors. This can be avoided by the use of volume units (ppm and ppb) for gaseous contaminants where the volume ratio (volume contaminant/volume of air) is constant at all temperatures and pressures.

Recommendation 26: Monitoring units

The recommended units for recording and archiving the monitoring results of gases are parts per million (ppm) or parts per billion (ppb), with conversion to mg/m3 or µg/m3 at 0oC for reporting purposes.

PM10 results should be recorded and archived as µg/m3 at 0oC.

8.10 Conversion factors

Concentrations of air contaminants may be measured by volume or mass. Most analysers measure by volume. Volume measurements, such as parts per billion (ppb) or parts per million (ppm), are independent of temperature and pressure and are the recommended unit for recording and archiving gaseous air contaminant data. Concentrations by mass, such as mg/m3 or µg/m3, refer to the weight of a gas or particulate contaminant in a cubic metre of dry air, and recorded values are dependent on ambient temperature and pressure at the time.

8.10.1 How mass occupies volume

The Ideal Gas equation is written as:

PV = nRT

where:

  P = pressure (kPa)

  V = volume (m3)

  n = number of moles of gas

  R = universal gas constant (8.3144 J/Kmol)

  T = temperature Kelvin (K)

and:

n = m/Mr

where:

  m = mass of gas (mg or µg – see later)

  Mr = relative molecular mass (g/mol).

  This gives:

PV = mRT/Mr

which can be rearranged to:

m/V (mg/m3 or µg/m3) = PMr/RT (ppm or ppb respectively).

This allows you to take account of temperature, molecular mass and pressure. Pressure is usually taken as 101.325 kPa, as it does not markedly change the factor but may need to be considered at some elevated locations.

8.10.2 Conversion calculations

Mass per unit volume (mg/m3 or µg/m3) is the unit required by the NES for air quality in reporting contaminant concentrations, and by the ambient air quality guidelines for recommended ambient concentrations.

Volume per unit volume (mass per unit mass) ppb is either ppbv or ppbm. If not stated, usually ppbv is used. Most instruments’ output is in ppbv. For gaseous contaminants, the conversion between the ppb and µg/m3 (or ppm and mg/m3) units depends on the molecular weight of the gas and temperature of the gas.

1 ppb (vol) of contaminant

= 1 litre of contaminant / 109 litres of air

= 1 litre x MW x 109 μg/gm x 10-3
   22.41 x 109 litres x (T/273) x 10-3 m3/litre

= MW x 273
   22.41 x T

= 0.0409 x MW μg/m3 (at 25°C)

= 0.0416 x MW μg/m3 (at 20°C)

= 0.0423 x MW μg/m3 (at 15°C)

= 0.0431 x MW μg/m3 (at 10°C)

= 0.0446 x MW μg/m3 (at 0°C)

 

Where MW is the molecular weight of the contaminant, 22.41 is the average molecular volume for dry air, and T the temperature of the gas in degrees Kelvin. This relationship falls down if there is significant moisture in the air. Also, only the most common isotopes are assumed.

EXAMPLE – 10 ppb O3

  MW = 3 x O = 3 x 16 = 48

  at 0°C multiply by 0.0446 x 48 = 2.14

  therefore 10 ppb O3 = 21.4 μg/m3.

EXAMPLE – 20 ppb SO2

  MW = 1 x S + 2 x O = 32 + 2 x 16 = 64

  at 0°C multiply by 0.0446 x 64 = 2.85

  therefore 20 ppb SO2 = 57 μg/m3.

Note that these examples have taken a rounded-off figure for the molecular weights. Typically these are not integers, since the elements are made up of different isotopes with different atomic weights. In practice, this is a very small (typically less than 1 per cent) difference and can be omitted, given that the measurement uncertainties are almost certainly larger than this.

Appendix G shows the conversion factors for various gases and temperatures.

 


3 Minutes from the Beta Attenuation Monitor Workshop, Hawke’s Bay Regional Council, Napier, 17 March 2008.