When detecting trends it is ideal to have estimates of flow to accompany each water quality measurement. This is because many water quality analytes are subject to either dilution or increases due to overland runoff during high flows. Of the water quality measurements we obtained for this study, many were not accompanied by measurements of flow. This meant that we needed to estimate flows at ungauged sites on particular dates. Estimation of flow at ungauged sites can be achieved using deterministic models (e.g. TOPNET). However, deterministic models rely on accurate rainfall time-series as model input, and require flow data sets for calibration. There are no such deterministic flow models available with national coverage for New Zealand. We therefore devised and tested three empirical methods for estimating flow at an ungauged site on a particular date.

For this method we attempted to describe the frequency distribution of flows on each day of the year (Julian day) for many sites with relatively long flow records. The L-moments and the parameters describing a Generalised Extreme Value distribution were calculated from flows observed at each gauge in the NZ Hydrometric Network with five or more years of data (n = 264), for each Julian day. Frequency distributions were then generated for each gauge and each Julian day (n = 100). The median flow was then calculated from each of these frequency distributions. We located the nearest gauging station in Euclidean space that shared the same REC Climate and *Source-of-flow* class as each water quality site of interest. We estimated the flow for each water quality measurement as the median flow estimated for the appropriate Julian day from this nearest gauging station.

For this method we utilised information on mean flows and seasonal patterns of flow that have been previously estimated for all rivers in New Zealand. The mean flow for each water quality sampling site was taken from the REC (Woods *et al.*, 2006). For each water quality measurement, the estimated mean flow was then multiplied by the proportion of flow in the appropriate REC *Source-of-flow* class for the month of the year when the water quality sample was measured (Woods, NIWA unpublished data). For this method the estimated flow for each month of the year is the same.

For each water quality site and each date when water quality had been measured we identified a substitute gauging station. The substitute gauging station was defined as the nearest gauging station in Euclidean space that shared the same REC Climate and *Source-of-flow* class and that also had a record of flow on the date of interest. The flow recorded on the date of interest at the substitute gauging station was standardised by dividing by mean flow for the entire gauged period. Standardised flows (i.e. recorded flow divided by mean flow) were sufficient for the purpose of flow adjustment because we were interested in the relative changes in flow on different water quality measurement occasions, rather than absolute flow magnitudes.

Read a description of this figure

Histograms showing R-squared values for three flow methods: ‘*EstimatedMedianFlo*w’, ‘*MeanMonthlyFlow*’ and ‘*StandEstimatedFlowOnDate*’ for 260 sites. The ‘*StandEstimatedFlowOnDate*’ method had the greatest percentage of sites with high R-squared values resulting from the regressions.

Approximately 760 sites with water quality data were located on rivers rather than estuaries or lakes. Of these, 260 also contained at least two observations of flow. Figure A.1 shows r-squared values for linear regression of observed flow against estimated flow in log-log space, for each of three methods used to estimate flow, for each of these 260 sites. Higher values of r-squared indicate better estimation of the observed flows. The ‘*StandEstimatedFlowOnDate*’ method performed better than the other two methods. Many locations had high r-squared values, indicating that the patterns of flows were well estimated. However, there was a wide range of r-squared values across sites for this method. R-squared is a function of both the number of observations and closeness of fit; therefore lower r-squared values were calculated for locations with fewer flow observations and also locations where the hydrological regime (flow pattern) was poorly estimated.

The ‘*StandEstimatedFlowOnDate*’ method of flow estimation was purely empirical. No physical processes controlling hydrology other than those associated with REC Climate and *Source-of-flow* class were used in the analysis. This meant that the method took no explicit account of catchment size, altitude slope or network configuration. To test the three flow estimation methods, we used daily flows measured on the same day as water quality data. There may have been some bias in testing, as water quality sites with measured flows may have been located nearer to flow gauges than water quality sites without measured flows.