7.1 Classification definition decisions
7.1.1 Pilot classifications
The ranking of the candidate environmental variables derived from the validation analyses was used to subjectively select a reduced set of variables for subsequent development of pilot classifications (Snelder et al. 2004). In order to define pilot classifications, we eliminated variables with low rankings because of their relatively poor ability to discriminate biological patterns. This included two variables for which we had major concerns regarding their reliability: sediment type and freshwater fraction. The three bed-shape variables were also eliminated. However, we added slope, reflecting our subjective judgement as to its likely importance in differentiating variation in at least some marine communities. These subjective decisions were tested by the classification tuning process that is described below.
Weatherhead and Snelder (2003) showed that there were particular problems with the inclusion of sediment type in the prototype classification. The sediment data layer is based on the 1:6,000,000 scale regional sediment chart which is low resolution relative to the other variable layers and based on a categorical subdivision. Although we experimented with different methods of including this variable in the classification, we found that the classification was always dominated by sediment patterns when it was included (Weatherhead and Snelder 2003). This dominance was out of proportion to sediment's actual value as a predictor as shown by the validation analyses. We concluded that the resolution of the existing sediment data layer is too low and that, until there is a better source of data, sediment should be excluded from the classification.
Various pilot classifications of the EEZ were developed based on the following variables: depth, wintertime SST and mean annual solar radiation, slope, mean orbital velocity, annual amplitude of SST, spatial gradient annual mean SST, and tidal current (Snelder et al. 2004). In addition, Snelder et al. (2004) suggested transformations and weighting of some variables in the definition of the pilot classifications based on subjective decisions that were guided by inspection of the mapped classifications.
7.1.2 Tuning the classification
Leathwick et al. (2004) used Mantel tests to test the decisions included in the pilot classifications and made some small changes to tune the classification in accordance with the criteria set out for defining the classification. Leathwick et al. (2004) found that transformations of some variables and weighting of depth improved the classification's correlation with the available biological data. The Mantel tests also justified the inclusion of slope as a measure of bed shape over the three bed variables that were included in the validation analyses. In addition, Mantel tests indicated that some small gains in correlation could be achieved by adding some of the variables that had been omitted from the pilot classification for some biological datasets but not others. However, the overall benefit (i.e. averaged across all datasets) of adding any of the omitted variables at either scale of analysis was negligible and the tests provided little evidence that the classification of the EEZ would be improved by adding further variables.
The final decisions for the appropriate transformation and weighting of variables were guided by the results of the Mantel tests (see Table 7). However, because there was a lack of biological data in many parts of the environmental domain, the test information was supplemented with inspection of trial classifications in order to make final decisions (Leathwick et al. 2004).
Table 7: The variables, transformations and weightings used to define the EEZ classification
Mean annual solar radiation
Annual amplitude of SST
Spatial gradient annual mean SST
Mean orbital velocity
The most important set of decisions made by the tuning analysis were those concerning Depth. Mantel tests examined the change in correlation (delta-r) for transformations and weighting of Depth based on the eight-variable (pilot classification). The results of Mantel tests (Figure 13) showed improvements (i.e. positive delta-r) with various transformations of depth. When the results are averaged over all datasets, a fourth root transformation maximised the correlation. However, this relatively severe compression of depth was not eventually used and a more muted square root transformation was chosen. This decision was based primarily on inspection of mapped trial classifications that indicated that strong compression of depth resulted in the classification having little discrimination of environmental variability over a large part of the domain where depth were greater than 1000 m (Leathwick et al. 2004). We considered that this subjective decision was justifiable given the relative absence of fish data for depths greater than 1500 m, particularly given that this accounted for well over half of the spatial domain.
Figure 13: Mantel test results showing the change in correlation (delta-r) for various transformations of depth for the three biological data sets and at two scales (for the fish and chlorophyll data)
A bar graph shows the effect on delta-r of four transformations of depth: square root, cube root, fourth root and log10. Six data sets are compared: the full EEZ fish set, the sub EEZ fish set, the Full EEZ chlorophyll set, the sub EEZ chlorophyll set, the shelf set and the average of the five previous sets.
Note: For the first four datasets, whiskers show the standard deviation of delta-r values for the geographic sub samples. For the shelf data, whiskers show the 5% and 95% confidence bounds. Note that variability is higher for the sub-EEZ scale results indicating that there are large geographical differences in the effect of the transformations.
Mantel tests also provided some confidence for weighting depth. However, the tests showed that correlations reduced with increasing weighting of depth for the chlorophyll dataset at the sub-EEZ scale (Figure 14). This result indicated that the depth weighting detrimentally mutes the other variables that are important correlates with chlorophyll at the sub-EEZ scale. It was therefore decided that compromise position would be to apply a double weighting of depth. This acknowledges the consistent importance of depth at the whole EEZ and sub-EEZ scales, but seeks to minimize the muting of the spatially patchy variables that are important at scales smaller than the EEZ. The final decisions for definition of the EEZ classification are shown in Table 7.
A bar graph showing the effect on delta-r of two weightings of depth: 2x and 3x. Six data sets are compared: the full EEZ fish set, the sub EEZ fish set, the Full EEZ chlorophyll set, the sub EEZ chlorophyll set, the shelf set and the average of the five previous sets.
Note: Tests were performed for three biological data sets and at two spatial scales (for the fish and chlorophyll data). For the first four datasets, whiskers show the standard deviation of delta-r values for the geographic sub samples. For the shelf data, whiskers show the 5% and 95% confidence bounds.
Figure 15 shows the EEZ classification at four different hierarchical levels and illustrates the subdivision of environmental variation at successive (hierarchical) levels of classification detail. Each class is labelled by a number, which has no specific meaning but is associated with the order in which groups of cells are agglomerated by the clustering procedure. Table 8 shows the average value of each of the variables used to define the EEZ classification at the 20-class level. Inspection of this table indicates that classes are distinctive from one another with respect to at least one variable. Table 8 also shows the how the classification has differentiated environmental variation at the 2, 4, 9 and 20-class levels. The division at the two-class level occurs between classes 273 and 12 (bold line on Table 8). This level subdivides the relatively coastal environments from the deeper oceanic environments (see Figure 15). Within the oceanic environments, further divisions occur at the four-class level that are associated with differences in the mean annual solar radiation and SST winter (thin solid line on Table 8). These subdivisions approximately define the subtropical shelf and sub-tropical front, and the sub-Antarctic waters.
The nine-class level further subdivides the subtropical waters into deep and abyssal, the shelf and sub-tropical front waters into the deep sub-tropical front, and central continental shelf and southern continental shelf (dotted lines on Table 8). The nine-class level also subdivides the coastal environment into three class that are associated with differences in the mean annual solar radiation and SST winter, northern, central and southern continental shelf (dotted lines on Table 8).
Maps of the EEZ modelling environmental classes (differentiated by colour) at four different levels of classification: 2, 4, 9 and 15 classes.
Table 8: Average values for each of the eight defining environmental variables in each class of the 20-class level of the EEZ classification
The 20-class level (Figure 16) further defines variation in the shallow coastal environments. The following environments are discriminated; class 58 - high tidal current, class 60 - middle mid depths, class 64 - middle shallows, class 124 - high wave energy coastlines, class 130 - Marlborough Sounds, class 169 - Southland current, class 190 - Southland front.
The relationships between classes are described in greater detail by the dendrogram on Appendix 1, Figure A1.1. The dendrogram shows how the classes are progressively amalgamated to form a single large group. Note that the class numbers are assigned during the clustering procedure and are derived from the order in which amalgamation of the groups occur.
Although the classification is generally used at a set number of classes (e.g. 9-classes), the numerical procedure treats variation in a continuous manner. The proximity of any two classes or even grid cells (i.e. locations) can, therefore, be described as an environmental distance. To illustrate this we generated an alternative continuously varying colour scheme to reflect environmental distances. Details of this approach are set out in Snelder et al. (2004) and briefly described below.
A principle components analysis (PCA) of the environmental variables was performed on the mean values of each environmental variable (the cluster centroids) for each of the 290 classes generated by non-hierarchical classification. Each class was assigned varying levels of red, green or blue colour based on the value of the first three principle components of the PCA analysis. Hence each classification group is assigned a colour based on its position in a three dimensional configuration so that the closer the proximity of two groups the more similar their colours will appear. The colour assigned to each PCA axis was chosen to make intuitive sense. Thus, for the EEZ classification, blue was assigned to the first PCA axis that was correlated with the variables depth, tidal current and mean orbital velocity. Thus the bluer areas are deeper, with higher slopes, and lower tidal current and mean orbital velocity. Red was assigned to the second PCA axis, which was correlated with SST winter and annual mean surface solar radiation. Thus the redder areas have higher values of these two variables. Green was assigned to the third PCA axis which was most correlated with slope and SST gradient. Thus the greener areas are associated with the higher slopes and areas of high SST gradient. Figure 17 shows the resulting map.
A map of the EEZ modelling 20 environmental classes (differentiated by colour).
A map of the EEZ displays a colour scheme based on principal component analysis on the 290 classes generated by non-hierarchical classification. Each class was assigned varying levels of red, green or blue colour based on the value of the first three components in the analysis. Bluer areas are deeper, with higher slopes and lower tidal current and mean orbital velocity. Redder areas have higher SST winter values and annual mean surface solar radiation. Greener areas are associated with higher slopes and higher SST gradients.
Note: Bluer areas are deeper with lower tidal current and mean orbital velocity. Redder areas have higher values of SST winter and annual mean surface solar radiation. Greener areas are associated with the higher slopes and areas of high SST gradient.
Each cell has been coloured according to the mix of blue, red and green associated with the location of its cluster centroid on the first, second and third axes of the PCA. The map shows sharp colour boundaries where environmental characteristics have abrupt changes and shows the continuous nature of variation in environment across the spatial domain.
7.3 Classification strength of the EEZ classification
A full description of the biological testing of the EEZ classification is contained within Leathwick et al. (2004). Because large parts of the environmental domain were not represented by the biological datasets not all the classes that are defined at any given level of the classification could be tested. ANOSIM analyses were performed on classes at each level of the classification provided classes had at least five biological samples. Thus, with the fish dataset 14 classes could be tested at the 20-class level and 20 classes could be tested at the 50-class level. A much larger proportion of the environmental classes had adequate samples when using the chlorophyll a dataset, i.e. 16 groups had adequate biological data at the 20-class level and around 23 at a 50-class classification level. For the 274 benthic invertebrate sites represented in the shelf dataset, 9 and 16 sites had sufficient biological data for testing at the 20- and 50-class levels respectively.
ANOSIM r-values generally increased for all datasets as the classification detail was increased, indicating that lower levels of classification defined more biologically distinctive environments. However, for the fish and chlorophyll a datasets the increase in classification strength was minimal from about 20-classes on because the number of testable classes begun to plateau.
ANOSIM r-values for the fish dataset were significant at p < 0.01 for all levels of the classification up to 50 classes. This indicates that classes that are distinctive with respect to their fish assemblages are defined at all the tested levels of the classification. Indeed, the individual pair-wise comparisons of fish communities at the 20-class level indicated that all but 73 of the 78 potential contrasts are significantly different in their biological composition (p < 0.01). For the chlorophyll dataset ANOSIM r-values increase steadily from the five-class level and stabilised at about the 45-class level. All r-values were significant at p < 0.01. Examination of the 105 possible pair-wise comparisons for the 15 classes with available chlorophyll data (at the 20-class level of the classification) indicated that all but 13 are significant at p < 0.05.
For shelf dataset (274 benthic invertebrate sites) 9 and 16 sites had sufficient biological data for ANOSIM tests at the 20- and 50-class levels respectively. ANOSIM r-values were low at low classification levels, but increased rapidly up to a 20-class level, and more slowly thereafter. The r-values for classification levels with less than 15 classes were not statistically significant, i.e. at the 15-class level of classification no significant differences in benthic invertebrate composition were apparent. Although the overall ANOSIM r-value was significant at the 20-class level, a lower classification strength than for the other biological groups was also evident. Examination of the 36 possible pair-wise comparisons for the nine classes with available chlorophyll data (at the 20-class level of the classification) indicated that 16 of the possible comparisons were non-significant at a 5% level. Thus, it can be concluded that the strength of the classification, at any given level, is relatively lower for invertebrates than for fish and chlorophyll.