Nonlinear Dimensionality Reduction: Alternative Ordination Approaches for Extracting and Visualizing Biodiversity Patterns in Tropical Montane Forest Vegetation Data

Abstract

Ecological patterns are difficult to extract directly from vegetation data. The respective surveys provide a high number of interrelated species occurrence variables. Since often only a limited number of ecological gradients determine species distributions, the data might be represented by much fewer but effectively independent variables. This can be achieved by reducing the dimensionality of the data. Conventional methods are either limited to linear feature extraction (e.g., principal component analysis, and Classical Multidimensional Scaling, CMDS) or require a priori assumptions on the intrinsic data dimensionality (e.g., Nonmetric Multidimensional Scaling, NMDS, and self organized maps, SOM). In this study we explored the potential of Isometric Feature Mapping (Isomap). This new method of dimensionality reduction is a nonlinear generalization of CMDS. Isomap is based on a nonlinear geodesic inter-point distance matrix. Estimating geodesic distances requires one free threshold parameter, which defines linear geometry to be preserved in the global nonlinear distance structure. We compared Isomap to its linear (CMDS) and nonmetric (NMDS) equivalents. Furthermore, the use of geodesic distances allowed also extending NMDS to a version that we called NMDS-G. In addition we investigated a supervised Isomap variant (S-Isomap) and showed that all these techniques are interpretable within a single methodical framework. As an example we investigated seven plots (subdivided in 456 subplots) in different secondary tropical montane forests with 773 species of vascular plants. A key problem for the study of tropical vegetation data is the heterogeneous small scale variability implying large ranges of $β$-diversity. The CMDS and NMDS methods did not reduce the data dimensionality reasonably. On the contrary, Isomap explained 95% of the data variance in the first five dimensions and provided ecologically interpretable visualizations; NMDS-G yielded similar results. The main shortcoming of the latter was the high computational cost and the requirement to predefine the dimension of the embedding space. The S-Isomap learning scheme did not improve the Isomap variant for an optimal threshold parameter but substantially improved the nonoptimal solutions. We conclude that Isomap as a new ordination method allows effective representations of high dimensional vegetation data sets. The method is promising since it does not require a priori assumptions, and is computationally highly effective.

Publication
Ecological Informatics
Miguel D. Mahecha
Miguel D. Mahecha
Professor for Earth System Data Science

Professor