Data analysis and model-data comparisons in the environmental sciences require diagnostic measures that quantify time series dynamics and structure, and are robust to noise in observational data. This paper investigates the temporal dynamics of environmental time series using measures quantifying their information content and complexity. The measures are used to classify natural processes on one hand, and to compare models with observations on the other. The present analysis focuses on the global carbon cycle as an area of research in which model-data integration and comparisons are key to improving our understanding of natural phenomena. We investigate the dynamics of observed and simulated time series of Gross Primary Productivity (GPP), a key variable in terrestrial ecosystems that quantifies ecosystem carbon uptake. However, the dynamics, patterns and magnitudes of GPP time series, both observed and simulated, vary substantially on different temporal and spatial scales. We demonstrate here that information content and complexity, or Information Theory Quantifiers (ITQ) for short, serve as robust and efficient data-analytical and model benchmarking tools for evaluating the temporal structure and dynamical properties of simulated or observed time series at various spatial scales. At continental scale, we compare GPP time series simulated with two models and an observations-based product. This analysis reveals qualitative differences between model evaluation based on ITQ compared to traditional model performance metrics, indicating that good model performance in terms of absolute or relative error does not imply that the dynamics of the observations is captured well. Furthermore, we show, using an ensemble of site-scale measurements obtained from the FLUXNET archive in the Mediterranean, that model-data or model-model mismatches as indicated by ITQ can be attributed to and interpreted as differences in the temporal structure of the respective ecological time series. At global scale, our understanding of C fluxes relies on the use of consistently applied land models. Here, we use ITQ to evaluate model structure: The measures are largely insensitive to climatic scenarios, land use and atmospheric gas concentrations used to drive them, but clearly separate the structure of 13 different land models taken from the CMIP5 archive and an observations-based product. In conclusion, diagnostic measures of this kind provide dataanalytical tools that distinguish different types of natural processes based solely on their dynamics, and are thus highly suitable for environmental science applications such as model structural diagnostics. © 2016 Sippel et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.