Climate extremes can trigger exceptional responses in terrestrial ecosystems, for instance by altering growth or mortality rates. Such effects are often manifested in reductions in net primary productivity (NPP). Investigating a Europe-wide network of annual radial tree growth records confirms this pattern: we find that 28% of tree ring width (TRW) indices are below two standard deviations in years in which extremely low precipitation, high temperatures or the combination of both noticeably affect tree growth. Based on these findings, we investigate possibilities for detecting climate-driven patterns in long-term TRW data to evaluate state-of-the-art dynamic vegetation models such as the Lund-Potsdam-Jena dynamic global vegetation model for managed land (LPJmL). The major problem in this context is that LPJmL simulates NPP but not explicitly the radial tree growth, and we need to develop a generic method to allow for a comparison between simulated and observed response patterns. We propose an analysis scheme that quantifies the coincidence rate of climate extremes with some biotic responses (here TRW or simulated NPP). We find a relative reduction of 34% in simulated NPP during precipitation, temperature and combined extremes. This reduction is comparable to the TRW response patterns, but the model responds much more sensitively to drought stress. We identify 10 extreme years during the 20th century during which both model and measurements indicate high coincidence rates across Europe. However, we detect substantial regional differences in simulated and observed responses to climatic extreme events. One explanation for this discrepancy could be the tendency of tree ring data to originate from climatically stressed sites. The difference between model and observed data is amplified by the fact that dynamic vegetation models are designed to simulate mean ecosystem responses on landscape or regional scales. We find that both simulation results and measurements display carry-over effects from climate anomalies during the previous year. We conclude that radial tree growth chronologies provide a suitable basis for generic model benchmarks. The broad application of coincidence analysis in generic model benchmarks along with an increased availability of representative long-term measurements and improved process-based models will refine projections of the long-term carbon balance in terrestrial ecosystems. © Author(s) 2015.