Accurate information on the spatial distribution of plant species and communities is in high demand for various fields of application, such as nature conservation, forestry, and agriculture. A series of studies has shown that Convolutional Neural Networks (CNNs) accurately predict plant species and communities in high-resolution remote sensing data, in particular with data at the centimeter scale acquired with Unoccupied Aerial Vehicles (UAV). However, such tasks often require ample training data, which is commonly generated in the field via geocoded in-situ observations or labeling remote sensing data through visual interpretation. Both approaches are laborious and can present a critical bottleneck for CNN applications. An alternative source of training data is given by using knowledge on the appearance of plants in the form of plant photographs from citizen science projects such as the iNaturalist database. Such crowd-sourced plant photographs typically exhibit very different perspectives and great heterogeneity in various aspects, yet the sheer volume of data could reveal great potential for application to bird’s eye views from remote sensing platforms. Here, we explore the potential of transfer learning from such a crowd-sourced data treasure to the remote sensing context. Therefore, we investigate firstly, if we can use crowd-sourced plant photographs for CNN training and subsequent mapping of plant species in high-resolution remote sensing imagery. Secondly, we test if the predictive performance can be increased by a priori selecting photographs that share a more similar perspective to the remote sensing data. We used two case studies to test our proposed approach with multiple RGB orthoimages acquired from UAV with the target plant species Fallopia japonica and Portulacaria afra respectively. Our results demonstrate that CNN models trained with heterogeneous, crowd-sourced plant photographs can indeed predict the target species in UAV orthoimages with surprising accuracy. Filtering the crowd-sourced photographs used for training by acquisition properties increased the predictive performance. This study demonstrates that citizen science data can effectively anticipate a common bottleneck for vegetation assessments and provides an example on how we can effectively harness the ever-increasing availability of crowd-sourced and big data for remote sensing applications.