Land use and land cover (LULC) mapping is a powerful tool for monitoring large areas. For the Amazon rainforest, automated mapping is of critical importance, as land cover is changing rapidly due to forest degradation and deforestation. Several research groups have addressed this challenge by conducting local surveys and producing maps using freely available remote sensing data. However, automating the process of large-scale land cover mapping remains one of the biggest challenges in the remote sensing community. One issue when using supervised learning is the scarcity of labeled training data. One way to address this problem is to make use of already available maps produced with (semi-) automated classifiers. This is also known as weakly supervised learning. The present study aims to develop novel methods for automated LULC classification in the cloud-prone Amazon basin (Brazil) based on the labels from the MapBiomas project, which include twelve classes. We investigate different fusion techniques for multi-spectral Sentinel-2 data and synthetic aperture radar Sentinel-1 time-series from 2018. The newly designed deep learning architectures— DeepForest-1 and DeepForest-2— utilize spatiotemporal characteristics, as well as multi-scale representations of the data. In several data scenarios, the models are compared to state-of-the-art (SotA) models, such as U-Net and DeepLab. The proposed networks reach an overall accuracy of up to 75.0%, similar to the SotA models. However, the novel approaches outperform the SotA models with respect to underrepresented classes. Forest, savanna and crop were mapped best, with F1 scores up to 85.0% when combining multi-modal data, compared to 81.6% reached by DeepLab. Furthermore, in a qualitative analysis, we highlight that the classifiers sometimes outperform the inaccurate labels.