The creation of accurate energy prediction models plays a signifcant role in achieving sustainability in smart cities. However, stakeholders such as municipalities face the problem of creating individual energy forecasting models for multiple building feets which leads to an increased amount of computational resources and time spent to prepare each model. This research proposes a method using Hierarchical clustering with Dynamic time warping (DTW) to group similar buildings according to their consumption values and the integration of Transfer Learning (TL) to share the model weights from a source building to other target buildings. Several TL models using diferent portions of the target data were tested against a standard workfow without TL for predicting electricity and district heating for several school buildings using a Multivariate LSTM model. The performance metrics show minor diferences between the TL and standard models. Results indicate that using 20% to 40% of the target data is sufcient for training. The models achieved average RMSE improvements of 20% and 5% for district heating and electricity respectively, indicating a potential for reduced data requirements without sacrifcing predictive accuracy and demonstrating TL’s efciency to streamline the energy forecasting process for building feets.