Advanced Data Analytics Modelling for Air Quality Assessment
2023 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Air quality assessment plays a crucial role in understanding the impact of air pollution onhuman health and the environment. With the increasing demand for accurate assessment andprediction of air quality, advanced data analytics modelling techniques offer promisingsolutions. This thesis focuses on leveraging advanced data analytics to assess and analyse airpollution concentration levels in Italy over a 4km resolution using the FORAIR_IT datasetsimulated in ENEA on the CRESCO6 infrastructure, aiming to uncover valuable insights andidentifying the most appropriate AI models for predicting air pollution levels. The datacollection, understanding, and pre-processing procedures are discussed, followed by theapplication of big data training and forecasting using Apache Spark MLlib. The research alsoencompasses different phases, including descriptive and inferential analysis to understand theair pollution concentration dataset, hypothesis testing to examine the relationship betweenvarious pollutants, machine learning prediction using several regression models and anensemble machine learning approach and time series analysis on the entire dataset as well asthree major regions in Italy (Northern Italy – Lombardy, Central Italy – Lazio and SouthernItaly – Campania). The computation time for these regression models are also evaluated and acomparative analysis is done on the results obtained. The evaluation process and theexperimental setup involve the usage of the ENEAGRID/CRESCO6 HPC Infrastructure andApache Spark. This research has provided valuable insights into understanding air pollutionpatterns and improving prediction accuracy. The findings of this study have the potential todrive positive change in environmental management and decision-making processes, ultimatelyleading to healthier and more sustainable communities. As we continue to explore the vastpossibilities offered by advanced data analytics, this research serves as a foundation for futureadvancements in air quality assessment in Italy and the models are transferable to other regionsand provinces in Italy, paving the way for a cleaner and greener future.
Place, publisher, year, edition, pages
2023. , p. 156
Keywords [en]
Air quality assessment, Advanced Data Analytics, Artificial Intelligence (AI), Machine Learning (ML), Big Data, Regression Models, Time Series Models, High Performance Computing (HPC), Air Pollution
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:ltu:diva-101490OAI: oai:DiVA.org:ltu-101490DiVA, id: diva2:1801320
External cooperation
ENEA Casaccia Research Center, Italy; Leeds Beckett University, United Kingdom
Subject / course
Student thesis, at least 30 credits
Educational program
Master Programme in Green Networking and Cloud Computing
Presentation
2023-06-14, Municipality City Hall, Anacapri, Italy, Anacapri, 15:00 (English)
Supervisors
Examiners
2023-10-062023-09-292023-10-09Bibliographically approved