Abstract:
This study focused on the prediction of rainfall in different climatic zones in South Africa using four machine learning models. The models used in this study are the linear regression, random forest, support vector machine, and the ridge and lasso regression. South Africa was divided into nine using the Koppen-Geiger climate classification system and three cities were selected for each climatic zone. Atmospheric datasets from the South African Weather Service from 1991 to 2023 and the National Aeronautics and Space Agency from 1983 to 2023 were used for this study. These datasets were trained and tested using the four models. The monthly rainfall predictions obtained after training and testing are then compared with the actual datasets to validate the accuracy of the models. Evaluation metrics such as mean average error, mean square error, root mean square error, correlation coefficient, and coefficient of determination were used to access the accuracy of each model. The best model for almost all climatic zones were the support vector machine and by random forest. Linear regression and ridge and lasso regression also performed well in various regions. It was however difficult to accurately predict rainfall under the warm and dry summer. This was attributed to the unpredictable atmospheric variability in this region. Also, in regions where there is little rainfall, the models performed worse compared to climatic zones with rainfall above 5mm. This study also showed that for better predictive performance, atmospheric parameters such as dew point, cloud cover, and water vapour are the most essential. Using the random forest model, monthly rainfall for 2024 was predicted and compared with 2022 and 2023 rainfall.