Forecasting Harmful Algal Blooms for Western Lake Erie Using Data Driven Machine Learning Techniques
Date of Award
Master of Science in Civil Engineering
Washkewicz College of Engineering
Kim, Ung Tae
Harmful algal blooms (HAB) have been documented for more than a century occurring all over the world. The western Lake Erie has suffered from Cyanobacteria blooms for many decades. There are currently two widely available HAB forecasting models for Lake Erie. The first forecasting model gives yearly peak bloom forecast while the second provides weekly short-term forecasting and offers size as well as location. This study focuses on bridging the gap of these two models and improve HAB forecast accuracy in western Lake Erie by letting historical observations tell the behavior of HABs. This study tests two machine learning techniques, artificial neural network (ANN) and classification and regression tree (CART), to forecast monthly HAB indicators in western Lake Erie for July to October. ANN and CART models were created with two methods of selecting input variables and two training periods: 2002 to 2011 and 2002 to 2013. First a nutrient loading period method which considers all nutrient contributing variables averaged from March to June and second a Spearman rank correlation to choose separate input sets for each month considering 224 different combinations of averaging and lag periods. The ANN models showed a correlation coefficient increase from 0.70 to 0.77 for the loading method and 0.79 to 0.83 for the Spearman method when extending the training period. The CART models followed a similar trend increasing overall precision from 85.5% to 92.9% for the loading method and 82.1% to 91% for the Spearman method. Both selection methods had similar variable importance with river discharge and phosphorus mass showing high importance across all methods. The major limitation for ANN is the time required for each forecast to be complete while the CART forecasts earlier is only able to produce a class forecast. In future work, the ANN model accuracy can be improved and use new sets of variables to allow earlier HAB forecasts. The final form of ANN and CART models will be coded in a user interface system to forecast HABs. The monthly forecasting system developed allows watershed planners and decision-makers to timely manage HABs in western Lake Erie.
Reinoso, Nicholas L., "Forecasting Harmful Algal Blooms for Western Lake Erie Using Data Driven Machine Learning Techniques" (2017). ETD Archive. 987.