Date of Award


Degree Type


Degree Name

Master of Science in Civil Engineering


Washkewicz College of Engineering

First Advisor

Kim, Ungtae

Subject Headings

Civil Engineering


The Great Lakes are most important freshwater bodies providing water resources and other various related businesses to the northeastern part of North America. However, harmful algal blooms (HABs) are more often and severe in those lakes than before and thus threatening lake environments and economies. Researchers have studied the factors influencing HABs characteristics using different scientific methods. In this study, all possible predictors and predictand variables were collected from various data source and then eight final predictors and one predictand were selected based on correlation between predictors and predictand variables. This study tests two machine learning techniques, Stepwise Multiple Regression (SMR) and Genetic Programming (GP), to forecast monthly HAB indicators in Western Lake Erie from July to October. SMR and GP models were created with selected input variables for two training periods, 2002 to 2011 and 2002 to 2014. A Spearman rank correlation coefficient was used to choose input variable sets for each HAB month considering 224 different combinations of lag time and average periods. The SMR models showed a correlation coefficient increase from 0.71 to 0.78 when extending the training period. The GP models followed a similar trend increasing the overall correlation coefficient from 0.82 to 0.96. Both models optimally selected monthly discharge and phosphorus mass from Maumee River Basin as significant predictor variables. A major drawback of both models was data-dependency as common in data-driven methods. GP was better to detect high nonlinear HAB mechanism than SMR due to its nature to use many mathematical functions while SMR only use the linear combination of variables. This study attested that both SMR and GP can be useful to simulate historical HAB event and predict future HAB severity. In future work, to avoid under- or over-prediction for unobserved HAB mechanism regarding short training period, it is suggested to develop an extrapolation technique that is statistically sound and operable in the model and test multi-model ensemble approaches to provide most possible HAB prediction.