Forecasting Stadium Attendance Using Machine Learning Models: A Case of the National Football League
Vol.18,No.2(2024)
The study examines the use of machine learning models to forecast attendance at sports stadiums, specifically analyzing National Football League (NFL) games from 2000 to 2019, with over 5,055 regular-season games. The models, including Linear Regression, Classification and Regression Trees (CART), Random Forest, CatBoost, and XGBoost, integrate a diverse set of variables such as team performance, economic indicators, stadium characteristics, and weather conditions. Each model's accuracy and effectiveness are assessed using five statistical metrics. With a Mean Absolute Error (MAE) of 0.02 and a Root Mean Squared Error (RMSE) of 0.04, the models display high precision in predicting stadium attendance. The coefficient of determination (R²) reaches 77.27% after optimization. These figures suggest that the models, particularly Random Forest and CatBoost, are highly effective in forecasting attendance rates for NFL games. Key influences on game attendance include factors like 'stadium_name,' 'personal_income,' 'stadium_age,' and 'home_club_age', which emerge as significant predictors. This study fills a theoretical gap in the limited research on the NFL and provides valuable insights for strategic planning and decision-making in professional sports management.
machine learning; stadium attendance forecast; Random Forest; CatBoost; XGBoost
Alonso, A. D., & O’Shea, M. (2013). The links between reasons for game attendance of a new professional sports league and revenue management: An exploratory study. International Journal of Revenue Management, 7(1), 56–74. https://doi.org/10.1504/IJRM.2013.053359
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(2).
Borland, J., & Macdonald, R. (2003). Demand for Sport. Oxford Review of Economic Policy, 19(4), 478–502. https://doi.org/10.1093/oxrep/19.4.478
Bowley, J. L., & Berger, P. D. (2017). Predicting National Football League (NFL) stadium attendance. International Journal of Social Science and Business, 2(3).
Buraimo, B. (2008). Stadium attendance and television audience demand in English league football. Managerial and Decision Economics, 29(6), 513–523. https://doi.org/10.1002/mde.1421
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
Coates, D., & Humphreys, B. R. (2007). Ticket prices, concessions and attendance at professional sporting events. International Journal of Sports Finance, 2(3), 161–170.
Coates, D., & Humphreys, B. R. (2010). Week to week attendance and competitive balance in the National Football League. International Journal of Sport Finance, 5(4), 239.
Depken, C. A. (2001). Fan loyalty in professional sports: An extension to the National Football League. Journal of Sports Economics, 2(3), 275–284. https://doi.org/10.1177/152700250100200306
Du, P., Wang, Y., Liao, C., & Xian, T. (2022). Sports games attendance forecast using machine learning. 2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA), 181–188. https://doi.org/10.1109/ICDSCA56264.2022.9987748
Falls, G. A., & Natke, P. A. (2014). College football attendance: A panel study of the Football Bowl Subdivision. Applied Economics, 46(10), 1093–1107. https://doi.org/10.1080/00036846.2013.866208
Falls, G. A., & Natke, P. A. (2016). College football attendance: A panel study of the Football Championship Subdivision. Managerial and Decision Economics, 37(8), 530–540. https://doi.org/10.1002/mde.2740
Ge, Q., Humphreys, B. R., & Zhou, K. (2020). Are fair weather fans affected by weather? Rainfall, habit formation, and live game attendance. Journal of Sports Economics, 21(3), 304–322. https://doi.org/10.1177/1527002519885427
Gropper, C. C., & Anderson, B. C. (2018). Sellout, blackout, or get out: the impacts of the 2012 policy change on TV blackouts and attendance in the NFL. Journal of Sports Economics, 19(4), 522–561. https://doi.org/10.1177/1527002516661600
Gupta, R. (2019). Prediction of major factors affecting fans attendance for the teams of major league baseball. Dublin, National College of Ireland.
Hansen, H., & Gauthier, R. (1989). Factors affecting attendance at professional sport events. Journal of Sport Management, 3(1), 15–32. https://doi.org/10.1123/jsm.3.1.15
Hart, R. A., Hutton, J., & Sharot, T. (1975). A statistical analysis of association football attendances. Journal of the Royal Statistical Society: Series C (Applied Statistics), 24(1), 17–27. https://doi.org/10.2307/2346700
Hauke, J., & Kossowski, T. (2011). Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae, 30(2), 87–93. https://doi.org/10.2478/v10117-011-0021-1
Hong, J. (2020). An application of XGBoost, LightGBM, CatBoost algorithms on house price appraisal system. Housing Finance Research, 4, 33–64. https://doi.org/10.52344/hfr.2020.4.0.33
Huang, G., Wu, L., Ma, X., Zhang, W., Fan, J., Yu, X., Zeng, W., & Zhou, H. (2019). Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. Journal of Hydrology, 574, 1029–1041. https://doi.org/10.1016/j.jhydrol.2019.04.085
Jabeur, S. B., Gharib, C., Mefteh-Wali, S., & Arfi, W. B. (2021). CatBoost model and artificial intelligence techniques for corporate failure prediction. Technological Forecasting and Social Change, 166, 120658. https://doi.org/10.1016/j.techfore.2021.120658
Jennett, N. (1984). Attendances, uncertainty of outcome and policy in Scottish League Football. Scottish Journal of Political Economy, 31(2), 176–198. https://doi.org/10.1111/j.1467-9485.1984.tb00472.x
King, B. E. (2017). Predicting National Basketball Association game attendance using random forests. Journal of Computer Science, 5(1), 1–14. https://doi.org/10.15640/jcsit.v5n1a1
King, B. E., & Rice, J. (2018). Predicting attendance at major league soccer matches: A comparison of four techniques. Journal of Computer Science and Information Technology, 6, 15–22. https://doi.org/10.15640/jcsit.v6n2a2
King, B. E., Rice, J. L., & Vaughan, J. (2018). Using machine learning to predict National Hockey League average home game attendance. The Journal of Prediction Markets, 12(2), 85–98. https://doi.org/10.5750/jpm.v12i2.1608
Lenten, L. J. (2011). Long-run trends and factors in attendance patterns in sport: Australian Football League, 1945–2009. Handbook on the Economics of Leisure, Edward-Elgar, Northampton, 360–380. https://doi.org/10.4337/9780857930569.00026
Lewis, R. J. (2000.). An introduction to Classification and Regression Tree (CART) analysis. In Annual Meeting of the Society for Academic Emergency Medicine in San Francisco, California (Vol. 14). San Francisco, CA, USA: Department of Emergency Medicine Harbor-UCLA Medical Center Torrance.
Mueller, S. Q. (2020). Pre- and Within-Season attendance forecasting in major league baseball: A random forest approach. Applied Economics, 52(41), 4512–4528. https://doi.org/10.1080/00036846.2020.1736502
Nesbit, T. M., & King, K. A. (2010). The impact of fantasy football participation on NFL attendance. Atlantic Economic Journal, 38(1), 95–108. https://doi.org/10.1007/s11293-009-9202-x
Paul, R. J., Ehrlich, J. A., & Losak, J. (2021). Expanding upon the weather: Cloud cover and barometric pressure as determinants of attendance for NFL games. Managerial Finance, 47(6), 749–759. https://doi.org/10.1108/MF-06-2020-0295
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS 2018). Montréal, Canada.
Rein, R., & Memmert, D. (2016). Big data and tactical analysis in elite soccer: Future challenges and opportunities for sports science. SpringerPlus, 5(1), 1410. https://doi.org/10.1186/s40064-016-3108-2
Rodionova, M., Skhvediani, A., & Kudryavtseva, T. (2022). Prediction of crash severity as a way of road safety improvement: The case of Saint Petersburg, Russia. Sustainability, 14(16), 9840. https://doi.org/10.3390/su14169840
Şahin, M., & Erol, R. (2018). Prediction of attendance demand in European football games: Comparison of ANFIS, Fuzzy Logic, and ANN. Computational Intelligence and Neuroscience, 2018, 1–14. https://doi.org/10.1155/2018/5714872
Şahin, M., & Uçar, M. (2020). Prediction of sports attendance: A comparative analysis. Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology, 236(2), 106–123. https://doi.org/10.1177/1754337120983135
Spenner, E. L., Fenn, A. J., & Crooker, J. (2004). The demand for NFL attendance: A rational addiction model. Colorado College Economics and Business Working Paper, 2004–01. http://dx.doi.org/10.2139/ssrn.611661
Welki, A. M., & Zlatoper, T. J. (1999). U.S. professional football game-day attendance. Atlantic Economic Journal, 27(3), 285–298. https://doi.org/10.1007/BF02299579

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © 2024 Yu Pang, Fengchen Wang