Creating a Systematic ESG Scoring System: Discussion

15 Jun 2024


(1) Aarav Patel, Amity Regional High School – email:;

(2) Peter Gloor, Center for Collective Intelligence, Massachusetts Institute of Technology and Corresponding author – email:

6. Discussion

The Random Forest Regression Model likely performed the best because it works by combining the predictions of multiple decision trees. This allows it to improve its accuracy and reduce overfitting to one specific tree, thus producing superior results. The Random Forest Regression algorithm had a statistically significant R2 Correlation of 26.1% (p-value <0.05), and it had a low MAAE of 13.4%. These results align with similar work done using other sources of data (Krappel et al., 2021). For example, a paper by Krappel et al. created an ESG prediction system by feeding fundamental data (i.e., financial data and general information surrounding the company) into ensemble machine-learning algorithms. Their most accurate model received an R2 correlation of 54% and a MAAE of 11.3%. While the proposed algorithm does not correlate as well as Krappel et al.’s model, likely because it leverages qualitative data, it still highlights the viability of using social sentiment as a proxy for ESG.

The proposed algorithm displayed encouraging results, highlighting its viability in ESG rating prediction. Unlike current ESG raters who determine ESG using self-disclosed sustainability reports, the proposed algorithm’s data-driven approach allows for a more holistic and balanced evaluation. Utilizing social sentiment also allows executives to measure which areas people want a company to improve on, helping to focus actions on change Additionally, the system’s architecture allows for scores to be updated within short timeframes. Finally, executives can test additional keywords by inputting them into the algorithm. These attributes showcase the system’s flexibility as well as advantages over the conventional methodology.

A limitation of the results, however, is that it was tested on the S&P 500 companies. Therefore, results might not carry over for smaller companies below this index. Another limitation could be misinformation within the social network data. While this should be diluted by other comments, it can potentially alter the algorithm’s ratings. Additionally, the Flair sentiment analysis algorithm sometimes misclassified post/article sentiment, especially if the post/article had a sarcastic attitude. Finally, for this research, access to certain paid native APIs was not available. As a result, the collected data might not encompass all data available for a keyword due to rate limiting.

While the algorithm has displayed statistically significant results, there is room for improvement that can be done in future research. Some of this can include gathering more data. This can be done by analyzing more companies beyond the S&P 500 or by collecting data for more keywords and ESG sub-topics. This can also be done by using native APIs to collect more datapoints per individual keyword. Additionally, more data sources could be incorporated into the model. This can be done by incorporating other social networks (i.e., Reddit, Glassdoor) or by including quantitative data/statistics (i.e., % women as board members, number of scope 1 carbon emissions, etc.) from company reports and government databases.

Furthermore, to better fit the task at hand, NLP algorithms can be created specifically for ESG. For instance, while the current method filters much of the irrelevant data, some unrelated data still gets through. So, to solve this, a new supervised learning algorithm can be trained to identify related bodies of text using TF-IDF vectorization. The algorithm can be trained by handlabeling the data that has already been collected. To add on, the long-article/short-post NLP algorithms can also be further optimized. While Flair can already provide satisfactory results, some articles seem to be misclassified, which might be a source of error for the algorithm. By creating a sentiment analysis algorithm specifically tailored to ESG classification, the long-article and short-post NLP algorithm accuracy can be further improved. This can be done by either creating a custom ESG lexicon with weights or by training a novel NLP algorithm against classified ESG data.

Finally, another area to be improved is post credibility: While small amounts of misinformation would not significantly alter results, it is still best to mitigate this risk as much as possible. There is a growing body of literature that explores fake news identification on social networks. So, these approaches can potentially be used to identify fake posts/articles (de Beer et al., 2020). Also, adding “hard” quantitative data from company filings to the algorithm can be used as an added safeguard. Finally, the algorithm can prioritize more centralized/credible actors over others to yield safer outputs.

Overall, this research provides a proof-of-concept framework for a social-network-based ESG evaluation system. This work can serve as the backend logic for a social sentiment ESG product which can eventually be used by executives. While pre-packaged libraries were used for prototyping purposes, in future works, these aspects of the project can be optimized. Unlike existing frameworks that rely on self-reported company filings, the proposed models take on a more balanced view of the company’s ESG positives and negatives. In general, this can help approach an ESG ground truth that can better influence company practices to be more sustainable.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.