In the world of data, asset managers are pampered.
Every financial metric is available in an instant, updated real time, and fully standardized - nowhere is there a debate on the definition of Gross Profit Margin, or discussing whether Return on Assets covers all assets or only some 70% of them that has been measured.
These are issues however that plague ESG metrics.
First there is the lack of available data.
Up until recently, reporting was effectively up to the good will of companies and any pressures that shareholder activism could muster. Mandatory disclosures are now beginning to catch up, but it is a long road to match the level of compliance as financial disclosures.
But even for the fortunate cases of reporting, we have a reliability issue.
Different companies have different competing methodologies to report the same datapoint and interpret the definition of certain metrics differently. They also may only have the capacity to report for a certain segment of their company or operations. Even with these in place though, without the proper auditing or transparency, it can be incredibly difficult to understand where these values come from. This can lead to apparently absurd situations where different ESG data providers will disagree on a company’s reported impact metrics, even outside of their own estimations to fill the gaps.
Data providers, however, often have the unenviable job of selling a complete dataset - something that is usable to client investors in an actionable way - and this means having to impute and fill the gaps. This goes in direct counter to the needs of other investors, who need to know how reliable the datapoint they have is to make their investment decision - be it for risk management or to drive impact.
So how do we tackle these challenges at RGS, how do we go from uncleaned raw data to a complete and reliable dataset?
The methodology with which we estimate datapoints is a living one, and one that is continuously being improved for reliability and specificity – in this article we are giving a sneak-peek into our data pipeline and how we address quality control processes within.
In our data pipeline the first step is Data Collection - the sourcing for the raw data from which all else is derived.
There are 3 types of data sources we collect from, and every data source is given significant time with human analysts to ensure that the data is consistent, and the source reliable:
Here we still (where possible) rely solely on the data that is raw from a company and impute ourselves for the estimated datapoints that enables us to track and maintain a solid domain knowledge of each datapoint we utilize.
Also, a long time is spent examining and comparing the data to other sources where possible and engaging with the data provider to ensure that both the methodology and any discrepancies across data sources are understood.
The different data sources mean that whilst we have significant control of many of the datapoints we collect, we must undergo a process to join and merge files. We must also engage in the work to hunt out the outliers in our data sets. This is our Pre-processing stage. We join the various datasets needed for a given impact or framework pipeline and standardize namings for columns and companies. From this structured form begins our Quality Control Process.
Within Pre-processing, our first task is Data Type Validation - this ensures that the data type is correct for the column, so numbers are represented as numbers, and not a string, and namings of companies and ISIN’s are consistent throughout their use.
With the data types verified, we then move to our Input Data QC, which utilizes 3 layers of “defense” against incorrect inputs:
Using the suspicion list we develop from methods 2 and 3, we then engage in manual examination for each of these company-year-datapoints to verify the reported value and investigate the outlier behavior. From these, those that are unexplained and unexcused are marked as true outliers. This is a separate input that we then feed into the pre-processing to give us a manual list of datapoints that should be dropped and estimated instead, or otherwise corrected in the raw data where reasonable to do so.
With our now cleaned data, we move onto the task of Imputations. Depending on the datapoint, we may have reporting coverage as high as 85%, or as low as 1.8%, and the task of estimating the rest must come with its challenges. We classify imputations into 2 types, depending on the statistical strength of the assumptions we can enforce with our estimations.
Information about which imputations we consider to be made with high confidence and those where we consider there to be more uncertainty is preserved throughout this process, for both internal tracking during quality control, and so we can effectively quantify our confidence in our impact calculations, and the uncertainty associated with each value we estimate.
We are in the process of integrating the same automated data statistics sheets and outlier detection model as in the pre-processing stage to catch potential outliers - and mark these suspicious points for our investigation. Not only does this act as another filter for catching potentially problematic reported datapoints, but it also enables us to manage the quality of our imputations and examine the merit of different estimation methods per datapoint.
Per datapoint, we calculate the percentage of datapoints that are reported, imputed with statistical rigor, and imputed under relaxed conditions, and the relative effectiveness of each imputation method. This is used to then adjust our imputation logic.
Now with a completed dataset of input KPIs, the next stage along the data pipeline is our impact calculations. With each calculation we engage in significant academic research, saving both the bibliography, derivations, and logic justifications for any and all calculations we utilize.
This then undergoes an internal peer review of the research undergone, a viva-like defense for the logic behind the calculation, and blind rederivation of any involved mathematical formulae to prevent arithmetic errors. This research goes towards both the impact measurement and monetization calculations themselves, and to any and all social costs or other global assumptions utilized in the calculation.
Once this peer review has been completed and passed, we move onto our Results QC. We begin with an examination of our automatically created Data Statistics Sheets, identical in form to those discussed before, but applied onto the monetized impacts, as well as the intermediate drivers and input KPIs we use. This is done both for the individual monetized impacts, and their aggregations (such as Total RIFT™).
Manual examination not only acts to augment the calculation review described above, by contextualizing the calculation logic and enabling cross-impact comparisons, it also is the input towards our outlier detection - both manual and automated. The outlier detection behaves similar to those methods described above.
Across all the parts of the data pipeline, we have collected and investigated a list of suspicious company-year-datapoints. Those that do not have a valid explanation for their unexpected behavior, or that otherwise cannot be verified, are marked into the input that goes into the preprocessing, thus leading to our self-improvement model that ensures data quality remains as high as possible for our clients.
Beyond our transparency towards our process, we are also developing methods to directly communicate and quantify the transparency (or lack thereof) of the company under assessment.
These Data Transparency & Reliability Scores are important augmenters to investment decisions. Understanding and quantifying impact uncertainty is fundamental for developing true impact-risk management techniques.