According to the Global Reporting Initiative (GRI), over 9 out of 10 world’s largest companies publish sustainability reports every year, yet these disclosures vary enormously in structure, terminology, and data quality.
At the same time, state of the art AI tools are rapidly transforming how organisations collect and process ESG information at scale. But even with all this progress, one fact remains: AI alone cannot tackle the complexity of sustainability reporting, human judgement and process is still essential.
Every sustainability report tells a story, but that story is rarely easy to delineate. It might be hidden inside a 200-page PDF, spread across annexes, buried within investor webpages, or split into multiple data sheets. Some companies disclose polished, consistent ESG reports, while others have scattered information across several pages of mixed technical disclosures with marketing summaries. No two companies follow the same approach, and many change their approach every year.
To curate a clean, reliable dataset containing the most valuable ESG data, a refined methodology is required. One that leverages expert knowledge and judgement with AI-derived intelligence to efficiently achieve the best possible results.
1. The Retrieval Challenge: How to identify the correct information
Corporate ESG information is not presented in a standardised way. AI tools often struggle to distinguish between an outdated report, a summary brochure, a technical annex, an investor deck, or a highlights page. A model might retrieve a document that looks correct but is actually missing key data, or it might overlook the part where the appropriate information lies.
The expert or the data analyst, however, can navigate a corporate website intuitively. They recognise official disclosures, identify the correct reporting year, and determine which datasets contain the information needed.
They understand the structure behind different types of documents and know how to check whether a company has released multiple revisions of a report or updated a file mid-year. In other words, they ensure that the AI algorithm is provided with the right source of raw data since everything that follows depends on this first step.
2. Scaling Up: How AI Accelerates Data Extraction
Once the correct documents are collected, AI tools become an extraordinary accelerator. It can scan lengthy sustainability reports almost instantly, identify where key metrics are located, and extract those values with remarkable speed. What would previously take an expert days or even weeks to compile manually, AI tools can transform a raw dataset in just a few hours.
This ability to rapidly compile large volumes of structured and concise information is something no analyst could realistically match. AI sets the foundation, compiling raw data into a broad, comprehensive dataset covering many companies, years, and disclosures, which allows the expert to focus on interpretation, context, and quality rather than repetitive data collection. By turning scattered information into a compiled raw dataset, AI enables us to scale up our work to a level that would otherwise be infeasible.
3. Expert Guidance: Teaching AI What Matters
Even the most advanced AI models do not inherently know what to look for. They need context, definitions, and clarity, all of which are dictated by the expert. The interaction of human and artificial intelligence, known as hybrid intelligence, has the potential to deliver higher-quality results with greater efficiency.
The expert decides on which metrics are relevant, how different industries present similar concepts, and where specific pieces of information typically lie within a report. A single metric such as direct emissions (scope 1 emissions), which are emissions from sources that are owned or controlled by the company, can be cited in many different ways across companies and sectors. An AI tool might not correlate the various descriptions with the different types of scope emissions.
Only someone with domain knowledge can interpret these discrepancies, understand the underlying meaning, and guide the AI model to extract the desired values. This collaborative interaction ensures that the model retrieves not just any information, but the correct information, consistent, comparable, and meaningful.
4. AI Beyond Extraction: Calculating Metrics and Scaling Values
One of the most powerful aspects of the hybrid intelligence is that AI tools are not exclusively confined to value extraction, but also assist in calculating and scaling them when companies do not report metrics concisely. For example, a company might not state the percentage of women in the workforce; instead the report might only include the number of male and female employees, along with the total number of employees. AI models can analyse these values, understand their relationship and calculate the percentage in question based on the available data, automatically assigning the necessary value in the raw dataset. This turns partial disclosures into meaningful, comparable indicators.
AI is also extremely effective at identifying units and scales. Sustainability data such as the relevant environmental metrics may use different units (e.g. tonnes, kilograms, gigajoules etc.). Most AI tools can discern between these discrepancies and provide concise units, so the analysts can proceed on processing the raw data into clean, usable values for further investigation.
Through these capabilities, AI helps us build extensive datasets quickly, warranting conciseness for further analysis.
5. Quality Assurance: The Final Layer of Expert Insight
After extraction and analysis, the data still requires in-depth review. ESG disclosures often include complexities that AI cannot fully interpret, even small changes in definitions or reporting boundaries can affect the results. A metric reported as zero might indicate a missing disclosure rather than an actual value, or a sudden rise in emissions might artificially present itself due to the exploitation of a different estimation methodology on behalf of the company rather than the manifestation of true operational changes.
We encountered a similar issue with companies in the banking sector. There was a large increase on their latest year report in Scope 3 emissions (indirect greenhouse gas emissions related to a company's activities). The automatic process flagged these values as outliers, but after a manual quality check we discovered that these values were correct. The companies had started reporting Scope 3 Category 15 (Investments), which they had not included previously in their calculations. So what initially looked like an unusually high Scope 3 value for a financial company actually turned out to be correct.
The data analyst or expert is therefore responsible to bring context, industry knowledge, and critical reasoning to the final dataset. They identify inconsistencies, sector-specific norms, and interpret whether discrepancies represent real shifts or artifacts of reporting. This final layer ensures that the dataset produced is not only comprehensive but genuinely reliable.
6. The Human-AI Partnership: A Balanced Approach
The collaboration between AI and the expert is not just a practical choice, it is essential for delivering high-quality sustainability intelligence. AI provides speed, scalability, and the ability to process vast amounts of data. The expert provides interpretation, intuition, and the high-level judgement necessary to delineate trends present in complex disclosures.
This partnership enables more fruitful insights, faster updates, and more accurate datasets. The future of ESG data is not automated or manual, it is hybrid.
Conclusion
At RGS, we believe that insightful and widely useful results come from combining technological capability with human expertise. We are constantly working towards applying AI technology in the most robust and efficient way by using scalable automated pipelines, rigorous unit and provenance checks, and domain-guided models to process information at scale, calculate derived metrics, interpret units, and build databases with remarkable efficiency.
The expert ensures that the results are accurate, meaningful, and grounded in real-world understanding. In unison, these strengths allow us to create impact datasets that are not only efficient and comprehensive but genuinely insightful, empowering analysts, clients, and investors to make better decisions for a more transparent and sustainable world.


