Reading between the green lines: A framework for scalable greenwashing detection

Download the Audio (Right-click, Save-As)

It's a quiet morning, and somewhere a sustainability officer is putting the finishing touches on their company's annual sustainability report. What's the best tone for them to take? Measured, or celebratory?

"We reduced our emissions by 1% this year"
Or... "We made meaningful progress in our climate journey, achieving a significant reduction in emissions through our ongoing commitment to a more sustainable future."

Same underlying facts, two very different tones. The first sounds like a footnote. The second like a TED talk. What would you do? A or B? Unfortunately, this choice isn't just one of style. For better or worse, the market rewards optimistic sustainability messaging. Investors read these reports. Consumers read these reports. Regulators read these reports. And the difference between sounding conservative and sounding inspiring can translate into real dollars on your balance sheet.

This kind of dilemma is happening at every major corporation, and it raises a question:

How do we tell the difference between a company that's actually doing the work outlined in their reports, and a company that's overselling their efforts?

The gap between what a company says about its sustainability-performance and its actual sustainability performance has a name. It's called greenwashing. And it's become a real obstacle in the fight against climate change. When greenwashing goes unchecked, companies are incentivized to do very little but then talk about those tiny deeds a lot, trust in corporate sustainability claims breaks down, capital gets misallocated, and the energy transition slows.

There are already a number of authorities (like the European Commission) that are actively reviewing these kinds of claims and keeping companies in line. But, that kind of slow, methodical, manual review isn't necessarily capable of keeping pace with this problem. In recent years greenwashing hasn't just grown, it's surged, and the sheer number of corporate entities and corporate communications that have to be reviewed is growing all the time.

So what can we do about it? How can we review sustainability claims at scale, and label them as what they are: either accurate grounded claims, harmless spin, slight overexaggeration, misleading marketing, or cynical deception?

That's where today's paper comes in. In it, the authors showcase a new metric called the GTS, the Greenwashing Tendency Score, and the framework that allows you to compute it. The idea is that you can feed in a company's sustainability report, pull in its third-party ESG rating, and then have a system output a relative measure of how positively that company is communicating about sustainability relative to its actual measured performance. On today's episode we'll walk through their system, and see how it works. Let's dive in.

Trying to nail-down a definition of greenwashing is harder than it might sound. The concept is a bit slippery, and there's no single, universally accepted meaning for the term. Different authors and regulators draw the boundaries in different places.

Some restrict it to environmental claims, others include social performance.
Some require intent to deceive, others say that intent isn't necessary, that even unintentional misrepresentation counts.
Some apply the term only to companies, others apply it more broadly to any organization.
Some focus on the product level (ie: a single product being marketed as sustainable when it isn't), and others focus on the organizational level (the whole company being branded as sustainable when its overall practices don't support that branding).

The authors of this paper deliberately choose as inclusive of a definition as they could, while still trying to ground the idea in objectivity. They landed on the following:

Greenwashing is: any discrepancy between an entity's overly positive communication of its sustainability performance and its actual performance.

Note that they don't require intent, they don't limit it to environmental claims, and they don't restrict it to any particular type of entity. They tie their definition of sustainability to the Brundtland framework. This is a classic definition from the 1987 UN report:

Sustainability is: development that meets the needs of the present without compromising the ability of future generations to meet their own.

In practical terms, that means the authors are looking at all three classic ESG dimensions: environmental, social, and governance. And fortunately, they don't need to invent anything in order to gauge a company's efforts in those areas. We already have an international framework for measuring this kind of performance: formalized ESG ratings. They're produced by third-party providers who evaluate companies on baskets of criteria. These ratings are typically built in a hierarchical pillar structure. The overall score is composed of three pillar-scores (environmental, social, and governance), and each of those pillars is built from a series of subscores covering more specific dimensions (resource use, emissions reduction, workforce diversity, community impact, board structure, and so on). And importantly: companies are rated relative to their industry peers, because what counts as good environmental performance for a bank is obviously different from what counts for a steel manufacturer.

With the ESG ratings in hand, the authors just needed a way to compare those ratings against the company's public communications. For this they chose sustainability reports. These are the formal, organization-wide documents that corporations publish annually to describe their sustainability efforts. They're much more substantive than a marketing campaign so they're well-suited to assessing a company's overall greenwashing tendency (if any).

Once they had the reports, the question became a bit more nuanced: how do you actually measure how positively a company is communicating about itself? This is where natural language processing comes in. The authors use two complementary NLP techniques, each one capturing a different dimension of communication.

The first is sentiment analysis. This is essentially the practice of measuring the emotional tone of a text. Is the language being used positive, neutral, or negative? For this they used VADER. It's a rule-based model that scans sentences for emotionally weighted words and phrases, then combines those signals into an overall score. Words associated with optimism, success, improvement, or achievement push the score upward, and more negative or cautionary language pulls it down.
The second is semantic alignment analysis. Here, the authors converted both the sustainability reports and the official UN Sustainable Development Goal descriptions into high-dimensional vector embeddings. Then they used cosine similarity to measure how closely those vectors aligned. This allowed them to estimate how strongly a company's report semantically overlapped with recognized sustainability topics.

But these two measurements weren't actually useful on their own. Remember, the authors are trying to operationalize a very specific idea about greenwashing: that it's fundamentally a mismatch between communications and performance. Both the sentiment score and the semantic alignment score were proxies for the "communication" side. To actually estimate greenwashing, the authors needed to pull in the ESG scores too. Then, the final Greenwashing Tendency Score, or GTS, would need to synthetize all three in a way that made sense.

But how? Well, think about it: what is the mathematical equivalent of a comparison between two different values? A fraction! The ESG score is normalized onto a 0-1 scale, then squared, and placed in the denominator. The sentiment value and the alignment value are multiplied together, placed in the numerator and multiplied by 10.

What you're left with is a score where lower is better. The GTS rises when sustainability language becomes more positive, when sustainability-topic alignment becomes broader or stronger, and when ESG performance becomes weaker relative to the intensity of the communication. It gets lower when there's consistency between messaging and performance and higher when there's rhetoric that is disproportionate to actual effort. A company with restrained, cautious reporting should receive a relatively low GTS even if its ESG score is mediocre. While another company with aggressive sustainability messaging and only moderate ESG performance should score substantially higher. Why? Because the system is not attempting to determine whether a company is sustainable in any absolute sense. It is only attempting to estimate whether the communication surrounding that sustainability appears inflated relative to independently measured performance.

If you want to dig deeper, the full paper goes well beyond what we've covered here. It includes the complete rankings for every company in the sample, the year-by-year breakdowns of each underlying component, specific examples of how sentiment scores get assigned to individual sentences, a walkthrough of the authors' validation work against alternative sentiment models, and an extended discussion of how the methodology could be modified for industry-specific applications. The authors have also published their dataset openly, so that any of us can reproduce this analysis or adapt the methodology for ourselves. If you're working in sustainable finance, regulatory policy, or you're just interested in seeing how NLP techniques can be applied to corporate accountability problems, you should definitely give it a read.