Skip to main navigation Skip to search Skip to main content

STOXX 3000 Sustainability Reporting Text Measures

  • Johannes van der Waal (Creator)

Dataset

Description

This dataset provides firm-level, text-derived indicators based on sustainability reporting by STOXX 3000 companies. The current version contains a binary variable identifying whether a firm-year sustainability report includes references to the Sustainable Development Goals (SDGs). The SDG indicator is constructed through automated text processing of publicly available sustainability reports. no copyrighted raw text is included in this dataset.
Each observation is uniquely identified by ISIN and year and is accompanied by core firm metadata, including:

- Reporting year
- Country of incorporation
- Industry sector (STOXX classification)
- Index component classification (large, mid, or small cap)

These variables allow users to merge the dataset easily with data from external financial or sustainability databases.

The repository also includes replication code (R) used to generate the SDG-dummy variable and produce the empirical results for the associated publication. The code illustrates the text-processing workflow while ensuring that underlying copyrighted documents are not redistributed.

Future versions of this dataset will extend the available text-derived measures. Planned additions include indicators constructed using alternative dictionaries, custom lexicons, or thematic classifications applied to sustainability disclosures. Only derived variables will be released; raw corporate text will not be shared.

Intended use:
Researchers can use the dataset for replication, robustness checks, comparative textual analysis, or as a foundation for expanded sustainability research on STOXX 3000 firms.
Date made available24 Nov 2025
PublisherMendeley Data

Cite this