Description
This dataset contains the SDG mentioning frequencies in corporate sustainability reports of a two-year set of 300 large enterprises taken from the Stoxx Global 3000. It has three equal groups of USA, European and East-Asian (Japan, Korea, Taiwan or "JKT") companies.
The sustainability reports of these 300 companies were collected from a database (corporateregister.com). All texts were analysed for the presence using a dictionary created by the author of characteristic SDG words taken from the SDG foundational documents (the text of the UN resolution) (SDG-dictionary.txt).
The data set can be used to explore the sustainability reporting practices of large stocklisted companies in connection with financial and organizational variables.
Additionally, the data can be used to explore other features of sustainability reporting, as the original document-feature matrix (dfm) has also been included.
The second version of this data set also contains text fragments of the reports that contain references to the SDGs. They come in two forms: text fragments and sentences that both contain any of the words "sustainable development goals", "sdgs", "united nations", "2030 agenda", and "global compact". These are zipped text files that can be imported into a CAQDAS programme for manual text analysis (coding). The file names indicate the company's ISIN and the reporting year.
In version 4, a reference is included to the Shiny application - https://jwhwaal.shinyapps.io/Shiny_SDG/ - where you can explore different Structural Topic Models (STM) produced from the SDG sentences extracted from the reports. The topics are characterized by the highest probability keywords. These describe the topics and have been selected automatically by the STM algorithm. For every topic, you can consult the documents (sentences) which have the highest probability of belonging to that topic. This allows you to explore rapidly, with different degrees of detail, how the SDGs are discussed in sustainability and integrated reports.
The sustainability reports of these 300 companies were collected from a database (corporateregister.com). All texts were analysed for the presence using a dictionary created by the author of characteristic SDG words taken from the SDG foundational documents (the text of the UN resolution) (SDG-dictionary.txt).
The data set can be used to explore the sustainability reporting practices of large stocklisted companies in connection with financial and organizational variables.
Additionally, the data can be used to explore other features of sustainability reporting, as the original document-feature matrix (dfm) has also been included.
The second version of this data set also contains text fragments of the reports that contain references to the SDGs. They come in two forms: text fragments and sentences that both contain any of the words "sustainable development goals", "sdgs", "united nations", "2030 agenda", and "global compact". These are zipped text files that can be imported into a CAQDAS programme for manual text analysis (coding). The file names indicate the company's ISIN and the reporting year.
In version 4, a reference is included to the Shiny application - https://jwhwaal.shinyapps.io/Shiny_SDG/ - where you can explore different Structural Topic Models (STM) produced from the SDG sentences extracted from the reports. The topics are characterized by the highest probability keywords. These describe the topics and have been selected automatically by the STM algorithm. For every topic, you can consult the documents (sentences) which have the highest probability of belonging to that topic. This allows you to explore rapidly, with different degrees of detail, how the SDGs are discussed in sustainability and integrated reports.
| Date made available | 2 May 2024 |
|---|---|
| Publisher | Mendeley Data |
Cite this
- DataSetCite