15–19 Sept 2024
Leuven, Belgium
Europe/Berlin timezone

Investigation of corporate sustainability reporting through automated text analysis

18 Sept 2024, 14:20
20m
Auditorium

Auditorium

Data Mining and Warehousing PCA and mining

Speaker

Aleš Toman (University of Ljubljana, School of Economics and Business)

Description

In order to evaluate the performance of companies, the focus is shifting from purely quantitative (financial) information to qualitative (textual) information. Corporate annual reports are comprehensive documents designed to inform investors and other stakeholders about a company's performance in the past year and its goals for the coming years. We have focused on the corporate sustainability reporting of FTSE 350 companies in the period 2012–2021. The lack of standardization and structuring of non-financial reporting makes such an analysis difficult.

We extracted all text from the non-financial sections of the annual reports using the pdf2txt tool and filtered it to retain only structurally correct sentences. We then identified sentences related to sustainability using a pre-trained sentence classifier (manual annotation). The content of these sentences was analyzed using the RoBERTa model, which was adapted to the financial domain. Using a hierarchical clustering algorithm, we identified 30 interpretable sustainability-related topics and 6–9 higher-level clusters of sustainability concepts.

For each report and each year, we calculated the proportion of topics within the report. The development of sustainability topics over time shows that external events and new reporting standards influence the overall content of the annual reports. In addition, we clustered the reports hierarchically based on the proportion of topics and identified 6 types of reports. The analysis showed that external events had the greatest influence on the structure of the individual reports.

Type of presentation Talk
Classification Mainly application
Keywords sustainability, annual reports, text classification

Primary authors

Urša Ferjančič (University of Ljubljana, School of Economics and Business) Riste Ichev (University of Ljubljana, School of Economics and Business) Igor Lončarski (University of Ljubljana, School of Economics and Business) Syrielle Montariol (Jozef Stefan Institute) Andraž Pelicon (Jozef Stefan Institute) Senja Pollak (Jozef Stefan Institute) Katarina Sitar Šuštar (University of Ljubljana, School of Economics and Business) Aleš Toman (University of Ljubljana, School of Economics and Business) Aljoša Valentinčič (University of Ljubljana, School of Economics and Business) Martin Žnidaršič (Jozef Stefan Institute)

Presentation materials

There are no materials yet.