ECAS-ENBIS Course: Text Mining: from basics to deep learning tools

Europe/Amsterdam
Description

ECAS-ENBIS Course: Text Mining: from basics to deep learning tools

Part of the ENBIS-22 Trondheim conference.

This half-day course is a joint initiative from ENBIS and ECAS (http://ecas.fenstats.eu/) which has provided courses since 1987 to achieve training in special areas of statistics for both researchers and teachers for universities and professionals in industry fields.

Instructors

 

Adrien Guille (Laboratoire ERIC, Université Lumière Lyon 2, France)

Jairo Cugliari (Laboratoire ERIC, Université Lumière Lyon 2, France)

 

Overview

 

Textual data are pervasive and can be leveraged to help solving a wide range of problems. This new source of information coupled with recent advances in text mining have incontestably impacted the industry and academic research. While classical approaches yield reasonable performances on diverse text mining tasks, they make restrictive assumptions incompatible with some properties of natural language. In the last decade, these assumptions have been partly relaxed thanks to important breakthroughs in representation learning and deep learning, enhancing the performance for several tasks.

 

This short course is a first introduction to text mining aimed at a broad audience of practitioners. We’ll present the classical way of preprocessing, encoding and leveraging text data. Then, we’ll introduce recent techniques to learn more meaningful text representations and ways to deal with them using deep neural networks. We’ll stress out the importance of using modern approaches to represent the text through case studies with actual industrial applications, as for instance electricity demand forecasting or sentiment analysis from online comments.

 

Some experience in programming with R/Python is a plus. No prior knowledge of any deep learning framework is required. Data and code will be shared to allow reproduction of the experiments.

Outline

Part 1: Representing Text

  • Sparse text representation: bag-of-words, n-grams, tf-idf weighting
  • Dense text representation: word2vec, GloVe


Part 2: Solving Industrial Problems Defined on Text

  • Supervised learning with the linear model, trees and random forests
  • Supervised learning with convolutional, recurrent and Transformer-based neural networks

Short bio

 

Jairo Cugliari is Associate Professor of Statistics at the Laboratoire ERIC of the Université Lumière Lyon 2 in France, after receiving his Ph.D. in Statistics in 2011 (in Paris-Sud University in Orsay, France). His research is focused on academic and industrial data science problems involving complex data, such as functions, texts, multicriteria, or time series. He participated and led several projects driven by new challenges in electrical power demand forecasting. He is currently working on innovative usages of text data and heterogeneous transfer learning techniques. He is also interested in scientific popularization. He is a member of SFdS and ENBIS.

https://eric.univ-lyon2.fr/jcugliari/

 

 

Adrien Guille is Associate Professor of Computer Science at the Université Lumière Lyon 2, France. His research interests lie in the development of machine learning and data mining techniques for the analysis and modeling of text, with the aim of solving real-world problems related to language. While his most recent works deal with (linked) document embedding, he has also contributed to the fields of social media analysis and sociolinguistics. In addition to his participation in academic projects, he also lead or participated in several industrial collaborations leading to the creation of innovative text-based tools.

https://adrienguille.github.io/

 

 

 

Registration
ENBIS-22 Trondheim Course. Registration