Latent Variables Multivariate Statistical Methods for Data Analytics in Industry 4.0


Latent Variables Multivariate Statistical Methods for Data Analytics in Industry 4.0?

Part of the ENBIS-23 Valencia conference.


Alberto Ferrer and Joan Borràs

Multivariate Statistical Engineering Group (GIEM),

Dpt. of Applied Statistics, O.R. & Quality

Universitat Politècnica de València, Spain 


Modern industry is adopting the Industry 4.0 paradigm fostered by the Industrial Internet of Things (IIoT) connecting intelligent physical entities to each other and allowing complex equipment units to have embedded sensors and special modules (agents) providing connection to the monitoring center. This is leading to the so-called Big Data environment, characterized by the 5 V´s: volume, variety, veracity, velocity and value (White 2016).

Process data in industry, although shares many of the characteristics represented by the 5 V´s, may not really be Big Data in comparison to other sectors such as social networks, sales, marketing and finance. However, the complexity of the questions we are trying to answer with industrial process data is really high. Not only do we want to find and interpret patterns in the data and use them for predictive purposes, but we also want to extract meaningful relationships that can be used to improve and optimize a process (García-Muñoz & MacGregor 2016).

Apart from the infrastructure (e.g. data collection, warehousing and integration) needed to manage these Big Data streams, the key point is how to analyze them to effectively extract information to give organizations new insights about their products, customers and services and steer the decision-making process. This can be particularly valuable when it is critical to maintain quality and uptime, such as in process monitoring applications, by quickly detecting and diagnosing abnormal activities, predicting the time-to-failure of equipment units or when rapid new products development is critical for company survival (MacGregor 2018).

In this talk we illustrate the power of latent variable-based multivariate statistical methods for Data Analytics to analyze and visualize extracted information in a way that is easily interpreted and that is useful for different purposes (e.g. process understanding, real time process monitoring, fault detection & identification, process improvement and predictive maintenance). We will stress the use of these methods for process optimization using historical data (not necessarily from Design of Experiments) (Palací-López et al 2019).

A discussion on the pros/cons of latent variable-based vs classical statistical models (e.g. linear regression methods) and machine learning methods (such as deep learning neural networks, support vector machines or random forests) in Data Analytics to derive knowledge and information from massive data will also be addressed.

All participants will get free access to an original Graphical User Interface (GUI) implemented in Python. Thus, the participant will be able to apply the contents explained in the course to industrial case studies. It is not required to bring their Windows or Mac computers.


Below is given an outline of the course:

  • Process industry 4.0 and (Big) Data Streams
  • Latent variables (LV)-based multivariate analysis (MVA) for Industry 4.0
  • Optimization by LV model inversion
  • Machine learning vs Latent variables models. Who is the winner?
  • Case studies: process understanding, real time process monitoring, fault detection & identification, process improvement and predictive maintenance, process optimization.


  1. García Muñoz, S., MacGregor, J.F.: Big Data. Success Stories in the Process Industries. Chemical Engineering Progress, 112 (3) 36-40 (2016)

  2. MacGregor, J.F. Empirical Models for Analyzing BIG Data. What's the Difference? 2018 Spring AIChE Conference (Orlando, FL, USA).

  3. Palací-Lopez, D., Facco, P., Barolo, M., Ferrer, A. New tools for the design and manufacturing of new products based on Latent Variable Model Inversion. Chemometrics and Intelligent Laboratory Systems 194 (2019) 103848

  4. White, D.: Big Data. What is it? Chemical Engineering Progress, 112 (3) 32-35 (2016)

Short bio

Alberto Ferrer is Head of the Multivariate Statistical Engineering Research Group ( and Professor of Statistics at the Department of Applied Statistics, Operation Research and Quality at Universitat Politècnica de València (Spain). His main interest focuses on industrial statistics, (big) data analytics, multivariate Six Sigma, medical and industrial image analysis, and statistical techniques for quality and productivity improvement, especially those related to latent variables-based multivariate statistical methods for both continuous and batch processes (chemical, bio, pharma, food, …).


Joan Borràs holds has a bachelor's degree and master’s degree in chemical engineering. Currently, he is a final year PhD candidate in the Statistics and Optimization Doctoral Program at Universitat Politècnica de València working on industrial process optimization through latent variable multivariate statistical techniques.

ENBIS-23 Valencia Course. Registration
    • 9:00 AM 1:00 PM
      Latent Variables Multivariate Statistical Methods for Data Analytics in Industry 4.0 4h
      Speakers: Alberto J. Ferrer-Riquelme (Universidad Politecnica de Valencia), Joan Borràs-Ferrís (Universitat Politècnica de València)