Machine learning techniques are becoming top trending in Industry 4.0. These models have been successfully applied for passive applications such as predictive modelling and maintenance, pattern recognition and classification, and process monitoring, fault detection and diagnosis. However, there is a dangerous tendency to use them indiscriminately, no matter the type of application. For...
We describe a case study for modeling manufacturing data from a chemical process. The goal of the research was to identify optimal settings for the controllable factors in the manufacturing process, such that quality of the product was kept high while minimizing costs. We used structural equation modeling (SEM) to fit multivariate time series models that captured the complexity of the...
Batch reactors are suitable for the agile manufacturing of high value added products such as pharmaceuticals and specialty chemicals as the same reactors can be used to produce different products or different grades of products. Batch chemical reaction processes are typically highly nonlinear and batch to batch variations commonly exist in practice. Optimisation of batch process operation is...
We investigate the inference and design optimization of a progressively Type-I censored step-stress accelerated life test when the lifetime follows a log-location-scale family. Although simple, the popular exponential distribution lacks model flexibility due to its constant hazard rates. In practice, Weibull or lognormal distributions, which are members of the log-location-scale family,...
Some batch processes have a large variability on the batch-to-batch time completion caused by process conditions and/or external factors. The local batch time is commonly inferred from process experts. However, this may lead to inaccuracies, due the uncertainty associated with the batch-to-batch variations, leading the process to run more than is really needed. Process engineers could appeal...
We investigate the order-restricted Bayesian estimation and design optimization for a progressively Type-I censored simple step-stress accelerated life tests with exponential lifetimes under both continuous and interval inspections. Based on the three-parameter gamma distribution as a conditional prior, we ensure that the failure rates increase as the stress level increases. In addition, its...
Currently established maintenance regimes aim at pre-emptive activities to avoid failures during operation. Nevertheless, in many cases a significant amount of unforeseen service effort has to be spent on reactive measures entailing significant perturbation of the production and the service process. System supervision and analytics offer the potential to facilitate preventive maintenance....
Comparing the means of several experimental groups is an old and well known
problem in the statistical literature which arises in many application areas. In the
past decades, a large body of literature about the design of experiments for treatment
comparisons has flourished. However, the attention has been almost exclusively devoted
to estimation precision, and not to optimal testing. This...
The probability distribution of the response variable is one of the necessary assumptions in the design of an experiment and the existence of uncertainty supposes a challenge for practitioners. The aim of this work is the analysis of four strategies to obtain robust optimal designs in order to face this uncertainty. The strategies compared in this paper are compound criteria, multi-stage...
The maintenance process of railway tracks was for a long time purely event-driven, i.e., reactive. In the last decade, considerable research and development effort has been made in order to turn this into a pro-active work, i.e., analyse data from railway net as well as traffic, model a position-specific stress in terms of wear and predict the time for a required maintenance action.
One of...
One of the main criticisms to the optimal experimental design theory is that optimal designs tend to require too few points, frequently very extremal. In most of the models with one variable the number of different points reduces to the number of parameters to be estimated. Actually, an optimal design is used as a reference to measure how efficient are the designs used in practice. In this...
Abstract
In recent years, rail transportation in Europe is regarded as a viable alternative to other means of transport, and this naturally leads to a fierce competitions among operators in terms of passenger satisfaction. In this regard, railway passenger thermal comfort is one of the most challenging and relevant aspects, especially for long trips. Indeed, new European standards, such...
The talk will focus on the prediction of a new unobserved functional datum given a set of observed functional data, possibly in presence of covariates, either scalar, categorical, or functional. In particular we will present an approach (i) able to provide prediction regions which could visualized in the form of bands, (ii) guaranteed with exact coverage probability for any sample size, (iii)...
This project was initiated by engineers and scientists who have been exploring the exciting combination of new online data and new methods of data analysis. We will share our characterization of the data they collected and the questions they are asking. After giving a very brief description of the methods tried so far and issues that have arisen, we anticipate hearing your suggestions for...
This work deals with the problem of identifying recurrent patterns in the passengers' daily access to trains and/or stations in the railway system of Lombardy. People counter data, i.e. the number of boarding and dropping passengers on each train at each station, are analysed to identify eventual issues of the railway transport system and help decision makers in planning the trains scheduling...
Safari Njema Project is an interdisciplinary research project aimed at understanding and optimizing paratransit mobility system in Maputo (Mozambique), by analyzing mobile phone GPS data. In this talk, we give an introduction about the project and the context, describing what is paratransit mobility and how GPS data can help understanding the complex mobility system in sub-saharian urban...
According to the World Health Organization, the increase in the concentration of PM10 (particulate matter) in the air, values greater than 50 μg /m3, is a serious problem that threatens the environmental balance. Several research projects have been proposed in the detection of pollution peaks in offline and online mode to keep the variation of PM10 under control. While the increase in the...
Fredholm integral equations of the first kind are the prototypical example of inverse ill-posed problem.
They model, among other things, density deconvolution, image reconstruction and find applications in epidemiology, medical imaging, nonlinear regression settings.
However, their numerical solution remains a challenging problem. Many techniques currently available require a preliminary...
Quite often in industrial applications modeled using statistical models, the problem of changing these models at unknown times arises. These changes can be detected in real time during the process, or after all observations have been collected, and are called respectively, on line change-point detection, and a posteriori change-point detection. Moreover, depending on the theoretical conditions...
Weather forecasts are often expressed as an ensemble of forecasts obtained via multiple runs of deterministic physical models. Ensemble forecasts are affected by systematic errors and biases and have to be corrected via suitable statistical techniques. In this work, we focus on the statistical correction of multivariate weather forecasts based on empirical copulas.
We present the most common...
Imagine an experiment that has 5 categorical factors with 3, 4, 4, 8 and 12 levels, respectively. The combination of all of these in a full factorial experiment is 4,608 runs. Would you like to run all of those experiments? While you could if you had no restrictions on time, cost and sanity implications, this is not practical (especially if you consider adding levels or factors).
...
In the last years, the possibility of improving the analysis of data by capturing intrinsic relations has proven to be a flourishing research direction in data analysis. Graphs and higher-order structures are more and more often associated to data in order to infer qualitative knowledge, possibly independently from the data embedding representation. In this direction, topological data analysis...
Oil production rates forecasting is crucial for reservoir management and wells drilling planning. We here present a novel approach named Physics-based Residual Kriging, which is here applied to forecast production rates, modelled as functional data, of wells operating in a mature conventional reservoir along a given drilling schedule. The presented methodology has a wide applicability and it...
Reinforcement Learning (RL) is one of the three basic machine learning paradigms, alongside supervised and unsupervised learning. RL focuses on training an agent to learn an optimal policy, maximizing cumulative rewards from the environment of interest [1]. Recent developments in RL have achieved remarkable success in various process optimization and control tasks, where multiple applications...
A class of multivariate tests for the two-sample location problem with high-dimensional low sample size data and with complex dependence structure, that are increasingly common in industrial statistics, is described. The tests can be applied when the number of variables is much larger than the number of objects, and when the underlying population distributions are heavy-tailed or...
The continuously evolving digitalized manufacturing industry is pushing quality engineers to face new and complex challenges. Quality data formats are evolving from simple univariate or multivariate characteristics to big data streams consisting of sequences of images and videos in the visible or infrared range; manufacturing processes are moving from series production to more and more...
Industrial production processes are becoming more and more flexible, allowing the production of geometries with increasing complexity, as well as shapes with mechanical and physical characteristics that were unthinkable only a few years ago: Additive Manufacturing is a striking example. Such growing complexity requires appropriate control quality methods and, in particular, a suitable...
Condition-based maintenance is an effective method to reduce unexpected failures as well as the operations and maintenance costs. This work discusses the condition-based maintenance policy with optimal inspection points under the gamma degradation process. A random effect parameter is used to account for population heterogeneities and its distribution is continuously updated at each inspection...
In applications often paired comparisons involving competing alternatives of product descriptions are presented to respondents for valuation. For this situation, exact designs are considered which allow efficient estimation of main effects plus two plus three attribute interactions when all attributes have two levels. These designs allow significant reduction in the number of alternatives...
This workshop explores the importance of understand variation within business simulation. The use of one-dimensional Value Stream Mapping (VSM) with a focus on average process times and average stock levels is widespread throughout business. This approach is useful but not without limitations particularity in complex and high interactive processes, often seen in process industries.
The...
Managers make decisions that may affect the survival of the organisation and the future of many employees. It would be comforting to learn that these managers can get all the help they need to base their decisions on reliable data, but I suspect that this is not the case.
Managers who have ready access to a statistician who can communicate effectively with clients may be well satisfied. ...
Bayes classifiers rest on maximising the joint conditional PDF of the feature vector under the class value. The usage of copulae is the most flexible way of fitting joint distributions to data. In recent years, the problem of applying copulae to high dimensions has been approached with Vine copulae. Nevertheless, the application to very high dimensions in the order of several thousands have...
Due to the development of sensor devices and ubiquitous computing, we generate an enormous amount of data every second of every day. With access to a gigantic amount of information, it is imperative to analyze it, monitor it, and interpret it correctly so that business decisions are made correctly. When it comes to security, finding the anomalies is only the first step in data analysis....
Coherent forecasting techniques for count processes generate forecasts that consist of count values themselves. In practice, forecasting always relies on a fitted model and so the obtained forecast values are affected by estimation uncertainty. Thus, they may differ from the true forecast values as they would have been obtained from the true data generating process. We propose a...
Portfolio optimisation requires insight into the joint distribution of the asset returns, in particular the association or dependence between the individual returns. Classical approaches use the covariance matrix for association modelling. However, the usage of copulae is the most flexible way of fitting joint distributions to data. In recent years, the problem of applying copulae to high...
Computer age statistics typically involves large amounts of data and application of computer intensive methods. In this talk we focus on bootstrapping, cross validation and simulation methods. We discuss their use and limitations and contrast their applications. Specifically, we show how bootstrapping used to test hypothesis is different from cross validation used to validate predictive...
Generalized Likelihood Ratio (GLR)-based control charts for monitoring count processes have been proposed considering a variety of underlying dis- tributions and they are known to outperform the traditional control charts in effectively detecting a wide range of parameters’ shifts, while being relatively easy to design. In this study, generalized likelihood ratio tests for monitoring...
In a regression task, the choice of the best Machine Learning model is a critical step, especially when the main purpose is to offer a reliable tool for predicting future data. A poor choice could result in really poor predictive performances.
Fast moving consumer goods companies often plan consumer tests to gather consumers’ evaluations on new products and then are interested in analysing...
Statistical Engineering is gaining interest as a rising discipline. While there is near universal agreement as to the necessity of this discipline, there is still much confusion surrounding the particulars. How should Statistical Engineering relate to Statistics? How should it relate to other similar disciplines, such as Data Science or Operations Research? The results of several weeks of...
This work consists in a collection of useful results on the topics of Design of Experiments and Machine Learning applied in the context of product innovation. In many industries the performance of the final product depends upon some objective indicators that can be measured and that define the quality of the product itself. Some examples are mechanical properties in metallurgy or adhesive...
Five months ago, ISEA (International Statistical Engineering Association) organized a webminar with the title: “What It Is, What It Is Not, and How it Relates to Other Disciplines”. I was invited to participate as a discussant, which forced me to think about the topic. The two excellent presentations and the ensuing discussion gave me new points of view. In this presentation, I summarize my...
Consumer satisfaction, among other feelings, towards products or services are usually captured, both in industry and academia, by means of ordinal scales, such as Likert-type scales. This kind of scales generates information intrinsically affected by uncertainty, imprecision and vagueness for two reasons: 1) the items of a Likert scale are subjectively interpreted by respondents based on their...
Statistical modeling is, perhaps, the apex of data science activities in the process industries. These models allow analysts to gain a critical understanding of the main drivers of those processes to make key decisions and predict future outcomes. Interest in this field has led to accelerating innovation in the field of model development itself. There is a plethora of different modeling...
Problem/Challenge: The goal of this project was to autonomously control a part of a tissue mill’s continuous manufacturing process using artificial intelligence and predictive analytics to reduce raw material consumption while maintaining the product quality within the specification limit. The project objective was to overcome the challenge within the operator’s ability to act quickly with...
Nowadays, physical experimentation for some complex engineering and technological processes appears too costly or, in certain circumstances, impossible to be performed. In those cases, computer experiments are conducted in which a computer code is run to depict the physical system under study. Specific surrogate models are used for the analysis of computer experiments functioning as...
In the chemical process industry (CPI), it is important to properly manage process and equipment degradation as it can lead to great economic losses. The degradation dynamics are seldom included in modeling frameworks due to their complexity, time resolution and measurement difficulty. However, tackling this problem can provide new process insights and contribute to better predictive...
This contribution is a joint work of academicians and a research group of a leading industry in semiconductor manufacturing. The problem under investigation refers to a predictive maintenance manufacturing system. Zonta et al. (2020) present a systematic literature review of initiatives of predictive maintenance in Industry 4.0. According to Mobley (2002) industrial and process plants...
See the PDF abstract.
Performing online monitoring for short horizon data is a challenging, though cost effective benefit. Self-starting methods attempt to address this issue adopting a hybrid scheme that executes calibration and monitoring simultaneously. In this work, we propose a Bayesian alternative that will utilize prior information and possible historical data (via power priors), offering a head-start in...
In this study, we merge the sensitivity analysis method with the machine learning approach. We perform our study on the electrical power output of a combined cycle power plant using MLP neural networks.
Functional data creates challenges: it generates a lot of measurements, sometimes with redundant information and/or high autocorrelation, sampling frequency may not be regular, and it can be difficult to analyse the information or pattern behind the data.
One very common practice is to summarize the information through some points of interest in the curves: maximum/minimum value, mean, or...
Does Artificial Intelligence (AI) have to be certified?
AI modeling continues to grow in all industries, thus have a real impact on our day-to-day life. The explainability or interpretability of AI models is becoming more and more important nowadays, to understand the black box behind our algorithms.
Engineers and data scientists must understand and explain their models before sharing...
Quantum computing is a new revolutionary computing paradigm, first theorized in 1981. It is based on quantum physics and quantum mechanics, which are fundamentally stochastic in nature with inherent randomness and uncertainty. The power of quantum computing relies on three properties of a quantum bit: superposition, entanglement, and interference. Quantum algorithms are described by the...
The last decade has seen several new developments in the design of experiments. Three of these innovations, available in commercial software, are A-optimal designs, Definitive Screening Designs (DSDs), and Group Orthogonal Supersaturated Designs (GO SSDs). With all these choices the practitioner may wonder which class of designs to use to address a specific study. This presentation will...
Process and condition monitoring seem to be a key for nowadays execution of industry 4.0. Both have the purpose to keep process outcome quality high and operating costs low. The presentation will touch some fundamental conceptions on data quality, machine learning, synthetic training, forecast technologies statistical thinking, maintenance etc. as well as basic process categories from...
Most of the previous studies in Phase II analysis in real-life applications focused on monitoring profiles assuming that the estimated models and control-limits from Phase I are correctly performed with no model misspecification. However, these models may not perfectly fit the relationship between the response variable and the independent variable(s). Thus, this research proposes two new...
We develop Shiryaev-Roberts schemes based on signed sequential ranks to detect a persistent change in location of a continuous symmetric distribution with known median. The in-control properties of these schemes are distribution free, hence they do not require a parametric specification of an underlying density function or the existence of any moments. Tables of control limits are provided....
The application of statistical regression models in (bio)chemical industry as soft-sensors is becoming more and more popular with the increasing amounts of data collected. Such sensors can predict the critical properties of the product that signify the production quality from process variables. As these variables are much quicker and easier to measure than a traditional wet chemical analysis...
In this talk we discuss new Poisson CUSUM methods for space-time monitoring of geographical disease outbreaks. In particular, we develop likelihood ratio tests and change-point estimators for detecting changes in spatially distributed Poisson count data subject to linear drifts. The effectiveness of the proposed monitoring approach in detecting and identifying trend-type shifts is studied by...
This work focuses on the effective utilisation of varying data sources in injection moulding for process improvement through a close collaboration with an industrial partner. The aim is to improve productivity in an injection moulding process consisting of more than 100 injection moulding machines. It has been identified that predicting quality through Machine Process Data is the key to...
CUSUMs based on the signed sequential ranks of observations are developed for detecting location and scale changes in symmetric distributions. The CUSUMs are distribution-free and fully self-starting: given a specified in-control median and nominal in-control average run length, no parametric specification of the underlying distribution is required in order to find the correct control limits....
Chemical Processes are traditionally simulated using physical computer models to capture the highly non-linear behaviours exhibited by features such as reaction kinetics and recycle loops. Traditional statistical models have been, historically, poor predictors of process performance. Here, an alternative, Bayesian treatment of a process modelling problem is presented, modelled on an existing...
This study presents a digital twin-based data processing platform for predictive maintenance in an industrial context. The proposed platform aims for predictive maintenance using a data-driven solution enhanced with a model-driven approach based on a three-tier architecture. The platform developed is aligned with Big Data Value Reference Model and the Industrial Internet Reference Architecture...
The free multilingual AMADO-online application displays and analyses data matrix (binary, counts, responses to Likert-type items, or measures of heterogeneous variables, etc.) by combining Bertin's visualisation method with Factorial Analysis (to find an approximately diagonal structure, if it exists in the data) and Hierarchical Classification (to find bock-models).
AMADO-online is...
Production often has as main purpose that products should fall in a pre-specified range of variation. Thus, there is a lack of variation in data making it difficult to work with from a statistical viewpoint. The traditional approach to learn about the input-to-output process of a production is to make a hypothesis and then tailor an experimental design yielding sufficient variation in data to...
In the industry of hot water tanks, welding is present in almost all the manufacturing steps. The final product quality is highly dependent on the welding quality. Evaluating this latter from the welding signals has gained considerable interest in the last years due to the development of data acquisition systems and artificial intelligence methods. Welding defect detection is the center of...
Failure analysis (FA) is key to a reliable semiconductor industry. Fault analysis, physical analysis, sample preparation and package construction analysis are arguably the most used analysis activity for determining the root-cause of a failure in semiconductor industry 4.0. As a result, intelligent automation of this analysis decision process using artificial intelligence is the objective of...
With the development of automatic diagnostics based on statistical predictive models, coming from any supervised machine learning (ML) algorithms, new issues about model validation have been raised. For example in the non-destructive testing field, generalized automated inspection (that will allow large gain in terms of efficiency and economy) has to provide high guarantees in terms of...