https://conferences.enbis.org/event/41/
With all the data being collected today, there is a strong focus on how to generate value from that data for various stakeholders and society as a whole. While many analytics tasks can be solved efficiently using local data only, typically, their solution can be substantially improved by using data of others. Obvious examples would include (i) supply chains where stakeholders can highly...
We propose a novel Bayesian binary classification framework for networks with labeled nodes. Our approach is motivated by applications in brain connectome studies, where the overarching goal is to identify both regions of interest (ROIs) in the brain and connections between ROIs that influence how study subjects are classified. We develop a binary logistic regression framework with the network...
Spatially misaligned data are becoming increasingly common in fields such as epidemiology, ecology and the environment due to advances in data collection and management. Here, we present a Bayesian geostatistical model for the combination of data obtained at different spatial resolutions. The model assumes that underlying all observations, there is a spatially continuous variable that can be...
Applying Machine Learning techniques in our business require several elements beyond the Statistics and Math. The main building blocks which would enable a real deployment and use of Machine Learning commonly imply data and statistics but also expert teams, technology, frames, tools, governance, regulation and processes, amongst other. Expert data scientists knowing the limits of the...
This article studies the willingness of the citizens of the 27 EU countries to change their travel and tourism habits to assume a more sustainable behavior. The study wants to contribute to the recent literature on the topic of interconnections between tourism and sustainability. The data comes from the Flash Eurobarometer survey 499, involving more than 25,000 European citizens. The survey...
We often think of digitalization as the application of complex machine learning algorithms to vast amounts of data. Unfortunately, this raw material is not always available, and, in particular, many traditional businesses with well-established processes accumulate a large technical debt that impedes progress towards more modern paradigms. In this talk, we review a complete case study, from...
A document of the Joint Committee for Guides in Metrology [JCGM 106:2012 - Evaluation of measurement data – The role of measurement uncertainty in conformity assessment] provides a Bayesian approach to perform conformity assessment (CA) of a scalar property of a single item (a product, material, object, etc.). It gives a methodology to calculate specific and global risks of false decisions for...
A mixture of a distribution of responses from untreated patients and a shift of that distribution is a useful model for the responses from a group of treated patients. The mixture model accounts for the fact that not all the patients in the treated group will respond to the treatment and their responses follow the same distribution as the responses from untreated patients. The treatment effect...
Extensive studies have been conducted on how to select efficient designs with respect to a criterion. Most design criteria aim to capture the overall efficiency of the design across all columns. When prior information indicated that a small number of factors and their two-factor interactions (2fi's) are likely to be more significant than other effects, commonly used minimum aberration designs...
Feature selection is one of the most relevant processes in any methodology for creating a statistical learning model. Generally, existing algorithms establish some criterion to select the most influential variables, discarding those that do not contribute any relevant information to the model. This methodology makes sense in a classical static situation where the joint distribution of the data...
The recent pandemic surged the emergency for quick access to new drugs and vaccines for the patients. Stability assessment of the product may represent a bottleneck when it is based on real-time data covering 2 or 3 years. To accelerate the decisions and ultimately the time-to-market, accelerated stability studies may be used with data obtained for 6 months. We show that the kinetic Arrhenius...
We argue against the use of generally weighted moving average (GWMA) control charts. Our primary reasons are the following: 1) There is no recursive formula for the GWMA control chart statistic, so all previous data must be stored and used in the calculation of each chart statistic. 2) The Markovian property does not apply to the GWMA statistics, so computer simulation must be used to...
The main contributions of the work (joint with P. Semeraro, Politecnico di Torino) are algorithms to sample from multivariate Bernoulli distributions and to determine the distributions and bounds of a wide class of indices and measures of probability mass functions. Probability mass functions of exchangeable Bernoulli distributions are points in a convex polytope, and we provide an analytical...
Design of Experiments (DOE) is a powerful tool for optimizing industrial processes with a long history and impressive track record. However, despite its success in many industries, most businesses in Denmark still do not use DOE in any form due to a lack of statistical training, preference for intuitive experimentation, and misconceptions about its effectiveness.
To address this issue, the...
Data Science has emerged to deal with the so-called (big) data tsunami. This has led to the Big Data environment, characterized by the four Vs: volume, variety, velocity, and veracity. We live in a new era of digitalization where there is a belief that due to the amount and speed of data production, new technologies coming from artificial intelligence could now solve important scientific and...
Measurement uncertainty is a key quality parameter to express the reliability of measurements. It is the basis for measurements that are trustworthy and traceable to the SI. In addition to scientific research, guidance documents and examples on how to evaluate the uncertainty for measurements, training is an important cornerstone to convey an understanding of uncertainty.
In Europe courses...
This research discusses the effects of large round-off errors on the performance of control charts for means when a process is normally distributed with a known variance and a fixed sample size. Quality control in practice uses control charts for means as a process monitoring tool, even when the data is significantly rounded. The objective of this research is to demonstrate how ignoring the...
Abstract
In the era of big data, several sampling approaches are proposed to reduce costs (and time) and to help in informed decision making. Most of these proposals require the specification of a model for the big data. This model assumption, as well as the possible presence of outliers in the big dataset, represent a limitation for the most commonly applied subsampling criterions.
The task...
Regression models have become increasingly important in a range of scientific fields, but accurate parameter estimation is crucial for their use. One issue that has recently emerged in this area is the estimation of parameters in linear or generalized linear models when additional information about the parameters limits their possible values. One issue that has recently emerged in this area is...
We present a summary of recently developed methods for the Statistical Process Control of 3-dimensional data acquired by a non-contact sensor in the form of a mesh. The methods have the property of not requiring ambient coordinate information, and use only the intrinsic coordinates of the points on the meshes, hence not needing the preliminary registration or alignment of the parts. Intrinsic...
DTs are simulation models that replicate physical systems in a virtual environment, dynamically updating the virtual model according to the observed state of its real counterpart to achieve physical control of the latter. DTs consist of a Physical to Virtual (P2V) and a Virtual to Physical (V2P) connection. DTs require complex modelling, often resorting to data-driven approaches. DTs allow for...
Numerical models have become essential tools to study complex physical systems. The accuracy and robustness of their predictions is generally affected by different sources of uncertainty (numerical, epistemic). In this work, we deal with parameter uncertainty of multiphysics simulation consisting of several numerical models from different physics which are coupled with one another. Our...
A recent study based on data from Microsoft reports that 76 − 95% of all failed components in data centres are hard drives. HDDs are the main reason behind server failures. Consequently, the ability to predict failures in hard disk drives (HDDs) is a major objective of HDD manufacturers since avoiding unexpected failures may prevent data loss, improve service reliability, and reduce data...
This paper explores the problem of estimating the contour location of a computationally expensive function using active learning. Active learning has emerged as an efficient solution for exploring the parameter space when minimizing the training set is necessary due to costly simulations or experiments.
The active learning approach involves selecting the next evaluation point sequentially to...
Thermal management is a key issue for the miniaturization of electronic devices due to overheating and local hot spots. To anticipate these failures, manufacturers require knowledge of the thermal properties of the used materials at the nanoscale (defined as the length range from 1 nm to 100 nm), which is a challenging issue because thermal properties of materials at nanoscale can be...
Most experimental design methodology focuses on parameter precision, where the model structure is assumed known and fixed. But arguably, finding the correct model structure is the part of the modelling process that takes the most effort.
Experimental design methodology for model discrimination usually focuses on discriminating between two or more known model structures. But often part of...
The Poisson log-normal (PLN) model is a generic model for the joint distribution of count data, accounting for covariates. It is also an incomplete data model. A classical way to achieve maximum likelihood inference for model parameters $\theta$ is to resort to the EM algorithm, which aims at maximizing, with respect to $\theta$, the conditional expectation, given the observed data $Y$, of the...
Statistical process control (SPC), as part of quality control, makes it possible to monitor the quality levels of products and services, detect possible anomalies, their assignable causes and, consequently, facilitate their continuous improvement. This work will present the application of various SPC tools for the control of processes such as transit through the Expanded Panama Canal or the...
MTR, the major Hong Kong public transport provider, has been operating for 40 years with more than 1000 escalators in the railway network. These escalators are installed in various railway stations with different ages, vertical rises and workload. An escalator’s refurbishment is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an...
Spectroscopy and chromatography data - from methods such as FTIR, NMR, mass spectroscopy, and HPLC - are ubiquitous in chemical, pharmaceutical, biotech and other process industries. Until now, scientists didn't have good ways to use this data as part of designed experiments or machine learning applications. They were required to ‘extract features’ such as the mean, peak height, or a threshold...
Cloud computing has transformed the way businesses handle their data and extract insights from it. In the geospatial domain, the main cloud platforms such as BigQuery, AWS, Snowflake, and Databricks have recently introduced significant developments that allow users to work with geospatial data. Additionally, CARTO is developing a Spatial Extension - a set of products and functionalities built...
When computer codes are used for modeling complex physical systems, their unknown parameters are tuned by calibration techniques. A discrepancy function is added to the computer code in order to capture its discrepancy with the real physical process. This discrepancy is usually modeled by a Gaussian process. In this work, we investigate a Bayesian model selection technique to validate the...
Data science is getting closer and closer to the core of Business. Statistical analysis is not anymore a task constrained to data analysts that end up in a results report for making. On the one hand, as Data Visualization and Machine Learning models are spreading throughout all business areas, it is needed something else than static reports. The deployment of Data Science products to be...
The advantages of being able to define precisely meaningful multivariate raw material specifications are enormous. They allow increasing the number of potential suppliers, by allowing a wider range of raw material properties, without compromising the Critical Quality Attributes (CQAs) of the final product. Despite their importance, specifications are usually defined in an arbitrary way based...
We explore several statistical learning methods to predict individual electrical load curves using customers’ billing information. We predict the load curves by searching in a catalog of available load curves. We develop three different strategies to achieve our purpose. The first methodology relies on estimating the regression function between the load curves and the predictors (customers’...
Over the last 30 years processing industries such as refining, chemicals and life sciences have been using data driven models to achieve economic and environmental goals through optimization. Some of these applications include advanced process control, real-time optimization and univariate statistical process monitoring. Although these methods are successful for many applications, there are...
The family life cycle is a theoretical model that describes the different stages that a family normally goes through during its life. These stages are associated with changes in the family nucleus composition and with the relations between members. From a banking point of view, it is important to note that the financial needs of the family will also change throughout its life. Therefore, the...
Within the framework of the Mixed Research Center (CEMI) between the company Navantia and the University of A Coruña, one of the research lines consists of using statistical methods for dimensional control of panel production. This paper will present some advances in the use of set estimation for detecting singular elements in panels and determining their geometric characteristics (angles...
The emergence of green targets is driving manufacturing to minimize environmental impact, optimize resource utilization, reduce waste, and achieve zero-net industries. On the other side, the emergence of Industry 4.0 and advancements in process technologies have led to the availability of complex and massive data sets in various industrial settings. This has sparked a new renaissance in...
The exponential integration of technologies in different disciplines, the ease of access to data, the proliferation of publications in Internet, ... etc., causes an increase in the number of new beliefs that try to explain the origin of the differences between behaviors with pseudoscientific discourses based on data. People are not using Statistics well.
Statistical professionals can do...
Internet of Things sensors placed in the environment may be subject to a nested structure caused by local data relay devices. We present an algorithm for D-optimal experiment design of the sensor placement under these circumstances. This algorithm is an adaption of an existing exchange algorithm sometimes called the Fedorov algorithm. The Fedorov exchange algorithm has been shown in the...
Environmental consciousness is a complex construct that involves multiple dimensions related to pro-environmental attitudes, beliefs and behaviours. Academic literature has attempted, over the last 20 years, to conceptualize and operationalize environmental consciousness, thus leading to a wide variety of measures. However, the available measures are country-specific and with a predominant...
The large volume of complex data being continuously generated in Industry 4.0 environments, usually coupled with significant restrictions on experimentation in production, tends to hamper the application of the classical Six Sigma methodology for continuous improvement, for which most statistical tools are based in least squares techniques. Multivariate Six Sigma [1], on the other hand,...
Industrial systems are in general subject to deterioration, ultimately leading to failure, and therefore require maintenance. Due to increasing possibilities to monitor, store, and analyze conditions, condition-based maintenance policies are gaining popularity. We consider optimization of imperfect condition-based maintenance for a single unit that deteriorates according to a discrete-time...
Reinforcement learning is a variant on optimization, formulated as a Markov Decision Problem, and is seen as a branch of machine learning. CQM, a consultancy company, has decades of experience in Operations Research in logistics and supply chain projects. CQM performed a study in which reinforcement learning was applied to a logistics case on tank containers. Because of inbalanced flows, these...
We present a state-space model in which failure counts of items produced from the same batch are correlated, so as to be able to characterize the pattern of occurrence of failures of new batches at an early stage, based on those of older batches. The baseline failure rates of consecutive batches are related by a random-walk-type equation, and failures follow a Poisson distribution. The failure...
In recent years, several researchers have published catalogs of experimental plans. First, there are several catalogs of orthogonal arrays, which allow experimenting with two-level factors as well as multi-level factors. The catalogs of orthogonal arrays with two-level factors include alternatives to the well-known Plackett-Burman designs. Second, recently, a catalog of orthogonal minimally...
Traditional Six Sigma statistical toolkit, mainly composed of classical statistical techniques (e.g., scatter plots, correlation coefficients, hypothesis testing, and linear regression models from experimental designs), is seriously handicapped for problem solving in the Industry 4.0 era. The incorporation of latent variables-based multivariate statistical techniques such as Principal...
While previous studies have shown the potential value of predictive modelling for emergency care, few models have been practically implemented for producing near real-time predictions across various demand, utilisation and performance metrics. In this study, 33 independent Random Forest (RF) algorithms were developed to forecast 11 urgent care metrics over a 24-hour period across three...
We address the task of predicting amount of energy produced during the total duration of a wind-farm project, typically spanning several decades. This is a crucial step to assess the project's return rate and convince potential investors.
To perform such an assessment, onsite mast measures at different heights often provide accurate data over a few years, together with so-called satellite...
In this talk we introduced a multivariate image analysis (MIA)-based quality monitoring system for the detection of batches of a vegetable fresh product (Iceberg type lettuce) that do not meet the established quality requirements. This tool was developed in the Control stage of the DMAIC cycle of a Six Sigma Multivariate project undertaken in a company of the agri-food sector.
An...
Design Risk Analysis is often resembled with doing a Design Failure Mode and Effects Analysis (DFMEA). By doing a DFMEA a structure is defined where the customer technical requirements are mapped to functions, and the functions are mapped to failure modes that contains a cause and effect description. This is in a qualitative way ranked and managed.
The challenge in a Design Risk Analysis work...
In this talk, we propose to discuss the Smarter Mobility Data Challenge organised by the AI Manifesto, a French business network promoting AI in industry, and TAILOR, a European project aiming to provide the scientific foundations for trustworthy AI. The challenge required participants to test statistical and machine learning prediction models to predict the statuses of a set of electric...
In this study, we propose to use the Local Linear Forest (R. Friedberg et al., 2020) to forecast the best equipment condition from complex and high-dimensional semiconductor production data. In a static context, the analysis performed on real production data shows that Local Linear Forests outperform the traditional Random Forest model and 3 other benchmarks. Each model is finally integrated...
For the two three-way ANOVA models $A \times BB \times CC$ and $(A \succ BB) \times CC$ (doubled letters indicate random factors) an exact $F$-test does not exist, for testing the hypothesis that the fixed factor $A$ has no effect. Approximate $F$-tests can be obtained by Satterthwaite's approximation. The approximate $F$-test involves mean squares to be simulated. To approximate the power of...
This is a session in which we discuss two or three open problems proposed by conference participants. As usual, Chris Ritter will lead this session. He is looking for fresh cases for this session:
CALL FOR VOLUNTEERS
We need volunteers who have current open problems and would like to present them at this session.
You will present for about 5-7 minutes to describe the context and the...
In paper & paperboard making, sampling of product properties can only be made by the end of each jumbo reel, which occurs 1-2 times per hour. Product properties can vary significantly faster and do so in both machine and cross machine directions. The low sampling may result in significant consequences such as the rejecting an entire jumbo reel, weighing about 25 tons, by classifying it as...
The use of data for supporting inductive reasoning, operational management, and process improvement, has been a driver for progress in modern industry. Many success stories have been shared on the successful application of data-driven methods to address different open challenges, across different industrial sectors. The recent advances in AI/ML technology in the fields of image & video...
Following on from the Kansei Engineering (KE) special session at ENBIS 2019, we now present new work in this niche area dealing with design, service and product development.
The role of affective engineering in product development.
Affective aspects in products are increasingly important for usability, application and user purchase decision-making. Therefore, considering these aspects is...
Modern digital instruments and SW options are standardly used in various areas, of course also in aviation. Today, the pilot is shown a number of physical parameters of the flight, the state of the propulsion or the aircraft's systems. These instruments also automatically save the scanned data.
Analysis of collected data allows simultaneous surveillance of several aircraft turboprop...
Liquefied natural gas (LNG) is a promising fuel. However, a major component of LNG is Methane, which is a greenhouse gas. Shell aims to reduce methane emissions intensity below 0.2% by 2025.
Existing leak detection techniques have limitations, such as limited coverage area or high cost. We explore data science driven framework using existing process sensor data to localize and estimate leak...
In the context of sensitivity analysis, the main objective is to assess the influence of various input variables on a given output of interest, and if possible to rank the influential inputs according to their relative importance. In many industrial applications, it can occur that the input variables present a certain type of hierarchical dependence structure. For instance, depending on some...
The benefit of predictive maintenance (PdM) as an enterprise strategy for scheduling repairs compared to other maintenance strategies relies heavily on the optimal use of resources, especially for SMEs: Expertise in the production process, Machine Learning Know-How, Data Quality and Sufficiency, and User Acceptance of the AI-Models have shown to be significant factors in the profit...
We introduce a novel framework for explainable AI time series forecasting based on a local surrogate base model. An explainable forecast, at a given reference point in time, is delivered by comparing the change in the base model fitting before and after the application of the AI-model correction. The notion of explainability used here is local both in the sense of the feature space and the...
In parametric non-linear profile modeling, it is crucial to map the impact of model parameters to a single metric. According to the profile monitoring literature, using multivariate $T^2$ statistic to monitor the stability of the parameters simultaneously is a common approach. However, this approach only focuses on the estimated parameters of the non-linear model and treats them as separate...
Novel production paradigms like metal additive manufacturing (AM) have opened many innovative opportunities to enhance and customize product performances in a wide range of industrial applications. In this framework, high-value-added products are more and more characterized by novel physical, mechanical and geometrical properties. Innovative material performances can be enabled by tuning...
Autocorrelated sequences of individual observations arise in many modern-day statistical process monitoring (SPM) applications. Often times, interest involves jointly monitoring both process location and scale. To jointly monitor autocorrelated individuals data, it is common to first fit a time series model to the in-control process and subsequently use this model to de-correlate the...
In today's society, machine learning (ML) algorithms have become fundamental tools that have evolved along with society itself in terms of their level of complexity. The application areas of ML cover all information technologies, many of them being directly related to problems with a high impact on human lives. As a result of these examples, where the effect of an algorithm has implications...
The modelling and projecting of disease incidence and mortality rates is a problem of fundamental importance in epidemiology and population studies generally, and for the insurance and pensions industry in particular. Human mortality has improved substantially over the last century, but this manifest benefit has brought with it additional stress in support systems for the elderly, such as...
We are living in the big data era. The amount of data created is enormous and we are still planning to generate even more data. We should stop and ask ourselves: Are we extracting all the information from the available data? Which data do we really need? The next frontier of climate modelling is not in producing more data, but in producing more information. The objective of this talk is to...
Point cloud data are widely used in manufacturing applications for process inspection, modeling, monitoring and optimization. The state-of-art tensor regression techniques have effectively been used for analysis of structured point cloud data, where the measurements on a uniform grid can be formed into a tensor. However, these techniques are not capable of handling unstructured point cloud...
This improvement project was conducted in a warehouse that provides repair services and storage for the equipment, supplies/consumables, and repair parts needed to perform technical cleaning and hygiene services for clients such as in schools, hospitals, airports, etc. While initially organizing materials one section/area at a time using 5S (sort, set-in-order, shine, standardize, and...
Studies have identified a connection between the microtexture regions (MTRs) found in certain titanium alloys and early onset creep fatigue failure of rotating turbomachinery. Microtexture regions are defined by their size and orientation, which can be characterized via scanning electron microscopy (SEM) Electron Backscatter Diffraction (EBSD). However, doing so is impractical at the...
Ordinary citizens rarely think about protecting underground utilities, until a water main has burst or internet service is interrupted by an excavation project. The project might be as small as a fence installation or as large as burying fiber optic cable along large sections of major highways. Many states and countries have a central service provider that distributes notices to utility...
We have created a wildfire-probability estimating system, based on publicly available data (historic wildfires, satellite images, weather data, maps). The mathematical model is rather simple: kriging, logistic regression and the bootstrap are its main tools, but the computational complexity is substantial, and the data analysis is challenging.
It has a wide range of applications. Here we...
My main goals, as the Director of the Center for Governance and the Economy in the Israel Democracy Institute is to initiate, lead and manage applied research and to professionally analyze the key developments within Israeli economy, society and labor market. I work towards achieving these goals on several tracks:
- Recruiting a team of talented professionals, experts in data analysis....
Recently, online-controlled experiments (i.e., A/B tests) have become an extremely valuable tool used by internet and technology companies for purposes of advertising, product development, product improvement, customer acquisition, and customer retention to name a few. The data-driven decisions that result from these experiments have traditionally been informed by null hypothesis significance...
Over 782,000 individuals in the U.S. have end-stage kidney disease with about 72% of patients on dialysis, a life-sustaining treatment. Dialysis patients experience high mortality and frequent hospitalizations, at about twice per year. These poor outcomes are exacerbated at key time periods, such as the fragile period after the transition to dialysis. In order to study the time-varying effects...
Many approaches for solving problems in business and industry are based on analytics and statistical modelling. Analytical problem solving is driven by the modelling of relationships between dependent (Y) and independent (X) variables, and we discuss three frameworks for modelling such relationships: cause-and-effect modelling, popular in applied statistics and beyond, correlational predictive...
In the previous century, statisticians played the most central role in the field of data analysis, which was primarily focused on analyzing structured data, often stored in relational databases. Statistical techniques were commonly employed to extract insights from these data. The last few decennia have marked a substantial change in the way data are generated, used and analyzed. The term data...
Two data-driven approaches for interpreting turbulent-flow states are discussed. On the one hand, multidimensional scaling and K-medoids clustering are applied to subdivide a flow domain in smaller regions and learn from the data the dynamics of the transition process. The proposed method is applied to a direct numerical simulation dataset of an incompressible boundary layer flow developing on...
In this work, a practical reliability analysis and engine health prognostic study is performed using a Functional Data Analysis (FDA) approach. Multi-sensor data collected from aircraft engines are processed in order to solve one of the most important reliability analysis problems, which is estimating the health condition and the Remaining Useful Life (RUL) of an aircraft engine. Time-variant...
Hybrid modeling is a class of methods that combines physics-based and data-driven models to achieve improved prediction performance, robustness, and explainability. It has attracted a significant amount of research and interest due to the increasing data availability and more powerful analytics and statistical methodologies (von Stosch et al., 2014; Sansana et al., 2021). In the context of the...
Joint modelling is a modern statistical method that has the potential to reduce biases and uncertainties due to informative participant follow-up in longitudinal studies. Although longitudinal study designs are widely used in medical research, they are often analysed by simple statistical methods, which do not fully exploit the information in the resulting data. In observational studies,...
The International Statistical Engineering Association on its webpage states, “Our discipline provides guidance to develop appropriate strategies to produce sustainable solutions.” Clearly, strategy should be an essential foundation the proper implementation of statistical engineering. Yet, virtually all of the materials on the website are more tactical than strategic. This talk explores the...
Our work proposes a variance-based measure of importance for coherent systems with dependent and heterogeneous components. The particular cases of independent components and homogeneous components are also considered. We model the dependence structure among the components by the concept of copula. The proposed measure allows us to provide the best estimation of the system lifetime, in terms of...
Studies in life course epidemiology involve different outcomes and exposures being collected on individuals who are followed over time. These include longitudinally measured responses and the time until an event of interest occurs. These outcomes are usually separately analysed, although studying their association while including key exposures may be interesting. It is desirable to employ...
The latent variable framework is the base for the most widespread methods for monitoring large-scale industrial processes. Their prevalence arises from the robustness and stability of their algorithms and a well-established and mature body of knowledge. A critical aspect of these methods lies in the modeling of the dynamics of the system, which can be incorporated in two distinct ways:...
In this talk we share our experience introducing Statistical Engineering as a new discipline in Brazil. We provide an overview of the actions taken and the challenges we face. Our efforts have been mentored by Professor Geoff Vining, an enthusiastic leader in promoting the emerging subject of Statistical Engineering. The initiative is led by the Federal University of Rio Grande do Norte...
In the real world, a product or a system usually loses its function gradually with a degradation process rather than fails abruptly. To meet the demand of safety, productivity, and economy, it is essential to monitor the actual degradation process and predict imminent degradation trends. A degradation process can be affected by many different factors.
Degradation modelling typically involves...
The family of orthogonal minimally aliased response surface designs or OMARS designs bridges the gap between the small definitive screening designs and classical response surface designs. The initial OMARS designs involve three levels per factor and allow large numbers of quantitative factors to be studied efficiently. Many of the OMARS designs possess good projection properties and offer...
Run-to-Run (R2R) control has been used for decades to control wafer quality in semiconductor manufacturing, especially in critical processes. By adjusting controllable variables from one run to another, quality can be kept at desired levels even as the process conditions gradually change, such as equipment degradation. The conventional R2R control scheme calculates the adjustment value for the...
The use of paper helicopters is very common when teaching Six Sigma and in particular DoE (Design of Experiments). During the conference in Turkey, I used the paper helicopter demonstration to spur discussion. Now is the time to revisit this topic and rejuvenate interest.
During this session I will demonstrate how Statistical Process Control (SPC), DoE (Plackett and Burman 8 runs) and...
AI is the key to optimizing the customer experience. But without explicit industry knowledge, empathy, knowledge of currents, values and cultural characteristics of the audience, the cultivation, and expansion of customer relationships falls short of expectations. AI and the segmentation and forecasting possibilities that come with it quickly become a blunt sword. Only in combination with...
Nowadays, die stacking is gaining a lot of attention in the semiconductor industry. Within this assembly technique, two or more dies are vertically stacked and bonded in a single package. Compared to single-die packages, this leads to many benefits, including more efficient use of space, faster signal propagation, reduced power consumption, etc.
Delamination, i.e., the separation of two...
In pharmaceutical manufacturing, the analytical method to measure the responses of interest is often changed during the lifetime of a product due to new laboratory included, new equipment, or different source of starting material. To evaluate an impact of such change, method comparability assessment is needed. Method comparability is traditionally evaluated by comparing summary measures such...
Industry 4.0 opens up a new dimension of potential improvement in productivity, flexibility and control in bioprocessing, with the end goal of creating smart manufacturing plants with a wide web of interconnected devices. Bioprocessing involves living organisms or their components to manufacture a variety of different products and deliver therapies and this organic nature amplifies the...
Self-Validating Ensemble Modeling (S-VEM) is an exciting, new approach that combines machine learning model ensembling methods to Design of Experiments (DOE) and has many applications in manufacturing and chemical processes. In most applications, practitioners avoid machine learning methods with designed experiments because often one cannot afford to hold out runs for a validation set without...
Broadly speaking, Bayesian optimisation methods for a single objective function (without constraints) proceed by (i) assuming a prior for the unknown function f (ii) selecting new points x at which to evaluate f according to some infill criterion that maximises an acquisition function; and (iii) updating an estimate of the function optimum, and its location, using the updated posterior for f....
Design space construction is a key step in the Quality by Design paradigm in manufacturing process development. Construction typically follows the development of a response surface model (RSM) that relates different process parameters with various product quality attributes and serves the purpose of finding the set of process conditions where acceptance criteria of the objectives are met with...
One of the last major steps in the development of complex technical systems is reliability growth (RG) testing. According to [1], RG is defined as […] improvement of the reliability of an item with time, through successful correction of design or product weaknesses. This means that besides testing, a qualified monitoring and inspection as well as an effective corrective action mechanism is...
Omics data, derived from high-throughput technologies, is crucial in research, driving biomarker discovery, drug development, precision medicine, and systems biology. Its size and complexity require advanced computational techniques for analysis. Omics significantly contributes to our understanding of biological systems.
This project aims to construct models for Human Embryonic Kidney cells...
We present a novel deep neural network-based approach for the parameter estimation of the fractional Ornstein-Uhlenbeck (fOU) process. The accurate estimation of the parameters is of paramount importance in various scientific fields, including finance, physics, and engineering. We utilize a new, efficient, and general Python package for generating fractional Ornstein-Uhlenbeck processes in...
Machine learning (ML) algorithms, fitted on learning datasets, are often considered as black-box models, linking features (called inputs) to variables of interest (called outputs). Indeed, they provide predictions which turn out to be difficult to explain or interpret. To circumvent this issue, importance measures (also called sensitivity indices) are computed to provide a better...
Monitoring the stability of manufacturing processes in Industry 4.0 applications is crucial for ensuring product quality. However, the presence of anomalous observations can significantly impact the performance of control charting procedures, especially in complex and high-dimensional settings.
In this work, we propose a new robust control chart to address these challenges in monitoring...
The cumulative sum (CUSUM) control chart iterates sequential probability ratio tests (SPRT) until the first SPRT ends with rejecting the null hypothesis. Because the latter exhibits some deficiencies if the true mean is substantially different to the one used in the underlying likelihood ratio, Abbas (2023) proposes to substitute the SPRT by a repeated significance test (RST), cf. to Armitage...
Distribution-free control charts have received increasing attention in non-manufacturing fields because they can be used without any assumption on the distribution of the data to be monitored. This feature makes them particularly suitable for monitoring environmental phenomena often characterized by highly skewed distribution. In this work we compare, using two Monte Carlo studies, the...
Our work deals with air quality monitoring, by combining different types of data. More precisely, our aim is to produce (typically at the scale of a large given city), nitrogen dioxide or fine particulate matter concentration maps, at different moments. For this purpose, we have at our disposal, on the one hand, concentration maps produced by deterministic physicochemical models (such as...
The use of Process Analytical Technology (PAT) in dairy industries can enhance manufacturing processes efficiency and improve final product quality by facilitating monitoring and understanding of these processes. Currently, near-infrared spectroscopy (NIR) is one of the most widely used optical technologies in PAT, thanks to its ability to fingerprint materials and simultaneously analyze...
Financial fraud detection is a classification problem where each operation have a different misclassification cost depending on its amount. Thus, it fall within the scope of instance-dependent cost-sensitive classification problems. When modeling the problem with a parametric model, as a logistic regression, using a loss function incorporating the costs has proven to result in a more effective...
Machine learning (ML) algorithms, in credit scoring, are employed to distinguish between borrowers classified as class zero, including borrowers who will fully pay back the loan, and class one, borrowers who will default on their loan. However, in doing so, these algorithms are complex and often introduce discrimination by differentiating between individuals who share a protected attribute...
The advancement in data acquisition technologies has made possible the collection of quality characteristics that are apt to be modeled as functional data or profiles, as well as of collateral process variables, known as covariates, that are possibly influencing the latter and can be in the form of scalar or functional data themselves. In this setting, the functional regression control chart...
Some years ago the largest bank in our region came to the university and offered project and master thesis on bank related problems and huge data sets. This was very well received by students and it became an arena for learning and job-related activity. The students got practice in working with imbalanced data, data pre-processing, longitudinal data, feature creation/selection and...
Robust multivariate control charts are statistical tools used to monitor and control multiple correlated process variables simultaneously. Multivariate control charts are designed to detect and signal when the joint distribution of the process variables deviates from in-control levels, indicating a potential out-of-control case. The main goal of robust multivariate control charts is to provide...
Hotelling’s $T^2$ control chart is probably the most widely used tool in detecting outliers in a multivariate normal distribution setting. Within its classical scheme, the unknown process parameters (i.e., mean vector and variance-covariance matrix) are estimated via a phase I (calibration) stage, before online testing can be initiated in phase II. In this work we develop the self-starting...
Quality testing in the food industry is usually performed by manual sampling and at/offline laboratory analysis, which is labor intensive, time consuming, and may suffer from sampling bias. For many quality attributes such as fat, water and protein, in-line near-infrared spectroscopy (NIRS) is an alternative to grab sampling and which provides richer information about the process.
In this...
Causal inference based on Directed Acyclic Graphs (DAGs) is an increasingly popular framework for helping researchers design statistical models for estimating causal effects. A causal DAG is a graph consisting of nodes and directed paths (arrows). The nodes represent variables one can measure, and the arrows indicate how the variables are causally connected. The word acyclic means there can be...
Machine Learning is now part of many university curriculums and industrial training programs. However, the examples used are often not relevant or realistic for process engineers in manufacturing.
In this work, we will share a new industrial batch dataset and make it openly available to other practitioners. We will show how batch processes can be challenging to analyze when having sources...
We address the problem of estimating the infection rate of an epidemic from observed counts of the number of susceptible, infected and recovered individuals. In our setup, a classical SIR (susceptible/infected/recovered) process spreads on a two-layer random network, where the first layer consists of small complete graphs representing the households, while the second layer models the contacts...
Fold-over designs often have attractive properties. Among these is that the effects can be divided into two orthogonal subspaces. In this talk, we introduce a new method for analyzing fold-over designs called “the decoupling method” that exploits this trait. The idea is to create two new responses, where each of them is only affected by effects in one of the orthogonal subspaces. Thereby the...
Over the last 43 years I have been privileged to work across the UK and overseas as an academic, industrial statistician, quality leader, quality executive, management consultant and external examiner and advisor to various UK Universities.
In that time, I have focussed on systemic improvement of all end-to-end processes in research and development, new product development, manufacturing,...
The partitioning of the data into clusters, carried out by the researcher in accordance with a certain criterion, is a necessary step in the study of a particular phenomenon. Subsequent research should confirm or refute the appropriateness of such a division, and in a positive case, evaluate the discriminating power of the criterion (or, in other words, the influencing power of the factor...
In this talk, the problem of selecting a set of design points for universal kriging,
which is a widely used technique for spatial data analysis, is further
investigated. The goal is to select the design points in order to make simultaneous
predictions of the random variable of interest at a finite number of
unsampled locations with maximum precision. Specifically, a correlated...
In the framework of emulation of numerical simulators with Gaussian process (GP) regression [1], we proposed in this work a new algorithm for the estimation of GP covariance parameters, referred to as GP hyperparameters. The objective is twofold: to ensure a GP as predictive as possible w.r.t. to the output of interest, but also with reliable prediction intervals, i.e. representative of its...
The simultaneous optimization of multiple objectives (or responses) has been a popular research line because processes and products are, in nature, multidimensional. Thus, it is not surprising that the variety and quantity of responses modelling techniques, optimization algorithms, and optimization methods or criteria put forward in the RSM literature for solving multiresponse problems are...
Monitoring COVID-19 infection cases has been a singular focus of many policy makers and communities. However, direct monitoring through testing has become more onerous for a number of reasons, such as costs, delays, and personal choices. Wastewater-based epidemiology (WBE) has emerged as a viable tool for monitoring disease prevalence and dynamics to supplement direct monitoring. In this talk,...
Thanks to wearable technology, it is increasingly common to obtain successive measurements of a variable that changes over time. A key challenge in various fields is understanding the relationship between a time-dependent variable and a scalar response. In this context, we focus on active lenses equipped with electrochromic glass, currently in development. These lenses allow users to adjust...
The emergence of Industry 4.0 has led to a data-rich environment, where most companies accumulate a vast volume of historical data from daily production usually involving some unplanned excitations. The problem is that these data generally exhibit high collinearity and rank deficiency, whereas data-driven models used for process optimization especially perform well in the event of independent...
To demonstrate reliability at consecutive timepoints, a sample at each current timepoint must prove that at least 100$p$% of the devices of a population function until the next timepoint with probability of at least $1-\omega$.
For testing that reliability, we develop a failure time model which is motivated by a Bayesian rolling window approach on the mean time to failure. Based on this...
The active session will explore what topics we find difficult to teach. Common examples include: what are degrees of freedom; when should we divide by n and when by n-1? But moving on from these classics, we want to delve deeper into the things that trip us up when performing in front of an audience of students.
The session will commence with a short introduction and then settle into small...
As part of the Dutch national PrimaVera project (www.primavera-project.com), an extensive case study with a leading high-tech company on predicting and monitoring failure rates of components is being carried out. Following common practice from reliability engineers, the engineers of the high-tech company frequently use the Crow-AMSAA model for age-dependent reliability problems. There are,...
Due to the LED industry's rapid growth and the ease of manufacturing LED lights, the LED market is highly competitive, making good price-quality ratio and being first-to-market crucial for manufacturers. To that end, accurate and fast lifetime testing is one of the key aspects for LED manufacturers. Lifetime testing of LED lighting typically follows experimental and statistical techniques...
Kernel methods are widely used in nonparametric statistics and machine learning. In this talk kernel mean embeddings of distributions will be used for the purpose of uncertainty quantification. The main idea of this framework is to embed distributions in a reproducing kernel Hilbert space, where the Hilbertian structure allows us to compare and manipulate the represented probability measures....
It is safe to assume that classifying patients and generating multi-type distributions of service duration, instead of using a general distribution for all patients, would yield a better appointment schedule. One way to generate multi-type distributions is by using data mining. CART, for example, will generate the best tree, from a statistical perspective, nevertheless one could argue that...
The research presented showcases a collaboration with a leading printer manufacturer to facilitate the remote monitoring of their industrial printers installed at customer sites. The objective was to create a statistical model capable of automatically identifying printers experiencing more issues than expected based on their current operating conditions. To minimize the need for extensive data...
The COVID-19 pandemic showed that our mortality models need to be reviewed to adequately model the variability between years.
Our presentation has the following objectives: (1) We determine the time series of mortality changes in the European Union, United States, United Kingdom, Australia and Japan. Based on these time series, we estimate proximity measures between each pair of countries in...
In the past, screening (which process parameters are impactful) and optimisation (optimise the response variable or the critical quality attribute, CQA) were 2 distinct phases performed by 2 designs of experiments (DoE). Then, the definitive screening designs (DSDs) published approximately 10 years ago attracted a lot of attention from both statisticians and non-statisticians, espcially in the...
Phase-I monitoring plays a vital role as it helps to analyse the process stability retrospectively using a set of available historical samples and obtaining a benchmark reference sample to facilitate Phase-II monitoring. Since, at the very beginning process state and its stability is unknown, trying to assume a parametric model to the available data (which could be well-contaminated) is...
In vitro diagnostics Medical Devices (IVDs) market has had exponential growth in recent years, IVDs are a crucial part of today’s healthcare. Around the world, IVDs needs to be approved for specific regulations to market on different countries. To do so, manufacturers need to submit the Technical Documentation to ensure safety and performance for approval to U.S. Food and Drug Administration...
A gearbox is a critical component in a rotating machine; therefore, early detection of a failure or malfunction is indispensable to planning maintenance activities and reducing downtime costs.
The vibration signal is widely used to perform condition monitoring in a gearbox as it reflects the dynamic behavior in a non-invasive way. This work aimed to efficiently classify the severity level of...
Blended learning refers to the combination of online teaching with face-to-face teaching, using the advantages of both forms of teaching. We will discuss task generators developed with R and Python that support students in practising statistical tasks and can be easily extended in the future. The tools automatically generate tasks with new data, check the solutions and give students visual...
The univariate Bayesian approach to Statistical Process Control/Monitoring (BSPC/M) is known to provide control charts that are capable of monitoring efficiently the process parameters, in an online fashion from the start of the process i.e., they can be considered as self-starting since they are free of a phase I calibration. Furthermore, they provide a foundational framework that utilizes...
Hepatocellular carcinoma (HCC) poses significant challenges and risks globally. Liver metabolism assessment, reflected in Indocyanine Green Retention at 15 minutes (ICG15), is crucial for HCC patients. This study aimed to predict ICG15 levels using radiomics-based features and selected hematology test results. A hybrid predictive model combining clustering and stacking models is developed to...
I see final exams as a necessary evil and a poor assessment tool, and their preparation as a daunting, time consuming task, but to my students the final exam is of prime importance. They invest hours in solving exam questions from previous years, so I treat the exam questions as a very important teaching tool, despite a personal preference for projects, case studies and exercises using...
Flow cytometry is used in medicine to diagnose complex disorders using a multiparametric measurement (up to 20 parameters). This measurement is performed in a few seconds on tens of thousands of cells from a blood sample. However, clustering and analysis of this data is still done manually, which can impede the quality of diagnostic discrimination between "disease" and "non-disease" patients....
Analysis of dynamical systems often entails considering lagged states in a system which can be identified by heuristics or brute-force for small systems, however for larger and complex plantwide systems these approaches become infeasible. We present the Python package, dPCA, for performing dynamic principal component analysis as described by Vanhatalo et al.. Autocorrelation and partial...
The aim of introductory mathematics courses at university level is to provide students with the necessary tools for their studies. In terms of competence levels, the contents are still basic: the students should know and understand the underlying concepts, but mainly should be able to apply the relevant methods correctly in typical situations (even if they have not fully understood the...
Modern data acquisition systems allow for collecting signals that can be suitably modelled as functions over a continuum (e.g., time or space) and are usually referred to as profiles or functional data. Statistical process monitoring applied to these data is accordingly known as profile monitoring. The aim of this research is to introduce a new profile monitoring strategy based on a...
As we are all too aware, organizations accumulate vast amounts of data from a variety of sources nearly continuously. Big data and data science advocates promise the moon and the stars as you harvest the potential of all these data. And now, AI threatens our jobs and perhaps our very existence. There is certainly a lot of hype. There’s no doubt that some savvy organizations are fueling their...
https://conferences.enbis.org/event/42/
https://conferences.enbis.org/event/43/
Linear assets such as roads, pipelines, and railways are crucial components of a society's infrastructure, and their proper maintenance is critical. These assets have defined beginnings and ends but exhibit specific characteristics with branching and heterogenous segmentation. Their sizes require condition monitoring to be performed using special measurement devices such as measurement cars or...
Statistical process control (SPC) methods are applied across businesses in the monitoring of key performance indicators. These indicators often take the form of multiple univariate time series, each of which measures the `health' of some aspect of the business. SPC methods monitor these time series by highlighting unusual variation, changes in the mean, or local trends. Initially developed in...
For the proper conservation of cultural heritage, it is necessary to monitor and control the microclimate conditions where artifacts are located. The European standard EN 15757:2010 establishes a methodology to analyze seasonal patterns and short-term relative humidity (RH) and temperature (T) fluctuations. This standard is designed to analyze data from a single data-logger. However, spaces...
For maintaining a competitive edge in the market, companies strive to improve the quality of their products while also minimizing downtime and maintenance costs. One of the ways to achieve this is through Digital Twins (DTs). DTs can serve as a powerful tool for implementing quality assurance (QA) processes. However, current DT-driven QA frameworks do not provide guidelines for addressing the...
The observed traffic at a particular point on a telecommunications network typically has a similar shape from day to day due to customer behaviours, and so it is natural to adopt a functional data paradigm to describe its structure. However in some instances one can observe a deviation from this typical functional form of the data. Such deviations, which we call anomalies, are potentially of...
This work compares two multivariate methods for the classification of tenders (auctions). Outcomes show that both are appropriate and yield good results when the variables are processed as (i) categorical data with Multiple Correspondence Analysis (MCA) or (ii) continuous variables by means of Principal Component Analysis (PCA). The Cronbach alpha coefficient determines a reasonable...
Errors-in-Variables is a statistical concept to model errors in the input variables, which can be caused for example by noise. It is well-known in statistics that not accounting for such errors can cause a bias in the model. However, most existing deep learning approaches have so far not taken Errors-in-Variables into account, which might be due to the increased numerical burden or the...
The methods of measurement error analysis have long since been widely known, used and carefully described, for example, in the Reference Manual of Measurement System Analysis (MSA). Another liked by many people source of practical advices is the book “EMP III: Evaluating the Measurement Process and Using Imperfect Data” by D. Wheeler. We scrutinized these information sources and came to...
Typically, machine learning (ML) and artificial intelligence (AI) applications tend to focus on examples that are not relevant to process engineers.
In this talk, industrial data science fundamentals will be explained and linked with commonly-known examples in process engineering, followed by two common industrial applications using state-of-art ML techniques.
First, will discuss what...
Improving the quality of our maps, by the early detection of errors that impact end-user experience is key to providing the best map in the market. This talk showcases how statistical learning helps improve the detection of incorrect features in the map, and obtain quality indicators to guide the map editing process. It focuses on an application of a machine learning model for spatial data,...
The pandemic of SARS-CoV-2 virus and COVID-19 disease, still affecting the population worldwide, has demonstrated the need of more accurate methodologies for assessing, monitoring, and controlling an outbreak of such devastating proportions.
Authoritative attempts have been made in traditional fields of medicine (epidemiology, virology, infectiology) to address these shortcomings, mainly by...
Long-term unemployment is a serious social problem with sustainable repercussions on society. This issue can be tackled by profiling jobseekers. Thus, the objective of this study is to create a profiling tool for jobseekers in Senegal. In other words, our study is to profile or identify jobseekers who have a higher risk of being unemployed for at least 12 months. Data from the National...
In the last years the development of new technologies aligned with the acquisition and processing of data has led to an increase in academic content related to the understanding and processing of data in the majority of areas of knowledge. Moreover, assessing different training courses in Higher Education, it has observed that the contents related to processing data are more focused on the...
It is common to use model performance measures, such as AIC and BIC, to evaluate how well the model fits the data. This work illustrates that we need to go beyond these measures to assess a model's capability to represent the data. There are several ways to achieve that. Here we focus on a graphical approach using the probability integral transform (PIT) histogram. We present a situation in...