The increasing digitalization of manufacturing processes is transforming the relationship between raw material suppliers and customers. Rather than defining rigid specifications that often lead to unnecessary rejection of raw material lots, Industry 4.0 opens the door to more collaborative and knowledge-driven strategies. In this context, multivariate raw material specifications should not be...
Throughout my years of research and university teaching, as well as advising master's and doctoral theses in applied fields such as economics, management, biology, geology, and agriculture, I have noticed that students and researchers often face difficulties in selecting appropriate statistical methods to validate their hypotheses. They may either choose an inappropriate method or fail to...
Recent advances in additive manufacturing enable the fabrication of complex parts with intricate geometries and spatially-varying material composition. Data fusion integrates point cloud data with chromatic attributes, yielding 4D point clouds, a rich representation that jointly encodes shape and material information. We introduce a registration-free framework for jointly monitoring shape and...
Statistical Quality Control (SQC) plays a critical role in modern manufacturing processes by enabling early detection of process deviations, quantification of variability, and consistent delivery of products that meet customer specifications. To keep up with evolving customer demands and market dynamics, manufacturing networks grow in scale and complexity. Classic SQC workflows struggle to...
Industrial systems generate large volumes of operational data, enabling predictive maintenance strategies to reduce unplanned downtime and costs. Over the past decade, machine learning (ML) models have been widely used for predicting equipment degradation. However, their effectiveness is constrained by the scarcity of high-quality labels, as industrial datasets remain largely unlabelled....
Random effects models are widely used in interlaboratory comparisons to estimate between-laboratory variability τ and assess degrees of equivalence of laboratories [1]. In this context, decisions are often based on 95% credible intervals [2], while Bayesian hypothesis testing provides an alternative probabilistic framework based on Bayes factors for assessing laboratory effects [3].
This work...
Classification performance is typically assessed empirically, yet its dependence on intrinsic data characteristics is not fully understood. In this study, we examine classification difficulty through two complementary dimensions: local class ambiguity measured in terms of instance hardness and global class separability captured by Silhouette score.
We generated synthetic datasets based on the...
In studies of opinion spread, peer pressure is often modeled through interactions of more than two individuals (higher-order interactions). We introduce a two-layer random hypergraph model, in which households and workplaces form the layers and hyperedges represent individual households and workplaces. Within this structure, individuals may react when their opinion is in the minority within...
This paper addresses effect screening from observational data where sampling is constrained by cost, time, or process limitations, with a main focus on manufacturing applications. We propose a novel active learning strategy that introduces principles from optimal experimental design (A- and D-optimality) and combines it with an optimization for multicollinearity using Variance Inflation...
Uncertainty quantification for complex physical systems often relies on computationally expensive numerical simulators. When execution times limit the number of feasible runs, surrogate modeling becomes essential for tasks such as sensitivity analysis, design optimization, and safety assessment. Gaussian process regression (GPR) is a leading
surrogate due to its uncertainty...
State-space models have become core tools in industry as the basis of digital twin technology, enabling online system state monitoring. Advanced Bayesian methods, such as Particle Markov Chain Monte Carlo (PMCMC), may be used for state and parameter inference in non-linear state-space models. The approach combines particle filtering to approximate the hidden state posterior distribution and a...
The real-time estimation of key quality variables remains a critical challenge in industrial environments due to the limited availability of direct measurements and the presence of complex, dynamic process behavior. This work proposes an adaptive soft-sensing framework for the estimation of cement quality in clinker production, where quality indicators are traditionally measured through costly...
Adaptive randomization in clinical trials often requires balancing competing goals: improving patient benefit, preserving statistical efficiency, and maintaining adequate randomness in treatment assignment. We propose the Adaptive Design Strategy (ADeS), a flexible group-sequential framework that unifies covariate-adaptive (CA), response-adaptive (RA), covariate-adjusted response-adaptive...
Ensemble methods such as Random Forests achieve strong predictive accuracy but at the cost of interpretability. Explainable Ensemble Trees (E2Tree) address this trade-off by constructing a single interpretable tree that approximates the co-occurrence structure induced by the ensemble. The quality of this approximation matters: when interpretability is invoked for regulatory or scientific...
Profile monitoring is a branch of Statistical Process Monitoring (SPM) that uses statistical methods to identify irregularities in process data. The data is characterized by a profile, or response curve, observed over a given time interval. Profile monitoring consists of two main phases: the first involves defining an in-control (IC) profile, and the second focuses on comparing subsequent...
Measurement System Analysis is a fundamental element in quality improvement initiatives in manufacturing and is commonly conducted by Gauge Repeatability and Reproducibility (Gauge R&R) studies. However, most of the widely used software tools for Gauge R&R analysis confine the analyst to a restricted design in which the analysis is performed by ANOVA. In such analyses, generally part and...
The digitalisation of tourism facilities means that these facilities have access to a wide range of IoT-like data sources. This article presents a conceptual approach that describes how such heterogeneous data streams can be used to systematically improve service offerings, resource planning and operational decisions through targeted short-term, medium-term and long-term forecasts. The...
Innovative Small Modular Reactors (i-SMRs) introduce fundamentally different design characteristics compared to conventional large-scale nuclear power plants, including integral reactor configurations, compact steel containment, multi-module deployment, extended fuel cycles, and flexible load-following capabilities. In particular, i-SMRs are inherently designed for multi-module operation,...
The advent of artificial intelligence (AI) technologies has significantly changed many domains, including applied statistics. This talk explores the evolving role of applied statistics in the AI era, drawing from our experiences in engineering statistics, focusing on how statistics can be employed to study the properties of AI models and enhance AI systems, especially in AI assurance.
We...
Inter-rater reliability, the quantification of agreement between individuals who assign scores to the same phenomenon, is an important consideration in all fields for which data drives decision-making (e.g., business and industry, healthcare, social and behavioural sciences, education, etc.). Traditionally, the raters scoring the phenomenon have been human beings. With the proliferation of AI,...
Uncertainty quantification is essential for assessing the reliability of MRI reconstructions. The Network-based Tikhonov reconstruction method was demonstrated to produce excellent results in accelerated multicoils settings. However, in challenging low-field settings, where noise is high and scanners are single coil, this scheme requires a careful evaluation of its reconstruction...
The classical approach to DoE, as shaped by Fisher, now reaches back almost 100 years. Bayesian optimisation, or "active learning", is now often presented as a more modern alternative. As an iterative method, it selects each new experimental run based on the information currently available. This means that randomisation, which is one of the central aspects in "classical" DoE, is inherently...
Bayesian optimization (BO) and classical design of experiments (DOE) are rarely taught together. DOE courses cover factorial designs, response surface methodology, and optimality criteria like D and I-optimality. BO courses cover Gaussian processes and acquisition functions like expected improvement. The two communities use different notation, different software, and publish in different...
Early sequential feature selection is crucial in manufacturing environments with heterogeneous sensors and tightly coupled process variables. In shop-floor applications, predicting End-of-Line test failures using data collected as early as possible is vital to enable timely corrective actions. However, classical feature selection and Explainable Artificial Intelligence methods (e.g., SHAP)...
This work investigates how accounting for multiple sources of input measurement uncertainty affects the estimation of global risk in manufacturing processes. For example, conformity assessment for Non-Automatic Weighing Instruments involves several quantities of interest, one of which is the producer risk—the probability of incorrectly rejecting a conforming instrument—which is strongly...
The GAISE recommendations (2025) are shaping best practices in teaching statistics, shifting the focus toward statistical thinking as a holistic investigative process. Despite related academic advancements, a persistent skills gap remains in preparing graduates for the complex unstructured problem-solving requirements of modern industry. This session proposes a framework where teaching...
In many applied settings involving binary variables, practitioners typically rely on pairwise measures of dependence, such as correlations or agreement indices. However, when more than two variables are involved, these quantities do not uniquely determine the joint distribution. Instead, they define a family of admissible distributions that share the same pairwise structure while potentially...
The presence of careless respondents represents a well-known threat to the quality of survey data. Respondents who provide inattentive or random answers can distort statistical analyses, reduce measurement reliability, and bias substantive conclusions. A variety of indicators have been proposed in the literature to detect such respondents, including response pattern measures such as longstring...
Skillful predictions in climate and environmental science are essential for planning operations, assessing risks, guiding adaptation strategies, and building resilience. This talk synthesizes key concepts and methods for enhancing predictive performance in these domains, with a particular emphasis on predictive uncertainty estimation, extreme event prediction, and the role of big datasets and...
Orthogonal minimally aliased response surface or OMARS designs permit the study of quantitative factors at three levels using an economical number of runs. In these designs, the linear effects of the factors are neither aliased with each other nor with the quadratic effects and the two-factor interactions. Complete catalogs of OMARS designs with up to five factors have been obtained using an...
Over the past few years, we have been interested in answering the question "Can ChatGPT Think Like a Statistician?" The answer is "Yes, but...". Our data analysis prompting experience has prompted us (pun intended!) to create a framework for obtaining appropriate data analyses from artificial intelligence called CROSSVALI (Context, Refined Questions, Options, Specificity, Scrutiny, Verify &...
Optimisation and control of industrial fed-batch bioprocesses remains a complex and resource intensive task. Process development from laboratory to manufacturing scale requires extensive experimental trials to characterise system behaviour and identify optimal conditions, incurring significant costs. While advanced process control (APC) technologies have been widely applied to industry, their...
A stability-aware, value-driven framework for uplift policy selection is presented, applied to a telecommunications churn-retention dataset. The central argument is that reliable deployment of an uplift model requires more than optimizing a causal ranking metric: the targeted customer set must remain consistent across repeated training runs, and the selected policy must be economically sound...
In many industrial settings, data is collected over time causing a serial dependence among the observations. Many chemometric methods, such as Principal Component Analysis (PCA), function under the assumption of time independence. This assumption is violated for most industrial data, creating challenges for both descriptive modelling as well as fault detection.
Dynamic PCA (DPCA), which...
Machine vision systems are important in Industry 4.0 as they allow fast automated inspection and quality control. Traceable metrology for machine vision systems is critical for the digital transformation of the Industry 4.0 objectives defined by the EU Green Deal. Nevertheless, these systems currently lack well-defined uncertainty frameworks and calibration techniques. For contactless 3D...
Short-term congestion risk assessment in transmission grids is still largely based on deterministic load flow computations from point forecasts. We investigate multivariate probabilistic net load forecasting across substations as a way to better quantify the probability of congestion events, which arise from correlated forecast errors across substations. This is particularly challenging with...
In industrial machine vision, camera positioning is traditionally a manual, iterative, trial-and-error process. Even if sufficient accuracy can be reached, this leads to prolonged downtime during initial installation and maintenance, especially for inspection tasks where the camera must be positioned at a precise location, orientation, and working distance. In addition, the operator-dependent...
At ENBIS-24 in Leuven, we brought an emerging challenge to the ENBIS Active Session: a large industrial laundry operator managing millions of textile items across multiple sites wanted to understand and extend textile lifespans as part of a circular economy strategy using existing operational data on textile discarding events. The key advice we received — to not trust the data from the outset...
Active learning is commonly framed in the artificial intelligence community as a strategy to reduce labeling costs by selectively querying informative samples for model improvement, yet its conceptual roots are closely aligned with classical ideas from statistical design of experiments, particularly sequential design and design augmentation. In this work, we position active learning as a...
Statistical models and machine learning algorithms are often deployed in populations that differ from those on which they were trained, a challenge that is particularly acute in digital health. We discuss domain generalization and adaptation for a large-scale database from multiple countries with intensive care unit (ICU) data. We introduce Distributionally Robust Invariance Learning as an...
Join us for one of the most dynamic and interactive sessions of the ENBIS conference — ENBIS-Live: Open Problem Solving in Action!
This is no ordinary talk. In this fast-paced, high-energy session, statisticians and data scientists roll up their sleeves to tackle real-world open problems — live and on the spot. Think of it as a collaborative brain trust powered by the collective wisdom and...
Efforts to mitigate public health crises have been complicated by unreported cases and the ever-changing trends of those monitored health events across geographic regions and socioeconomic cultures. To resolve both challenges, we propose a Bayesian spatiotemporal susceptible-exposed-infected-recovered-removed (BayST-SEIRD) framework that builds the hidden effects of neighboring communities,...
This work focuses on the estimation of multivariate generalized gamma convolutions (MGGC), a class of distributions widely used in risk modeling for which no closed-form density is available. In practice, only their characteristic functions are known, which makes standard estimation methods such as maximum likelihood inapplicable. To overcome this difficulty, we adopt an RKHS-based approach...
This work develops a statistical framework to estimate stabilization time for core tablet batches produced by continuous roller compaction. Using simulated data from multiple production runs of a representative product, the project focuses on characterizing concentration stability during start-up and identifying an optimal start-up duration to guarantee product quality and consistency. The...
We present a unified perspective on explicit functional ANOVA as a principled decomposition framework for black-box models, bridging explainability, sensitivity analysis, and algorithmic understanding. We derive an exact closed-form functional ANOVA for categorical inputs, valid under arbitrary dependence structures and even on sparse or non-rectangular supports, thereby removing a major...
In statistical and machine learning, efficient data acquisition is pivotal to model performance, particularly when labeled data are costly or time-intensive to obtain. This motivates active learning, in which the learning algorithm selectively queries maximally informative data points to accelerate training and improve predictive efficiency. While many active learning strategies consider query...
Effective predictive modeling in large-scale manufacturing is hampered by the isolated and limited data from individual organizations, collected from costly experiments and various inspections. Collaboration across organizations can handle these limitations, but it faces two main challenges: privacy concerns over organizations and heterogeneous features from varied sensing and inspection...
Antibiotic resistance is one of the greatest health threats facing the world. The University of Oxford are pioneering a novel approach by targeting the RecBCD enzyme — a key regulator of DNA repair in bacteria. Central to this was the development of a robust, high-throughput biochemical assay, which meant navigating a complex 11-factor space.
Initially limited by manual pipetting and...
Sir Ronald Fisher was one of the giants in both the fields of statistics and genetics. His seminal work was done at Rothamsted Experimental Station outside of London.
Fisher was both a statistician and a scientist. He was well trained in mathematics, but in the final analysis, he was a scientist who understood how to perfom proper data analysis. Fisher strongly rejected the...
The rapid evolution of Artificial Intelligence is transforming how data is generated, analyzed, and leveraged across business environments, industrial systems, and organizational processes. Moving beyond the traditional Industry 4.0 paradigm centered on automation, new AI technologies are opening the way toward increasingly autonomous data-driven systems capable of supporting complex...
Quality notifications (QNs) often include detailed free-text descriptions that are difficult to analyze at scale. As a result, many lower-priority notifications are resolved operationally without a structured root cause analysis, even though they may represent recurring problems and non-negligible quality costs. This work presents a practical methodology for using large language models (LLMs)...
Statistical jump models have been recently introduced to detect persistent regimes by clustering temporal features while discouraging frequent regime changes. However, they rely on hard clustering and therefore do not account for uncertainty in state assignments.
In this work, we propose a fuzzy extension of the statistical jump model that incorporates uncertainty in cluster membership....
Gastronomic tourism has become an increasingly important component of contemporary tourism demand, particularly in Mediterranean countries such as Italy, where food culture and local traditions are deeply embedded in territorial identity. This study investigates the importance assigned to gastronomy during travel and identifies different profiles of gastronomic tourists in Italy. The analysis...
Vision-based systems in industrial applications involve a wide range of software, including fitting, association, and cloud-to-cloud registration. Software verification is required to guarantee the accuracy of estimated parameters. Verification typically relies on realistic datasets generated using ray casting to sample points on the predefined surface, followed by the addition of random...
This session offers a practical and thought-provoking exploration of how generative AI and large language models are transforming the daily work of statisticians, data scientists, and educators.
We start with a concise “kaleidoscope” of real examples illustrating what modern general-purpose AI tools can achieve in practice, with demonstrations that show how complex tasks can now be...
Low-cost sensors are a new tool for improving air quality maps, which are of major interest in the current era of high-resolution, urban-scale air quality monitoring. These sensors require calibration using reference analyzers. A variety of strategies can be employed, ranging from individual pointwise calibration models to network calibration models. Here, we propose using geographically...
In spatial statistics, optimal designs select sampling locations such that a specific optimality criterion - for instance, maximizing the precision of parameter estimation or prediction accuracy - is satisfied. This task is particularly challenging in high-dimensional design spaces. Frequently, space-filling designs are used as an alternative; however, these perform well only under certain...
Design of Experiments (DOE) is powerful but rarely intuitive. What is wrong with poking around in design space? Why not vary one factor at a time? The mathematics answers clearly, but the classroom often doesn't.
Physical experiments — where participants can see, touch, and interact with a real system — bring DOE concepts to life. They make abstract ideas like screening, response surface...
Medical claim expenses are inherently compositional, as fraud-relevant patterns often emerge from the relative allocation of costs across categories rather than from total expenditure alone. We propose a claim-level fraud screening framework based on compositional profiling, using the Aitchison distance to compare new claims with a historical reference distribution. Statistical significance is...
This study examines wage outcomes at the cohort level among STEM graduates in Italy, using data from the AlmaLaurea surveys covering 60 institutions over the period 2008-2023. The unit of analysis is a graduate cohort sharing the same university, degree level, and disciplinary category, observed one, three, or five years after completing their studies. Given the nested structure of the data...
Determination of the decision variables such as the inspection period, number of measurements, and sample size is crucial for planning an efficient degradation test. For widely used stochastic processes, the necessary and sufficient conditions for the explicit expression of optimal decision variables can be derived by minimizing the approximate variance of an estimator of interest under a...
The Zipf-PSS distribution is a Poisson Stopped-Sum with a Zipf distribution as secondary distribution. In this work, we consider two INAR(1) processes: The Zipf-PSS-INAR(1) innovations process, whose innovations follow a Zipf-PSS distribution, and the Zipf-PSS-INAR(1) marginal process, whose stationary marginal distribution is Zipf-PSS. Working with the marginal process is more complex...
In pharmaceutical manufacturing, process optimization and control are critical for ensuring consistent drug product quality and regulatory compliance. In real-world applications, certain study factors are treated as fixed and maintained constant to standardize production conditions. However, it is sometimes inevitable for other factors to exhibit variability, introducing uncertainties that can...
Transport logistics facilities such as less-than-truckload terminals require fast and robust tactical decisions under uncertainty, for example regarding task scheduling, resource allocation, and terminal configuration. Since detailed simulation experiments are computationally expensive, surrogate-assisted optimization methods provide an important basis for decision support.
This...
Quality control methods such as measurement systems analysis, control charts, capability studies, and design of experiments are central to modern manufacturing and increasingly used in service industries. However, many established software solutions (e.g., Minitab, JMP) are costly or require substantial technical expertise (e.g., R, Python). In this presentation, we introduce the Quality...
JMP continues to develop powerful capabilities for statisticians and data scientists in industry. In the session we will demonstrate capabilities in JMP and JMP Pro 19 for Bayesian Optimization of multiple responses, and a new Causal Inference platform that makes establishing causality from observational data easily accessible to non-statistician researchers. We will also give a preview of new...
Accelerated stability studies are a fundamental tool in the development of pharmaceutical and vaccine products, enabling the prediction of long-term stability from short-term experiments conducted under stressed conditions. In this context, kinetic models—such as the Šesták–Berggren formulation combined with Arrhenius-type temperature dependence—are widely used to describe degradation...
Regression is the workhorse of statistics, and is often faced with real data that contain outliers. When these are casewise outliers, that is, cases that are entirely wrong or belong to a different population, the issue can be remedied by existing casewise robust regression methods. It is another matter when cellwise outliers occur, that is, suspicious individual entries in the data matrix...
Linear profile monitoring assesses the stability of a process described by a linear relationship between a scalar response variable and multiple explanatory variables. When both the response and explanatory variables are functions, this translates into tracking the stability of the underlying functional linear model (FLM). However, unlike the scalar setting, where batches of data points are...
Bayesian optimization (BO) has become a cornerstone methodology for the data-efficient tuning of expensive black-box systems encountered throughout business and industrial practice, ranging from chemical process design and structural engineering to controller calibration and the configuration of large-scale machine learning pipelines. In most of these applications, the objective must be...
In rectifying sampling inspection, a lot is subjected to full inspection if the number of defects in a random sample exceeds a predefined acceptance criterion. Traditional models assume a constant probability of being defective p for all items within a lot. In contrast to this, we consider heterogeneous lots in which individual items can have different probabilities of being defective, to...
Although the CLIC-based model selection approach is widely used to identify spatial extreme models, the complexity of the associated statistical inference limits the reliability of this criterion. In addition, the strong spatial dependence in small or moderate regions may lead to substantial overlap among the spatial extremes models. This potential overlap increases the risk of model...
Model selection in screening experiments is challenging when data arise from split-plot designs, multi-day industrial studies, or other settings that introduce correlation between observations. Standard approaches, including stepwise selection, LASSO, and mixed-integer optimisation (MIO), assume independent errors and may show degraded performance when this assumption is violated, limiting...
A key challenge for mRNA vaccine and medicines development is represented by mRNA degradation under normal refrigerated storage condition (2-8°C). Product evolution over time is primarily driven by mRNA Integrity degradation, occurring mainly through chemical hydrolysis.
mRNA molecules are known to be particularly susceptible to hydrolysis under alkaline conditions, where degradation via...
Surrogate models provide fast approximations of computationally expensive simulations (or experiments) and are trained using a limited set of observations generated by these codes. In the multi-fidelity framework, we assume the availability of two computer models with different levels of cost and accuracy. The high-fidelity model $z_H$ provides the most accurate predictions but is also the...
Kernel-based methods provide a principled alternative to classical numerical solvers for nonlinear partial differential equations (PDEs), especially in mesh-free settings with built-in regularization and uncertainty quantification. Traditional discretization techniques such as finite differences or finite elements can become computationally demanding for nonlinear or multiscale problems. In...
Understanding extreme environmental phenomena is crucial for risk management in a changing climate. In particular, dry spells, defined as consecutive days without precipitation, play a key role in drought dynamics, with direct impacts on agriculture, water resources, and insurance systems. Dry spell lengths are inherently discrete and often exhibit complex dependence structures across...
Maps play a key role in many applications to facilitate decision-making for risk analysis, such as in assessing natural hazards, soil pollution, water quality, and so on. Performing global sensitivity analysis in a spatial context can benefit from multiple approaches adapted to multivariate outputs, namely: 1. the combination of variance-based sensitivity indices and functional principal...
This talk presents a Six Sigma project developed in a ready-to-eat food company aimed at optimizing a meat roasting process while balancing food safety, product appearance, juiciness, and production yield.
Following the DMAIC methodology, historical data analysis, Measurement System Analysis (Gage R&R), and Root Cause Analysis tools were initially applied to understand process variability and...
Berry greenhouses in the Souss-Massa region of Morocco sustain high-value exports of strawberries, blueberries, and raspberries, but their yield and fruit quality are highly sensitive to microclimate deviations. Dense IoT sensor networks generate high-dimensional, autocorrelated, and non-stationary data that violate classical Shewhart, CUSUM, and MEWMA assumptions.
We propose a hybrid...
This study develops a statistical process control framework for monitoring drought as a stochastic process characterized by frequency, duration, and severity. The analysis focuses on the Emilia-Romagna region (Italy) and relies on a spatially weighted SPEI-12 index, ensuring a robust and representative aggregation of regional climatic conditions. The main contribution from a statistical...
Stability studies are commonly conducted to evaluate how product characteristics evolve during storage. In many industrial applications, several quality attributes are measured repeatedly over time for multiple products, generating multivariate longitudinal datasets. A key objective in these studies is to compare products in terms of their stability and identify those exhibiting more stable...
The landscape of industrial data mining is rapidly evolving, opening new opportunities to transform production systems from merely instrumented to truly intelligent. This talk explores innovative AI-driven solutions that go beyond traditional data-driven approaches, encompassing generative AI — including GANs and diffusion models — and transfer learning, to support a new generation of smart...
In many applications of interest, multivariate time series data feature trend behaviors. Yet, trends that may affect multivariate stochastic processes are still largely dealt with in a univariate manner. Calling on differencing and co-integration concepts for univariate time series, we introduce stochastic trends for multivariate data, with particular focus on trends that are constrained by an...
In this work, we consider monitoring continuous data in the unit interval and investigate the statistical design and performance of a two-sided Shewhart chart when the process parameters are unknown. The most common distribution assumed for such data is the Beta distribution. Although control charts based on the Beta distribution have been studied by several authors, the case of estimated...
A major challenge in Additive Manufacturing (AM) is the development of reliable in-situ and online quality monitoring methodologies. Visible and infrared cameras can provide near real-time image data that can be exploited for anomaly detection through Statistical Process Control and Monitoring (SPC/M) methods.
This work investigates image-based monitoring methods for Selective Laser Melting...
Minitab DoE by Effex platform has been expanded to handle random blocks and complex split-plot structures with up to five levels of difficult-to-change factors. In this talk, we will first explain how random factors are considered when generating an optimal design. Then, we will explain how to assess the trade-off between run size and the quality of competing optimal design candidates. Next,...
Personalized medicine aims to improve treatment decisions using patient-specific covariates. In diseases with heterogeneous treatment responses, estimating treatment-covariate interactions is essential for identifying effective therapies across patient subgroups. Multi-arm clinical trials provide an efficient framework for evaluating several treatments simultaneously; however, the design...
The increasing availability and complexity of data are transforming decision-making processes across science, industry, and engineering. Modern datasets are often high-dimensional, heterogeneous, and structured over space and time, and are collected on domains with complex geometries, including environmental domains, biological structures, and engineering systems. In many applications, the...
Plackett-Burman designs are experimental designs presented in 1946 by Robin L. Plackett and J. P. Burman while working in the British Ministry of Supply.Their goal was to find experimental designs for investigating the dependence of some measured quantity on a number of independent variables (factors), each taking L levels, in such a way as to minimize the variance of the estimates of these...
BO methods have recently been advocated as a newly accessible, straightforward, hands-off alternative to design of experiments (DOE) to efficiently optimize the levels of the factors in physical experiments. Physical experiments have random variation between replicated runs.
In the typical hands-on DOE approach to process optimization, an initial choice of factors and their ranges is made...
The foldover technique for screening designs is well known to guarantee zero aliasing of the main effect estimators with respect to two factor interactions and quadratic effects. It is a key feature of many popular response surface designs, including central composite designs, definitive screening designs, and most orthogonal, minimally-aliased response surface designs. In this paper, we show...
Industrial manufacturing processes are complex, often including many known and unknown factors like various raw materials and their properties, multiple production lines, hundreds of process variables and different product quality parameters. Using data analysis to understand, optimize and monitor and control several parts of these processes is a common practice with big proven value. However,...
Metamodeling is a fundamental approach for approximating computationally expensive numerical simulations in engineering applications, such as uncertainty propagation and sensitivity analysis. In this work, we address the simultaneous prediction of multiple high-dimensional physical fields governed by linear equality constraints, a setting that arises naturally in problems involving...
Downhole oil and gas tools are used to conduct measurement and acquire samples for oil reserves estimation. Current tools are exhibiting failures in the field. We propose the use of predictive maintenance (PdM) to avoid or mitigate the risk of failures and decrease the total cost of ownership of coring tools. Each tool will go through a surface screening test were data is collected and...
Understanding how technical product characteristics translate into consumer perception remains a key challenge in product development. This study presents a case study in which preference mapping techniques are used to explore the relationship between laboratory-based technical measurements and consumer evaluations.
A set of products was characterized through a series of objective...
As referred in last year’s abstract by the same authors, “the landscape of the pharmaceutical industry is evolving”. Such a process continues, with more efforts being devoted to developing data-efficient methodologies for exploring operational spaces of increasing dimensionality that may be composed of continuous, categorical, and mixture factors. From what was (and still is) a science-based...
Quantile-oriented sensitivity analysis allows to quantify uncertainty around quantiles, at different levels, while sensitivity analysis is often focused on deviation around mean (as it involves variances). We will consider qunatile-oriented sensitivity indices (QOSA) and quantile-oriented Shapley effects (QOSE). We will present their relevance on some analytical examples, show how to estimate...
Incremental design for computer experiments traditionally relies on space-filling or uniformity criteria, but these become impractical in high dimensions. The greedy minimisation of the $L_s$-mean quantisation error (or distortion) offers a valuable alternative, though it is also computationally intractable for large $d$.
This talk focuses on random designs composed of i.i.d. points sampled...
In many applied contexts, organizations need to evaluate and compare sets of explanatory variables in terms of their association with a Key Performance Indicator (KPI) of interest. This problem frequently arises in industrial and marketing applications, where companies seek to identify which groups of product characteristics or drivers are most strongly related to outcomes such as customer...
In this talk we present a real-time change detection method for monitoring large, dynamic networks with community structure. We model the propensity for communication within and between communities to incorporate the structure of the underlying network. Our focus on communities makes our method scalable to large-scale networks and we use a window-based approach to accommodate network dynamics...
Reliability testing is one of the last and most expensive steps in the development process of a complex technical repairable system. It ensures a certain level of reliability prior to market release and provides the basis for estimation of expected maintenance and warranty costs. Depending on system complexity and diversity of available configurations, current industrial practice is to select...
Machine learning (ML) models are used to provide predictive characterization of response functions based on observed or simulated data. Insight on the nature of the approximation to the underlying response function is essential for the model output to be trusted and used by decision makers, a key part of exlainable AI. When a response function is well-behaved, methods that identify marginal...
Introduction.
Forest fires are complex phenomena causing significant damage to the environment and human health, habitat destruction, soil erosion, greenhouse gas emissions, and biodiversity loss. They are increasing globally, with extreme events becoming more frequent and destructive. Understanding their root causes and influencing factors is crucial.
Methods.
This work focuses on...
In industrial split-plot experiments, traditional algebraic block generators often force complete aliasing between critical sub-plot interactions and whole-plot blocks, trapping vulnerable effects within the high-variance whole-plot error stratum and severely reducing experimental power. While algorithmic D-optimal designs can mitigate this loss through partial confounding, generating such...
High-dimensional data generated by modern multi-sensor systems call for statistical methods able to capture complex dependency structures. Graphical models are a popular tool for this purpose, as they represent conditional relationships between variables through a network. However, classical estimation techniques can be severely affected by the presence of outliers. Traditional contamination...
A case study illustrates the application of a structured approach and tools to identify a new hydrogel for human cartilage replacement. These materials have multiple properties of interest, so selecting a new material (hydrogel) is a multi-attribute decision-making problem. Ten hydrogels, most of which are new formulations, were evaluated based on three attributes. The weights assigned...
Additive manufacturing processes are increasingly characterized by high customization, small batch sizes, and limited availability of historical data, making traditional statistical process control approaches difficult to apply. This work proposes a self-starting monitoring framework for few-shot additive manufacturing environments, enabling effective process monitoring from the earliest...
Computational phenotyping uses data mining methods to extract clusters of clinical descriptors, known as phenotypes, from electronic health records (EHR). Tensor factorization methods are very effective in extracting meaningful patterns and have become popular in computational phenotyping. Nevertheless, these techniques mainly focus on regular tensors and are used in a fully unsupervised...
We are motivated by the field of air quality control, where one goal is to quantify the impact of uncertain inputs such as meteorological conditions and traffic parameters on pollutant dispersion maps. Sensitivity analysis is one answer, but the majority of sensitivity analysis methods are designed to deal with scalar or vector outputs and are badly suited to an output space of maps. To...
The value of information (VOI) is a decision sensitivity measure that quantifies the expected improvement in decision quality when uncertainty in selected inputs is removed. Unlike many other sensitivity measures, the VOI provides not only a relative ranking of factors but also an absolute metric of decision quality. Despite this, its use has been limited, particularly to decision problems...
Time-series classification faces recurring challenges, including high dimensionality, autocorrelation, and the difficulty of identifying features that capture essential dynamics across temporal scales and phase shifts. We address these issues through shapelet decomposition, a technique that extracts shape-based features from time series while preserving both temporal and frequency information....
A complex system is currently under validation as implemented in its initial instantiation. The technical bet regards more than doubling a key performance at parity of the other ones. The preliminary estimation has been performed by simulation in the concept’s exploration phase by risk reduction by Fault Tree Analysis. The current studies are devoted to allowing the estimation of the...
In this contribution, we focus on small-area compositional data. These data are defined as vectors whose elements are strictly positive and sum to one (e.g., proportions). Compositional data arise in various fields, including medicine, economics, psychology, and environmetrics. They are defined on the D-part simplex (S^D) and require complex techniques for proper analysis.
A traditional...
The popular zero-state average run length (ARL) is just the mean of the random run length, which is the core element of a control chart. However, more appropriate measures for evaluating the detection power make use of the conditional expected delay (CED), which is the mean of the detection delay for a given change point position $\tau = 1, 2, \ldots$ under the condition that no false alarm...
Since 2023, large language models (LLMs) have begun to reshape the landscape of time-series analysis. In this talk, we present our latest research, insights, and perspectives on using LLMs to model time-series data— both as a standalone modality and in combination with other spatial or contextual information/modality. We will explore three key questions:
(1) What are spatiotemporal LLMs?
(2)...
Complex physical systems are often modeled using high-fidelity simulation codes. However, their inherent complexity makes each simulation computationally expensive, which severely limits their direct use in tasks such as uncertainty quantification. To overcome this, a widely adopted solution is to approximate the simulator using a Gaussian Process Regression (GPR) surrogate model.
However,...
CQM is a consultancy company with over four decades of experience in industrial R&D projects. One of its long-standing customers has developed consumer products for many years and seeks to reduce the test effort and improve decision making in development projects for a certain class of products. In these development projects, different types of tests are performed on prototype designs, from A...
Recent advances in generative AI have enabled powerful language models for industrial applications. However, most solutions rely on cloud-based infrastructures or GPU-accelerated environments, which raise concerns regarding data privacy, latency, and operational cost—particularly in industrial settings dealing with sensitive internal documents.
In this study, we investigate the feasibility...
In modern industrial settings, advanced acquisition systems allow for the collection of data in the form of profiles, that is, functional relationships linking responses to explanatory variables. In this context, statistical process monitoring (SPM) aims to assess the stability of profiles over time in order to detect unexpected behavior. This talk focuses on SPM methods that model profiles as...
Travel time reliability (TTR) is an important issue in transportation systems, significantly influencing individual decision-making and aggregate travel demand. The inherent uncertainty in travel time is crucial for various applications, which requires the estimation of entire travel time distribution instead of merely the expected travel time. This work proposes a statistical framework for...
A pre-clinical trial for the treatment of cancer, with data analyzed by a traditional statistical method with random coefficient model, showed the significance of the treatment based on purified bacterial redox protein. A data analysis performed later, based on quantile regression, went beyond. It demonstrated the remarkable specificity of the effectiveness of experimental treatment.
Product formulation in the specialty chemicals industry requires balancing product quality, cost, and environmental impact. This work proposes a data-driven framework for sustainable formulation design based on latent-variable model inversion combined with multi-objective optimization. Partial Least Squares (PLS) models are built to relate raw-material properties and compositions to product...
Order picking systems are known as complex logistics systems, where uncertainties, e.g., caused by randomness of incoming orders or delay of supplies, are present. Investigating the relationship between input variables and key performance indicators (KPIs) in common types of order picking systems, described by reference models, is aimed. Input variables are for example system load, batch size...
Industrial statisticians are well acquainted with the temptation to simplify the analysis of process performance through summary statistics. One of the most prominent examples is process capability analysis, in which a process’s ability to produce items within specification limits is expressed as the ratio between the specification range and the natural variability of the process. The latter...
The truth is that some error, no matter how hard we try, simply can’t be modelled away. That irreducible error that stubbornly remains, no matter how time we have spent selecting predictors, or agonising over parameter tuning. Accepting that there will always be some randomness in statistics goes a long way to helping manage a technical team.
In this talk, Sophie will draw on her own...
When we began researching the impact of Lean Six Sigma in the textile and apparel industries, we set out to identify success factors, understand differences in implementation strategies, and explore how applications varied by business type. We expected implementation to be strongest in segments with demanding customer requirements (for example aerospace, automotive, and medical textiles),...
Structural Health Monitoring (SHM) of historical heritage is a crucial challenge for preserving humanity’s cultural assets. Increasingly, monuments are equipped with multi-sensor monitoring systems that continuously collect large volumes of data over extended periods. These data require the application of appropriate statistical methods to provide meaningful insights into the structural health...
This study develops a data-driven Markov chain framework to analyse tourist mobility patterns using empirical origin–destination data collected through surveys at a tourism information point. The dataset records both the municipality visited immediately prior to the survey and the subsequent intended destination, enabling the estimation of transition probability matrices that govern the...
Clinical surveillance of cancer patients is necessary to ensure early detection of recurrence aftercurative treatment and to monitor patient progression. Although clinical guidelines commonly recommend fixed surveillance schedules for all patients, static intervals between follow-up visits may not be compatible with individual disease progression, increasing the risk of delayed recurrence...
We investigate the territorial and structural drivers of tourism arrivals, with the aim of supporting more effective destination planning. Identifying the factors that attract tourists is essential for designing policies that balance development and sustainability across regions.
Our analysis applies a Multivariate Regression Tree (MRT), a flexible and interpretable method that allows to...
Farming is vital business. Agricultural experiments have long been carried out on crops including sugar, wheat, potatoes and grass. The second oldest grassland experiment in the UK has been in continuous action at Newcastle University’s Cockle Park farm in Northumberland since 1897.
Over the years data on grass (hay) yield, fertiliser treatments, soil structure, grass composition and the...
We consider a stochastic decision-making system with unknown parameters that need to be estimated to make appropriate decisions. We take the standard approach of exploring first and then exploiting. We start with a stylized model but present numerous applications in restaurant bookings, bike-share replinishments, customized order-fulfilment, air traffic control, virtual queueing systems, and...
Identifying predictors associated with specific response categories in multinomial logistic regression is a challenging task. It is furthermore complex in a high-dimensional setting, where the number of covariates is higher than the number of units. To address the variable selection in high dimensional domain and in the presence of multinomial models with unordered responses, we propose a...
In various industrial applications, sensors are widely used to collect the signals for predicting the lifetime of product units or systems.
From a modeling perspective, the signal from each sensor can be considered as a time-varying covariate and the lifetime of units can be considered as the response.In the literature, cumulative exposure models are used to link the lifetime response with...
In the framework of the Cusum procedure, the evolution of a false alarm has a well-understood stochastic behavior. So, if observations preceding an alarm were to exhibit a behavior that is significantly different, there would be reason to reject the hypothesis that the alarm is false.
We develop a test of this difference. The method is applied to detecting a change in a Covid-19 context...
In manufacturing, identifying variables that influence process outcomes is essential for control and optimization. While wrapper-based variable selection methods, such as Conditional Boruta by Rotari and Kulahci(2025), has been proposed, they remain susceptible to rejecting variables that only influence the outcome through interactions with other variables. Therefore, the purpose of the...
Artificial Intelligence (AI) has shown become very popular as modelling strategy within statistical process monitoring (SPM), particularly in detecting abnormal process behaviours. However, for existing AI-based SPM methods, diagnosing features associated with signal remains challenging, as traditional diagnosis methods are not directly applicable. This lack of diagnosis makes it difficult to...