The increasing digitalization of manufacturing processes is transforming the relationship between raw material suppliers and customers. Rather than defining rigid specifications that often lead to unnecessary rejection of raw material lots, Industry 4.0 opens the door to more collaborative and knowledge-driven strategies. In this context, multivariate raw material specifications should not be...
Throughout my years of research and university teaching, as well as advising master's and doctoral theses in applied fields such as economics, management, biology, geology, and agriculture, I have noticed that students and researchers often face difficulties in selecting appropriate statistical methods to validate their hypotheses. They may either choose an inappropriate method or fail to...
Recent advances in additive manufacturing enable the fabrication of complex parts with intricate geometries and spatially-varying material composition. Data fusion integrates point cloud data with chromatic attributes, yielding 4D point clouds, a rich representation that jointly encodes shape and material information. We introduce a registration-free framework for jointly monitoring shape and...
Random effects models are widely used in interlaboratory comparisons to estimate between-laboratory variability τ and assess degrees of equivalence of laboratories [1]. In this context, decisions are often based on 95% credible intervals [2], while Bayesian hypothesis testing provides an alternative probabilistic framework based on Bayes factors for assessing laboratory effects [3].
This work...
Classification performance is typically assessed empirically, yet its dependence on intrinsic data characteristics is not fully understood. In this study, we examine classification difficulty through two complementary dimensions: local class ambiguity measured in terms of instance hardness and global class separability captured by Silhouette score.
We generated synthetic datasets based on the...
In studies of opinion spread, peer pressure is often modeled through interactions of more than two individuals (higher-order interactions). We introduce a two-layer random hypergraph model, in which households and workplaces form the layers and hyperedges represent individual households and workplaces. Within this structure, individuals may react when their opinion is in the minority within...
This paper addresses effect screening from observational data where sampling is constrained by cost, time, or process limitations, with a main focus on manufacturing applications. We propose a novel active learning strategy that introduces principles from optimal experimental design (A- and D-optimality) and combines it with an optimization for multicollinearity using Variance Inflation...
Uncertainty quantification for complex physical systems often relies on computationally expensive numerical simulators. When execution times limit the number of feasible runs, surrogate modeling becomes essential for tasks such as sensitivity analysis, design optimization, and safety assessment. Gaussian process regression (GPR) is a leading
surrogate due to its uncertainty...
Adaptive randomization in clinical trials often requires balancing competing goals: improving patient benefit, preserving statistical efficiency, and maintaining adequate randomness in treatment assignment. We propose the Adaptive Design Strategy (ADeS), a flexible group-sequential framework that unifies covariate-adaptive (CA), response-adaptive (RA), covariate-adjusted response-adaptive...
Ensemble methods such as Random Forests achieve strong predictive accuracy but at the cost of interpretability. Explainable Ensemble Trees (E2Tree) address this trade-off by constructing a single interpretable tree that approximates the co-occurrence structure induced by the ensemble. The quality of this approximation matters: when interpretability is invoked for regulatory or scientific...
The digitalisation of tourism facilities means that these facilities have access to a wide range of IoT-like data sources. This article presents a conceptual approach that describes how such heterogeneous data streams can be used to systematically improve service offerings, resource planning and operational decisions through targeted short-term, medium-term and long-term forecasts. The...
The advent of artificial intelligence (AI) technologies has significantly changed many domains, including applied statistics. This talk explores the evolving role of applied statistics in the AI era, drawing from our experiences in engineering statistics, focusing on how statistics can be employed to study the properties of AI models and enhance AI systems, especially in AI assurance.
We...
Inter-rater reliability, the quantification of agreement between individuals who assign scores to the same phenomenon, is an important consideration in all fields for which data drives decision-making (e.g., business and industry, healthcare, social and behavioural sciences, education, etc.). Traditionally, the raters scoring the phenomenon have been human beings. With the proliferation of AI,...
Uncertainty quantification is essential for assessing the reliability of MRI reconstructions. The Network-based Tikhonov reconstruction method was demonstrated to produce excellent results in accelerated multicoils settings. However, in challenging low-field settings, where noise is high and scanners are single coil, this scheme requires a careful evaluation of its reconstruction...
Bayesian optimization (BO) and classical design of experiments (DOE) are rarely taught together. DOE courses cover factorial designs, response surface methodology, and optimality criteria like D and I-optimality. BO courses cover Gaussian processes and acquisition functions like expected improvement. The two communities use different notation, different software, and publish in different...
Early sequential feature selection is crucial in manufacturing environments with heterogeneous sensors and tightly coupled process variables. In shop-floor applications, predicting End-of-Line test failures using data collected as early as possible is vital to enable timely corrective actions. However, classical feature selection and Explainable Artificial Intelligence methods (e.g., SHAP)...
This work investigates how accounting for multiple sources of input measurement uncertainty affects the estimation of global risk in manufacturing processes. For example, conformity assessment for Non-Automatic Weighing Instruments involves several quantities of interest, one of which is the producer risk—the probability of incorrectly rejecting a conforming instrument—which is strongly...
The GAISE recommendations (2025) are shaping best practices in teaching statistics, shifting the focus toward statistical thinking as a holistic investigative process. Despite related academic advancements, a persistent skills gap remains in preparing graduates for the complex unstructured problem-solving requirements of modern industry. This session proposes a framework where teaching...
In many applied settings involving binary variables, practitioners typically rely on pairwise measures of dependence, such as correlations or agreement indices. However, when more than two variables are involved, these quantities do not uniquely determine the joint distribution. Instead, they define a family of admissible distributions that share the same pairwise structure while potentially...
The presence of careless respondents represents a well-known threat to the quality of survey data. Respondents who provide inattentive or random answers can distort statistical analyses, reduce measurement reliability, and bias substantive conclusions. A variety of indicators have been proposed in the literature to detect such respondents, including response pattern measures such as longstring...
Skillful predictions in climate and environmental science are essential for planning operations, assessing risks, guiding adaptation strategies, and building resilience. This talk synthesizes key concepts and methods for enhancing predictive performance in these domains, with a particular emphasis on predictive uncertainty estimation, extreme event prediction, and the role of big datasets and...
Orthogonal minimally aliased response surface or OMARS designs permit the study of quantitative factors at three levels using an economical number of runs. In these designs, the linear effects of the factors are neither aliased with each other nor with the quadratic effects and the two-factor interactions. Complete catalogs of OMARS designs with up to five factors have been obtained using an...
Over the past few years, we have been interested in answering the question "Can ChatGPT Think Like a Statistician?" The answer is "Yes, but...". Our data analysis prompting experience has prompted us (pun intended!) to create a framework for obtaining appropriate data analyses from artificial intelligence called CROSSVALI (Context, Refined Questions, Options, Specificity, Scrutiny, Verify &...
Optimisation and control of industrial fed-batch bioprocesses remains a complex and resource intensive task. Process development from laboratory to manufacturing scale requires extensive experimental trials to characterise system behaviour and identify optimal conditions, incurring significant costs. While advanced process control (APC) technologies have been widely applied to industry, their...
A stability-aware, value-driven framework for uplift policy selection is presented, applied to a telecommunications churn-retention dataset. The central argument is that reliable deployment of an uplift model requires more than optimizing a causal ranking metric: the targeted customer set must remain consistent across repeated training runs, and the selected policy must be economically sound...
Machine vision systems are important in Industry 4.0 as they allow fast automated inspection and quality control. Traceable metrology for machine vision systems is critical for the digital transformation of the Industry 4.0 objectives defined by the EU Green Deal. Nevertheless, these systems currently lack well-defined uncertainty frameworks and calibration techniques. For contactless 3D...
In industrial machine vision, camera positioning is traditionally a manual, iterative, trial-and-error process. Even if sufficient accuracy can be reached, this leads to prolonged downtime during initial installation and maintenance, especially for inspection tasks where the camera must be positioned at a precise location, orientation, and working distance. In addition, the operator-dependent...
At ENBIS-24 in Leuven, we brought an emerging challenge to the ENBIS Active Session: a large industrial laundry operator managing millions of textile items across multiple sites wanted to understand and extend textile lifespans as part of a circular economy strategy using existing operational data on textile discarding events. The key advice we received — to not trust the data from the outset...
Active learning is commonly framed in the artificial intelligence community as a strategy to reduce labeling costs by selectively querying informative samples for model improvement, yet its conceptual roots are closely aligned with classical ideas from statistical design of experiments, particularly sequential design and design augmentation. In this work, we position active learning as a...
Statistical models and machine learning algorithms are often deployed in populations that differ from those on which they were trained, a challenge that is particularly acute in digital health. We discuss domain generalization and adaptation for a large-scale database from multiple countries with intensive care unit (ICU) data. We introduce Distributionally Robust Invariance Learning as an...
Join us for one of the most dynamic and interactive sessions of the ENBIS conference — ENBIS-Live: Open Problem Solving in Action!
This is no ordinary talk. In this fast-paced, high-energy session, statisticians and data scientists roll up their sleeves to tackle real-world open problems — live and on the spot. Think of it as a collaborative brain trust powered by the collective wisdom and...
Efforts to mitigate public health crises have been complicated by unreported cases and the ever-changing trends of those monitored health events across geographic regions and socioeconomic cultures. To resolve both challenges, we propose a Bayesian spatiotemporal susceptible-exposed-infected-recovered-removed (BayST-SEIRD) framework that builds the hidden effects of neighboring communities,...
This work focuses on the estimation of multivariate generalized gamma convolutions (MGGC), a class of distributions widely used in risk modeling for which no closed-form density is available. In practice, only their characteristic functions are known, which makes standard estimation methods such as maximum likelihood inapplicable. To overcome this difficulty, we adopt an RKHS-based approach...
This work develops a statistical framework to estimate stabilization time for core tablet batches produced by continuous roller compaction. Using simulated data from multiple production runs of a representative product, the project focuses on characterizing concentration stability during start-up and identifying an optimal start-up duration to guarantee product quality and consistency. The...
We present a unified perspective on explicit functional ANOVA as a principled decomposition framework for black-box models, bridging explainability, sensitivity analysis, and algorithmic understanding. We derive an exact closed-form functional ANOVA for categorical inputs, valid under arbitrary dependence structures and even on sparse or non-rectangular supports, thereby removing a major...
In statistical and machine learning, efficient data acquisition is pivotal to model performance, particularly when labeled data are costly or time-intensive to obtain. This motivates active learning, in which the learning algorithm selectively queries maximally informative data points to accelerate training and improve predictive efficiency. While many active learning strategies consider query...
Effective predictive modeling in large-scale manufacturing is hampered by the isolated and limited data from individual organizations, collected from costly experiments and various inspections. Collaboration across organizations can handle these limitations, but it faces two main challenges: privacy concerns over organizations and heterogeneous features from varied sensing and inspection...
Sir Ronald Fisher was one of the giants in both the fields of statistics and genetics. His seminal work was done at Rothamsted Experimental Station outside of London.
Fisher was both a statistician and a scientist. He was well trained in mathematics, but in the final analysis, he was a scientist who understood how to perfom proper data analysis. Fisher strongly rejected the...
The rapid evolution of Artificial Intelligence is transforming how data is generated, analyzed, and leveraged across business environments, industrial systems, and organizational processes. Moving beyond the traditional Industry 4.0 paradigm centered on automation, new AI technologies are opening the way toward increasingly autonomous data-driven systems capable of supporting complex...
Quality notifications (QNs) often include detailed free-text descriptions that are difficult to analyze at scale. As a result, many lower-priority notifications are resolved operationally without a structured root cause analysis, even though they may represent recurring problems and non-negligible quality costs. This work presents a practical methodology for using large language models (LLMs)...
Statistical jump models have been recently introduced to detect persistent regimes by clustering temporal features while discouraging frequent regime changes. However, they rely on hard clustering and therefore do not account for uncertainty in state assignments.
In this work, we propose a fuzzy extension of the statistical jump model that incorporates uncertainty in cluster membership....
Vision-based systems in industrial applications involve a wide range of software, including fitting, association, and cloud-to-cloud registration. Software verification is required to guarantee the accuracy of estimated parameters. Verification typically relies on realistic datasets generated using ray casting to sample points on the predefined surface, followed by the addition of random...
This session offers a practical and thought-provoking exploration of how generative AI and large language models are transforming the daily work of statisticians, data scientists, and educators.
We start with a concise “kaleidoscope” of real examples illustrating what modern general-purpose AI tools can achieve in practice, with demonstrations that show how complex tasks can now be...
Low-cost sensors are a new tool for improving air quality maps, which are of major interest in the current era of high-resolution, urban-scale air quality monitoring. These sensors require calibration using reference analyzers. A variety of strategies can be employed, ranging from individual pointwise calibration models to network calibration models. Here, we propose using geographically...
Design of Experiments (DOE) is powerful but rarely intuitive. What is wrong with poking around in design space? Why not vary one factor at a time? The mathematics answers clearly, but the classroom often doesn't.
Physical experiments — where participants can see, touch, and interact with a real system — bring DOE concepts to life. They make abstract ideas like screening, response surface...
This study examines wage outcomes at the cohort level among STEM graduates in Italy, using data from the AlmaLaurea surveys covering 60 institutions over the period 2008-2023. The unit of analysis is a graduate cohort sharing the same university, degree level, and disciplinary category, observed one, three, or five years after completing their studies. Given the nested structure of the data...
In pharmaceutical manufacturing, process optimization and control are critical for ensuring consistent drug product quality and regulatory compliance. In real-world applications, certain study factors are treated as fixed and maintained constant to standardize production conditions. However, it is sometimes inevitable for other factors to exhibit variability, introducing uncertainties that can...
Transport logistics facilities such as less-than-truckload terminals require fast and robust tactical decisions under uncertainty, for example regarding task scheduling, resource allocation, and terminal configuration. Since detailed simulation experiments are computationally expensive, surrogate-assisted optimization methods provide an important basis for decision support.
This...
Quality control methods such as measurement systems analysis, control charts, capability studies, and design of experiments are central to modern manufacturing and increasingly used in service industries. However, many established software solutions (e.g., Minitab, JMP) are costly or require substantial technical expertise (e.g., R, Python). In this presentation, we introduce the Quality...
JMP continues to develop powerful capabilities for statisticians and data scientists in industry. In the session we will demonstrate capabilities in JMP and JMP Pro 19 for Bayesian Optimization of multiple responses, and a new Causal Inference platform that makes establishing causality from observational data easily accessible to non-statistician researchers. We will also give a preview of new...
Accelerated stability studies are a fundamental tool in the development of pharmaceutical and vaccine products, enabling the prediction of long-term stability from short-term experiments conducted under stressed conditions. In this context, kinetic models—such as the Šesták–Berggren formulation combined with Arrhenius-type temperature dependence—are widely used to describe degradation...
Regression is the workhorse of statistics, and is often faced with real data that contain outliers. When these are casewise outliers, that is, cases that are entirely wrong or belong to a different population, the issue can be remedied by existing casewise robust regression methods. It is another matter when cellwise outliers occur, that is, suspicious individual entries in the data matrix...
Bayesian optimization (BO) has become a cornerstone methodology for the data-efficient tuning of expensive black-box systems encountered throughout business and industrial practice, ranging from chemical process design and structural engineering to controller calibration and the configuration of large-scale machine learning pipelines. In most of these applications, the objective must be...
In rectifying sampling inspection, a lot is subjected to full inspection if the number of defects in a random sample exceeds a predefined acceptance criterion. Traditional models assume a constant probability of being defective p for all items within a lot. In contrast to this, we consider heterogeneous lots in which individual items can have different probabilities of being defective, to...
Although the CLIC-based model selection approach is widely used to identify spatial extreme models, the complexity of the associated statistical inference limits the reliability of this criterion. In addition, the strong spatial dependence in small or moderate regions may lead to substantial overlap among the spatial extremes models. This potential overlap increases the risk of model...
Surrogate models provide fast approximations of computationally expensive simulations (or experiments) and are trained using a limited set of observations generated by these codes. In the multi-fidelity framework, we assume the availability of two computer models with different levels of cost and accuracy. The high-fidelity model $z_H$ provides the most accurate predictions but is also the...
Kernel-based methods provide a principled alternative to classical numerical solvers for nonlinear partial differential equations (PDEs), especially in mesh-free settings with built-in regularization and uncertainty quantification. Traditional discretization techniques such as finite differences or finite elements can become computationally demanding for nonlinear or multiscale problems. In...
Understanding extreme environmental phenomena is crucial for risk management in a changing climate. In particular, dry spells, defined as consecutive days without precipitation, play a key role in drought dynamics, with direct impacts on agriculture, water resources, and insurance systems. Dry spell lengths are inherently discrete and often exhibit complex dependence structures across...
Maps play a key role in many applications to facilitate decision-making for risk analysis, such as in assessing natural hazards, soil pollution, water quality, and so on. Performing global sensitivity analysis in a spatial context can benefit from multiple approaches adapted to multivariate outputs, namely: 1. the combination of variance-based sensitivity indices and functional principal...
This talk presents a Six Sigma project developed in a ready-to-eat food company aimed at optimizing a meat roasting process while balancing food safety, product appearance, juiciness, and production yield.
Following the DMAIC methodology, historical data analysis, Measurement System Analysis (Gage R&R), and Root Cause Analysis tools were initially applied to understand process variability and...
Berry greenhouses in the Souss-Massa region of Morocco sustain high-value exports of strawberries, blueberries, and raspberries, but their yield and fruit quality are highly sensitive to microclimate deviations. Dense IoT sensor networks generate high-dimensional, autocorrelated, and non-stationary data that violate classical Shewhart, CUSUM, and MEWMA assumptions.
We propose a hybrid...
Stability studies are commonly conducted to evaluate how product characteristics evolve during storage. In many industrial applications, several quality attributes are measured repeatedly over time for multiple products, generating multivariate longitudinal datasets. A key objective in these studies is to compare products in terms of their stability and identify those exhibiting more stable...
The landscape of industrial data mining is rapidly evolving, opening new opportunities to transform production systems from merely instrumented to truly intelligent. This talk explores innovative AI-driven solutions that go beyond traditional data-driven approaches, encompassing generative AI — including GANs and diffusion models — and transfer learning, to support a new generation of smart...
In many applications of interest, multivariate time series data feature trend behaviors. Yet, trends that may affect multivariate stochastic processes are still largely dealt with in a univariate manner. Calling on differencing and co-integration concepts for univariate time series, we introduce stochastic trends for multivariate data, with particular focus on trends that are constrained by an...
In this work, we consider monitoring continuous data in the unit interval and investigate the statistical design and performance of a two-sided Shewhart chart when the process parameters are unknown. The most common distribution assumed for such data is the Beta distribution. Although control charts based on the Beta distribution have been studied by several authors, the case of estimated...
A major challenge in Additive Manufacturing (AM) is the development of reliable in-situ and online quality monitoring methodologies. Visible and infrared cameras can provide near real-time image data that can be exploited for anomaly detection through Statistical Process Control and Monitoring (SPC/M) methods.
This work investigates image-based monitoring methods for Selective Laser Melting...
Minitab DoE by Effex platform has been expanded to handle random blocks and complex split-plot structures with up to five levels of difficult-to-change factors. In this talk, we will first explain how random factors are considered when generating an optimal design. Then, we will explain how to assess the trade-off between run size and the quality of competing optimal design candidates. Next,...
The increasing availability and complexity of data are transforming decision-making processes across science, industry, and engineering. Modern datasets are often high-dimensional, heterogeneous, and structured over space and time, and are collected on domains with complex geometries, including environmental domains, biological structures, and engineering systems. In many applications, the...
Metamodeling is a fundamental approach for approximating computationally expensive numerical simulations in engineering applications, such as uncertainty propagation and sensitivity analysis. In this work, we address the simultaneous prediction of multiple high-dimensional physical fields governed by linear equality constraints, a setting that arises naturally in problems involving...
Understanding how technical product characteristics translate into consumer perception remains a key challenge in product development. This study presents a case study in which preference mapping techniques are used to explore the relationship between laboratory-based technical measurements and consumer evaluations.
A set of products was characterized through a series of objective...
As referred in last year’s abstract by the same authors, “the landscape of the pharmaceutical industry is evolving”. Such a process continues, with more efforts being devoted to developing data-efficient methodologies for exploring operational spaces of increasing dimensionality that may be composed of continuous, categorical, and mixture factors. From what was (and still is) a science-based...
Quantile-oriented sensitivity analysis allows to quantify uncertainty around quantiles, at different levels, while sensitivity analysis is often focused on deviation around mean (as it involves variances). We will consider qunatile-oriented sensitivity indices (QOSA) and quantile-oriented Shapley effects (QOSE). We will present their relevance on some analytical examples, show how to estimate...
Incremental design for computer experiments traditionally relies on space-filling or uniformity criteria, but these become impractical in high dimensions. The greedy minimisation of the $L_s$-mean quantisation error (or distortion) offers a valuable alternative, though it is also computationally intractable for large $d$.
This talk focuses on random designs composed of i.i.d. points sampled...
In many applied contexts, organizations need to evaluate and compare sets of explanatory variables in terms of their association with a Key Performance Indicator (KPI) of interest. This problem frequently arises in industrial and marketing applications, where companies seek to identify which groups of product characteristics or drivers are most strongly related to outcomes such as customer...
In this talk we present a real-time change detection method for monitoring large, dynamic networks with community structure. We model the propensity for communication within and between communities to incorporate the structure of the underlying network. Our focus on communities makes our method scalable to large-scale networks and we use a window-based approach to accommodate network dynamics...
Machine learning (ML) models are used to provide predictive characterization of response functions based on observed or simulated data. Insight on the nature of the approximation to the underlying response function is essential for the model output to be trusted and used by decision makers, a key part of exlainable AI. When a response function is well-behaved, methods that identify marginal...
Computational phenotyping uses data mining methods to extract clusters of clinical descriptors, known as phenotypes, from electronic health records (EHR). Tensor factorization methods are very effective in extracting meaningful patterns and have become popular in computational phenotyping. Nevertheless, these techniques mainly focus on regular tensors and are used in a fully unsupervised...
We are motivated by the field of air quality control, where one goal is to quantify the impact of uncertain inputs such as meteorological conditions and traffic parameters on pollutant dispersion maps. Sensitivity analysis is one answer, but the majority of sensitivity analysis methods are designed to deal with scalar or vector outputs and are badly suited to an output space of maps. To...
The value of information (VOI) is a decision sensitivity measure that quantifies the expected improvement in decision quality when uncertainty in selected inputs is removed. Unlike many other sensitivity measures, the VOI provides not only a relative ranking of factors but also an absolute metric of decision quality. Despite this, its use has been limited, particularly to decision problems...
Time-series classification faces recurring challenges, including high dimensionality, autocorrelation, and the difficulty of identifying features that capture essential dynamics across temporal scales and phase shifts. We address these issues through shapelet decomposition, a technique that extracts shape-based features from time series while preserving both temporal and frequency information....
A complex system is currently under validation as implemented in its initial instantiation. The technical bet regards more than doubling a key performance at parity of the other ones. The preliminary estimation has been performed by simulation in the concept’s exploration phase by risk reduction by Fault Tree Analysis. The current studies are devoted to allowing the estimation of the...
In this contribution, we focus on small-area compositional data. These data are defined as vectors whose elements are strictly positive and sum to one (e.g., proportions). Compositional data arise in various fields, including medicine, economics, psychology, and environmetrics. They are defined on the D-part simplex (S^D) and require complex techniques for proper analysis.
A traditional...
Since 2023, large language models (LLMs) have begun to reshape the landscape of time-series analysis. In this talk, we present our latest research, insights, and perspectives on using LLMs to model time-series data— both as a standalone modality and in combination with other spatial or contextual information/modality. We will explore three key questions:
(1) What are spatiotemporal LLMs?
(2)...
Complex physical systems are often modeled using high-fidelity simulation codes. However, their inherent complexity makes each simulation computationally expensive, which severely limits their direct use in tasks such as uncertainty quantification. To overcome this, a widely adopted solution is to approximate the simulator using a Gaussian Process Regression (GPR) surrogate model.
However,...
CQM is a consultancy company with over four decades of experience in industrial R&D projects. One of its long-standing customers has developed consumer products for many years and seeks to reduce the test effort and improve decision making in development projects for a certain class of products. In these development projects, different types of tests are performed on prototype designs, from A...
Recent advances in generative AI have enabled powerful language models for industrial applications. However, most solutions rely on cloud-based infrastructures or GPU-accelerated environments, which raise concerns regarding data privacy, latency, and operational cost—particularly in industrial settings dealing with sensitive internal documents.
In this study, we investigate the feasibility...
In modern industrial settings, advanced acquisition systems allow for the collection of data in the form of profiles, that is, functional relationships linking responses to explanatory variables. In this context, statistical process monitoring (SPM) aims to assess the stability of profiles over time in order to detect unexpected behavior. This talk focuses on SPM methods that model profiles as...
Travel time reliability (TTR) is an important issue in transportation systems, significantly influencing individual decision-making and aggregate travel demand. The inherent uncertainty in travel time is crucial for various applications, which requires the estimation of entire travel time distribution instead of merely the expected travel time. This work proposes a statistical framework for...
Product formulation in the specialty chemicals industry requires balancing product quality, cost, and environmental impact. This work proposes a data-driven framework for sustainable formulation design based on latent-variable model inversion combined with multi-objective optimization. Partial Least Squares (PLS) models are built to relate raw-material properties and compositions to product...
Industrial statisticians are well acquainted with the temptation to simplify the analysis of process performance through summary statistics. One of the most prominent examples is process capability analysis, in which a process’s ability to produce items within specification limits is expressed as the ratio between the specification range and the natural variability of the process. The latter...
The truth is that some error, no matter how hard we try, simply can’t be modelled away. That irreducible error that stubbornly remains, no matter how time we have spent selecting predictors, or agonising over parameter tuning. Accepting that there will always be some randomness in statistics goes a long way to helping manage a technical team.
In this talk, Sophie will draw on her own...
Structural Health Monitoring (SHM) of historical heritage is a crucial challenge for preserving humanity’s cultural assets. Increasingly, monuments are equipped with multi-sensor monitoring systems that continuously collect large volumes of data over extended periods. These data require the application of appropriate statistical methods to provide meaningful insights into the structural health...
Farming is vital business. Agricultural experiments have long been carried out on crops including sugar, wheat, potatoes and grass. The second oldest grassland experiment in the UK has been in continuous action at Newcastle University’s Cockle Park farm in Northumberland since 1897.
Over the years data on grass (hay) yield, fertiliser treatments, soil structure, grass composition and the...
We consider a stochastic decision-making system with unknown parameters that need to be estimated to make appropriate decisions. We take the standard approach of exploring first and then exploiting. We start with a stylized model but present numerous applications in restaurant bookings, bike-share replinishments, customized order-fulfilment, air traffic control, virtual queueing systems, and...
Identifying predictors associated with specific response categories in multinomial logistic regression is a challenging task. It is furthermore complex in a high-dimensional setting, where the number of covariates is higher than the number of units. To address the variable selection in high dimensional domain and in the presence of multinomial models with unordered responses, we propose a...
In the framework of the Cusum procedure, the evolution of a false alarm has a well-understood stochastic behavior. So, if observations preceding an alarm were to exhibit a behavior that is significantly different, there would be reason to reject the hypothesis that the alarm is false.
We develop a test of this difference. The method is applied to detecting a change in a Covid-19 context...
In manufacturing, identifying variables that influence process outcomes is essential for control and optimization. While wrapper-based variable selection methods, such as Conditional Boruta by Rotari and Kulahci(2025), has been proposed, they remain susceptible to rejecting variables that only influence the outcome through interactions with other variables. Therefore, the purpose of the...
Artificial Intelligence (AI) has shown become very popular as modelling strategy within statistical process monitoring (SPM), particularly in detecting abnormal process behaviours. However, for existing AI-based SPM methods, diagnosing features associated with signal remains challenging, as traditional diagnosis methods are not directly applicable. This lack of diagnosis makes it difficult to...