This presentation explores the application of innovative deep learning architectures to enhance electricity demand forecasting in decentralised grid systems, with a focus on the French energy market. Generalised Additive Models (GAMs), which are state of the art methods for electricity load forecasting, struggle with spatial dependencies and high-dimensional interactions inherent in modern...
The EU Digital Decade Policy Programme 2030 strongly depends on safe and reliable cutting-edge technologies, like Micro-Electro-Mechanical Systems (MEMS) sensors, that are widely used in large sensor networks for infrastructural, environmental, healthcare, safety, automotive, energy and industrial monitoring. The massive production of these sensors, often in the order of millions per week,...
Aerospace industry is driven by the need to develop new concepts and methods to handle the constraints of weight and performance efficiency, reliability, regulatory safety compliance, and cost-effectiveness. In parallel to these demands, engineers have to manage increasing design complexity by Multi Disciplinary models and accelerate the product development cycles to be able to fulfil the...
Data privacy is a growing concern in real-world machine learning (ML) applications, particularly in sensitive domains like healthcare. Federated learning (FL) offers a promising solution by enabling model training across decentralized, private data sources. However, both traditional ML and FL approaches typically assume access to fully labeled datasets, an assumption that rarely holds in...
Classification is the activity of assigning objects to some pre-existing exclusive categories that form a comprehensive spectrum (scale) of the studied property. The classifier can be a person, a machine, an algorithm, etc. Classification accuracy is a combination of trueness and precision. The latter (precision), perceived as the 'closeness of agreement' between results of multiple...
Timely crop yield estimation is a key component of Smart Agriculture, enabling proactive decision-making and optimized resource allocation under the constraints of climate variability and sustainability goals. Traditional approaches based on manual sampling and empirical models are constrained by labour intensity, limited spatial coverage, and sensitivity to within-block (-site) heterogeneity....
A single Shewhart chart based on a Max-type statistic has been suggested for monitoring a process using one control charts, based on a single plotting statistic, and detecting changes in its parameters. To improve its power, it is suggested to apply one or more supplementary rules based on run statistics, known as runs rules. Supplementary runs rules have been used since the 1950s to improve...
Modern industrial systems generate high-dimensional data streams often used for statistical process monitoring (SPM), i.e., distinguishing between multiple in-control and out-of-control (OC) states. While supervised SPM methods benefit from labeled data in assessing the process state, label acquisition is often expensive and infeasible at large scale. This work proposes a novel stream-based...
Monitoring disease prevalence over time is critical for timely public health response and evidence-based decision-making. In many cases, prevalence estimates are obtained from a sequence of independent studies with varying sample sizes, as commonly encountered in systematic reviews and meta-analyses. Traditional control charts such as the EWMA and CUSUM have been widely used in industrial...
This research investigates the performance of adaptive Exponentially Weighted Moving Averages (EWMA) control charts when monitoring the standard deviation of a process. It is known that when we use a fixed value for the smoothing parameter λ of the EWMA chart we restrict its ability to detect shifts of changing magnitudes. In this paper, we propose nine EWMA charts for the standard deviation...
Modern, flexible, data-driven pricing techniques are required for revenue management in the car rental sector, in order to better meet the ever-changing market conditions and erratic customer demand. Traditional pricing approaches often fall short in capturing the inherent volatility and complexity that exist in the particular industry. However, the growing availability of real-time data has...
Statistical process monitoring (SPM) is used widely to detect changes or faults in industrial processes as quickly as possible. Most of the approaches applied in industry are based on assuming that the data follows some parametric distribution (e.g., normality). However, in industry this assumption is not always feasible and limits the application and usefulness of SPM for fault detection. In...
This study addresses the implications of multiple inputs in estimating overall risk for Non-Automatic Weighing Instruments (NAWI) manufacturers. Conformity assessment of NAWI involves various controls, such as type approval, initial and subsequent verifications, and in-service inspections. Producer risk, i.e., the probability of rejecting a compliant device, is significantly affected by the...
Recent advances in the Internet of Things (IoT) and sensor technologies have provided powerful tools for the continuous, real-time monitoring of highly complex systems characterized by a wide range of features. This is particularly relevant for HVAC systems in buildings, where the objective is to maintain appropriate levels of hygrothermal comfort while minimizing energy consumption. As such,...
This work presents a methodology for condition monitoring of spur gearboxes based on AI-enhanced multivariate statistical process control. Gearboxes are critical components in rotating machinery, and early fault detection is essential to minimize downtime and optimize maintenance strategies. Vibration signals are a non-invasive means to assess gearbox conditions under varying load and...
This research proposes a novel framework for automating the generation of AI personas using Agentic AI systems, designed to be MCP-compliant for robust interoperability and seamless tool integration. Traditionally, creating AI personas involves repetitive prompt engineering, manual calibration, and continuous human intervention, which can be time-consuming and error-prone. In contrast, we...
In this work, we address the problem of binary classification under label uncertainty in settings where both feature-based and relational data are available. Motivated by applications in financial fraud detection, we propose a Bayesian Gaussian Process classification model that leverages covariate similarities and multilayer network structure. Our approach accounts for uncertainty in the...
Bayesian Optimization (BO) has received tremendous attention for optimizing deterministic functions and tuning ML parameters. There is increasing interest in applying BO to physical measurement data in industrial settings as a recommender system for product/process design. In this context multiple responses of interest are the norm, but "basic" BO is only defined minimization/maximization of a...
As more and more new materials (like raw earth, hemp concrete, etc) are used in the construction of building walls, their thermal resistance needs to be evaluated, not only at the laboratory scale, but also and more importantly in situ, at the building scale where potentially used in conjunction with other materials. A dedicated experimental prototype device limiting the influence of external...
The real estate sector plays a significant role in shaping the urban environment and influencing carbon emissions. As the demand for environmental sustainability grows, it is crucial to understand the factors that drive real estate companies to adopt environmentally friendly policies and improve their environmental performance. This study investigates the impact of board gender diversity on...
Estimating traffic volumes across street networks is a critical step toward enhancing transport planning and implementing effective road safety measures.
Traditional methods for obtaining traffic data rely on manual counts or high-precision automatic sensors (e.g., cameras or inductive loops). While manual counting is labor-intensive and time-consuming, fixed sensors are costly and typically...
Calibrating a simulation model involves estimating model's parameters by comparing its outputs with experiences to ensure that simulation results accurately reflect those experiences. However, when outputs are functions of time, there are multiple ways to define the difference between experimental and simulated outputs. It has recently been proposed to use elastic functional data analysis,...
Cellwise outliers, introduced by Alqallaf et al. (2009), represent a shift from the traditional rowwise approach in robust statistics by focusing on individual anomalous data cells rather than entire observations. This paradigm offers significant advantages, such as pinpointing which variables cause outlying behavior and preserving more usable data, particularly in high-dimensional settings...
The focus on every experimental process is observing, studying and understanding phenomena, which are of multivariate nature. The rapid growth of computational power, in combination with the existence of different statistical packages, has facilitated the data collection and led to the development of statistical techniques for monitoring and surveillance. In real world settings, the...
Real world datasets frequently include not only vast numbers of observations but also high dimensional feature spaces. Exhaustively gathering and examining every variable to uncover meaningful insights can be time consuming, costly, or even infeasible. In order to build up robust, reliable and efficient regression models, feature selection techniques have therefore become inevitable. Yet many...
Single-cell RNA sequencing (scRNA-seq) enables detailed exploration of cellular heterogeneity,Yet its high dimensionality requires efficient feature selection for robust downstream analysis.This study evaluates five feature selection methods—Triku, Scanpy, Seurat, Variance Threshold, and Pearson Residual—on a multi-cancer scRNA-seq dataset comprising 801 cells from breast, colon, kidney, lung,...
Balanced systems are widely employed across various industries and are often exposed to dynamic environments. While most existing research emphasizes degradation dependence, this study focuses on optimizing maintenance strategies for balanced systems by jointly considering dependent competing risks and environmental influences. System failure is defined under three conditions: (1) soft...
During storage chicken filets develop unwanted odours caused by volatile compounds produced by spoilage bacteria present on the surface of the products. Spoilage bacteria are not harmful but may lead to rejection by consumers. The poultry industry therefore needs to optimise the shelf life to minimise the risk of rejection by consumers to reduce food waste.
Optimisation of the estimated...
Design of experiments (DoE) is a cornerstone methodology for optimizing industrial processes, yet its application to multistage processes remains underdeveloped, particularly in cost-constrained contexts. We present a methodology for cost-efficient experimental design tailored to such contexts, illustrated through a case study in potato fry production.
Potato fry production involves a...
Missed appointments in diagnostic services contribute significantly to healthcare inefficiencies, increased operational costs, and delayed patient care. This study explores the use of data-driven predictive models to identify patients at high risk of no-show behavior in diagnostic scheduling. The analysis focuses on the MRI department, where the impacts of no-shows are particularly crucial. By...
Many business process and engineering design scenarios are driven by an underlying inverse problem. Rather than iteratively exercise a computationally expensive system model to find a suitable design (i.e., match a target performance vector), one might instead design an experiment and conduct off-line system model simulations to fit an inverse approximation, then use the approximation to...
The on-line detection of a change in the statistical behavior of an observed random process is a problem that finds numerous applications in diverse scientific fields. The "Sequential Detection of Changes" literature addresses theoretically and methodologically the specific problem offering a rich collection of theoretical results and a multitude of methods capable of efficiently responding to...
Healthcare fraud is a significant issue that leads to substantial financial losses and compromises the quality of patient care. Traditional fraud detection methods often rely on rule-based systems and manual audits, which are inefficient and lack scalability. Machine learning methods have begun to be incorporated in the fraud detection systems of insurance companies; however these methods...
Monitoring the occurrence of undesirable events, such as equipment failures, quality issues or extreme natural phenomena, requires tracking both the time between events (T) and their magnitude (X). Time Between Events and Amplitude (TBEA) control charts have been developed to monitor these two aspects simultaneously. Traditional approaches assume known distributions for T and X. However, in...
In this work, we study the behavior of nonparametric Shewhart-type control charts, which employ order statistics and multiple runs-type rules. The proposed class of monitoring schemes include some existing control charts. In addition, new distribution-free monitoring schemes that pertain to the class, are set up and examined extensively. Explicit expressions for determining the variability and...
Last year I gave my quite classical DoE course again to my colleagues, and realised that teaching the design construction is actually not the most useful time spent for practitioners, as modern software takes care of that for the practitioners needs. I had had the feedback before that the most valuable piece in my course was the section on understanding the problem at hand, identifying the...
Tourism stakeholders increasingly seek data-driven methods for tailoring travel experiences to individual interests. This study investigates whether the preferences that travelers express implicitly on social media, together with operational travel data, can be transformed into high-fidelity digital profiles and, subsequently, into personalized travel packages.
During the first phase, we will...
Early process development in biopharma traditionally relies on small-scale experimentation, e.g. microtiter plates. At this stage, most catalyst candidates (clones) are discarded before process optimisation is conducted in bioreactors at larger scales, which differ significantly in their feeding strategies and process dynamics. This disconnect limits the representability of small-scale...
This study focuses on bankruptcy prediction for micro-sized enterprises, a segment often overlooked in credit risk modeling due to the limited reliability of their financial data. Building on prior research that highlights the importance of sector-specific strategies, we construct separate predictive models for selected industries using a dataset of 84,019 Italian micro-enterprises, of which...
Manufacturability is a critical factor in mechanical design, yet many engineers—especially those with limited hands-on manufacturing experience—unknowingly produce components that are difficult or impossible to fabricate. For instance, designs with sharp internal corners in milled pockets often overlook tool geometry constraints, resulting in costly redesigns and delays. This research...
This paper presents a novel framework for designing adaptive testing procedures by leveraging the properties of waiting-time distributions. The proposed approach integrates temporal information - specifically, the time needed for a specific sequence of correct answers to be realized—into the testing process, enabling a more dynamic and individualized assessment of examinee performance. By...
Large language models (LLMs) are increasingly capable of simulating human-like personalities through prompt engineering, presenting novel opportunities and challenges for personality-aware AI applications. However, the consistency and stability of these simulated personalities over time and across contexts remain largely unexplored. In this study, we propose an integrated framework to...
The exponentially weighted moving average (EWMA) control chart was proposed already in 1959 and it became one of the most popular devices in statistical process monitoring (SPM) in the last decade of the previous century. Besides its most popular version for monitoring the mean of a normal distribution, many other statistical parameters were deployed as target for setting up an EWMA chart....
Model-based approaches are commonly used in the analysis, control and optimization of biosystems. These models rely on knowledge of physical, chemical and biological laws, such as conservation laws, transport phenomena and reaction kinetics, which are usually described by a system of non-linear differential equations.
Often our knowledge of the laws acting on the system is incomplete....
Nowadays, big data is generated real-time in the majority of industrial production processes. Happenstance data is characterized by high volume, variety, velocity and veracity (4v of big data).
In this study production data from industrial purification process is analyzed to assess process performance and its relations with product quality. For this purpose, a comprehensive data...
Fraction of Design Space(FDS) plot is a graphical display that uses the scaled prediction variance(SPV) measure to assess and compare the prediction capabilities of response surface designs. Its application has been vastly used in single-response surface designs as a more informative display of the distribution of SPV in an experimental region. However, many experiments in industries require...
Extracting meaningful insights from vast amounts of unstructured textual data presents significant challenges in text mining, particularly when attempting to separate valuable information from noise. This research introduces a novel deep learning framework for text mining that identifies latent structures within comprehensive text corpora. The proposed methodology incorporates an initial...
In recent decades, numerical experimentation has established itself as a valuable and cost-effective alternative to traditional field trials for investigating physical phenomena and evaluating the environmental impact of human activities. Nevertheless, high-fidelity simulations often remain computationally prohibitive due to the detailed modelling required and the complexity of parameter...
The IFP group is a leader in research and training in the energy and environmental sector, particularly in the development and commercialization of catalysts. Building accurate predictive models for these catalysts usually requires expensive and time-consuming experiments. To make this process more efficient, it’s helpful to leverage existing data from previous generations of catalysts. This...
An approach to the construction of Balanced Incomplete Block Designs (BIBD) is described. The exact pairwise balance of treatments within blocks (second-order balancing condition) is required by standard BIBD. This requirement is attainable when $\lambda = b \binom{k}{2} / \binom{t}{2}$ is an integer, where $t$ is the number of treatments, $b$ is the number of blocks and $k$ is the block...
Passenger discomfort during flight is greatly influenced by seat interface pressure whose effect varies with time of exposure and passenger anthropometric characteristics.
Existing studies have largely explored static relationships between anthropometric features, seat-interface pressure, and discomfort perception without leveraging these findings for building predictive systems for...
This study presents a statistical framework developed within the TouKBaSEED (Tourism Knowledge Base for Socio-Economic and Environmental Data Analysis) research project to support sustainable tourism planning in the port city of Piraeus, Greece. It integrates quantitative and qualitative methods, combining survey data from returning tourists, new arrivals, and residents with sentiment analysis...
On nuclear sites, such as nuclear power plants, instruments for measuring atmospheric radioactivity are deployed to ensure the radiation protection of workers. This type of instrument continuously samples ambient air aerosols on a filter, measures as an energy spectrum the radioactivity accumulated on the filter in real time, and shall notify the operator if transuranic alpha emitters are...
In the planning of order picking systems, which are characterized by an increasing complexity as well as uncertainties, discrete-event simulation is widely used. It enables investigations of systems using experiments based on executable models. However, the execution of simulation experiments with different parameter configurations (simulation runs) is associated with a high level of effort....
Well microplates are used in several application areas, such as biotechnology, disease research, drug discovery and environmental biotechnology. Within these fields, optimizing bioassays such as CART-T, ELISA and CRISPR-Cas9 is commonplace. Microplates have a fixed size, and the most used ones have 24, 48, 64, 96, 384 or 1,536 wells, with each well representing an individual experiment. When...
Many chemometrics methods like Principal Component Analysis (PCA) function under the assumption of time independent observations, which may not be valid in most industrial applications. This is particularly true when PCA is employed for multivariate statistical process control. To handle time dependent data, Dynamic PCA (DPCA) has been proposed, which incorporates expanding the feature matrix...
Real-time monitoring systems play a crucial role in detecting and responding to changes and anomalies across diverse fields such as industrial automation, finance, healthcare, cybersecurity, and environmental sensing. Central to many of these applications is multivariate statistical process monitoring (MSPM), which enables the concurrent analysis of multiple interrelated data streams to...
Manual Welding is an important manufacturing process in several industries such as marine, automotive and furniture among others. Despite the widespread welding, it still causes a significant percentage of rework in many companies, especially small to medium sized companies. The objective of this project is to develop an economic online monitoring method for detecting defective welds using...
Statistical modelling of material fatigue supports the development of technical products to achieve a design which reliably withstands field load but avoids over-engineered and further unnecessary weight, energy consumption, and consequently, life cycle costs. In this relation, the process of statistical modelling contains test planning, model selection as well as parameter estimation. Several...
An important parameter in pharmacological research is the half-maximal inhibitory concentration (IC50/EC50), which quantifies the potency of a drug by measuring the concentration required to inhibit a biological process by 50%. The 4-parameter logistic (4PL) model is widely employed for estimating IC50/EC50 values, as it provides a flexible sigmoidal fit. Meta-analysis on the other hand, has...
Mixed effect regression models are statistical models that not only contain fixed effects but also random effects. Fixed effects are non-random quantities, while random effects are random variables. Both of these effects must be estimated from data. A popular method for estimating mixed models is restricted maximum likelihood (REML).
The Julia programming language already has a...
The power curve of a wind turbine describes the generated power as a function of wind speed, and typically exhibits an increasing, S-shaped profile. We suggest to utilize this functional relation to monitor the wind energy systems for faults, sub-optimal controls, or unreported curtailment. The problem is formulated as a regression changepoint model with isotonic shape constraints on the model...
A major challenge in the chemicals industry is coordinating decisions across different levels, such as individual equipment, entire plants, and supply chains, to enable more sustainable, autonomous operations. Multi-agent systems, based on large language models (LLMs), have shown potential for managing complex, multi-step problems in software development (Qian et al., 2023). This work...
Innovation diffusion phenomena have long attracted researchers due to their interdisciplinary nature, which allows for integrating theories and concepts from various fields, including natural sciences, mathematics, physics, statistics, social sciences, marketing, economics, and technological forecasting. The formal representation of diffusion processes has historically relied on epidemic...
JMP Pro 19 and JMP Student Edition 19, coming in October 2025, is a major advancement of JMP’s statistical modeling capabilities. In this presentation we highlight three of its most important and impactful new developments. The new Bayesian Optimization platform which combines model optimization via the Profiler with Gaussian Process (GaSP) based active learning methods. Essentially any...
The increased use of random forest (RF) methods as a supervised statistical learning technique is primarily attributed to their simplicity and ability to handle complex datasets. A RF consists of multiple decision trees, which can be categorized into two types based on how they process node splitting: parallel and oblique. Axes-parallel decision trees split the feature space using a single...
In this work we consider one-sided EWMA and CUSUM charts with one Shewhart-type control limit, and study their performance in the detection of shifts, of different magnitude, in the parameters of a two-parameter exponential distribution. Using Monte Carlo simulation, we calculate the run length distribution of the considered charts and evaluate their performance, focusing on the average run...
In statistical process monitoring, control charts typically depend on a set of tuning parameters besides its control limit(s). Proper selection of these tuning parameters is crucial to their performance. In a specific application, a control chart is often designed for detecting a target process distributional shift. In such cases, the tuning parameters should be chosen such that some...
Mixture experiments are commonplace in the chemical industry, where some or all the factors are components of a mixture expressed as percentages. These components are subject to a linear equality constraint, which forces the sum of the proportions to equal one. In most cases, the components are box-constrained, meaning there are constraints on the minimum and maximum concentrations of each...
Bayesian Optimization has emerged as a useful addition to the DOE toolbox, well-suited for industrial R&D where resource constraints incentivize spending a minimal number of experiments on complex optimization problems.
While Bayesian Optimization is quite simple to use in principle, the experimenter still has to make choices regarding their strategy and algorithm setup. The question is,...
Dynamic pricing has emerged as a powerful mechanism for adapting product and service prices in real time, based on fluctuating market conditions, customer behavior, and operational constraints. In this work, we explore a novel approach to dynamic pricing that leverages techniques from statistical process monitoring and probability modelling toolboxs. Through a series of simulations as well as...
In the realm of machine learning, effective modeling of heterogeneous subpopulations presents a significant challenge due to variations in individual characteristics and behaviors. This paper proposes a novel approach for addressing this issue through Multi-Task Learning (MTL) and low-rank decomposition techniques. Our MTL approach aims to enhance personalized modeling by leveraging shared...
Statistical consulting plays a crucial role in bridging theory and practice across industries and research. This session brings together professionals with diverse consulting backgrounds—including a freelance consultant, a consultant from a small firm, a consultant from a large organization, and an internal statistical consultant—to explore what it takes to succeed in the field today and in...
Monitoring time between events, operational delays or responding to a customer call is essential for maintaining and thriving to enhance service quality. Several aspects of the processes, including location such as median time, variability and shape, are pivotal. This paper introduces a Phase-II distribution-free cumulative sum (CUSUM) procedure based on a combination of three orthogonal rank...
Explainable AI (XAI) approaches, most notably Shapley values, have become increasingly popular because they reveal how individual features contribute to a model’s predictions. At the same time, global sensitivity analysis (GSA) techniques, especially Sobol indices, have long been used to quantify how uncertainty in each input (and combinations of inputs) propagates to uncertainty in the...
The paper examines the factors that may signal the existence of potential fraud in companies’ financial statements. Using a sample of the Russell 3000 firms from 2000 to 2023, we explore the relationship between various accounting, audit, internal control and market variables and the presence of fraud indicators. Two dependent variables are employed as proxies for potential fraud: the...
In today’s industrial landscape, effective decision-making increasingly relies on the ability to assess target ordinal variables - such as the degree of deterioration, quality level, or risk stage of a process - based on high-dimensional sensor data. In this regard, we tackle the problem of predicting a ordinal variable based on observable features consisting of functional profiles, by...
São Paulo, one of the largest cities in the world, is implementing one of the most extensive primary healthcare accreditation projects ever conducted, covering 465 Basic Health Units (UBS) and reaching approximately seven million users of Brazil’s Unified Health System (SUS). This initiative is part of the municipal program called “Avança Saúde” and follows the methodology of the National...
Statistical Process Control (SPC) and its numerous extensions/generalisations focus primarily on process monitoring. This permits identification of out-of-control signals, which might be isolated out-of-control observations or a more persistent process aberration, but says nothing about remedying or controlling them. While isolated out-of-control signals require isolated interventions, a more...
The landscape of the pharmaceutical industry is evolving. From what was (and still is) a science-centered discipline, more awareness exists nowadays of the opportunities arising from exploring data-driven methodologies to conduct various key activities. In this regard, Chemometrics has been an old-standing ally of the pharmaceutical industry, allowing for real-time assessment of raw materials,...
Over the past decade, ongoing collaboration in research and technology transfer between faculty and professionals from Ecuadorian Escuela Politécnica Nacional and Universidad Nacional de Chimborazo, and Spanish Universidade da Coruña, has led to multiple developments in computational statistics aimed at solving real-world problems in industry and engineering.
Specifically, our work has...
Modern industrial systems generate real-time multichannel profile data for process monitoring and fault diagnosis. While most methods focus on detecting process mean shifts, identifying changes in the covariance structure is equally important, as process behavior often depends on interdependence among multiple variables. However, monitoring covariance in multichannel profiles is exacerbated by...
We consider a framework which addresses the search for an optimal maintenance policy of a system by using observed system state data to learn the degradation model on one hand, and by using simulation from the learned model to obtain future states and rewards to update the value function and improve the current policy, on the other hand. We apply this framework to the maintenance of...
Predicting the Remaining Useful Life (RUL) of equipment is critical for enabling proactive maintenance and mitigating unexpected failures. Traditional RUL prediction methods often rely on direct regression from sensor data to failure time, resembling Monte Carlo (MC) approaches in reinforcement learning, which require full run-to-failure trajectories and can exhibit high variance. This paper...
There are typically three approaches to using statistical software in teaching. The first is to teach the statistical topic at hand without any use of software, then show how to apply methods using statistical software. This separation approach follows the idea that instruction should be software neutral. While a statistical topic is broader than a particular method, including ideas and...
In a client project focused on asset registration, we transitioned from a publicly available LLM (Gemini) to a locally hosted LLM to prevent the potential leakage of sensitive manufacturing and customer data. Due to hardware constraints (GPUs with a maximum of 12 GB VRAM), the performance of local LLMs was initially inferior to hosted models. Therefore, prompt optimization became crucial to...
The Shiryaev’s change point methodology is a powerful Bayesian tool in detecting persistent parameter shifts. It has certain optimality properties when we have pre/post-change known parameter setups. In this work we will introduce a self-starting version of the Shiryaev’s framework that could be employed in performing online change point detection in short production runs. Our proposal will...
In pharmaceutical statistics, traditional outlier detection often focuses on univariate methods. However, a multivariate approach is essential for analysing complex datasets representing critical quality attributes, such as assay, dissolution, and disintegration time.
The Shiny-for-Python application described here employs advanced machine learning techniques, specifically Principal...
I will present examples of how I have analysed data to display the outcome on LinkedIn and the comments on LinkedIn.
Examples, include:
• UK Gas Price rip off
• Small Boats and Slogans
• Death in England and Wales and the winter fuel allowance
I also wish to discuss two other examples including:
• Global Warming, mobile phones, the use of AI and the law of energy.
...
The advent of Industry 5.0—characterized by its emphasis on resilient and sustainable technology integration—aims to reorient industrial production toward a more competitive model with a positive societal impact. Within this framework, the Joint Research Unit (CEMI) formed by the shipbuilding company Navantia and the Universidade da Coruña is focused on developing and validating advanced...
In manufacturing, output of a measurement system is often used to classify products as conforming or noncorforming. Therefore, to ensure product quality, it is essential to utilize a suitable measurement system. In this regard, practitioners frequently employ various performance metrics to assess measurement systems, which are typically obtained through off-line studies involving experimental...
Electric batteries are often connected in parallel to ensure a wider power supply range to external electrical loads. Their condition is routinely monitored through the current measured when the batteries supply power. When the condition is adequate, the current is balanced throughout the system, with each battery contributing equally to the electrical load.
To ensure that monitoring focuses...
In reservation-based services with volatile demand and competitive pricing pressures, dynamically optimizing prices is essential for revenue maximization. This paper introduces a data-driven pricing framework that integrates demand forecasting with stochastic optimization. We model customer arrivals using a non-homogeneous Poisson process, where expected demand is estimated through a Poisson...
I will tell the story of a social media influencing mission to empower every scientist and engineer in the world with the tools of Statistical Design and Analysis of Experiments. I'll share compelling examples, talk about why I started this, and how I did it. Using visual explorations of impressions data you will see what we can learn about using online channels to promote the value of statistics.
Turboprop engines undergo regular inspections, yet continuous analysis of in-flight sensor data provides an opportunity for earlier detection of wear and degradation—well before scheduled maintenance. The choice of statistical method plays a crucial role in ensuring diagnostic accuracy and interpretability. In this study, we compare the performance of traditional parametric...
Nuclear fusion holds the promise of clean, virtually limitless energy. For fusion reactions to occur in the plasma, extreme temperatures must be reached – up to 150 million degrees Celsius. This constitutes an immense engineering challenge, requiring a necessary degree of accuracy. The Neutral Beam Injection (NBI) is one key technology enabling the auxiliary heating of the plasma. Fast neutral...
Purpose: Industrial applications increasingly rely on complex predictive models for process optimization and quality improvement. However, the relationship between statistical model performance and actual operational benefits remains insufficiently characterized. This research investigates when model complexity provides genuine business value versus statistical...
In this study, we investigate the economic impact of COVID-19 on employment within Italian firms. In particular, we analyse how employment levels were affected across different types of firms and assess the extent of the impact. We also examine the role of public subsidies provided during the COVID-19 period and evaluate the occupational mix between ‘flexible’ and ‘non-flexible’ employees....
In a world saturated with data and methodological research, the ability to communicate statistical insights clearly and compellingly is as critical as the models we build and the methods we develop. This talk explores the art and science of data storytelling through the three rhetorical lenses of Logos (logic and reason), Ethos (credibility and character), and Pathos (emotion and connection)...
Project monitoring practices have significantly evolved over the past decades. Initially grounded in traditional methodologies such as Earned Value Management (EVM), these practices have advanced to incorporate control charts and sophisticated techniques utilizing Artificial Intelligence (AI) and Machine Learning (ML) algorithms to predict final project costs and durations. Despite these...
PRIM is a Bump Hunting algorithm traditional used in a supervised learning setting to find regions in the input variables subspace while being guided by the data analyst, that are associated with the highest or lowest occurrence of a target label of a class variable.
We present in this work a non-parametric PRIM-based algorithm that involves all the relevant attributes for rule generation...
This study presents analytical findings from the GreCO (Green Cultural Oases) project, funded under the European Urban Initiative (EUI). GreCO promotes sustainable cultural tourism in urban environments by leveraging digital innovation, local stakeholder collaboration, and intercultural engagement. Within this framework, a multilingual survey was conducted with over 350 respondents to examine...
Continuous manufacturing (CM) in the pharmaceutical sector integrates the various discrete stages of traditional batch production into a continuous process, significantly decreasing drug product manufacturing time. In CM, where all process units are directly linked, it is crucial to continuously monitor the current process state and maintain consistent product quality throughout...
As a reference frame, balanced factorial designs are used in this presentation, because these designs are orthogonal for all linear models, they can be used for. Orthogonality means that the experimental factors are mutually orthogonal (angles of 90◦) and as such are independent and not correlated. As a consequence, the parameters of the fitted linear models are also independent, leading to...
Deep learning (DL) models are significantly impacted by label and measurement noise, which can degrade performance. Label noise refers to wrong labels (Y) attached to samples, whereas measurement noise refers to the samples (X) that are corrupted due to issues during their acquisition. We present a generic approach for efficient learning in the presence of such noise, without relying on ground...
ESBELT, a manufacturer of conveyor belts, was preparing to replace a critical machine in its production line and aimed to ensure a robust technology transfer. The machine fused multiple textile layers using a specific combination of temperature, air flow, tension and speed. Product quality was primarily evaluated by layer adherence, a critical-to-quality characteristic assessed destructively...
We study the problem of transforming a multi-way contingency table into an equivalent table with uniform margins and same dependence structure. This is an old question which relates to recent advances in copula modeling for discrete random vectors. In this work, we focus on multi-way binary tables and develop novel theory to show how the zero patterns affect the existence of the transformation...