Statistics came of age when manufacturing was king. But today’s industries are focused on information technology. Remarkably, a lot of our expertise transfers directly. This talk will discuss statistics and AI in the context of computational advertising, autonomous vehicles, large language models, and process optimization
Supervised learning under measurement constraints presents a challenge in the field of machine learning. In this scenario, while predictor observations are available, obtaining response observations is arduous or cost-prohibitive. Consequently, the optimal approach involves selecting a subset of predictor observations, acquiring the corresponding responses, and subsequently training a...
We propose a generalized linear model for distributed multimodal data, where each sample contains multiple data modalities, each collected by an instrument. Unlike the centralized methods that require access to all samples, our approach assumes that samples are distributed across several sites, and pooling the data is not allowable due to data sharing constraints. Our approach constructs a set...
In their simplest form, orthogonal arrays (OAs) are experimental designs where all level-combinations of any two factors occur equally often. As a result, the main effects of the factors are orthogonal to each other. There are also more involved OAs for which the level-combinations of any three factors occur equally often. In such OAs, the main effects are orthogonal to each other as well as...
SVM Regression Oblique Trees: A Novel Approach to Regression Tasks. This technique combines feature selection based on predictor correlation and a weighted support vector machine classifier with a linear kernel. Evaluation on simulated and real datasets reveals the superior performance of the proposed method compared to other oblique decision tree models, with the added advantage of enhanced...
Orthogonal minimally aliased response surface (OMARS) designs permit the screening of quantitative factors at three levels using an economical number of runs. In these designs, the main effects are orthogonal to each other and to the quadratic effects and two-factor interactions of the factors, and these second-order effects are never fully aliased. Complete catalogs of OMARS designs with up...
After a rich history in medicine, randomized control trials (RCTs), both simple and complex, are in increasing use in other areas, such as web-based A/B testing and planning and design of decisions. A main objective of RCTs is to be able to measure parameters, and contrasts in particular, while guarding against biases from hidden confounders. After careful definitions of classical entities...
In the era of Industry 4.0, ensuring the quality of Printed Circuit Boards (PCBs) is essential for maintaining high product quality, reliability, and reducing manufacturing costs. Anomaly detection in PCB production lines plays a critical role in this process. However, imbalanced datasets and the complexities of diverse data types pose significant challenges. This study explores the impact of...
The focus is on the homogeneity test that evaluates whether two multivariate samples come from the same distribution. The problem arises naturally in various applications, and many methods are available in the literature. Based on data depth, several tests have been proposed for this problem, but they may not be very powerful. In light of the recent development of data depth as an important...
The use of a statistical classifier can be limited by its conditional misclassification rates (i.e., false positive rate and false negative rate) even when the overall misclassification rate is satisfactory. When one or both conditional misclassification rates are high, a neutral zone can be introduced to lower and possibly balance these rates. In this talk the need for neutral zones will be...
Stratification on important variables is a common practice in clinical trials,
since ensuring cosmetic balance on known baseline covariates is often deemed to be a crucial requirement for the credibility of the experimental results. However, the actual benefits of stratification are still debated in the literature. Other authors have shown that it does not improve efficiency in large samples...
In advanced manufacturing processes, high-dimensional (HD) streaming data (e.g., sequential images or videos) are commonly used to provide online measurements of product quality. Although there exist numerous research studies for monitoring and anomaly detection using HD streaming data, little research is conducted on feedback control based on HD streaming data to improve product quality,...
Much has been written about augmenting preliminary designs for first-order regression with additional runs to support quadratic models. This is a reasonable approach to practical sequential experimentation, allowing an early stop if the preliminary first-order result does not look promising. Central composite designs are especially well-suited to this (Box and Wilson, 1951), as all or part of...
Dental practices are a small business. Like any other business, they need cash flow management and financial planning to be viable, if not highly profitable. What a lot of practices may not realize is that they are sitting on a treasure trove of data to be used in more ways than plain accounting and financial forecasting. Here we focus on longitudinal data, such as the timing of each patient’s...
Drought is a major natural hazard which can cause severe consequences on agricultural production, the environment, the economy and social stability. Consequently, increasing attention has been paid to drought monitoring methods that can assist governments in implementing preparedness plans and mitigation measures to reduce the economic, environmental, and social impacts of drought. The...
In recent decades, machine learning and industrial statistics have moved closer to each other. CQM, a consultancy company, performs projects in supply chains, logistics, and industrial R&D that often involve building prediction models using techniques from machine learning. For these models, challenges persist, e.g. if the dataset is small, has a group structure, or is a time series. At the...
A tool for analysis of variation of qualitative (nominal) or semi-quantitative (ordinal) data obtained according to a cross-balanced design is developed based on one-way and two-way CATANOVA and ORDANOVA. The tool calculates the frequencies and relative frequencies of the variables, and creates the empirical distributions for the data. Then the tool evaluates the total data variation and its...
Gliomas are the most common form of primary brain tumors. Diffuse Low-Grade Gliomas (DLGG) are slow growing tumors, and often asymptomatic during a long period. They turn into a higher grade, leading to the patients’ death. Treatments are surgery, chemotherapy and radiotherapy, with the aim of controlling tumor evolution. Neuro-oncologists estimate the tumor size evolution by delineating tumor...
Conditional Average Treatment Effect (CATE) is widely studied in medical contexts. It is one tool used to analyze causality. In the banking sector, the interest for causality methods increases. As an example, one may be interested in estimating the average effect of a financial crisis on credit risk, conditionally to macroeconomic as well as internal indicators. On one other hand, transfer...
Classification precision is particularly crucial in scenarios where the cost of false output is high, e.g. medical diagnosis, search engine results, product quality control etc. A statistical model for analyzing classification's precision from collaborative studies will be presented. Classification (categorical measurement) means that the object’s property under study is presented by each...
Despite the success of machine learning in the past several years, there has been an ongoing debate regarding the superiority of machine learning algorithms over simpler methods, particularly when working with structured, tabular data. To highlight this issue, we outline a concrete example by revisiting a case study on predictive monitoring in an educational context. In their work, the authors...
In most discrete choice experiments (DCEs), respondents are asked to choose their preferred alternative. But it is also possible to ask them to indicate the worst, or the best and worst alternative among the provided alternatives or to rank all or part of the alternatives in decreasing preference. In all these situations, it is commonly assumed that respondents only have strict preferences...
In spreading processes such as opinion spread in a social network, interactions within groups often play a key role. For example, we can assume that three members of the same family have higher chance to persuade a fourth member to change their opinion than three friends of the same person who do not know each other, and hence who do not belong to the same community. The other way around, in a...
Structural Equation Models (SEMs) are primarily employed as a confirmatory approach to validate research theories. SEMs operate on the premise that a theoretical model, defined by structural relationships among unobserved constructs, can be tested against empirical data by comparing the observed covariance matrix with the implied covariance matrix derived from the model parameters....
Lightning is a chaotic atmospheric phenomenon that is incredibly challenging to forecast accurately and poses a significant threat to life and property. Complex numerical weather prediction models are often used to predict lightning occurrences but fail to provide adequate short-term forecasts, or nowcasts, due to their design and computational cost. In the past decade, researchers have...
Existing control charts for Poisson counts are tailor-made for detecting changes in the process mean while the Poisson assumption is not violated. But if the mean changes together with the distribution family, the performance of these charts may deviate considerably from the expected out-of-control behavior. In this research, omnibus control charts for Poisson counts are developed, which are...
The aim of pattern matching is to identify specific patterns in historical time series data to predict future values. Many pattern matching methods are non-parametric and based on finding nearest neighbors. This type of method is founded on the assumption that past patterns can be repeated and provide informations about future trends. Most of the methods proposed in the literature are...
The shifted (or two-parameter) exponential distribution is a well-known model for lifetime data with a warranty period. Apart from that, it is useful in modelling survival data with some flexibility due to its two-parameter representation. Control charts for monitoring a process that it is modeled as a shifted exponential distribution have been studied quite extensively in the recent...
The problem of measuring the size distribution of ultrafine (nano and submicron-sized) particles is important to determine the physical and chemical properties of aerosols, their toxicity. We give a quick review of some statistical methods used in the literature to solve this problem, for instance an EM algorithm for the reconstruction of particle size distributions from diffusion battery...
Multivariate Singular Spectrum Analysis (MSSA) is a nonparametric tool for time series analysis widely used across finance, healthcare, ecology, and engineering. Traditional MSSA depends on singular value decomposition that is highly susceptible to outliers. We introduce a robust version of MSSA, named Robust Diagonalwise Estimation of SSA (RODESSA), that is able to resist both cellwise and...
This study examines the relationship between foreign affiliates and labour productivity in the construction and manufacturing sectors. Labour productivity is calculated using EUKLEMS & INTANProd database of the Luiss Lab of European Economics, while foreign affiliates abroad data are taken from Eurostat. With the help of data coming from 19 EU countries between 2010 and 2019, we demonstrate...
Our previous contribution to ENBIS included an introduction of BAPC (‘Before and After correction Parameter Comparison’), a framework for explainable AI time series forecasting, which has formerly been applied to logistic regression. An initially non-interpretable predictive model (such as neural network) to improve the forecast of a classical time series ’base model’ is used. Explainability...
This article constructs a control chart for monitoring a ratio of two variances within a bivariate-distributed population. For an in-control process, we assume the in-control two variances and the covariance of the bivariate-distributed population are known. The ratio of two variances is equivalent to a difference of the two variances. An unbiased estimator of the difference between the two...
Causality is a fundamental concept in the scientific learning paradigm. For this purpose, deterministic models are always desirable, but they are often unfeasible due to the lack of knowledge. In such cases, empirical models fitted on process data can be used instead. Moreover, the advent of Industry 4.0 and the growing popularity of the Big Data movement have caused a recent shift in process...
In the pharmaceutical industry, the use of statistics has been largely driven by clinical development, an area where frequentist statistics have been and remain dominant. This approach has led to numerous successes when considering the various effective treatments available to patients today.
However, over time, Null Hypothesis Significance Testing (NHST) and related Type-I error thinking...
Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In...
In modern manufacturing processes, one may encounter processes composed of two or more critical input blocks having an impact on Y-space. If these blocks follow a sequential order, any cause of variation in a particular block may be propagated to subsequent blocks. This is frequently observed when a first block of raw material properties entering a production process influence the performance...
Hyperspectral imaging is an instrumental method that allows obtaining images where each pixel contains information in a specific range of the electromagnetic spectrum. Initially used for military and satellite applications, hyperspectral imaging has expanded to agriculture, pharmaceuticals, and the food industry. In recent decades, there has been an increasing focus on such analytical...
In today’s fast-paced industrial landscape, the need for faster and more cost-effective research and development cycles is paramount. As experiments grow increasingly complex, with more factors to optimize, tighter budgetary and time constraints, and limited resources, the challenges faced by industry professionals are more pressing than ever before.
Although the optimal design of...
This work formulates model selection as an infinite-armed bandit problem, namely, a problem in which a decision maker iteratively selects one of an infinite number of fixed choices (i.e., arms) when the properties of each choice are only partially known at the time of allocation and may become better understood over time, via the attainment of rewards.
Here, the arms are machine learning...
Online outlier detection in multivariate settings is a topic of high interest in several scientific fields, with the Hotelling's T2 control chart being probably the most widely used method in practice to treat it. The problem becomes challenging though when we lack the ability to perform a proper phase I calibration, like in short runs or in cases where online inference is requested from the...
With the routine collection of energy management data at the organisational level comes a growing interest in using data to identify opportunities to improve energy use. However, changing organisational priorities can result in data streams which are typically very messy; with missing periods, poor resolution and containing structures that are challenging to contextualise. Using operational...
Two-level designs are widely used for screening experiments where the goal is to identify a few active factors which have major effects. We apply the model-robust Q_B criterion for the selection of optimal two-level designs without the usual requirements of level balance and pairwise orthogonality. We provide a coordinate exchange algorithm for the construction of Q_B-optimal designs for the...
Process Analytical Technologies have been the key technology of quality maintenance and improvement in process industry. Quality is however only one indicator of process excellence: Safety, Cost, Delivery, Maintenance and specifically Environment are strongly complementary determinants of process value. The rising societal demands on sustainability of contemporary process industry has made...
Industrial experiments often have a budget which directly translates into an upper limit on the number of tests that can be performed. However, in situations where the cost of the experimental tests is unequal, there is no one-to-one relation between the budget and the number of tests. In this presentation, we propose a design construction method to generate optimal experimental designs for...
AdDownloader is a Python package for downloading advertisements and their media content from the Meta Online Ad Library. With a valid Meta developer access token, AdDownloader automates the process of downloading relevant ads data and storing it in a user-friendly format. Additionally, AdDownloader uses individual ad links from the downloaded data to access each ad's media content (i.e. images...
Industry 4.0 contexts generate large amounts of data holding potential value for advancing product quality and process performance. Current research already uses data-driven models to refine theoretical models, but integrating mechanistic understanding into data-driven models is still overlooked. This represents an opportunity to harness extensive data alongside fundamental principles.
We...
Squats are punctual material failures at railway tracks which can lead to critical effects when not detected or removed in time. Investigations in the last years (c.f., e.g., [1], [2], [3], [4]) pointed out the severity of this problem, although relevant questions about root causes remain open. A main reason for this situation may be the challenging detectability of squat genesis as well as...
We discuss the problem of active learning in regression scenarios. In active learning, the goal is to provide criteria that the learning algorithm can employ to improve its performance by actively selecting data that are most informative.
Active learning is usually thought of as being a sequential process where the training set is augmented one data point at a time. Additionally, it is...
Process stability is usually defined using iid assumption about data. However violating stability requires some concrete model like changepoint, linear trend, outliers, distributional models, positive or negative autocorrelation, etc. These violations are often tested separately and not all of the possible modes of instability can always be taken into account. We suggested a likelihood-based...
Different data difficulty factors (e.g., class imbalance, class overlapping, presence of outliers and noisy observations and difficult border decisions) make classification tasks challenging in many practical applications and are hot topics in the domain of pattern recognition, machine learning and deep learning. Data complexity factors have been widely discussed in specialized literature from...
Anomaly detection identifies cases that deviate from a common behavior or pattern in data streams. It is of great interest in a variety of fields, e.g., from biology recognizing uncommon observations in genetic data, to financial sectors identifying frauds through unusual economic activities. Detection of anomalies can be formulated as a binary classification problem, distinguishing between...
In various global regions, In Vitro Diagnostic Medical Devices (IVDs) must adhere to specific regulations in order to be marketed. To obtain approval from entities such as the U.S. Food and Drug Administration (FDA), the In Vitro Diagnostic Medical Devices Regulation (IVDR) in Europe, Health Canada, or Japanese regulatory bodies, manufacturers are required to submit Technical Documentation to...
Bioprinting is an innovative set of technologies derived from additive manufacturing, with significant applications in tissue engineering and regenerative medicine. The quality of printed constructs is commonly measured in terms of shape fidelity trough a procedure known as printability assessment. However, the cost of experimental sampling and the complexity of various combinations of...
The concepts of null space (NS) and orthogonal space (OS) have been developed in independent contexts and with different purposes.
The former arises in the inversion of Partial Least Squares (PLS) regression models, as first proposed by Jaeckle & MacGregor [1], and represents a subspace in the latent space within which variations in the inputs do not affect the prediction of the outputs. The...
When analyzing sensor data, it is important to distinguish between environmental effects and actual defects of the structure. Ideally, sensor data behavior can be explained and predicted by environmental effects, for example via regression. However, this is not always the case, and explicite formulas are often missed. Then, comparing the behavior of environmental and sensor data can help to...
Forecasting is of the utmost importance to the integration of renewable energy into power systems and electricity markets. Wind power fluctuations at horizons of a few minutes ahead particularly affect the system balance and are the most significant offshore. Therefore, we focus on offshore wind energy short-term forecasting.
Since forecasts characterize but do not eliminate uncertainty,...
Our research addresses the industrial challenge of minimising production costs in an undiscounted, continuing, partially observable setting. We argue that existing state-of-the-art reinforcement learning algorithms are unsuitable for this context. We introduce Clipped Horizon Average Reward (CHAR), a method tailored for undiscounted optimisation. CHAR is an extension applicable to any...
An important axiom in innovation is “Fail early, fail often, but learn from the failures.” This talk discusses an academic-industrial statistical engineering project that initially had good prospects for success but ultimately provided virtually no benefit to the industrial partner although it did produce a nice dissertation for the PhD student assigned to the project. It is crucial to note...
In this presentation, we present a case study that results from a multi-stage project supported by NASA’s Engineering Safety Center (NESC) where the objective was to assess the safety of composite overwrapped pressure vessels (COPVs). The analytical team was tasked with devising a test plan to model stress rupture failure risk in carbon fiber strands that encase the COPVs with the goal of...
The aim of AI based on machine learning is to generalize information about individuals to an entire population. And yet...
- Can an AI leak information about its training data?
- Since the answer to the first question is yes, what kind of information can it leak?
- How can it be attacked to retrieve this information?
To emphasize AI vulnerability issues, Direction Générale de l’Armement...
In data-driven Structural Health Monitoring (SHM), a key challenge is the lack of availability of training data for developing algorithms which can detect, localise and classify the health state of an engineering asset. In many cases, it is additionally not possible to enumerate the number of operational or damage classes prior to operation, so the number of classes/states is unknown. This...
Manufacturing processes are systems composed of multiple stages that transform input materials into final products. Drawing inferences about the behavior of these systems for decision-making requires building statistical models that can define the flow from input to output. In the simplest scenario, we can model the entire process as a single-stage relationship from input to output. In the...
Structural Health Monitoring (SHM) is increasingly applied in civil engineering. One of its primary purposes is detecting and assessing changes in structure conditions to reduce potential maintenance downtime. Recent advancements, especially in sensor technology, facilitate data measurements, collection, and process automation, leading to large data streams. We propose a function-on-function...
In this presentation, we provide an overview of deep learning applications in electricity markets, focusing on several key areas of forecasting. First, we discuss state-of-the-art methods for forecasting electricity demand, including Generalised Additive Models (GAMs), which inspired the work that follows. Second, we look at multi-resolution forecasting, which uses data at high- and...
The International Statistical Engineering Association (ISEA) defines statistical engineering as "the discipline dedicated to the art and science of solving complex problems that require data and data analysis." Statistical Engineering emphasizes the importance of understanding the problem and its context before developing a problem-solving strategy. While this step may appear obvious, it is...
Multi-way data extend two-way matrices to a higher-dimensional tensor. In many fields, it is relevant to pursue the analysis of such data by keeping it in its initial form without unfolding it into a matrix. Often, multi-way data are explored by means of dimensional reduction techniques. Here, we study the Multilinear Principal Component Analysis (MPCA) model, which expresses the multi-way...
It is well-known that real data often contain outliers. The term outlier typically refers to a case, corresponding to a row of the $n \times d$ data matrix. In recent times also cellwise outliers are being considered. These are suspicious cells (entries) that can occur anywhere in the data matrix. Even a relatively small proportion of outlying cells can contaminate over half the rows, which is...
Project and problem-based learning is becoming increasingly important in teaching. In statistics courses in particular, it is important not only to impart statistical knowledge, but also to keep an eye on the entire process of data analysis. This can best be achieved with case studies or data analysis projects. In the IPPOLIS project, we are developing a software learning tool that allows...
In this talk, the problem of selecting a set of design points for universal kriging,
which is a widely used technique for spatial data analysis, is further
investigated. We are interested in optimal designs for prediction and present
a new design criterion that aims at simultaneously minimizing the variation
of the prediction errors at various points. This optimality criterion is based
on...
Reinforcement learning proposes a flexible framework that allows to tackle problems where data is gathered in a dynamic context: actions have an influence on future states. The classical reinforcement learning paradigm depends on a Markovian hypothesis, the observed states depend upon past states only through the last state and action. This condition may be too restrictive for real-world...
The "DOE Marble Tower" is a modular 3D-printed experiment system for teaching Design of Experiments. I designed it to solve one primary weakness of most DOE exercises, namely to prevent the ability of the experimenter to simply look at the system to figure out what each factor does. By hiding the mechanics, the DOE Marble Tower feels much more like real processes where the only way to know the...
Finding an optimal experimental design is computationally challenging, especially in high-dimensional spaces. To tackle this, we introduce the NeuroBayes Design Optimizer (NBDO), which uses neural networks to find optimal designs for high-dimensional models, by reducing the dimensionality of the search space. This approach significantly decreases the computational time...
Reinforcement Learning (RL) has emerged as a pivotal tool in the chemical industry, providing innovative solutions to complex challenges. RL is primarily utilized to enhance chemical processes, improve production outcomes, and minimize waste. By enabling the automation and real-time optimization of control systems, RL aims to achieve optimal efficiency in chemical plant operations, thereby...
Over the years I've seen diverse examples of fun elements in teaching statistics at the ENBIS. Of course paper helicopters and using catapults, or candle or water beads projects, for hands-on experience with DoE. But I also vividly remember a Lego assembly competition used for explaining control charts and process control.
Fun parts boost motivation and serve as anchors to remember...
The integration of multimodal artificial intelligence (AI) in warehouses monitoring offers substantial improvements in efficiency, accuracy, and safety. This approach leverages diverse data sources, including visual, and speech sensors, to provide comprehensive monitoring capabilities. Key challenges include the fusion of heterogeneous data streams, which requires sophisticated algorithms to...
Accelerated degradation tests (ADTs) are widely used to assess lifetime information under normal use conditions for highly reliable products. For the accelerated tests, two basic assumptions are that changing stress levels does not affect the underlying distribution family and that there is stochastic ordering for the life distributions at different stress levels. The acceleration invariance...
Storage of spare parts is one of the basic tasks set by the industry. Mathematical models, such as Crow-AMSAA (known in the Statistical Literature as the Power-Law Nonhomogeneous Poisson Process), allow us to estimate the demand based on coming data. Unfortunately, the amount of data is limited in the case of parts with high reliability, which is why the estimation is inaccurate. Bayesian...
Multispectral imaging, enhanced by artificial intelligence (AI), is increasingly applied in industrial settings for quality control, defect detection, and process optimization. However, several challenges hinder its widespread adoption. The complexity and volume of multispectral data necessitate advanced algorithms for effective analysis, yet developing these algorithms is resource-intensive....
Multivariate EWMA control charts were introduced in Lowry et al. in 1992 and became a popular and effective tool for monitoring multivariate data. However, multi-stream data are somehow related to the aforementioned framework. In both cases, correlation between the components respective streams is considered. However, whereas the multivariate EWMA charts deploys a distance (Mahalanobis) in the...
Acceptance sampling plays an imperative role in quality control. It is a common technique employed across various industries to assess the quality of products. The decision to accept or reject a lot depends upon the inspection of a random sample from that lot. However, traditional approaches often overlook valuable prior knowledge of product quality. Moreover, existing Bayesian literature...
In the pharmaceutical industry, there are strict requirements on the presence of contaminants inside single-use syringes (so-called unijects). Quality management systems include various methods such as measuring weight, manual inspection or vision techniques. Automated and accurate techniques for quality inspection are preferred, reducing the costs and increasing the speed of production.
...
In modern industrial settings, the complexity of quality characteristics necessitates advanced statistical methods using functional data. This work extends the traditional Exponentially Weighted Moving Average (EWMA) control chart to address the statistical process monitoring (SPM) of multivariate functional data, introducing the Adaptive Multivariate Functional EWMA (AMFEWMA). The AMFEWMA...
A repairable system can be reused after repairs, but data from such systems often exhibit cyclic patterns. However, as seen in the charge-discharge cycles of a battery where capacity decreases with each cycle, the system's performance may not fully recover after each repair. To address this issue, the trend renewal process (TRP) transforms periodic data using a trend function to ensure the...
Counting processes occur very often in several scientific and technological problems. The concept of numerousness and, consequently the counting of a number of items are at the base of many high-level measurements, in fields such as, for example, time and frequency, optics, ionizing radiations, microbiology and chemistry. Also, in conformity assessment and industrial quality control, as well...
In a period of time when Artificial Intelligence and Machine Learning algorithms are taking over the analysis of our needs in product development. There is still important to be reminded where humans still have to question and control how new product designs handle the aspects of variation and uncertainty.
One part is the mapping of the variation of all aspects of design and production...
Pest insects threaten agriculture, reducing global crop yields by 40% annually and causing economic losses exceeding $70 billion, according to the FAO. Increasing pesticide use not only affects pest species but also beneficial ones. Consequently, precise insect population monitoring is essential to optimize pesticide application and ensure targeted interventions.
In today's AI-driven era,...
Statistical process monitoring is of vital importance in various fields such as biosurveillance, data streams, etc. This work presents a non-parametric monitoring process aimed at detecting changes in multidimensional data streams. The non-parametric monitoring process is based on the use of convex hulls for constructing appropriate control charts. Results from applying the proposed method are...
Quality by Design (QbD) has emerged as a pivotal framework in the pharmaceutical industry, emphasizing proactive approaches to ensure product quality. Central to QbD is the identification of a robust design space, encompassing the range of input variables and process parameters that guarantee pharmaceutical product quality. In this study, we present a comparative analysis of random walk...
The use of composite materials has been increasing in all production industries including the aviation industry, due to their strength, lightness, and design flexibility. The manufacturing of composite materials finalizes with their curing in the autoclaves that are heat and pressure ovens. The autoclave curing cycle, in which a batch of materials is cured in the autoclave, is made of three...
As a collaborative statistician you have been charged with completing a complicated toxicology analysis regarding levels of harmful chemicals in groundwater. At the conclusion of your presentation, an audience member asks, ‘So, should my cows drink the water?” At least half the audience nods and comments that they, too, would like to know the answer to that question. Clearly, something went...
Online experimentation is a way of life for companies involved in information technology and e-commerce. These experiments allocate visitors to a website to different experimental conditions to identify conditions that optimize important performance metrics. Most online experiments are simple two-group comparisons with complete randomization. However, there is great potential for improvement...
Chemical and physical stability of drug substances and drug products are critical in the development and manufacturing of pharmaceutical products. Classical stability studies, conducted under defined storage conditions of temperature and humidity and in the intended packaging, are resource intensive and are a major contributor to the development timeline of a drug product. To provide support...
Mixture choice experiments investigate people's preferences for products composed of different ingredients. To ensure the quality of the experimental design, many researchers use Bayesian optimal design methods. Efficient search algorithms are essential for obtaining such designs, yet research in the field of mixture choice experiments is still not extensive. Our paper pioneers the use of a...
Waste lubricant oil (WLO) is a hazardous residual that is preferably recovered through a regeneration process, for promoting a sustainable circular economy. WLO regeneration is only viable if the WLO does not coagulate in the equipment. Thus, to prevent process shutdowns, the WLO’s coagulation potential is assessed offline in a laboratory through an alkaline treatment. This procedure is...
This talk ties in with the previous two talks in the session: the story and data are from one of the series of cases discussed by Froydis Bjerke, from Animalia, Norway, and the communication focus follows guidelines provided by Jennifer Van Mullekom.
The issues that arise in the case study itself include industrial statistics classics: “is the expensive external laboratory test really better...
In the fishing industry, maintaining the quality of fish such as the Peruvian anchovy (Engraulis ringens), used primarily for fishmeal and oil, is critical. The condition and freshness of the fish directly influence production outcomes and the final product's quality. Traditional methods for assessing fish freshness, though precise, are often too costly and time-consuming for frequent...
Powders are ubiquitous in the chemical industry, from pharmaceutical powders for tablet production to food powders like sugar. In these applications, powders are often stored in silos where the powder builds up stress under its own weight. The Janssen model describes this build up, but this model has unknown parameters that must be estimated from experimental data. This parameter estimation...
As businesses increasingly rely on machine learning models to make informed decisions, developing accurate and reliable models is critical. Obtaining curated and annotated data is essential for the development of these predictive models. However, in many industrial contexts, data annotation represents a significant bottleneck to the training and deployment of predictive models. Acquiring...
The popular ENBIS LIVE session is again on the program!
ENBIS LIVE 2024 will be hosted by Christian Ritter and Jennifer Van Mullekom.
This is a session in which three volunteers present open problems and the audience discusses them. t's a special occasion where we can all work together to make either by providing useful suggestions or by gaining a deeper understanding. In this session,...
Version 18 of JMP and JMP Pro are being released in Spring 2024, bringing a host of new features useful to scientists and engineers in industry and academia. This presentation will focus on some key extensions and improvements: Besides an improved user experience based on a new Columns Manager for easier data management or Platform Presets for creating and reusing customized report templates,...
There is a common perception that bringing in statistical innovation in the highly regulated industry, such as pharmaceutical companies, is a hard mission. Often, due to legal constraints, the statistical innovation in the nonclinical space is not obvious to the outer world. In our discussion panel we would like to discuss challenges we face as industrial statisticians working in...
In stratified designs, restricted randomization is often due to budget or time constraints. For example, if a factor is difficult to change and changing its level is expensive, the tests in a design are grouped into blocks so that within each block the level of the difficult factor is kept constant. Another example appears in agriculture, where some factors may need to be applied to larger...
The rapid progress in artificial intelligence models necessitates the development of innovative real-time monitoring techniques with minimal computational overhead. Particularly in machine learning, where artificial neural networks (ANNs) are commonly trained in a supervised manner, it becomes crucial to ensure that the learned relationship between input and output remains valid during the...
The online quality monitoring of a process with low volume data is a very challenging task and the attention is most often placed in detecting when some of the underline (unknown) process parameter(s) experience a persistent shift. Self-starting methods, both in the frequentist and the Bayesian domain aim to offer a solution. Adopting the latter perspective, we propose a general closed-form...
Flow cytometry is a technique used to analyze individual cells or particles contained in a biological sample. The sample passes through a cytometer, where the cells are irradiated by a laser, causing them to scatter and emit fluorescent light. A number of detectors then collect and analyze the scattered and emitted light, producing a wealth of quantitative information about each cell (cell...
Many measurement system capability studies investigate two components of the measurement error, namely repeatability and reproducibility. Repeatability is used to denote the variability of measurements due to gauge, whereas reproducibility is the variability of measurements due to different conditions such as operators, environment, or time. A gauge repeatability and reproducibility (R&R)...
During the past few decades, it has become necessary to develop new tools for exploiting and analysing the ever increasing volume of data. This is one of reason why Functional Data Analysis (FDA) have become a very popular in a constantly growing number of industrial, societal and medical applications. FDA is a branch of statistics that deals with data that can be represented as functions....
The Advanced Manufacturing Research Centre has invested heavily in AI for manufacturing and has seen success in many applications, including process monitoring, knowledge capture and defect detection. Despite the success in individual projects, the AMRC still has few experts in data science and AI and currently has no framework in place to enable wider adoption of AI nor to ensure the quality...
In order to evaluate the performance of companies, the focus is shifting from purely quantitative (financial) information to qualitative (textual) information. Corporate annual reports are comprehensive documents designed to inform investors and other stakeholders about a company's performance in the past year and its goals for the coming years. We have focused on the corporate sustainability...
Traffic flow estimation plays a key role in the strategic and operational planning of transport networks. Although the amplitude and peak times of the daily traffic flow profiles change from location to location, some consistent patterns emerge within urban networks. In fact, the traffic volumes of different road segments are correlated with each other from spatial and temporal perspectives....
The management of the COVID 19 pandemic, especially during years 2020 and 2021, highlighted a serious shortage at all levels and in the majority of countries around the world.
Some countries reacted slightly better, having faced similar epidemics in their recent past, but obviously this was not enough, since the flows of people worldwide are now so huge that it makes little sense to make...
Kernel Principal Component Analysis (KPCA) extends linear PCA from a Euclidean space to data provided in the form of a kernel matrix. Several authors have studied its sensitivity to outlying cases and have proposed robust alternatives, as well as outlier detection diagnostics. We investigate the behavior of kernel ROBPCA, which relies on the Stahel-Donoho outlyingness in feature space...
We formulate a semiparametric regression approach to short-term prediction (48 to 72 hours ahead horizons) of electricity prices in the Czech Republic. It is based on complexity penalized spline implementation of GAM hence it allows for flexible modeling of dynamics of the process, important details of the hourly + weekly periodic components (which are salient for both point prediction and its...
In recent years, significant progress has been made in setting up decision support systems based on machine learning exploiting very large databases. In many research or production environments, the available databases are not very large, and the question arises as to whether it makes sense to rely on machine learning models in this context.
Especially in the industrial sector, designing...
https://conferences.enbis.org/event/59/
https://conferences.enbis.org/event/60/
In the semiconductor industry it is required that high-tech equipment has a large uptime due to large costs of production losses. As a consequence, it is important to have accurate reliability predictions of parts of such equipment, so that there are sufficient spare parts available. This is not a trivial task since high-tech equipment may consist of thousands of parts.
It is common in the...
Prognostics of cutting tools health is an important and challenging task in manufacturing industry. The main objective of prognostics is to examine the ability of the cutting tool to perform its function throughout its expected life and determine its remaining useful life (RUL). An accurate estimate of RUL will aid in maximizing the utilization of the cutting tool, improve quality performance,...
In Industry 4.0 factories, innovative prediction tools are adopted
so that data can be systematically processed into information that can
explain uncertainties and support decisions. Predictive manufacturing
systems begin with acquiring data from monitored assets using appropriate
sensors to extract various signals. These signals can then be integrated
with historical data into extensive...
This paper compares measurements from a regular track measurement car and an onboard measurement system mounted on a regular passenger train car. The measurement systems were compared as an experimental instrument to assess a maintenance action. The experiment involved frequent pre- and post-maintenance measurements from onboard mounted equipment to assess short-term effects, while more...
Tony Greenfield said, ‘That is my challenge: Tell the world, outside your circle, of work you have done, and done successfully because you used statistics.’
Often outside our circle, if you mention the word, statistics! The reply is often ‘There are three kinds of lies: lies, damned lies, and statistics.
We use social media to communicate, this includes LinkedIn, a network for...
Counting data with excess zeros is commonly encountered in various scientific fields such as public health, insurance, economics, and engineering. To handle this issue, zero-inflated count models such as zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models are widely used.
In the context of regression models, it can be beneficial to incorporate uncertain prior...