Decode the complexities of manufacturing. From digital threads to workflow automation, access the definitive guide to the terminology driving the next generation of assembly.
Feature engineering is the process of creating, selecting, and transforming input variables (features) from raw data so that machine learning (ML) models can use them effectively. It translates domain knowledge and raw signals into structured, numerical or categorical representations that algorithms can work with.
It typically includes activities such as:
– Selecting which raw data fields or signals to use
– Cleaning and standardizing values (units, ranges, formats)
– Aggregating measurements over time or batches
– Deriving new variables from existing ones (e.g., ratios, deltas, rolling statistics)
– Encoding categorical values into numerical form
– Normalizing or scaling features to appropriate ranges
Feature engineering is usually performed before model training and is often captured in a repeatable pipeline so that the same transformations can be applied consistently to new data.
In manufacturing and other industrial domains, feature engineering commonly operates on:
– Process data from PLCs, historians, and OT systems (temperatures, pressures, speeds)
– MES data (work orders, material genealogy, routing steps, operator IDs)
– Quality data (in-process and final test measurements, SPC statistics)
– Maintenance data (run-time counters, alarms, failure codes)
Examples include:
– Converting second-by-second sensor traces into summary statistics per batch or lot
– Calculating time since last maintenance or time-in-state per machine
– Deriving features such as yield, scrap rate, or rework count per order
– Encoding production route, shift, or product family as model-ready features
In regulated environments, feature engineering steps are often:
– Documented as part of the data pipeline design
– Version-controlled alongside model code
– Subject to change control and impact assessment when modified
When AI models are integrated with MES, feature engineering strongly influences how explainable and trustworthy the models are:
– **Traceability and data lineage:** Each engineered feature should be traceable back to its raw data source (e.g., specific MES fields, historian tags) and transformation logic.
– **Interpretability:** Using domain-meaningful features (e.g., “average oven temperature in curing step” rather than opaque encodings) supports human review of model behavior.
– **Use-case boundaries:** Engineered features often encode assumptions about process conditions, time windows, or product families, which define where the model is appropriate to use.
– **Validation:** The correctness and stability of feature calculations are validated along with the model, since errors in feature engineering can lead to misleading outputs.
In this context, feature engineering is treated as part of the overall AI design, not just a technical preprocessing step.
Feature engineering:
– **Includes:** Data cleaning, transformation, and representation steps specifically aimed at preparing inputs for ML models.
– **Excludes:** The training of the ML model itself (model selection, fitting, hyperparameter tuning), even though these depend on the engineered features.
– **Excludes:** Basic ETL or integration work that does not change the informational content of the data beyond formatting, unless it directly defines input variables for a model.
It is related to, but distinct from:
– **Data engineering:** Focused on data storage, transport, and availability at scale.
– **Feature selection:** Choosing a subset of features, which may be part of feature engineering but is sometimes treated as a separate modeling step.
– **Versus raw data extraction:** Simply pulling tags from a historian or columns from a MES database is not, by itself, feature engineering. The term is reserved for the deliberate design of input variables from that data.
– **Versus automated feature learning:** Some modern ML methods (e.g., deep learning) can learn internal representations from raw data. Even then, in industrial settings, explicit feature engineering is still commonly used to capture domain-specific knowledge, ensure traceability, and support explainability.