Glossary

data lineage

Data lineage is the documented life cycle of data, showing where it originated, how it moved, and how it was transformed over time.

Core meaning

Data lineage is the documented life cycle of data, showing where it originated, how it moved between systems, and how it was transformed or aggregated at each step. It provides an end‑to‑end view of data flows from source to final use.

In industrial and manufacturing environments, data lineage commonly refers to the traceable path of data used for production control, quality records, and KPI reporting across OT and IT systems.

Key elements of data lineage

Data lineage typically captures:

– **Data sources**: originating systems or devices (e.g., PLCs, sensors, MES, LIMS, ERP).
– **Data movements**: interfaces and integrations (batch jobs, message queues, APIs, ETL/ELT tools).
– **Transformations and calculations**: rules applied to data (filtering, unit conversion, KPI formulas, aggregations).
– **Storage locations**: databases, data lakes, historians, and reporting data models.
– **Downstream uses**: dashboards, reports, regulatory submissions, and audit trails.

The lineage may be represented as diagrams, metadata records, or automatically generated maps in data catalog or observability tools.

Use in industrial and regulated environments

In manufacturing operations, data lineage is commonly used to:

– Show how **shop-floor data** (machine states, production counts, process parameters) flows into MES, historians, and analytics tools.
– Demonstrate how **quality and batch data** are compiled from multiple sources for electronic batch records or deviation investigations.
– Trace the origin of **KPI values** (e.g., OEE, yield, scrap rate), including which tags, transactions, and calculations produced them.
– Support **auditability and investigations**, allowing teams to reconstruct which data and versions of logic were in effect at a given time.

Lineage information is often maintained as part of a broader metadata, data governance, or validation framework, especially where regulations require evidence that reported data is accurate, complete, and traceable.

Boundaries and what data lineage is not

Data lineage:

– **Is**: a description of the path and transformations of data.
– **Is not**: the business meaning of the data (this is usually handled by data definitions or a data catalog).
– **Is not**: product or material traceability, although both can be related.
– **Is not**: a guarantee of data quality; it supports quality assessment by making sources and transformations visible.

Lineage records may reference related concepts such as data ownership, data quality rules, or system validation, but these are distinct governance elements.

Common confusion and related terms

– **Data lineage vs. data provenance**: In many IT and analytics contexts, these terms are used interchangeably. Some groups use *provenance* more narrowly for proof of origin and custody at the individual record level, while *lineage* emphasizes the overall flow across systems and processes.
– **Data lineage vs. traceability**: In manufacturing, *traceability* usually refers to tracking materials, lots, or products through the physical process. *Data lineage* tracks information objects and transformations, not physical items, though both may be linked in investigations.
– **Data lineage vs. data logging or history**: Historians and logs store time‑series or transactional records. Lineage describes **how** those records got there and how they are later used or transformed.

Being explicit about these distinctions helps avoid assuming that material traceability or the existence of a history automatically provides full data lineage for reporting or compliance.

Site context: MES data and KPI trust

When discussing trust in MES data for KPI reporting, data lineage commonly refers to:

– Documented mapping from **source tags, events, and records** to MES objects and KPI data models.
– Clear visibility of **all calculations and filters** applied between raw events and final KPIs.
– Traceable **system hops** (e.g., PLC → SCADA → MES → data warehouse → BI tool).
– Version awareness of **integration logic and KPI formulas** at the time a report was generated.

In this context, data lineage supports the ability to demonstrate how each reported value was produced and to reconcile reported KPIs back to underlying operational and physical records.

Related FAQ

Let's talk

Ready to See How C-981 Can Accelerate Your Factory’s Digital Transformation?