Glossary

data lakehouse

A data architecture that combines data lake storage with data warehouse-style management to support analytics and reporting.

Core meaning

A **data lakehouse** is a data architecture that combines characteristics of both a data lake and a data warehouse in a single platform. It typically:

– Stores large volumes of raw and semi-structured data at scale (like a data lake).
– Provides structured schemas, governance, and query performance comparable to a data warehouse.
– Supports batch, interactive, and often streaming analytics on the same underlying data.

In practice, a data lakehouse usually consists of low-cost, scalable storage (object storage or distributed file systems) with an added layer that manages tables, schemas, transactions, and access control to make the data reliable for BI and analytics workloads.

Use in industrial and manufacturing environments

In industrial and regulated manufacturing settings, a data lakehouse commonly refers to a central analytics layer that:

– Consolidates OT, MES, ERP, LIMS, maintenance, and supplier data into a single logical store.
– Retains raw time-series and event data (e.g., historian tags, machine logs) alongside curated, structured tables.
– Enables cross-plant or cross-supplier reporting, operations intelligence, and advanced analytics without duplicating data into multiple warehouses.
– Supports data scientists and engineers who need access to both granular shop-floor data and standardized business views.

A lakehouse often acts as the intermediate data layer between plant-level systems (MES, historian, SCADA) and enterprise analytics or dashboard tools.

Relationship to data lakes and data warehouses

A data lakehouse is intended to bridge the gap between:

– **Data lake**: Focused on storing raw, diverse data with minimal upfront modeling, but often lacking robust governance and performance for BI.
– **Data warehouse**: Focused on highly modeled, structured data with strong governance and high-performance queries, but less flexible for raw or semi-structured data.

The lakehouse adds data management features on top of data-lake-like storage, such as:

– Table formats with schema definitions and evolution.
– Transactional guarantees (e.g., ACID semantics) for reliable updates.
– Metadata catalogs and governance functions.

Site-context application: cross-site MES and supplier data

In the context of combining MES dashboards across plants and suppliers, a **data lakehouse** is often used as the shared data foundation that:

– Ingests and stores data from multiple MES instances, historians, ERP systems, and supplier feeds.
– Applies normalization and harmonized data models to align equipment IDs, product codes, quality attributes, and time bases.
– Exposes standardized tables and views that analytics tools and dashboards can query for cross-site or cross-supplier comparisons.

The lakehouse itself does not guarantee alignment; it provides a technical platform where common data models, governance rules, and validation logic can be implemented.

Boundaries and exclusions

– A data lakehouse **is not** a specific product or vendor implementation, although some vendors market platforms under this term.
– It **is not** the same as a traditional on-premises data warehouse, even if it can serve similar reporting use cases.
– It **is not** a replacement for MES, historian, or ERP; instead, it aggregates and structures data produced by those systems for analytics and reporting.

Common confusion and related terms

– **Data lake vs. data lakehouse**: A lakehouse includes management and governance features (e.g., table formats, transactions, catalogs) that a basic data lake may not provide.
– **Data warehouse vs. data lakehouse**: A data warehouse usually stores only modeled, structured data. A lakehouse supports both structured and raw or semi-structured data on the same storage.
– **Data fabric / data mesh vs. data lakehouse**: Data fabric and data mesh are broader architectural or organizational concepts for how data is managed and owned. A lakehouse is more specifically an architectural pattern for storage and analytics.

Related FAQ

Let's talk

Ready to See How C-981 Can Accelerate Your Factory’s Digital Transformation?