FAQ

Which metrics best indicate production system health beyond delivery counts?

Delivery counts and on-time delivery are lagging outcomes, not true indicators of production system health. In regulated, high-mix environments, you need a balanced set of leading and lagging metrics across flow, quality, assets, workforce, and system integrity. The right mix depends on your data maturity, integrations, and validation constraints.

1. Flow and stability metrics

These show whether work moves predictably through the system, independent of short-term expediting.

  • Throughput by constraint / bottleneck resource: Units or standard hours completed at the true constraint, not just shipped. Requires stable routing and time standards.
  • Work-in-process (WIP) by stage: WIP levels at key operations or value-stream segments. Rising WIP at a particular step often signals hidden defects, staffing gaps, or scheduling issues.
  • Queue time vs process time: Ratio of waiting time to actual touch time. A high ratio indicates systemic flow problems, even if deliveries are currently being met via expediting.
  • Schedule adherence: Percentage of orders completed in the planned sequence and time bucket, not just shipped on time. This is a good early-warning metric for firefighting behavior.

2. Quality and rework metrics

Healthy operations show stable, low variation in quality performance, with visible and acted-on feedback loops.

  • First pass yield (FPY) at key operations: Percentage of units passing a step without rework or deviation. In aerospace and similar environments, include concessions and use-as-is dispositions, not just hard rejects.
  • Final yield: Good units shipped vs total units started for a part number or family. Sensitive to scrap, rework, and test failures.
  • Cost of poor quality (COPQ): Labor, material, and overhead consumed by scrap, rework, MRB activity, and customer returns. Calculation methods vary and should be documented to remain auditable.
  • NCR rate and severity: Nonconformance count per unit or per labor hour, stratified by criticality (e.g., safety/airworthiness related vs minor). Requires consistent coding in your QMS or MES.
  • Rework cycle time: Time from NCR creation to closure. Long durations indicate systemic bottlenecks in MRB, inspector availability, or engineering decision-making.

3. Asset and equipment performance

The goal is predictable capability and availability, not just high utilization.

  • Overall equipment effectiveness (OEE) for critical assets: Availability, performance, and quality multipliers. In high-mix contexts, OEE is useful mainly when normalized carefully and limited to selected constraint resources.
  • Planned vs unplanned downtime: Percentage of machine downtime that occurs as planned maintenance, setups, or changeovers vs unexpected events. A rising unplanned share is an early signal of reliability and maintenance issues.
  • Mean time between failures (MTBF) / Mean time to repair (MTTR): For key machines, especially those with long qualification cycles or tooling lead times.
  • Setup and changeover time: Particularly important in high-mix, low-volume operations. Trends here directly affect your ability to maintain flow without excess WIP.

4. Labor, standard work, and workforce health

Delivery can be maintained short-term by burning people out. System health metrics must expose this.

  • Labor productivity: Value-added hours vs total hours, or units / standard hours vs actual hours. In regulated settings, ensure the standard data and actuals are controlled and traceable.
  • Overtime level and distribution: Percentage of hours worked as overtime, by area. Sustained high overtime often masks capacity, planning, or training issues.
  • Training and certification coverage: Percentage of operations run by properly certified / qualified operators per QMS requirements. Depends on robust training records and controlled work instruction systems.
  • Adherence to standard work: Measured via layered process audits, digital work instruction usage, or similar. Non-adherence is a leading indicator of future quality and safety problems.

5. Planning and material health

Production health is fragile when material and planning signals are unstable, even if deliveries look fine right now.

  • Material availability at schedule release: Percentage of work orders that can start on time with all required materials, tooling, and documents available. Requires integration between ERP, MES, and stores.
  • Shortage count and recurrence: Number of active shortages, frequency of repeat shortages on the same parts, and impact on constrained resources.
  • Reschedule churn: Frequency and magnitude of work-order rescheduling and priority changes. High churn indicates weak demand signals or unstable planning parameters.

6. System integrity and compliance signals

In regulated environments, system health includes the trustworthiness and stability of the digital backbone.

  • Data integrity incidents: Number of issues such as misaligned revisions, missing signatures, incorrect routings, or broken genealogy links detected in production or audits.
  • Document and revision adherence: Percentage of work performed to the correct, approved revision of drawings, specifications, and work instructions. This generally requires MES or digital traveler controls.
  • Audit and LPA findings: Trends in internal audit and layered process audit findings tied to production processes and documentation control.
  • Rework related to configuration errors: Portion of defects caused by wrong parts, revisions, or routings, which often arise from weak system integration rather than operator skill.

7. Choosing metrics realistically in brownfield environments

The list above is intentionally broad. In most brownfield plants with mixed MES, ERP, PLM, and QMS systems, you cannot measure all of these reliably on day one.

  • Start from critical constraints: Focus on metrics around the few resources, operations, or product families that drive most lead time, risk, or margin.
  • Assess data readiness: Before setting a metric as a KPI, verify that definitions are clear, time stamps align across systems, and manual workarounds are sustainable and auditable.
  • Avoid full replacement as a prerequisite: Waiting for a new monolithic system to replace legacy MES/ERP/QMS to “get perfect data” typically delays improvement and introduces qualification and downtime risk.
  • Validate calculations in regulated contexts: Where metrics may be used in decisions that affect product quality or compliance (e.g., risk-based sampling, staffing decisions), ensure calculations and reports are controlled, versioned, and validated.

8. Putting it together as a health dashboard

A practical production health view usually includes a small, stable set of metrics across categories, not a long list:

  • Flow: WIP by stage, queue vs process time, schedule adherence.
  • Quality: FPY at key steps, NCR rate/severity, COPQ trend.
  • Assets: OEE or availability for a few critical assets, unplanned downtime.
  • Workforce: Overtime level, training coverage, layered process audit adherence.
  • Planning/material: Material availability at release, shortage count, reschedule churn.
  • System integrity: Revision adherence, configuration-related defects.

The exact thresholds and targets will vary by plant, product, and regulatory context. What matters is that the metrics are defined clearly, traceable to their data sources, realistic given existing systems, and stable enough to drive disciplined problem solving rather than short-term firefighting.

Get Started

Built for Speed, Trusted by Experts

Whether you're managing 1 site or 100, Connect 981 adapts to your environment and scales with your needs—without the complexity of traditional systems.

Get Started

Built for Speed, Trusted by Experts

Whether you're managing 1 site or 100, C-981 adapts to your environment and scales with your needs—without the complexity of traditional systems.