FAQ

What data do we need to start building predictive quality models from NCRs?

At minimum, you need structured NCR data tied to operational context and eventual outcomes. NCR records by themselves are usually not enough, especially if they are mostly free text, inconsistently coded, or disconnected from MES, ERP, PLM, QMS, inspection, or supplier data.

A practical starting point is not “all plant data.” It is a smaller, traceable dataset where each NCR can be linked to what was built, how it was built, who supplied the material, where in the process it occurred, and what happened next.

Minimum data to start

  • NCR core record: NCR ID, date and time opened, site, line or cell, product or program, part number, revision, serial or lot if applicable, defect category, defect description, severity or priority if used, disposition, and closure status.

  • Process context: operation or routing step, work center, machine or asset ID where relevant, inspection point, shift, operator or team if governance allows, and whether the issue was found in incoming, in-process, final inspection, or field/MRO context.

  • Material and supplier context: supplier ID, purchase order or receipt linkage, batch or lot, material cert reference where applicable, outside processing step, and whether the issue was internal or supplier-originated.

  • Product definition context: part family, assembly relationship, drawing or spec revision, manufacturing plan version, approved work instruction version, and any relevant engineering change state.

  • Inspection and measurement data: characteristic or feature inspected, pass/fail result, measured values where available, gage or method used, sampling context, and whether measurement system variation is understood well enough to trust the signal.

  • Outcome data: scrap, rework, use-as-is, return to supplier, concession or deviation if applicable, time to disposition, time to closure, recurrence, cost estimate if tracked, and downstream effects such as schedule delay or repeated escapes.

  • Corrective action context: containment actions, root cause coding if it exists, CAPA linkage, effectiveness check result, and whether a similar issue had been seen before.

What makes the data usable for modeling

The most important requirement is consistent keys and timestamps. If you cannot reliably join NCRs to work orders, travelers, lots, serials, suppliers, revisions, inspections, and dispositions, you will spend more effort resolving data lineage than building a useful model.

You also need stable definitions. If one site uses defect codes by symptom, another by cause, and a third by disposition, the model may learn coding habits instead of process risk. That is common in brownfield environments.

For most teams, usable data quality means:

  • Repeatable coding for defect type, location, source, and disposition

  • Enough record volume over time to capture recurrence patterns

  • Known data ownership and change control

  • Traceable joins across QMS, MES, ERP, PLM, and inspection systems

  • Event timestamps accurate enough to reconstruct sequence

  • Validation that missing data is understood, not random guesswork

What usually matters more than model choice

In practice, feature quality matters more than whether you start with a complex algorithm. Many programs get better early results from simple, explainable models built on clean operational signals than from advanced machine learning applied to weak NCR data.

Common predictive features include prior defect frequency by part-operation pair, supplier-specific issue history, process step recurrence, inspection failure rates, rework loops, revision changes, shift or handoff patterns, queue time, and material lot clustering. Whether those features are available depends on your integration quality and traceability maturity.

What not to rely on alone

Free-text NCR narratives alone are usually not sufficient. Text can help, especially for triage or clustering, but it often contains inconsistent terminology, abbreviations, copy-forward habits, and missing context. If text is the only source, your first project is often data standardization, not prediction.

Also be careful with cost fields, root cause fields, and operator identifiers. These are often incomplete, entered late, or influenced by local behavior rather than true process conditions.

How much history do you need?

There is no universal threshold. It depends on product mix, event rates, process stability, and how granular the prediction target is. A high-mix, low-volume plant may have years of NCRs and still not have enough repeatability at the individual part-number level. In that case, you may need to model at the part family, process family, supplier, or defect category level instead.

If the process, routing, coding scheme, or product definition changed materially over time, older data may be only partially useful. More history is not automatically better if the underlying process is no longer comparable.

Brownfield reality

Most plants do not have all of this in one system. NCR data may sit in QMS, execution context in MES, product structure in PLM, receipts and suppliers in ERP, and measurements in separate SPC or inspection tools. That is normal.

You do not need a full platform replacement to begin, and in regulated, long-lifecycle environments that strategy often fails because of qualification burden, validation cost, downtime risk, integration complexity, and the need to preserve traceability and controlled change. A narrower approach is usually more realistic: define a specific prediction target, map the required records across existing systems, validate the joins, and prove data reliability before scaling.

Recommended first use case

Start with one constrained question such as:

  • Which incoming lots are most likely to generate an NCR?

  • Which part-operation combinations are most likely to recur as rework?

  • Which open NCRs are most likely to become high-cost scrap or schedule delay?

Those use cases usually need less data than a broad “predict all quality issues” initiative and are easier to validate operationally.

Bottom line

To start building predictive quality models from NCRs, you need structured NCR records plus traceable links to process, product, supplier, inspection, and outcome data. If those links are weak, the limiting factor is data readiness, not analytics. Start with one prediction target, one governed dataset, and one integration path you can validate under change control.

Get Started

Built for Speed, Trusted by Experts

Whether you're managing 1 site or 100, Connect 981 adapts to your environment and scales with your needs—without the complexity of traditional systems.

Get Started

Built for Speed, Trusted by Experts

Whether you're managing 1 site or 100, C-981 adapts to your environment and scales with your needs—without the complexity of traditional systems.