FAQ

What level of data quality is acceptable to start AI pilots in aerospace?

There is no single percentage threshold that makes data quality “acceptable” for AI pilots in aerospace.

In practice, the right standard is this: the data must be good enough for the specific pilot objective, and the pilot must be designed so that bad data cannot create uncontrolled operational, quality, or traceability risk.

For low-risk pilots, organizations often start with imperfect data. For example, advisory use cases such as document search, failure-code clustering, scheduling insights, or nonconformance trend analysis can tolerate some missing fields, inconsistent naming, and historical gaps if the limitations are known and visible. That is very different from using AI to drive acceptance decisions, process parameter changes, release steps, or any action that affects regulated records without review.

What is usually acceptable to start

For an aerospace AI pilot, acceptable data quality usually means:

  • The data lineage is known. You should know which system produced the data, how it was extracted, and where transformations occurred.

  • The key fields for the use case are mostly complete. Not every field matters equally. A pilot predicting part shortages needs different critical fields than a pilot analyzing NCR patterns.

  • Definitions are stable enough to compare records. If defect codes, work center names, revision identifiers, serial numbers, or timestamps are inconsistent across sources, model output may be misleading.

  • The error rate is bounded and understood. Some noise is tolerable. Unknown bias is much more dangerous than known incompleteness.

  • The data reflects current operations closely enough. If routings, equipment states, product structures, or quality workflows changed materially, old data may not represent present conditions.

  • Outputs can be checked by humans. Early pilots should usually remain decision support, not autonomous control.

A useful rule of thumb is that if subject matter experts cannot review a sample dataset and explain its gaps, conflicts, and likely distortions, the organization is probably not ready for even a narrow pilot.

What is not acceptable

Data is usually not acceptable when:

  • record identity is unreliable, such as weak linkage between part, lot, serial, work order, and operation records

  • timestamps are too inconsistent to reconstruct sequence of events

  • master data changes are uncontrolled or undocumented

  • large portions of relevant process history exist only in paper, email, or operator memory

  • training data contains unresolved duplicates, revision conflicts, or mixed contexts from different processes

  • the pilot would influence regulated decisions without validated controls, review, and evidence retention

In those cases, the pilot often becomes a data-cleanup exercise disguised as an AI project.

How much quality is enough depends on the use case

The required quality level rises with the consequence of being wrong.

  • Lower-risk use cases: search, summarization, anomaly flagging, engineering knowledge retrieval, maintenance trend detection, and queue prioritization can often start with partial data if limitations are explicit.

  • Medium-risk use cases: yield drivers, rework prediction, supplier performance analysis, and schedule-risk forecasting need better historical consistency and stronger cross-system mapping.

  • Higher-risk use cases: process optimization affecting qualified operations, automated quality disposition support, release-related recommendations, or anything tied to regulated records requires much stricter controls, validation, and usually a narrower initial scope.

The common mistake is to ask whether the data is good enough for AI in general. The real question is whether it is good enough for this decision, in this workflow, with these controls.

Brownfield reality matters

In aerospace, data quality is often limited less by one bad system than by coexistence problems across MES, ERP, PLM, QMS, historians, spreadsheets, and manual workarounds. Different plants may use different coding structures, event models, and revision practices. That does not mean AI pilots must wait for a full platform replacement.

In fact, full replacement is often the wrong prerequisite in regulated, long-lifecycle environments. It can fail because of qualification burden, validation cost, downtime risk, integration complexity, and the need to preserve traceability across legacy assets and processes. A narrower pilot that works with existing systems, documents assumptions, and isolates risk is usually more realistic.

But coexistence has a cost. If data mapping and governance are weak, the pilot may appear to perform well in a sandbox while failing in production because interfaces, identifiers, and process context do not hold up outside the test set.

Best way to start

A practical starting point is not “clean all the data first.” It is to choose one narrow, high-friction, low-consequence use case and test whether the available data can support it with controlled review.

Typical gating checks include:

  • Can you identify the source systems and owners for the required data?

  • Can you sample records and quantify missingness, duplicates, and obvious contradictions?

  • Can process, quality, and engineering leaders agree on the meaning of the fields used?

  • Can you retain prompts, model versions, outputs, and review evidence where needed?

  • Can the pilot run without bypassing change control or altering the system of record?

If the answer to those questions is mostly yes, you may be ready to start a pilot even if the data is far from perfect.

If the answer is no, the immediate priority is usually data readiness and workflow discipline, not model selection.

Bottom line

Acceptable data quality for an aerospace AI pilot is not perfection. It is sufficiency, traceability, and controllable risk for a narrowly defined use case.

Start when the data is reliable enough to support bounded decision support, the limitations are measured, and humans can catch errors before they affect product, process, or records. Do not start when the pilot depends on unstable identifiers, unclear lineage, or uncontrolled use of outputs in regulated workflows.

Get Started

Built for Speed, Trusted by Experts

Whether you're managing 1 site or 100, C-981 adapts to your environment and scales with your needs—without the complexity of traditional systems.