FAQ

What documentation should I keep to support audits of AI in production?

Q: What documentation should I keep to support audits of AI in production?

Keep enough documentation to show what the AI does, where its data came from, how it was validated, who approved changes, how outputs are reviewed, and how issues are handled. In regulated production, audit support usually depends less on a single AI file and more on a traceable evidence set across quality, IT, operations, and change control systems.

Audit and Compliance Readiness (AS9100, LPAs and Process Audits)Audit Readiness (AS9100)

You should keep a traceable evidence package that shows how the AI system was selected, configured, validated, monitored, changed, and governed in actual production use. In most regulated manufacturing environments, auditors will not be looking for one document. They will be looking for a coherent record set across quality, engineering, operations, and IT.

The exact package depends on what the AI is doing. A scheduling assistant, a vision model used for inspection support, and a model that proposes process parameter changes do not carry the same risk. The more the system can influence product quality, release decisions, traceability records, or operator actions, the more rigorous the documentation usually needs to be.

In practice, this connects to AS9100 compliance when teams need to turn the answer into repeatable execution habits.

Core documentation to retain

Intended use and scope
Document the business purpose, process boundaries, users, decision rights, inputs, outputs, and any prohibited uses. Be explicit about whether the AI is advisory, semi-automated, or allowed to trigger actions.
Risk assessment
Keep a documented assessment of failure modes, foreseeable misuse, data quality risks, model drift risks, cybersecurity considerations, and impact on product quality, traceability, and operations. Include the controls you rely on to reduce those risks.
System architecture and data flow
Maintain records showing where data originates, how it is transformed, what systems exchange data, what identifiers are used, and where outputs are stored. In brownfield plants, this often means documenting MES, ERP, PLM, QMS, historian, SCADA, document control, and local spreadsheets or operator stations that still participate in the process.
Data lineage and data readiness evidence
Retain source datasets, data definitions, inclusion and exclusion criteria, labeling or annotation methods if used, preprocessing rules, known gaps, and any data quality checks. If training or tuning used historical plant data, you should be able to show what period, what equipment, and what process conditions were represented.
Model and configuration records
Keep versioned records of the model type, vendor or internal build, prompts or system instructions if relevant, tuning parameters, thresholds, business rules, confidence limits, and any fallback logic. For vendor systems, you may not get full internal model details, so document what is actually available and note the limitation.
Validation and verification evidence
Keep test protocols, acceptance criteria, test results, exception handling, re-test results, and sign-offs. Validation should be tied to intended use, not just generic vendor claims. If performance depends on local data, operators, or integration quality, your records should say that clearly.
Human review and operating procedures
Document who reviews AI outputs, what they are expected to check, when they can override the system, when escalation is required, and how the review is recorded. This matters especially when AI recommendations affect inspection, routing, maintenance, scheduling, or quality events.
Change control
Keep formal records for model updates, prompt changes, rules changes, retraining events, connector changes, interface changes, and master data changes that could alter behavior. In regulated environments, undocumented tuning is a recurring audit problem.
Access control and security records
Retain evidence showing who can configure, approve, run, override, and administer the system, along with authentication, authorization, logging, and incident response practices. If technical data or controlled information is involved, document how access boundaries are enforced.
Audit trails and operational logs
Keep timestamped records of inputs, outputs, user actions, approvals, exceptions, overrides, and downstream actions taken. If the AI contributes to a production record, you should be able to reconstruct what happened for a specific lot, serial number, work order, or event.
Monitoring and periodic review
Retain evidence of ongoing performance review, drift checks where applicable, false positive or false negative trends, complaint signals, NCR or CAPA linkages, and trigger conditions for revalidation.
Deviation, incident, and CAPA records
When the AI behaves unexpectedly or contributes to a process issue, keep the investigation, containment, root cause work, corrective actions, and effectiveness checks linked back to the system version and affected records.
Training records
Document training for operators, engineers, reviewers, and administrators, including what they were trained to trust, what they were trained not to trust, and how exceptions are handled.
Supplier and service documentation
For third-party AI tools, retain contracts, service descriptions, release notes, support commitments, data handling terms, and vendor change notifications. If you cannot obtain needed evidence from the supplier, that gap should be acknowledged and addressed through compensating controls where possible.

What auditors usually want to see in practice

Most audits are less about the phrase AI and more about whether you can demonstrate control. In practice, that usually means you can answer these questions with records:

What exactly is this system allowed to do?
What data does it rely on, and is that data trustworthy enough for the intended use?
How was it validated in your environment, not just by the vendor?
Who approves changes, and how do you know what version was active at a given time?
How are users prevented from treating suggestions as automatic truth?
How do you detect bad outputs, integration errors, drift, or silent failures?
Can you reconstruct the system’s role in a specific production event or quality decision?

Brownfield reality

In many plants, the documentation is spread across existing systems rather than stored in one AI repository. That is normal. The challenge is not only creating documents, but linking them. Your evidence may live across QMS records, change requests, MES transaction history, ERP master data approvals, document control, validation binders, SIEM logs, and vendor tickets.

That is also why full replacement strategies often fail. Replacing MES, QMS, ERP, and plant integrations just to make AI governance cleaner usually creates more qualification burden, validation cost, downtime risk, and traceability disruption than most regulated operations can absorb. A more realistic approach is to define an evidence map that shows which system is the system of record for each control.

Common gaps

Using pilot documentation in production without updating intended use, risks, and approvals.
Keeping model test results but not the production configuration that was actually deployed.
Logging outputs without logging who reviewed them or what downstream action was taken.
Assuming vendor documentation is enough for local validation.
Failing to document prompts, thresholds, business rules, or retrieval sources because they seemed operational rather than validated.
Not linking AI incidents to NCR, deviation, or CAPA processes.

Minimum practical structure

If you need a starting point, keep at least these controlled record groups:

AI inventory with owner, intended use, risk level, interfaces, and current version.
Validation package tied to intended use and site conditions.
Change history with approvals and effective dates.
Operational logs and audit trail retention rules.
Periodic review records with performance and incident trends.
Training and access authorization records.

If those six areas are weak, audit support will usually be weak as well.

The main constraint is that documentation quality cannot compensate for poor process control, weak integrations, or unvalidated use. If the AI depends on unstable source data, informal operator workarounds, or undocumented system changes, those weaknesses will surface during an audit even if the document set looks complete.

What documentation should I keep to support audits of AI in production?

Core documentation to retain

What auditors usually want to see in practice

Brownfield reality

Common gaps

Minimum practical structure

Related Blog Articles

ITAR Compliance Manufacturing: A Practical Guide for Defense and Aerospace Operations

ISO 22400 KPI Governance: Keeping Metrics Consistent Across Time and Sites

Manufacturing Operations Management Standards in Aerospace: ISA-95, IEC 62264, and ISO 22400

MES vs SCADA: Understanding Two Complementary Manufacturing Systems

Built for Speed, Trusted by Experts

product

Resources

About

Built for Speed, Trusted by Experts

What documentation should I keep to support audits of AI in production?

Core documentation to retain

What auditors usually want to see in practice

Brownfield reality

Common gaps

Minimum practical structure

Related Blog Articles

ITAR Compliance Manufacturing: A Practical Guide for Defense and Aerospace Operations

ISO 22400 KPI Governance: Keeping Metrics Consistent Across Time and Sites

Manufacturing Operations Management Standards in Aerospace: ISA-95, IEC 62264, and ISO 22400

MES vs SCADA: Understanding Two Complementary Manufacturing Systems

Built for Speed, Trusted by Experts

product

Resources

About

Social

Language

Search

Built for Speed, Trusted by Experts