IMDRF's Draft N93 Technical Framework: What AI Device Makers Need to Know

The International Medical Device Regulators Forum (IMDRF) has released a draft of IMDRF/AIML WG/N93: Technical Framework for Artificial Intelligence Life Cycle Management, developed by its Artificial Intelligence/Machine Learning-enabled Working Group. The document is open for public consultation until July 10, 2026, and it matters: it is the connective tissue between the Good Machine Learning Practice (GMLP) guiding principles IMDRF finalized in 2025 (N88) and the day-to-day engineering, quality, and post-market decisions manufacturers of AI-enabled medical devices actually make. Below is what the framework says and what it means for your development program.

‍

What N93 Is

N93 is not a regulation and not jurisdiction-specific guidance. It is a harmonized technical framework that describes the life cycle of an AI-enabled medical device, the universal concepts that apply at every step, and the internationally recognized standards to consult along the way. It applies to machine learning-enabled medical devices (MLMD) broadly, and it explicitly reaches devices that incorporate generative AI, adaptive models, and third-party general-purpose models, areas earlier IMDRF documents only gestured at.

‍

The document is careful about a distinction that trips up many teams: the model versus the device. Some life cycle steps operate on the model (data collection, model building and tuning), while others operate on the AI-enabled medical device that incorporates it (clinical evaluation, deployment). N93 also separates explainability (how the model works, and why its outputs are trustworthy in a clinical context) from interpretability (whether users can comprehend how the device arrives at its outputs), and both concepts show up repeatedly in its expectations.

‍

Four Universal Concepts

Before walking the life cycle, N93 establishes four universal concepts that apply at every step:

Quality management system: a QMS adapted to AI, with traceability of training data and model versions, configuration management, and post-market surveillance.
Risk management: AI-specific risk identification and control layered on the established ISO 14971 framework.
Human oversight: clinical and user expertise informing development, validating real-world performance, and guarding the human-AI collaboration.
Cybersecurity: protection of data, models, and update channels across development and deployment.

The risk management section is one of the most useful parts of the draft. It catalogs AI-specific risks in four buckets:

Risks related to information: inaccurate or misleading outputs from black-box models.
Risks related to the human-AI interaction: automation bias, verification fatigue, and de-learning of clinical knowledge.
Risks related to model training and data quality: training data bias, drift, and out-of-distribution inputs.
Risks related to deployment and post-market performance: integration failures, degradation caused by changes in third-party models, and version control gaps.

It points to AAMI TIR 34971 as the reference for applying ISO 14971 to machine learning. If your AI risk file currently reads like a generic software risk file, this list is the checklist a reviewer will eventually hold it against.

‍

The Eight Life Cycle Steps

The heart of the document is an eight-step life cycle:

Planning and Design
Data Collection and Management
Model Building and Tuning
Verification and Validation, including Clinical Evaluation
Deployment
Operations and Monitoring
Real-World Performance Evaluation
Sunsetting

Planning through model building

Planning and Design asks manufacturers to justify that an AI model (rather than a rules-based approach) is the right tool at all, to assess data availability early, and to plan validation and post-market monitoring from day one.

‍

Data Collection and Management covers representativeness across clinical and non-clinical subtypes, bias mitigation, data cleaning with predefined stop thresholds, and data lineage and provenance documentation. Synthetic and augmented data are addressed directly: they are acceptable tools, subject to jurisdictional acceptance, documented provenance and transformation methods, and a fit-for-purpose justification.

‍

Model Building and Tuning covers architecture selection, explainability trade-offs, feature preprocessing with biological plausibility in mind, and, notably, a full treatment of leveraging third-party, general-purpose, and off-the-shelf models, including LLMs: supplier credibility, model provenance and longevity, hallucination susceptibility, transfer learning to close representativeness gaps, and monitoring for emergent behavior.

‍

V&V and clinical evaluation

N93 distinguishes external validation of the model (does performance generalize beyond the training environment) from clinical evaluation of the device (does the whole product, including the user and workflow, deliver its intended clinical benefit). It calls for robustness work, stress testing, sensitivity analysis, and red teaming, and it expects study designs matched to device outputs and risk, from retrospective studies and silent-mode deployments through prospective studies. Where the device supports human decision-making, evaluation of the human-AI team, not just standalone model performance, is expected.

‍

Deployment through sunsetting

The post-market half of the life cycle is where N93 goes furthest beyond existing documents. Deployment covers phased rollouts, deployment verification, and site-specific customization and localization, with Predetermined Change Control Plans (PCCPs) named as the mechanism regulators can use to prospectively authorize those changes. Operations and Monitoring expects logging of model predictions, versions, and explainability elements, drift detection, alerting thresholds, and distinct monitoring strategies for customizable, adaptive, and generative devices (including prompt-response logging and misuse detection for LLM-based devices). Real-World Performance Evaluation asks for device performance indicators defined not just in aggregate but for clinically relevant subgroups, with pre-specified thresholds that trigger investigation or model updates. Sunsetting, rarely addressed anywhere else, covers controlled decommissioning, stakeholder communication, and data retention.

Transparency and Labelling

A dedicated section ties transparency to the risk categories: information should reach the right user at the right time, potentially through the device UI itself, and labelling should cover subgroup performance, testing conditions and datasets, model characteristics, and known limitations (Appendix C provides the element list). Update-related transparency scales with the device: static models, discretely updated models, and autonomously adapting models each carry different disclosure expectations.

What This Means for Developers

Map your program to the eight steps now. Appendix A traces each life cycle step to the GMLP principles, which makes N93 a ready-made audit skeleton. Expect regulators, notified bodies, and eventually auditors to organize their questions this way.

‍

Treat the AI risk catalog as a minimum. Automation bias, drift, out-of-distribution inputs, and third-party model degradation now have names in a harmonized document. Absence of these from a risk file will be increasingly hard to defend.

‍

Plan post-market infrastructure before you build. The monitoring, logging, and real-world performance expectations in Sections 5.5 through 5.7 are difficult to retrofit. Architecture decisions made in planning determine whether you can meet them.

‍

If you use foundation models, document the dependency. Supplier management, version pinning, update cadence, and emergent behavior monitoring for general-purpose models are all called out. This is the clearest harmonized statement yet that an LLM inside a device is a supply chain and a life cycle problem, not just a design choice.

‍

Comment before July 10. This is a draft, and IMDRF is accepting comments through its Consultation Hub until July 10, 2026. Where the framework's expectations would be disproportionate for your technology or your company's scale, now is the moment to say so.

The Bigger Picture

N93 continues the trajectory from N67 (terms and definitions) through N88 (GMLP principles) toward a complete harmonized stack for AI device oversight, and jurisdictions from FDA to the EU AI Act enforcement bodies will draw on it. For related reading, see our playbook on navigating the AI lifecycle management model on the Cosm blog. You can also download the full N93 draft from our resources library at links.cosmhq.com/imdrf-n93-pdf.

How Cosm Can Help

Cosm helps medical device and digital health companies turn frameworks like N93 into working regulatory strategy: gap-assessing AI life cycle processes, building GMLP-aligned quality systems, designing PCCPs, and preparing FDA submissions for AI-enabled devices. If you are developing an AI-enabled medical device and want to understand what this framework means for your program, or want support drafting consultation comments, reach out at info@cosmhq.com or visit cosmhq.com.

‍

Disclaimer - https://www.cosmhq.com/disclaimer

Blog

Resources