An AI vendor risk assessment is a structured review of a provider, its service and the organization's intended use before—and throughout—the relationship. For an LLM or AI-enabled SaaS product, conventional security due diligence remains necessary, but it is not sufficient. The buyer also needs to understand training and retention practices, model limitations, output risks, human oversight, monitoring and the consequences of provider changes.

The assessment should be proportionate. A writing assistant restricted to public marketing copy does not need the same depth as a system influencing employment, finance, health, security or customer eligibility. Start with the use case and impact; otherwise even a thorough questionnaire may answer the wrong risk question.

Key takeaways
  • Approve a defined use case, not a vendor name in the abstract.
  • Map every data flow, including prompts, files, outputs, logs and subprocessors.
  • Assess security and privacy alongside accuracy, bias, misuse and human reliance.
  • Put notification, audit, deletion, incident and change obligations into the agreement.
  • Monitor the deployed service and maintain a realistic exit plan.

1. Define the AI use case and impact

Document what the service will do, who will use it, who may be affected and which decisions the output can influence. Identify whether the tool generates content, retrieves information, ranks people, recommends actions, writes code, operates an agent or processes regulated or confidential material. State prohibited uses as clearly as approved ones.

Assign an accountable business owner. Classify potential impact if the service is unavailable, produces a harmful output, exposes data or behaves differently after a model update. This classification determines the depth of technical testing, legal review, human oversight and ongoing monitoring.

2. Map data use, retention and model training

Ask what data enters the service through prompts, uploaded files, connectors, telemetry and support channels. Determine where inputs and outputs are processed and stored, how long they remain, who can access them and whether the customer can configure or verify deletion. Include metadata and logs; they can reveal sensitive context even when a prompt seems harmless.

Establish whether customer content is used to train or improve shared models, including default settings and exceptions. Clarify whether an enterprise opt-out covers all product components, human review and subprocessors. Identify data residency, cross-border transfers and the provider's process for government or third-party requests. Privacy and legal specialists should evaluate applicable obligations for the actual jurisdictions and data involved.

3. Review security architecture and access

Apply normal SaaS scrutiny: authentication, single sign-on, multi-factor authentication, role-based access, privileged administration, encryption, vulnerability management, secure development, backup, resilience, logging and incident response. Review independent assurance in context rather than accepting a badge as proof that the exact service and use case are covered.

Then assess AI-specific attack paths. Can untrusted content inject instructions into the model? Can tools or agents take actions outside the user's intention? Are connectors permissioned narrowly? How are secrets prevented from entering prompts or appearing in outputs? What separates customer environments, and what controls govern model and plugin supply chains?

4. Evaluate model behavior and limitations

Request documentation of intended capabilities, known limitations, evaluation methods and performance measures relevant to the use case. Broad benchmark scores may not predict performance on the organization's language, documents or edge cases. Conduct scenario-based testing with representative data that is safe and authorized for the evaluation.

Test incorrect and fabricated outputs, harmful content, bias, refusal behavior, inconsistent answers and over-reliance risk. For retrieval-augmented systems, examine citation accuracy and whether the model can distinguish sources from instructions. Define an acceptance threshold and record residual risk; no evaluation proves that a generative system will always behave correctly.

5. Confirm human oversight and accountability

Ask what controls help users review, challenge or override an output. Determine which decisions require mandatory human approval and what competence reviewers need. A nominal human in the loop is not effective if the interface encourages automatic acceptance, the reviewer lacks source information or workload makes meaningful review impossible.

Clarify responsibilities between buyer and provider. The buyer remains accountable for how it deploys the tool, while the vendor must provide enough information to manage the service. Internally, define ownership across business, security, privacy, legal, procurement and risk functions so gaps do not hide between teams.

6. Examine monitoring, incidents and material change

Determine what telemetry, audit logs and administrative reports are available. Can the organization monitor usage, prohibited data entry, harmful outputs, latency, refusals and model-version changes? Define how users report problems and how the vendor receives and investigates them.

The agreement should set incident-notification expectations, escalation contacts, investigation cooperation and evidence preservation. Also address non-incident changes: model replacement, new training practices, subprocessor changes, feature activation or altered safety controls. A service can change risk without suffering a conventional breach.

7. Put critical protections into the contract

Marketing pages are not durable controls. Contractual terms should address permitted data use, confidentiality, security measures, retention and deletion, subprocessor governance, incident notification, audit or assurance rights, service availability, intellectual-property allocation and support for regulatory or rights requests where applicable.

Require an appropriate change-notification mechanism and identify remedies when a material change is unacceptable. Allocate responsibility for outputs and downstream decisions carefully. Contract language needs qualified legal review; this checklist helps expose questions but is not legal advice.

8. Plan exit, deletion and continuity

Know how to export prompts, configuration, knowledge bases, logs and other customer data in usable formats. Confirm deletion procedures for active systems and backups, account closure, connector revocation and evidence of deletion. Assess dependence on provider-specific agents, embeddings, APIs or workflows that make migration difficult.

For critical processes, define a fallback if the model, API or provider becomes unavailable or unacceptable. A manual alternative may be sufficient for a low-volume workflow. Higher-impact use may require tested continuity arrangements and limits on automation when service quality degrades.

AI vendor due-diligence decision record

Conclude with a recorded decision: approved, approved with conditions, limited pilot, remediation required or rejected. Include the assessed use case, data categories, evidence reviewed, open risks, required controls, owner, review date and triggers for reassessment. Approval should expire or be reviewed when the service, use or risk materially changes.

FAQ

Is a SOC 2 report enough for an AI vendor?

No. It may provide useful assurance over selected controls, but the buyer still needs to confirm scope and assess data use, model behavior, oversight and the intended use case.

Should employees be allowed to use free public AI tools?

Organizations should define approved services and data-handling rules based on risk. Unmanaged accounts can create retention, training, access and visibility problems.

How often should an AI vendor be reassessed?

Use a risk-based schedule and event triggers such as material model changes, new data, expanded use, incidents, new subprocessors or changed obligations.

Does this checklist guarantee regulatory compliance?

No. Requirements vary by location, sector and use. Use qualified legal, privacy, security and domain expertise for the specific deployment.

Primary sources