AI Vendor & Tool Evaluation Scorecard
Demos are designed to impress, not to inform. This weighted scorecard compares AI vendors on the things that actually matter: data handling, security, residency, value and exit terms, so you can put two slick pitches next to each other and see which one holds up.
Branded PDF plus editable CSV
AI Vendor Evaluation Scorecard
Every AI vendor demos beautifully. That’s the job of a demo. The trouble starts later: when you discover the tool trains on your prompts, stores data in a region you can’t name, or triples in price once you’re dependent on it.
This scorecard puts vendors side by side on the criteria that predict those problems, weighted by how much they actually matter. Score each option from 1 to 5, and the sheet gives you a single weighted number per vendor, so the comparison is about substance, not whoever gave the smoothest pitch.
Use this scorecard with our Secure AI Buyer’s Guide. The guide explains the why behind each criterion; this scorecard is the how.
The criteria
Score each vendor 1 (poor) to 5 (excellent) on each line. The weights add to 100%, so the result lands back on a 1–5 scale you can compare directly.
| # | Criterion | A “5” looks like | Weight |
|---|---|---|---|
| 1 | No training on your data | Contractual guarantee they never train on your inputs | 12% |
| 2 | Retention & deletion control | You set retention, can purge to zero, deletion confirmed | 10% |
| 3 | Data residency & sovereignty | Data stays in a region you choose, under laws you accept | 12% |
| 4 | Sub-processor transparency | Full, current list of who else touches your data | 6% |
| 5 | Security assurance | SOC 2 Type II or ISO 27001, recent pen test, encryption everywhere | 12% |
| 6 | Access control & audit logs | SSO, role-based access, exportable audit trail | 8% |
| 7 | Deployment fits your data | A model (SaaS / private / on-prem) that suits your most sensitive data | 10% |
| 8 | Measurable value | A specific, provable outcome, not “boosts productivity” | 12% |
| 9 | Total cost clarity | All-in pricing, no surprise usage cliffs | 8% |
| 10 | Exit & portability | Clean export, confirmed deletion, no lock-in | 10% |
How the score works
The weighted total is a SUMPRODUCT of the weights and your 1–5 ratings:
Weighted score = Σ (criterion weight × your rating)
The downloaded sheet does this automatically for up to three vendors in adjacent columns. Type your ratings; the totals at the bottom update and rank themselves.
The dealbreakers
A weighted average is the right tool for comparing broadly acceptable options. It should never hide a fatal flaw. Some answers are pass/fail no matter how good the rest of the score is:
- It trains on your restricted or regulated data with no opt-out. Stop.
- It can’t tell you where the data is processed. Stop, for anything sensitive.
- You can’t get your data out or have it deleted. Stop. That’s a hostage situation, not a vendor.
Mark these on the sheet. A 4.5 average means nothing if criterion 1 is a hard no for your data.
How to use it
- Score from evidence, not the demo. A 5 on security needs the certificate and the pen-test summary, not a reassuring slide.
- Score per use case. The right answer for drafting marketing copy and for processing health records is rarely the same tool. Run the sheet once per workload.
- Get the answers in writing. If a vendor won’t commit an answer to email or the contract, score it low: vagueness is a finding.
- Compare the totals, then sanity-check the dealbreakers. Highest number wins only among options that clear every hard requirement.
A scorecard won’t make the decision for you, and it shouldn’t. What it does is make sure you decided on the things that matter rather than the things that demo well. When you want the awkward questions asked for you, that’s what a discovery call is for.
Want a second opinion on the shortlist?
Score your options, then bring the top two to a discovery call. We'll tell you what the demos glossed over and which one we'd actually trust with your data.