detection benchmarks

Measured monthly.
Published in full.

Kirtonic ships a benchmark harness in the repository and runs it monthly against publicly-available prompt-injection datasets. Headline numbers are written to a JSON file the page below reads at request time, so what you see is what we last measured.

last measured · 1 June 2026 · 75 cases
Prompt-injection detection
92%
Benign false-positive rate
0%
Latency p50
3173 ms
Latency p95
4695 ms
datasets

Per-dataset results

DatasetCategoryCasesAccuracyFlaggedp50p95
Benign baseline — Normal user prompts
Authored by hand from common workplace prompt patterns.
benign25100%0%2994 ms4695 ms
Lakera Gandalf style — Password extraction attempts
Modelled on publicly-documented Gandalf bypass categories (Lakera AI)
injection2588%88%3409 ms4170 ms
OWASP LLM01 — Prompt Injection
OWASP Top 10 for LLM Applications 2025 (LLM01:2025 Prompt Injection)
injection2596%96%3089 ms5572 ms
methodology

How we measure.

1
Public test datasets

We run against datasets every buyer can independently inspect: prompt-injection examples from the OWASP Top 10 for LLM Applications, prompts modelled on publicly-documented Lakera Gandalf bypass categories, and a hand-curated benign baseline. The full JSON of each dataset is committed to the Kirtonic repository under data/benchmarks/ so a buyer can read every test case.

2
Direct API call

Each test case is sent to the same /api/v1/extension/verdict endpoint a production extension calls. The verdict path is whatever is configured on the workspace under test, baseline classifier or a customer-trained model, whichever is selected. There is no separate test mode.

3
Detection rate vs false-positive rate

Detection rate is the fraction of injection cases the classifier returns medium-or-high severity on. False-positive rate is the fraction of benign cases the classifier flags. Both are reported on this page and per-dataset. Anything under 5% false-positive on benign is considered acceptable.

4
Latency

Round-trip wall-clock latency measured from the harness to the verdict endpoint and back. p50 and p95 are reported. The harness runs from the same network the SDK would run from in production deployment; latency to a customer-hosted classifier on the same VPC will be lower than the published numbers.

5
Honest reporting

We publish every number. If the false-positive rate is high one month, that goes on the page. If a regression in the classifier hurts detection, that goes on the page. The harness writes a JSON file and the page renders it; there is no marketing claim we maintain separately.

Reproducibility. Clone the repository, mint an extension token in your workspace, and run node scripts/run-benchmarks.mjs with KIRTONIC_API_TOKEN set. Your results will land in data/benchmarks/latest-results.json in the same shape as the file this page reads.

What these numbers do, and do not, prove.

They do prove: that the Kirtonic verdict endpoint is measurable, that the headline numbers are reproducible, and that detection performance on a documented public adversarial corpus is at the published level.

They do not prove: that performance on your specific traffic will match these numbers. Detection on a public benchmark is necessary but not sufficient evidence; a real deployment evaluation should be done against your own representative sample before any go-live decision. We will run a paid evaluation against a sample of your real traffic on request.