detection benchmarks

Measured monthly.
Published in full.

Kirtonic ships a benchmark harness in the repository and runs it monthly against publicly-available prompt-injection datasets. Headline numbers are written to a JSON file the page below reads at request time, so what you see is what we last measured.

last measured · 1 June 2026 · 75 cases

Prompt-injection detection

92%

Benign false-positive rate

Latency p50

3173 ms

Latency p95

4695 ms

datasets

Per-dataset results

Dataset	Category	Cases	Accuracy	Flagged	p50	p95
Benign baseline — Normal user prompts Authored by hand from common workplace prompt patterns.	benign	25	100%	0%	2994 ms	4695 ms
Lakera Gandalf style — Password extraction attempts Modelled on publicly-documented Gandalf bypass categories (Lakera AI)	injection	25	88%	88%	3409 ms	4170 ms
OWASP LLM01 — Prompt Injection OWASP Top 10 for LLM Applications 2025 (LLM01:2025 Prompt Injection)	injection	25	96%	96%	3089 ms	5572 ms

methodology

How we measure.

Public test datasets

We run against datasets every buyer can independently inspect: prompt-injection examples from the OWASP Top 10 for LLM Applications, prompts modelled on publicly-documented Lakera Gandalf bypass categories, and a hand-curated benign baseline. The full JSON of each dataset is committed to the Kirtonic repository under data/benchmarks/ so a buyer can read every test case.

Direct API call

Each test case is sent to the same /api/v1/extension/verdict endpoint a production extension calls. The verdict path is whatever is configured on the workspace under test, baseline classifier or a customer-trained model, whichever is selected. There is no separate test mode.

Detection rate vs false-positive rate

Detection rate is the fraction of injection cases the classifier returns medium-or-high severity on. False-positive rate is the fraction of benign cases the classifier flags. Both are reported on this page and per-dataset. Anything under 5% false-positive on benign is considered acceptable.

Latency

Round-trip wall-clock latency measured from the harness to the verdict endpoint and back. p50 and p95 are reported. The harness runs from the same network the SDK would run from in production deployment; latency to a customer-hosted classifier on the same VPC will be lower than the published numbers.

Honest reporting

We publish every number. If the false-positive rate is high one month, that goes on the page. If a regression in the classifier hurts detection, that goes on the page. The harness writes a JSON file and the page renders it; there is no marketing claim we maintain separately.

Reproducibility. Clone the repository, mint an extension token in your workspace, and run node scripts/run-benchmarks.mjs with KIRTONIC_API_TOKEN set. Your results will land in data/benchmarks/latest-results.json in the same shape as the file this page reads.

What these numbers do, and do not, prove.

They do prove: that the Kirtonic verdict endpoint is measurable, that the headline numbers are reproducible, and that detection performance on a documented public adversarial corpus is at the published level.

They do not prove: that performance on your specific traffic will match these numbers. Detection on a public benchmark is necessary but not sufficient evidence; a real deployment evaluation should be done against your own representative sample before any go-live decision. We will run a paid evaluation against a sample of your real traffic on request.

Measured monthly.Published in full.

Per-dataset results

How we measure.

What these numbers do, and do not, prove.

Measured monthly.
Published in full.