Methodology

Factual Accuracy

Algorithm

Extracts numeric, year, currency, and percentage claims from each LLM response using a small set of regular expressions. Each claim is grouped by the single word that immediately precedes it (e.g., "in 1939"). When two or more LLMs use the same one-word context but emit different values, the engine records a contradiction. The final score is 100 − 10 × contradictions, clamped to the [0, 100] range. Confidence is high when at least five claims were extracted overall, medium for one to four, and low when no claims were found.

Parameter version: 1.0.0

This methodology is open and auditable. Source code: project-alpha on GitHub.