Cross-check LLMs in one click
Paste an LLM answer. Get a 10-parameter bias breakdown across multiple models in seconds.
Q: Compare how three open-weight LLMs frame the question 'Was the Belt and Road Initiative a success?'
What does “beta” mean?
- • Free for now — paid tier launches when traffic warrants.
- • Limited daily analyses to keep LLM costs predictable.
- • No SLA — if a provider has an outage, results may be partial.
- • Your prompts and outputs are NOT used to train any model.
How it works
Three steps from paste to a 10-parameter bias analysis.
Paste your prompt
Drop a question, an LLM output, or both. We accept up to 24,000 tokens of output.
Pick your comparison
Choose 1–7 LLMs to compare. The free tier covers 7 open-weight models across US, China, and EU origins.
Read the synthesis
Side-by-side answers, a 10-parameter bias heatmap, and a neutral synthesis. Drill into any score for the evidence.
How we score bias — today and tomorrow
The free tier ships a fast heuristic rubric so you can compare LLMs immediately at zero cost. A paid tier is in the works that swaps the heuristic for a frontier-LLM judge, with reasoning, evidence, and your choice of judge model.
Heuristic bias rubric
Pattern-based scoring across 10 parameters. Fast, deterministic, and free. Best at catching obvious signals like contradicting numbers, region-coded vocabulary, and tone shifts — less reliable on subtle reasoning or culturally contested framing.
- Regex + vendored lexicons (curated, versioned per parameter)
- Same algorithm runs on every analysis — reproducible scores
- No LLM cost, instant turnaround
LLM-as-Judge deep analysis
A frontier LLM reads every response and scores them against the same 10-parameter rubric — with reasoning, evidence quotes, and disagreements surfaced. You pick the judge: a US-trained model (Gemini Flash) or a China-trained model (DeepSeek V3). The contrast is instructive in itself.
- Choose the judge: US (Gemini Flash) or China (DeepSeek V3)
- Per-parameter reasoning, not just numbers
- Falls back to heuristic when daily quota hits — never blocked
Also coming on the paid tier
Bias-dial rewrites
Pick the response that reads best, then drag dials to neutralize tone, political leaning, cultural framing. Watermarked.
Self-bias check (your own writing)
Paste a draft article, thesis, or memo — see how it reads against multiple LLMs' framing of the same topic.
Copy + upload affordances
One-click copy on every response card. Upload .txt or .md drafts directly to the prompt field.
Frequently asked
Will the free tier stay free?
The current free-tier limits stay free, even after the paid tier launches. We'll give clear notice well before anything changes.
How do you score bias?
Today we run a fast heuristic rubric across 10 versioned parameters — factual accuracy, completeness, cultural framing, political leaning, tone, source authority, omission, logical structure, terminology, and recency. The heuristic uses regex patterns + curated region-coded lexicons; it is fast, deterministic, and free, but best at obvious signals (contradicting numbers, region-coded vocabulary). The paid tier replaces the heuristic with an LLM-as-Judge — you pick the judge model (Gemini Flash, US-trained; or DeepSeek V3, China-trained) and the judge reads every response and scores it with reasoning + evidence. Click any score in the heatmap to drill into evidence.
Do you store my prompts?
SHA-256 hashes only by default; full text only if you opt in. Records auto-delete after 30 days via DynamoDB TTL.
Is this tool neutral?
Honest answer: no. The synthesis is itself AI-generated. We always show you the raw responses from every LLM alongside the synthesis so you can draw your own conclusions.
Can I use this via API?
Coming with the paid tier. The same /api/v1/analyze endpoint will be reachable with a personal API key.
Will you ever show ads?
No, ever.