Decision Intelligence
Understand how AskVerdict tracks outcomes, calculates decision scores, and helps you make better decisions over time.
Most AI tools stop at the answer. AskVerdict goes further — it tracks whether the AI's verdict was actually right, measures how well your decisions play out over time, and surfaces patterns in where your judgment is strong or weak.
This is Decision Intelligence: a feedback loop between your AI-powered verdicts and real-world results.
What Is Decision Intelligence?
When you run a debate, AskVerdict does not just generate a verdict. It also:
- Attaches a calibrated probability to the recommended outcome — a number between 0 and 1 expressing how confident the model is in its recommendation
- Creates a pending outcome record tied to that debate, waiting for you to report what actually happened
- Calculates a Brier score across all your resolved outcomes to measure long-run calibration accuracy
- Tracks domain performance so you can see where you rely on AI well and where you may be overconfident
Over time, your Decision Intelligence score becomes a measure of how accurately you use AI verdicts — not just how often you get answers, but how well-grounded those answers are when tested against reality.
Decision Intelligence is most valuable when you commit to recording outcomes consistently. Even a few months of honest outcome tracking creates a meaningful signal about where your AI-assisted decisions are reliable.
How Scoring Works
The Brier Score
AskVerdict uses the Brier score as its core accuracy metric. It is a standard measure of probabilistic prediction accuracy used in fields like meteorology, epidemiology, and finance.
The Brier score compares each predicted probability against the actual binary outcome (correct = 1, wrong = 0) and computes the mean squared error across all predictions. The formula is:
Brier Score = (1/N) × Σ (predicted_probability - actual_outcome)²Key properties to understand:
- Range: 0.0 to 1.0
- Lower is better — a score of 0.0 is perfect, 0.5 is no better than random guessing, 1.0 means every prediction was maximally wrong
- New users start at 0.5 until enough resolved outcomes exist to compute a real score
The accuracyPercentage shown in the UI is derived from the Brier score: higher accuracy means a lower Brier score. A brierScore of 0.18, for example, corresponds to approximately 82% accuracy.
What Counts as a Resolved Outcome
Only three outcome statuses contribute to your Brier score calculation:
| Status | Counts Toward Score | Brier Input |
|---|---|---|
correct | Yes | actual = 1.0 |
partially_correct | Yes | actual = 0.5 |
wrong | Yes | actual = 0.0 |
too_early | No — not yet | Excluded until you update it |
skipped | No | Excluded permanently |
pending | No | Waiting for your input |
If you always mark outcomes as correct, your score will appear artificially high. The score only improves your decision-making if it reflects what actually happened.
When Scores Are Calculated
Your decision score is recalculated automatically every time you submit an outcome. The recalculation runs asynchronously — it does not block the API response — and the updated score is available at GET /api/scores/me within a few seconds.
Outcome Tracking Workflow
Step 1: Run a Debate
Every completed debate automatically creates a pending outcome record. You can view all pending outcomes at GET /api/outcomes/pending or in the app under Decision History.
Step 2: Wait for Real-World Results
Good decisions take time to evaluate. You can:
- Set a reminder — use
PATCH /api/debates/:id/outcome/reminderto schedule a future date when you want to be nudged to record the result - Check back later — pending outcomes stay in your queue indefinitely
# Set a reminder three months out
curl -X PATCH https://api.askverdict.ai/api/debates/dbt_abc123/outcome/reminder \
-H "Authorization: Bearer vrd_your_api_key" \
-H "Content-Type: application/json" \
-d '{ "reminderAt": "2026-05-20T09:00:00Z" }'Step 3: Submit the Outcome
Once you know what happened, submit it:
curl -X POST https://api.askverdict.ai/api/debates/dbt_abc123/outcome \
-H "Authorization: Bearer vrd_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"outcomeStatus": "correct",
"outcomeNotes": "We migrated in Q2. Deployment time cut from 45 min to 8 min.",
"satisfactionScore": 4
}'The outcomeNotes field is optional but useful — it creates a personal record of what actually happened that you can refer back to later.
Step 4: Check Your Updated Score
curl https://api.askverdict.ai/api/scores/me \
-H "Authorization: Bearer vrd_your_api_key"Your brierScore, accuracyPercentage, and per-domain scores update immediately after outcome submission.
Step 5: Skip When Appropriate
If a debate was purely exploratory — a thought experiment with no real decision attached — mark it as skipped so it does not pollute your score:
curl -X POST https://api.askverdict.ai/api/debates/dbt_abc123/outcome/skip \
-H "Authorization: Bearer vrd_your_api_key"Domain Categories
AskVerdict automatically classifies each debate into a domain category based on the keywords in the question. The classification is used to calculate per-domain Brier scores so you can see where your AI-assisted decisions are strongest.
Available Domains
| Domain | Example Question Themes |
|---|---|
business | Revenue, pricing, market strategy, hiring, sales, customer growth |
technology | Databases, APIs, frameworks, architecture, deployments, cloud |
health | Treatments, symptoms, medical decisions, therapy choices |
legal | Contracts, compliance, regulation, liability, patents |
personal | Career, relationships, education, major life decisions |
finance | Investments, budgets, loans, mortgages, retirement planning |
education | Courses, degrees, certifications, training programs |
other | Everything that does not match the above categories |
How Classification Works
Classification uses keyword matching against the debate question. For example, a question containing "deploy", "API", or "database" is classified as technology. A question about "revenue" or "hire" maps to business.
When your question spans multiple domains, the system chooses the strongest signal. Classification is local and does not require an additional API call.
Per-domain scores appear in GET /api/scores/me/domains once you have resolved outcomes in each domain. Domains with no resolved outcomes are not shown.
Calibration
Calibration answers the question: when the AI says it is 80% confident, does it actually get things right 80% of the time?
Reading the Calibration Curve
The calibration endpoint returns your outcomes grouped into probability buckets. Each bucket represents a range of predicted confidences (e.g., 70–80%) and shows what fraction of decisions in that range were actually correct.
| Bucket | Predicted Range | Your Actual Rate | Interpretation |
|---|---|---|---|
| Bucket A | 50–60% | 55% | Well calibrated — your "uncertain" verdicts are appropriately uncertain |
| Bucket B | 70–80% | 52% | Overconfident — the AI is claiming 75% confidence but only hitting ~52% |
| Bucket C | 80–90% | 84% | Well calibrated — high-confidence predictions are reliably accurate |
curl https://api.askverdict.ai/api/scores/me/calibration \
-H "Authorization: Bearer vrd_your_api_key"{
"buckets": [
{ "predictedRange": [0.5, 0.6], "actualRate": 0.55, "count": 8 },
{ "predictedRange": [0.7, 0.8], "actualRate": 0.52, "count": 12 },
{ "predictedRange": [0.8, 0.9], "actualRate": 0.84, "count": 6 }
]
}Calibration Quality Levels
When a verdict is generated, AskVerdict also reports the calibration quality of the prediction itself — not just your historical accuracy:
| Quality | Meaning |
|---|---|
cold_start | You have fewer than 5 resolved outcomes — not enough history to calibrate |
limited | Some history exists but confidence intervals are wide |
well_calibrated | Enough history to produce reliable probability estimates |
Calibration data requires at least a handful of resolved outcomes before the buckets have statistical meaning. The more outcomes you resolve, the more reliable the calibration curve becomes.
What Good Calibration Looks Like
A perfectly calibrated user would see actualRate ≈ predictedRange[midpoint] across all buckets — predictions at 80% confidence land correct 80% of the time. In practice:
- Slight underconfidence (actual rate higher than predicted) is common and harmless
- Consistent overconfidence (actual rate significantly lower than predicted) means you should apply more scrutiny before acting on high-confidence verdicts
- Buckets with very few samples (
count < 5) are not statistically reliable — treat them as directional only
Streaks
Streaks measure how consistently you are engaging with structured decision-making. Your streak increments once per calendar day (UTC) when you create at least one debate.
How Streaks Work
The streak calculation is based on distinct calendar days, not debate count. Creating three debates in a single day still counts as one streak day.
The streak is considered active if:
- You created at least one debate today, or
- You created at least one debate yesterday (the streak extends into today if you have not debated yet)
If two or more days pass without a debate, the streak resets to zero.
Streak Milestones
Milestone thresholds are based on your total debate count, not streak length. They are:
| Milestone | Debates Required |
|---|---|
| First hundred | 10 debates |
| Getting started | 25 debates |
| Regular practitioner | 50 debates |
| Power user | 100 debates |
| Expert practitioner | 250 debates |
| Master | 500 debates |
Each milestone records the exact timestamp of the debate that crossed the threshold.
curl https://api.askverdict.ai/api/streaks \
-H "Authorization: Bearer vrd_your_api_key"{
"currentStreak": 7,
"longestStreak": 23,
"totalDebates": 84,
"milestones": [
{ "count": 10, "reached": true, "reachedAt": "2025-11-02T09:14:33Z" },
{ "count": 25, "reached": true, "reachedAt": "2025-11-28T16:05:01Z" },
{ "count": 50, "reached": true, "reachedAt": "2026-01-10T11:44:22Z" },
{ "count": 100, "reached": false }
]
}Streaks count all non-deleted debates, including those you did not track an outcome for. Streaks measure engagement, not decision quality — that is what the Brier score measures.
Decision Chains
A decision chain is a named group of related debates linked together to capture how a complex decision evolves over time. Chains are especially useful when:
- A first verdict generates immediate follow-up questions
- New information causes you to revisit a prior decision
- A broad question gets broken into narrower sub-decisions
Chain Structure
Each debate in a chain is a node with a parent link and a relationship type:
| Relationship | When to Use |
|---|---|
follow_up | The natural next question after acting on the parent verdict |
reversal | New information led you to reconsider or reverse the parent decision |
refinement | Narrowing the scope or adding constraints to the parent question |
A chain always has a root debate — the first debate in the chain with no parent. All subsequent debates link back through the tree to this root.
Creating and Managing Chains
Start a Chain
curl -X POST https://api.askverdict.ai/api/chains \
-H "Authorization: Bearer vrd_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Q1 Infrastructure Decisions",
"rootDebateId": "dbt_abc123"
}'Add a Follow-up Debate
After acting on the root verdict and running a new debate, link it to the chain:
curl -X POST https://api.askverdict.ai/api/chains/chn_def456/debates \
-H "Authorization: Bearer vrd_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"debateId": "dbt_ghi789",
"parentDebateId": "dbt_abc123",
"relationship": "follow_up"
}'Use Chain Context in New Debates
Before running the next debate in a sequence, fetch the chain context and inject it as the context field:
// 1. Fetch chain context
const { context } = await fetch(`/api/chains/${chainId}/context`, {
headers: { Authorization: "Bearer vrd_your_api_key" },
}).then(r => r.json());
// 2. Use the summary as context in the new debate
const { debate } = await fetch("/v1/verdicts", {
method: "POST",
headers: { Authorization: "Bearer vrd_your_api_key", "Content-Type": "application/json" },
body: JSON.stringify({
question: "Should we move the payment service to a dedicated cluster?",
context: context.summary,
mode: "balanced",
}),
}).then(r => r.json());The chain context is a compact summary (under 500 tokens) of the prior decisions and their outcomes, designed to be injected without blowing through context limits.
Chains are a lightweight organizational tool. Deleting a chain does not delete the debates inside it — it only removes the grouping structure.
Finding Similar Past Decisions
Before running a new debate, AskVerdict can search your history for semantically similar decisions using vector embeddings. This helps you avoid re-analyzing decisions you have already made — or use a prior verdict as a starting point.
curl "https://api.askverdict.ai/api/similar?q=Should+we+outsource+our+DevOps+team&limit=5" \
-H "Authorization: Bearer vrd_your_api_key"If a similar debate exists and has a recorded outcome, you can immediately see how that decision played out — without running a new debate at all.
Similarity search requires a vector embedding key to be configured on the server. If none is available, the endpoint returns an empty result set gracefully rather than throwing an error.
Personalized Recommendations
AskVerdict recommends public debates from other users that are likely to be relevant to you, based on the topics you have debated most.
For users with at least 3 debates, recommendations are generated by extracting the most frequent meaningful keywords from your recent debate questions and matching them against public completed debates. The results are ranked by engagement (view count and fork count).
For new users with fewer than 3 debates, recommendations fall back to trending public debates from the last 30 days.
curl "https://api.askverdict.ai/api/recommendations?limit=10" \
-H "Authorization: Bearer vrd_your_api_key"Each recommendation includes a recommendationReason string — either "Based on your interests" or "Trending on AskVerdict" — so you know how it was selected.
Tips for Improving Your Accuracy Score
Record Outcomes Honestly
The Brier score only improves your decision-making if it reflects reality. Marking everything as correct or always skipping difficult outcomes defeats the purpose. Record what actually happened, even when the verdict was wrong.
Use too_early Appropriately
If you genuinely cannot evaluate a decision yet (a multi-year investment, a product launch that is still in market), mark it too_early rather than guessing. You can update it later when the result is clear.
Focus on High-Stakes Decisions
Low-effort exploratory questions are worth skipping for scoring purposes. Reserve tracked outcomes for decisions where you actually acted on the verdict — those are the ones that produce meaningful calibration data.
Read Your Domain Breakdown
If your technology Brier score is 0.12 and your finance Brier score is 0.39, that tells you something important: you are getting good signal from technical AI verdicts but should apply more skepticism to financial recommendations. Check GET /api/scores/me/domains regularly.
Use Chains for Complex Decisions
Multi-step decisions — where each answer raises a new question — benefit most from chains. Linking debates together lets you see the full trajectory of a decision, including where early verdicts led to follow-ups or reversals.
Set Reminders for Long-Horizon Decisions
For decisions that will take months to evaluate, set a reminder immediately after the debate. A reminder at the right moment is the difference between having a resolved outcome and having an outcome that stays pending indefinitely.
# Set a 6-month reminder immediately after the debate
curl -X PATCH https://api.askverdict.ai/api/debates/dbt_abc123/outcome/reminder \
-H "Authorization: Bearer vrd_your_api_key" \
-H "Content-Type: application/json" \
-d '{ "reminderAt": "2026-08-20T09:00:00Z" }'Check Similar Debates First
Before running a new debate, search for similar past decisions. If the same question — or a very close variant — has already been debated and resolved, you may already have the answer.
curl "https://api.askverdict.ai/api/similar?q=your+question+here" \
-H "Authorization: Bearer vrd_your_api_key"Calibration quality improves significantly at around 20 resolved outcomes. Before that threshold, treat your score as directional rather than definitive — the trend matters more than the absolute number at this stage.
Was this page helpful?