Decision Intelligence

Understand how AskVerdict tracks outcomes, calculates decision scores, and helps you make better decisions over time.

11 min read
Share

Most AI tools stop at the answer. AskVerdict goes further — it tracks whether the AI's verdict was actually right, measures how well your decisions play out over time, and surfaces patterns in where your judgment is strong or weak.

This is Decision Intelligence: a feedback loop between your AI-powered verdicts and real-world results.


What Is Decision Intelligence?

When you run a debate, AskVerdict does not just generate a verdict. It also:

  1. Attaches a calibrated probability to the recommended outcome — a number between 0 and 1 expressing how confident the model is in its recommendation
  2. Creates a pending outcome record tied to that debate, waiting for you to report what actually happened
  3. Calculates a Brier score across all your resolved outcomes to measure long-run calibration accuracy
  4. Tracks domain performance so you can see where you rely on AI well and where you may be overconfident

Over time, your Decision Intelligence score becomes a measure of how accurately you use AI verdicts — not just how often you get answers, but how well-grounded those answers are when tested against reality.

Decision Intelligence is most valuable when you commit to recording outcomes consistently. Even a few months of honest outcome tracking creates a meaningful signal about where your AI-assisted decisions are reliable.


How Scoring Works

The Brier Score

AskVerdict uses the Brier score as its core accuracy metric. It is a standard measure of probabilistic prediction accuracy used in fields like meteorology, epidemiology, and finance.

The Brier score compares each predicted probability against the actual binary outcome (correct = 1, wrong = 0) and computes the mean squared error across all predictions. The formula is:

plaintext
Brier Score = (1/N) × Σ (predicted_probability - actual_outcome)²

Key properties to understand:

  • Range: 0.0 to 1.0
  • Lower is better — a score of 0.0 is perfect, 0.5 is no better than random guessing, 1.0 means every prediction was maximally wrong
  • New users start at 0.5 until enough resolved outcomes exist to compute a real score

The accuracyPercentage shown in the UI is derived from the Brier score: higher accuracy means a lower Brier score. A brierScore of 0.18, for example, corresponds to approximately 82% accuracy.

What Counts as a Resolved Outcome

Only three outcome statuses contribute to your Brier score calculation:

StatusCounts Toward ScoreBrier Input
correctYesactual = 1.0
partially_correctYesactual = 0.5
wrongYesactual = 0.0
too_earlyNo — not yetExcluded until you update it
skippedNoExcluded permanently
pendingNoWaiting for your input

If you always mark outcomes as correct, your score will appear artificially high. The score only improves your decision-making if it reflects what actually happened.

When Scores Are Calculated

Your decision score is recalculated automatically every time you submit an outcome. The recalculation runs asynchronously — it does not block the API response — and the updated score is available at GET /api/scores/me within a few seconds.


Outcome Tracking Workflow

Step 1: Run a Debate

Every completed debate automatically creates a pending outcome record. You can view all pending outcomes at GET /api/outcomes/pending or in the app under Decision History.

Step 2: Wait for Real-World Results

Good decisions take time to evaluate. You can:

  • Set a reminder — use PATCH /api/debates/:id/outcome/reminder to schedule a future date when you want to be nudged to record the result
  • Check back later — pending outcomes stay in your queue indefinitely
bash
# Set a reminder three months out
curl -X PATCH https://api.askverdict.ai/api/debates/dbt_abc123/outcome/reminder \
  -H "Authorization: Bearer vrd_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{ "reminderAt": "2026-05-20T09:00:00Z" }'

Step 3: Submit the Outcome

Once you know what happened, submit it:

bash
curl -X POST https://api.askverdict.ai/api/debates/dbt_abc123/outcome \
  -H "Authorization: Bearer vrd_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "outcomeStatus": "correct",
    "outcomeNotes": "We migrated in Q2. Deployment time cut from 45 min to 8 min.",
    "satisfactionScore": 4
  }'

The outcomeNotes field is optional but useful — it creates a personal record of what actually happened that you can refer back to later.

Step 4: Check Your Updated Score

bash
curl https://api.askverdict.ai/api/scores/me \
  -H "Authorization: Bearer vrd_your_api_key"

Your brierScore, accuracyPercentage, and per-domain scores update immediately after outcome submission.

Step 5: Skip When Appropriate

If a debate was purely exploratory — a thought experiment with no real decision attached — mark it as skipped so it does not pollute your score:

bash
curl -X POST https://api.askverdict.ai/api/debates/dbt_abc123/outcome/skip \
  -H "Authorization: Bearer vrd_your_api_key"

Domain Categories

AskVerdict automatically classifies each debate into a domain category based on the keywords in the question. The classification is used to calculate per-domain Brier scores so you can see where your AI-assisted decisions are strongest.

Available Domains

DomainExample Question Themes
businessRevenue, pricing, market strategy, hiring, sales, customer growth
technologyDatabases, APIs, frameworks, architecture, deployments, cloud
healthTreatments, symptoms, medical decisions, therapy choices
legalContracts, compliance, regulation, liability, patents
personalCareer, relationships, education, major life decisions
financeInvestments, budgets, loans, mortgages, retirement planning
educationCourses, degrees, certifications, training programs
otherEverything that does not match the above categories

How Classification Works

Classification uses keyword matching against the debate question. For example, a question containing "deploy", "API", or "database" is classified as technology. A question about "revenue" or "hire" maps to business.

When your question spans multiple domains, the system chooses the strongest signal. Classification is local and does not require an additional API call.

Per-domain scores appear in GET /api/scores/me/domains once you have resolved outcomes in each domain. Domains with no resolved outcomes are not shown.


Calibration

Calibration answers the question: when the AI says it is 80% confident, does it actually get things right 80% of the time?

Reading the Calibration Curve

The calibration endpoint returns your outcomes grouped into probability buckets. Each bucket represents a range of predicted confidences (e.g., 70–80%) and shows what fraction of decisions in that range were actually correct.

BucketPredicted RangeYour Actual RateInterpretation
Bucket A50–60%55%Well calibrated — your "uncertain" verdicts are appropriately uncertain
Bucket B70–80%52%Overconfident — the AI is claiming 75% confidence but only hitting ~52%
Bucket C80–90%84%Well calibrated — high-confidence predictions are reliably accurate
bash
curl https://api.askverdict.ai/api/scores/me/calibration \
  -H "Authorization: Bearer vrd_your_api_key"
json
{
  "buckets": [
    { "predictedRange": [0.5, 0.6], "actualRate": 0.55, "count": 8 },
    { "predictedRange": [0.7, 0.8], "actualRate": 0.52, "count": 12 },
    { "predictedRange": [0.8, 0.9], "actualRate": 0.84, "count": 6 }
  ]
}

Calibration Quality Levels

When a verdict is generated, AskVerdict also reports the calibration quality of the prediction itself — not just your historical accuracy:

QualityMeaning
cold_startYou have fewer than 5 resolved outcomes — not enough history to calibrate
limitedSome history exists but confidence intervals are wide
well_calibratedEnough history to produce reliable probability estimates

Calibration data requires at least a handful of resolved outcomes before the buckets have statistical meaning. The more outcomes you resolve, the more reliable the calibration curve becomes.

What Good Calibration Looks Like

A perfectly calibrated user would see actualRate ≈ predictedRange[midpoint] across all buckets — predictions at 80% confidence land correct 80% of the time. In practice:

  • Slight underconfidence (actual rate higher than predicted) is common and harmless
  • Consistent overconfidence (actual rate significantly lower than predicted) means you should apply more scrutiny before acting on high-confidence verdicts
  • Buckets with very few samples (count < 5) are not statistically reliable — treat them as directional only

Streaks

Streaks measure how consistently you are engaging with structured decision-making. Your streak increments once per calendar day (UTC) when you create at least one debate.

How Streaks Work

The streak calculation is based on distinct calendar days, not debate count. Creating three debates in a single day still counts as one streak day.

The streak is considered active if:

  • You created at least one debate today, or
  • You created at least one debate yesterday (the streak extends into today if you have not debated yet)

If two or more days pass without a debate, the streak resets to zero.

Streak Milestones

Milestone thresholds are based on your total debate count, not streak length. They are:

MilestoneDebates Required
First hundred10 debates
Getting started25 debates
Regular practitioner50 debates
Power user100 debates
Expert practitioner250 debates
Master500 debates

Each milestone records the exact timestamp of the debate that crossed the threshold.

bash
curl https://api.askverdict.ai/api/streaks \
  -H "Authorization: Bearer vrd_your_api_key"
json
{
  "currentStreak": 7,
  "longestStreak": 23,
  "totalDebates": 84,
  "milestones": [
    { "count": 10, "reached": true, "reachedAt": "2025-11-02T09:14:33Z" },
    { "count": 25, "reached": true, "reachedAt": "2025-11-28T16:05:01Z" },
    { "count": 50, "reached": true, "reachedAt": "2026-01-10T11:44:22Z" },
    { "count": 100, "reached": false }
  ]
}

Streaks count all non-deleted debates, including those you did not track an outcome for. Streaks measure engagement, not decision quality — that is what the Brier score measures.


Decision Chains

A decision chain is a named group of related debates linked together to capture how a complex decision evolves over time. Chains are especially useful when:

  • A first verdict generates immediate follow-up questions
  • New information causes you to revisit a prior decision
  • A broad question gets broken into narrower sub-decisions

Chain Structure

Each debate in a chain is a node with a parent link and a relationship type:

RelationshipWhen to Use
follow_upThe natural next question after acting on the parent verdict
reversalNew information led you to reconsider or reverse the parent decision
refinementNarrowing the scope or adding constraints to the parent question

A chain always has a root debate — the first debate in the chain with no parent. All subsequent debates link back through the tree to this root.

Creating and Managing Chains

Start a Chain

bash
curl -X POST https://api.askverdict.ai/api/chains \
  -H "Authorization: Bearer vrd_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Q1 Infrastructure Decisions",
    "rootDebateId": "dbt_abc123"
  }'

Add a Follow-up Debate

After acting on the root verdict and running a new debate, link it to the chain:

bash
curl -X POST https://api.askverdict.ai/api/chains/chn_def456/debates \
  -H "Authorization: Bearer vrd_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "debateId": "dbt_ghi789",
    "parentDebateId": "dbt_abc123",
    "relationship": "follow_up"
  }'

Use Chain Context in New Debates

Before running the next debate in a sequence, fetch the chain context and inject it as the context field:

typescript
// 1. Fetch chain context
const { context } = await fetch(`/api/chains/${chainId}/context`, {
  headers: { Authorization: "Bearer vrd_your_api_key" },
}).then(r => r.json());
 
// 2. Use the summary as context in the new debate
const { debate } = await fetch("/v1/verdicts", {
  method: "POST",
  headers: { Authorization: "Bearer vrd_your_api_key", "Content-Type": "application/json" },
  body: JSON.stringify({
    question: "Should we move the payment service to a dedicated cluster?",
    context: context.summary,
    mode: "balanced",
  }),
}).then(r => r.json());

The chain context is a compact summary (under 500 tokens) of the prior decisions and their outcomes, designed to be injected without blowing through context limits.

Chains are a lightweight organizational tool. Deleting a chain does not delete the debates inside it — it only removes the grouping structure.


Finding Similar Past Decisions

Before running a new debate, AskVerdict can search your history for semantically similar decisions using vector embeddings. This helps you avoid re-analyzing decisions you have already made — or use a prior verdict as a starting point.

bash
curl "https://api.askverdict.ai/api/similar?q=Should+we+outsource+our+DevOps+team&limit=5" \
  -H "Authorization: Bearer vrd_your_api_key"

If a similar debate exists and has a recorded outcome, you can immediately see how that decision played out — without running a new debate at all.

Similarity search requires a vector embedding key to be configured on the server. If none is available, the endpoint returns an empty result set gracefully rather than throwing an error.


Personalized Recommendations

AskVerdict recommends public debates from other users that are likely to be relevant to you, based on the topics you have debated most.

For users with at least 3 debates, recommendations are generated by extracting the most frequent meaningful keywords from your recent debate questions and matching them against public completed debates. The results are ranked by engagement (view count and fork count).

For new users with fewer than 3 debates, recommendations fall back to trending public debates from the last 30 days.

bash
curl "https://api.askverdict.ai/api/recommendations?limit=10" \
  -H "Authorization: Bearer vrd_your_api_key"

Each recommendation includes a recommendationReason string — either "Based on your interests" or "Trending on AskVerdict" — so you know how it was selected.


Tips for Improving Your Accuracy Score

Record Outcomes Honestly

The Brier score only improves your decision-making if it reflects reality. Marking everything as correct or always skipping difficult outcomes defeats the purpose. Record what actually happened, even when the verdict was wrong.

Use too_early Appropriately

If you genuinely cannot evaluate a decision yet (a multi-year investment, a product launch that is still in market), mark it too_early rather than guessing. You can update it later when the result is clear.

Focus on High-Stakes Decisions

Low-effort exploratory questions are worth skipping for scoring purposes. Reserve tracked outcomes for decisions where you actually acted on the verdict — those are the ones that produce meaningful calibration data.

Read Your Domain Breakdown

If your technology Brier score is 0.12 and your finance Brier score is 0.39, that tells you something important: you are getting good signal from technical AI verdicts but should apply more skepticism to financial recommendations. Check GET /api/scores/me/domains regularly.

Use Chains for Complex Decisions

Multi-step decisions — where each answer raises a new question — benefit most from chains. Linking debates together lets you see the full trajectory of a decision, including where early verdicts led to follow-ups or reversals.

Set Reminders for Long-Horizon Decisions

For decisions that will take months to evaluate, set a reminder immediately after the debate. A reminder at the right moment is the difference between having a resolved outcome and having an outcome that stays pending indefinitely.

bash
# Set a 6-month reminder immediately after the debate
curl -X PATCH https://api.askverdict.ai/api/debates/dbt_abc123/outcome/reminder \
  -H "Authorization: Bearer vrd_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{ "reminderAt": "2026-08-20T09:00:00Z" }'

Check Similar Debates First

Before running a new debate, search for similar past decisions. If the same question — or a very close variant — has already been debated and resolved, you may already have the answer.

bash
curl "https://api.askverdict.ai/api/similar?q=your+question+here" \
  -H "Authorization: Bearer vrd_your_api_key"

Calibration quality improves significantly at around 20 resolved outcomes. Before that threshold, treat your score as directional rather than definitive — the trend matters more than the absolute number at this stage.

Was this page helpful?