Article Analyst — Case Study & Trace Explorer

Interactive trace

Trace: run-019e6027

"AI Agents in 2025: Expectations vs. Reality"

IBM Technology Blog

7 phases 3.2k words 2 tool calls 6 grounded claims

Phase — of 7

Ingest

Fetched 3,247 words via readability-lxml in 1.8s

▾

System State

Fetch methodreadability-lxml

HTTP status200

Latency1.8s

Article length3,247 words

Agent Rationale

This phase is mechanical. The agent hasn't thought yet.

Pure Python: fetch URL, strip boilerplate, measure length. No model call. No decision.

Classify

Technology · Analysis · Neutral-informative

▾

System State

Schema validationpassed

ClassificationResultwritten

Agent Rationale

Domain

technology

Content type

analysis

Author stance

neutral-informative

Model's reasoning

"The article analyzes the gap between 2025 AI agent expectations and actual deployment patterns. It's neither promotional nor critical — balanced industry analysis."

Remember

3 prior briefs retrieved — all from technology domain

▾

System State

Querydomain='technology'

LIMIT3

Rows returned3

Drawer open time4ms

Agent Rationale

The model receives 3 prior briefs as context. It doesn't choose them — Python did.

"AI Agents Will Eat Software" Mar 2026 · agentic software will replace SaaS workflows
"The Multi-Agent Debate" Apr 2026 · coordinating agents creates emergent behavior
"Untapped AI UX Opportunities" May 2026 · interface design for AI is the next frontier

Analyze

HIGH credibility · NOT_READY · DEEP_READ

▾

System State

Schema validationpassed

BriefResult fields10 / 10

structured_output latency4.2s

Agent Rationale

Core thesis

"IBM argues that while 2025 saw massive agent experimentation, most deployments remain narrow single-task automations — the multi-agent coordination layer is still missing."

Key insight

"The gap between expectations and reality isn't technical capability — it's coordination architecture."

Credibility

HIGH

Response readiness

NOT_READY — search warranted

Recommendation

DEEP_READ

Decide

Chose search_web · 3 queries generated

▾

System State

Tool menu size2

bind_tools latency0.3s

Agent Rationale

search_web(query, rationale) Chosen

Search the web for current information. Use when the article makes empirical claims that need verification.

stop_here(rationale)

Stop analysis here. Use when the article is self-contained or no external search would add meaningful value.

Model's rationale

"IBM makes specific claims about 2025 deployment patterns. Verifying whether other sources corroborate the narrow single-task finding would strengthen the credibility assessment."

Search queries

multi-agent coordination challenges 2025 deployment
AI agents narrow task automation vs general purpose 2025
enterprise AI agent deployment statistics 2025

Evaluate ★

RETRY → loop → PROCEED — the agent corrects itself

▾

System State

search_revision_count0 → 1

MAX_SEARCH_REVISIONS1

Loop guardactive

Agent Rationale

Search results arrived. The evaluator reads them against the mission.

Retry First search: off-target

"Search results drifted toward general AI news and vendor announcements. The mission was to verify IBM's claim about narrow single-task deployment patterns. None of the current results directly address enterprise deployment statistics."

↓ loops back with refined query ↓

Refined query

enterprise AI agent deployment survey 2025 single task multi agent percentage

Proceed Second search: on target

"New results include a McKinsey survey on enterprise AI deployment patterns and an Accenture report on agent automation maturity. Both directly address the deployment landscape IBM describes."

Revision 1 of 1 (at limit — would force proceed on next attempt)

Ground

6 claims written · 0 orphans · all sources tagged

▾

System State

claims table rows6 written

FK integrity0 orphans

SQLite write latency12ms

Agent Rationale

article
"most deployments remain narrow single-task automations" article paragraph 4
past brief
"agentic software will replace SaaS workflows" brief_142
web search
"82% of enterprise AI deployments are single-task" McKinsey 2025 · verified URL
web search
"only 12% of organizations have deployed multi-agent systems" Accenture · verified URL
model
"The coordination layer remains the unsolved problem" no external source — model knowledge
article
"IBM's framing aligns with broader industry consensus" article synthesis

Three lessons

Tool descriptions are the real prompt

When the agent had only search_web, it searched 100% of the time. Fixing the stop_here docstring flipped behavior to 60/40. Behavior is shaped by how you describe options — not just what options exist.

Retrieval is architecture, not AI

Librarian (Python/SQL) and analyst (model) are separate roles. The retrieval backend was swapped from keyword to semantic without touching the graph. Separate the mechanical from the cognitive — keep them cleanly isolated.

Citations make the agent falsifiable

Every claim is tagged with source. The agent doesn't say "trust me." It says "verify me." A wrong claim with a citation you can check beats a correct claim you can't verify. Falsifiability is the beginning of trust.

Reads with memory. Judges. Proves every claim.