Article Analyst

Reads with memory. Judges. Proves every claim.

An article analyst built to make its thinking visible — every decision traced, every source named.

Shivam Bhatnagar  ·  May 2026

Most AI agents are black boxes. You put something in, something comes out, and you have no idea what happened in between.

This makes them impossible to debug, impossible to trust, and impossible to improve systematically.

The Article Analyst was built to prove a different model is possible: one where every decision leaves a trace, every claim has a source, and the agent's internal thinking is a designed artifact — not a mystery.

Reads articles
Ingests URLs, extracts clean text, classifies by domain, content type, and author stance
Remembers
Queries its own memory before forming judgments. "What have I read about this before?"
Decides
At runtime, the model picks between searching the web or stopping. Genuine discretion — not a hardcoded rule.
Proves
Every claim is grounded: article source, past brief, web search, or model knowledge. Citations are verifiable.
  1. 1
    Ingest

    Article enters — URL is fetched, boilerplate stripped, clean text extracted. No model call. No decision yet.

  2. 2
    Classify

    Agent assigns domain, content type, and author stance. The model labels the article before forming any judgment.

  3. 3
    Remember

    Prior briefs from the same domain are pulled from memory. The agent enters analysis already knowing what it's read before.

  4. 4
    Decide the fork

    With context in hand, the model chooses: verify claims with web search, or stop here. Genuine discretion — not a rule.

  5. 5
    Ground

    Every claim in the final brief is tagged: article source, past brief, web search, or model knowledge. Everything is verifiable.

→ Explore a Real Trace
Trace: run-019e6027
"AI Agents in 2025: Expectations vs. Reality"
IBM Technology Blog
7 phases 3.2k words 2 tool calls 6 grounded claims
Phase — of 7
System State
Fetch methodreadability-lxml
HTTP status200
Latency1.8s
Article length3,247 words
Agent Rationale

This phase is mechanical. The agent hasn't thought yet.

Pure Python: fetch URL, strip boilerplate, measure length. No model call. No decision.

System State
Schema validationpassed
ClassificationResultwritten
Agent Rationale
Domain
technology
Content type
analysis
Author stance
neutral-informative
Model's reasoning

"The article analyzes the gap between 2025 AI agent expectations and actual deployment patterns. It's neither promotional nor critical — balanced industry analysis."

System State
Querydomain='technology'
LIMIT3
Rows returned3
Drawer open time4ms
Agent Rationale

The model receives 3 prior briefs as context. It doesn't choose them — Python did.

  • "AI Agents Will Eat Software" Mar 2026 · agentic software will replace SaaS workflows
  • "The Multi-Agent Debate" Apr 2026 · coordinating agents creates emergent behavior
  • "Untapped AI UX Opportunities" May 2026 · interface design for AI is the next frontier
System State
Schema validationpassed
BriefResult fields10 / 10
structured_output latency4.2s
Agent Rationale
Core thesis

"IBM argues that while 2025 saw massive agent experimentation, most deployments remain narrow single-task automations — the multi-agent coordination layer is still missing."

Key insight

"The gap between expectations and reality isn't technical capability — it's coordination architecture."

Credibility
HIGH
Response readiness
NOT_READY — search warranted
Recommendation
DEEP_READ
System State
Tool menu size2
bind_tools latency0.3s
Agent Rationale
search_web(query, rationale) Chosen
Search the web for current information. Use when the article makes empirical claims that need verification.
stop_here(rationale)
Stop analysis here. Use when the article is self-contained or no external search would add meaningful value.
Model's rationale

"IBM makes specific claims about 2025 deployment patterns. Verifying whether other sources corroborate the narrow single-task finding would strengthen the credibility assessment."

Search queries
  • multi-agent coordination challenges 2025 deployment
  • AI agents narrow task automation vs general purpose 2025
  • enterprise AI agent deployment statistics 2025
System State
search_revision_count0 → 1
MAX_SEARCH_REVISIONS1
Loop guardactive
Agent Rationale

Search results arrived. The evaluator reads them against the mission.

Retry First search: off-target

"Search results drifted toward general AI news and vendor announcements. The mission was to verify IBM's claim about narrow single-task deployment patterns. None of the current results directly address enterprise deployment statistics."

Refinement: Focus specifically on enterprise deployment data and surveys — not vendor product announcements. What percentage of enterprise AI agent deployments are single-task vs multi-agent?

↓ loops back with refined query ↓
Refined query
  • enterprise AI agent deployment survey 2025 single task multi agent percentage
Proceed Second search: on target

"New results include a McKinsey survey on enterprise AI deployment patterns and an Accenture report on agent automation maturity. Both directly address the deployment landscape IBM describes."

Revision 1 of 1 (at limit — would force proceed on next attempt)
System State
claims table rows6 written
FK integrity0 orphans
SQLite write latency12ms
Agent Rationale
  • article
    "most deployments remain narrow single-task automations" article paragraph 4
  • past brief
    "agentic software will replace SaaS workflows" brief_142
  • model
    "The coordination layer remains the unsolved problem" no external source — model knowledge
  • article
    "IBM's framing aligns with broader industry consensus" article synthesis
01
Tool descriptions are the real prompt

When the agent had only search_web, it searched 100% of the time. Fixing the stop_here docstring flipped behavior to 60/40. Behavior is shaped by how you describe options — not just what options exist.

02
Retrieval is architecture, not AI

Librarian (Python/SQL) and analyst (model) are separate roles. The retrieval backend was swapped from keyword to semantic without touching the graph. Separate the mechanical from the cognitive — keep them cleanly isolated.

03
Citations make the agent falsifiable

Every claim is tagged with source. The agent doesn't say "trust me." It says "verify me." A wrong claim with a citation you can check beats a correct claim you can't verify. Falsifiability is the beginning of trust.