An article analyst built to make its thinking visible — every decision traced, every source named.
Most AI agents are black boxes. You put something in, something comes out, and you have no idea what happened in between.
This makes them impossible to debug, impossible to trust, and impossible to improve systematically.
The Article Analyst was built to prove a different model is possible: one where every decision leaves a trace, every claim has a source, and the agent's internal thinking is a designed artifact — not a mystery.
Article enters — URL is fetched, boilerplate stripped, clean text extracted. No model call. No decision yet.
Agent assigns domain, content type, and author stance. The model labels the article before forming any judgment.
Prior briefs from the same domain are pulled from memory. The agent enters analysis already knowing what it's read before.
With context in hand, the model chooses: verify claims with web search, or stop here. Genuine discretion — not a rule.
Every claim in the final brief is tagged: article source, past brief, web search, or model knowledge. Everything is verifiable.
This phase is mechanical. The agent hasn't thought yet.
Pure Python: fetch URL, strip boilerplate, measure length. No model call. No decision.
"The article analyzes the gap between 2025 AI agent expectations and actual deployment patterns. It's neither promotional nor critical — balanced industry analysis."
The model receives 3 prior briefs as context. It doesn't choose them — Python did.
"IBM argues that while 2025 saw massive agent experimentation, most deployments remain narrow single-task automations — the multi-agent coordination layer is still missing."
"The gap between expectations and reality isn't technical capability — it's coordination architecture."
"IBM makes specific claims about 2025 deployment patterns. Verifying whether other sources corroborate the narrow single-task finding would strengthen the credibility assessment."
Search results arrived. The evaluator reads them against the mission.
"Search results drifted toward general AI news and vendor announcements. The mission was to verify IBM's claim about narrow single-task deployment patterns. None of the current results directly address enterprise deployment statistics."
Refinement: Focus specifically on enterprise deployment data and surveys — not vendor product announcements. What percentage of enterprise AI agent deployments are single-task vs multi-agent?
"New results include a McKinsey survey on enterprise AI deployment patterns and an Accenture report on agent automation maturity. Both directly address the deployment landscape IBM describes."
When the agent had only search_web, it searched 100% of the time. Fixing the stop_here docstring flipped behavior to 60/40. Behavior is shaped by how you describe options — not just what options exist.
Librarian (Python/SQL) and analyst (model) are separate roles. The retrieval backend was swapped from keyword to semantic without touching the graph. Separate the mechanical from the cognitive — keep them cleanly isolated.
Every claim is tagged with source. The agent doesn't say "trust me." It says "verify me." A wrong claim with a citation you can check beats a correct claim you can't verify. Falsifiability is the beginning of trust.