There's a seductive idea circulating in AI infrastructure right now: if we just compile knowledge better — pre-process documents into structured artifacts, serve them faster, reduce token spend — agents will finally work.
It's wrong. Or more precisely, it's half the problem. And the missing half is why we built Zipf.
The Knowledge Compilation Thesis
The pitch goes like this: agents spend 85% of their effort on retrieval loops. They pull chunks, analyze them, discover gaps, pull more. The solution is to shift work upstream — compile raw data into task-optimized artifacts at ingestion time, so agents get exactly what they need in a single pass. Better memory for agents. Faster answers. Fewer tokens.
This is a real improvement over naive RAG. Pre-computing structured representations of your Salesforce data, your 10-K filings, your internal wikis — that's genuinely useful. We've watched several companies ship impressive demos of this pattern.
But the entire thesis rests on an assumption that breaks the moment you point it at the real world: the data sits still.
The Temporal Blindspot
Enterprise knowledge engines are optimized for data at rest. Compile once, serve many. A 10-K filing doesn't change after it's published. Your internal wiki changes slowly. Your CRM records update in predictable ways through known interfaces.
The web doesn't work like this.
A competitor's pricing page changes without notice. A regulatory filing drops at 4pm on a Friday. A news article surfaces that invalidates the analysis your agent compiled last week. A key hire at a target account announces they're leaving — on LinkedIn, not in your CRM.
For any agent operating on external information — competitive intelligence, market monitoring, regulatory compliance, prospect research, supply chain risk — the question isn't "what do I know?" It's "is what I knew still true?"
Knowledge engines can't answer that question. They compile what was true at ingestion time and serve it until someone re-ingests. The gap between reality changing and the compiled artifact reflecting that change is where decisions go wrong.
Agents Need Perception, Not Just Memory
Here's a frame that makes the distinction clear: knowledge engines give agents memory. What do I know about this company? What did this contract say? What were last quarter's numbers?
Memory is necessary. But agents also need perception — the ability to notice that something in the world has changed and that the change matters.
These are fundamentally different capabilities with fundamentally different architectures:
| Capability | Memory (Knowledge Engines) | Perception (Monitoring Agents) |
|---|---|---|
| Optimizes for | Query latency | Time-to-signal |
| Data model | Static corpus, compiled artifacts | Live sources, change detection |
| Failure mode | Stale answers served confidently | Missed signals in noise |
| Core challenge | Retrieval precision | Breadth under power laws |
| Value proposition | Know faster | Know first |
Most agent architectures today have sophisticated memory and no perception at all. They answer questions about a frozen snapshot of the world. The more confidently they answer, the more dangerous the staleness becomes.
The Zipfian Problem
Why is perception hard? Power laws.
The distribution of meaningful changes on the web follows a Zipfian curve. The vast majority of web pages that change on any given day are irrelevant to any particular monitoring intent. The changes that matter — a competitor launching a new product, a regulation being proposed, a key account posting a job listing that signals expansion — are rare and unpredictable.
This creates a paradox: the fewer things that matter, the wider you have to look. You can't pre-decide which sources will produce the signal. You can't compile artifacts from a fixed corpus because the corpus itself is undefined and shifting. You need agents that patrol broadly, apply judgment to what they find, and surface only what crosses a threshold of significance.
This is not a retrieval problem. It's a monitoring problem. And it requires infrastructure that treats time as a first-class dimension, not an afterthought.
This paradox is our namesake. George Kingsley Zipf observed that in any natural corpus, a small number of items account for the majority of occurrences — and the long tail stretches on practically forever. The same law governs web signals: a handful of changes will drive your next decision, but they're buried in a long tail of noise that you can't afford to ignore.
What We Built
Zipf is persistent web monitoring for AI agents and teams. You describe what to watch — in natural language — and Zipf deploys agents that patrol sources on a schedule, understand what changed, ignore what didn't matter, and deliver signals when something does.
The architecture is built around three ideas that distinguish it from both traditional alerting and one-shot search:
Describe, don't configure. You tell Zipf what you care about: "Monitor Series A AI infrastructure companies for product launches, pricing changes, and key hires." Zipf's planning engine decomposes that intent into a multi-step workflow — searches, crawls, extraction schemas, change detection rules — and deploys it. No query syntax. No URL lists. No regex patterns.
Agents that patrol with judgment. Each workflow execution isn't a blind re-run. Agents compare current results against prior executions, score changes by information gain, classify results against the monitoring intent, and filter out noise. A result that's relevant but identical to yesterday's has zero information gain and gets suppressed. A moderately relevant result that's completely new might be the signal that matters.
Self-healing under adversarial conditions. The web fights back. Sites change layouts. Crawlers get blocked. Search results drift. Zipf detects when patrols degrade — zero-result searches, blocked crawls, declining signal quality — and automatically adapts: rewrites queries, adjusts crawl strategies, tries alternate entry points. The monitoring intent stays constant while the tactics evolve.
Here's what an auto-healing spec edit actually looks like — the system detects a blocked URL, finds an alternate path, and adds an intent for more targeted extraction:
The result: signals delivered to Slack, email, webhooks, or CRM — structured, scored, and cited. Not a dashboard you have to check. Not an inbox of raw alerts. Actionable signals that arrive where you already work, only when the change crosses a threshold you control.
How It Works in Practice
A competitive intelligence team describes their monitoring intent:
"Track pricing and packaging changes across Pinecone, Weaviate, Qdrant, and ChromaDB. Flag new tier introductions, pricing restructuring, and feature additions to existing tiers."
Zipf's planning engine generates a workflow that:
- Searches for recent pricing announcements, blog posts, and changelog entries across all four competitors
- Crawls each competitor's pricing page with extraction schemas for tier names, prices, and feature lists
- Compares extracted data against the prior execution's baseline
- Scores each change by information gain — a minor copy edit scores low, a new enterprise tier scores high
- Delivers a structured signal to Slack with what changed, at which competitor, and what it means relative to your positioning
That workflow runs on whatever cadence matches the domain — hourly for fast-moving markets, daily for steady ones, weekly for regulatory landscapes. Each execution builds on the last. The baseline evolves. The agents learn what "normal" looks like for each source, so they can spot what's abnormal.
From Thinking to Knowing
The transformation that matters isn't "agents answer questions faster." It's "agents tell you things you didn't know to ask about."
A knowledge engine helps an agent answer: "What is Company X's current pricing?" A monitoring agent tells you: "Company X changed their pricing page last night. Here's what moved and what it means for your positioning."
The first is reactive. The second is proactive. The first requires someone to think of the question. The second turns thinking into knowing — persistent awareness that operates whether or not anyone is actively asking.
Knowledge engines and monitoring agents aren't competitors. They're complementary layers. One gives your agents memory — what do I know? The other gives them perception — what just changed? The infrastructure stack that wins is the one that has both.
But right now, almost everyone is building memory. Almost no one is building perception. That's the gap, and that's what Zipf exists to close.
Zipf is persistent web monitoring for AI agents and teams. Know first. Act first. Get started at zipf.ai, or read the docs to see how it works.

