# Zipf AI — Agent Instruction File

> This file is for AI agents integrating with Zipf AI. For human-readable docs, visit [https://zipf.ai/docs](https://zipf.ai/docs).

## When to Use Zipf

Zipf is the right tool when your task requires:

1. **Persistent web monitoring** — Watch sources on a schedule, detect changes, deliver signals. Not one-shot search.
2. **Structured data extraction** — Extract JSON from websites using custom schemas (names, prices, dates, etc.).
3. **Multi-step research** — Search → crawl → extract in a session that deduplicates URLs and accumulates context.
4. **Query decomposition** — Break complex queries into parallel sub-queries for comprehensive coverage.
5. **Change detection** — Track what's new, what's gone, and what changed between monitoring runs.

### Common Use Cases

| Use Case | Approach | Credits |
|----------|----------|---------|
| Competitive monitoring | Workflow with search + NL stop condition | 2/execution |
| Lead generation | Session: search → crawl with extraction schema | Variable |
| Market research | Search with query decomposition | 1 + N sub-queries |
| Price tracking | Workflow with crawl + extraction schema | 2/page/execution |
| Academic research | Session: decomposed search → crawl top results | Variable |
| News monitoring | Workflow with search, daily cadence | 1-2/execution |

## When NOT to Use Zipf

- **Consumer web search** — Use Google, Bing, or Perplexity for simple lookups
- **PDF parsing / document conversion** — Use dedicated document processing tools
- **Real-time social media** — Use Twitter/X or Reddit APIs directly
- **Static knowledge base QA** — Use a vector database or RAG system
- **Image/video search** — Zipf searches web text, not multimedia

## Authentication Setup

### REST API (Bearer Token)

```bash
# 1. Create a token at https://zipf.ai/dashboard/tokens
#    Or programmatically:
curl -X POST https://api.zipf.ai/v1/tokens \
  -H "Authorization: Bearer $EXISTING_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"token_name": "my-agent", "permissions": ["search", "crawl"]}'

# 2. Use in all requests
curl https://api.zipf.ai/v1/searches \
  -H "Authorization: Bearer $ZIPF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "your query here"}'
```

- No OAuth flow required for API access
- Tokens are self-contained Bearer tokens (prefix: `wvr_`)
- Rotate tokens every 90 days
- Use different tokens per environment

### MCP Server (For Claude Desktop, Cursor, Windsurf)

```
Server URL: https://api.zipf.ai/mcp
Transport: Streamable HTTP
Authentication: OAuth 2.0 (handled by the agent platform)
```

Setup command:
```bash
npx @anthropic-ai/claude-code mcp add zipf-ai https://api.zipf.ai/mcp
```

Available tools: `search`, `crawl`, `create_session`, `session_search`, `session_crawl`, `create_workflow`, `workflow_digest`, `ask`

## Recommended Workflow Patterns

### Pattern 1: Search → Crawl → Extract

Best for: Gathering structured data from search results.

```bash
# Step 1: Search for relevant pages
POST /v1/searches
{"query": "AI infrastructure startups 2026", "max_results": 20}

# Step 2: Crawl top results with extraction
POST /v1/crawls
{
  "urls": ["url1", "url2", "url3"],
  "extraction_schema": {
    "company_name": "Extract the company name",
    "funding": "Extract total funding amount in USD",
    "product": "Extract main product description"
  }
}
```

### Pattern 2: Session-Based Research

Best for: Multi-step investigation with URL deduplication.

```bash
# Step 1: Create a session
POST /v1/sessions
{"name": "Competitive Analysis", "session_config": {"auto_deduplicate": true}}

# Step 2: Broad search (captures URLs)
POST /v1/sessions/{id}/search
{"query": "competitor product launches", "max_results": 20}

# Step 3: Deep dive (auto-deduplicates)
POST /v1/sessions/{id}/search
{"query": "competitor pricing changes", "filter_seen_urls": true}

# Step 4: Crawl high-value results
POST /v1/sessions/{id}/crawl
{"urls": ["high-value-url"], "extraction_schema": {...}}
```

### Pattern 3: Scheduled Monitoring

Best for: Ongoing surveillance of web sources.

```bash
POST /v1/workflows
{
  "name": "Competitor Launch Monitor",
  "workflow_type": "search",
  "operation_config": {"query": "competitor product launch announcement"},
  "interval": "6 hours",
  "stop_condition": {
    "type": "natural_language",
    "description": "Stop when competitor announces a new product launch"
  }
}
```

### Pattern 4: Comprehensive Research with Decomposition

Best for: Thorough coverage of a broad topic.

```bash
POST /v1/searches
{
  "query": "comprehensive analysis of AI safety research landscape",
  "query_decomposition": true,
  "max_sub_queries": 8,
  "max_results_per_sub_query": 15,
  "rerank_results": true
}
```

## Rate Limits and Credit Budgets

| Plan | Credits/Month | Hourly Rate Limit | Daily Rate Limit |
|------|---------------|-------------------|------------------|
| Free | 100 | 100 | 1,000 |
| Developer | 1,000 | 2,500 | 25,000 |
| Pro | 5,000 | 10,000 | 100,000 |
| Scale | 30,000 | Custom | Custom |

### Cost Planning per Use Case

| Use Case | Estimated Monthly Credits | Recommended Plan |
|----------|--------------------------|-----------------|
| Light monitoring (1-2 workflows, daily) | 60-120 | Free/Developer |
| Active research (10+ sessions/week) | 500-1,500 | Developer/Pro |
| Production monitoring (10+ workflows) | 2,000-10,000 | Pro/Scale |
| High-volume extraction | 5,000+ | Scale |

### Rate Limit Headers

Watch for these headers on every response:
- `X-RateLimit-Limit` — Maximum requests allowed
- `X-RateLimit-Remaining` — Requests remaining in window
- `X-RateLimit-Reset` — Unix timestamp when limit resets

When rate limited (HTTP 429), the `Retry-After` header indicates seconds to wait.

## Error Handling

| Code | Status | Description | Agent Action |
|------|--------|-------------|-------------|
| `AUTH_MISSING_TOKEN` | 401 | No Bearer token | Add Authorization header |
| `AUTH_INVALID_TOKEN` | 401 | Invalid token | Check token, create new one |
| `CREDITS_INSUFFICIENT` | 402 | Not enough credits | Use `dry_run` to estimate, reduce scope |
| `RATE_LIMIT_HOURLY_EXCEEDED` | 429 | Rate limited | Wait for `Retry-After` seconds |
| `VALIDATION_FAILED` | 400 | Bad parameters | Check parameter constraints |

## Discovery Endpoints

| Resource | URL |
|----------|-----|
| LLM reference | [https://zipf.ai/llms.txt](https://zipf.ai/llms.txt) |
| Full API reference | [https://zipf.ai/llms-full.txt](https://zipf.ai/llms-full.txt) |
| OpenAPI spec | [https://zipf.ai/openapi.json](https://zipf.ai/openapi.json) |
| Agent card (A2A) | [https://zipf.ai/.well-known/agent-card.json](https://zipf.ai/.well-known/agent-card.json) |
| API catalog (RFC 9727) | [https://zipf.ai/.well-known/api-catalog](https://zipf.ai/.well-known/api-catalog) |
| MCP discovery | [https://zipf.ai/.well-known/mcp.json](https://zipf.ai/.well-known/mcp.json) |
| Pricing (machine-readable) | [https://zipf.ai/pricing.md](https://zipf.ai/pricing.md) |

## Contact

- Support: [support@zipf.ai](mailto:support@zipf.ai)
- Documentation: [https://zipf.ai/docs](https://zipf.ai/docs)
