Model Context Protocol: giving AI agents real-time web access
MCP has become the standard interface for connecting LLMs to external tools. Here's how it works and why it matters.
What is MCP?
The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024 that defines how AI models interact with external tools and data sources. Built on JSON-RPC 2.0, it gives any compatible AI agent a single, uniform interface to call external systems — whether that's a database, an API, or a web scraper.
Think of it as a universal adapter for LLMs. Instead of building custom integrations for every model and every tool, you implement MCP once and any compatible client can connect.
In December 2025, Anthropic donated MCP to the Agentic AI Foundation (a Linux Foundation project co-founded with OpenAI and Block), cementing its status as a vendor-neutral industry standard. Today it has over 97 million monthly SDK downloads, 10,000+ active servers, and first-class support in Claude, ChatGPT, Cursor, Gemini, VS Code, and Microsoft Copilot.
Why agents need live web access
AI agents are only as useful as the information they can act on. Training data goes stale — product prices change, news breaks, documentation gets updated. Without a way to reach the live web, agents are limited to what they already know.
Practical use cases that require real-time data:
- Researching competitors and market trends
- Monitoring websites for changes and triggering workflows
- Gathering structured data for analysis pipelines
- Verifying facts against live sources before acting
MCP solves the plumbing problem: it gives agents a standardized way to request this data without the agent needing to know anything about HTTP clients, browser automation, or HTML parsing.
What's new in the November 2025 spec
The latest MCP specification (2025-11-25) shipped several major additions:
Async tasks — Servers can now create long-running tasks, return a handle immediately, and stream progress updates until the result is ready. This is key for workloads like page rendering, document indexing, or crawling multi-page sites.
Streamable HTTP transport — Replaces server-sent events with a bi-directional HTTP transport that supports chunked transfer encoding and progressive delivery over a single connection. More scalable and easier to deploy behind standard infrastructure.
Tool output schemas — Servers can now declare the shape of what a tool returns. Clients and models know the output structure ahead of time, which enables better reasoning and reduces hallucinations about response format.
OAuth 2.1 authorization — A formal auth framework based on OAuth 2.1 with Protected Resource Metadata discovery and OpenID Connect support. Tokens are tightly scoped per server, reducing over-privileged access.
How Tentacrawl implements MCP
Tentacrawl exposes a standard MCP server that any compatible agent framework can connect to. The agent describes what it needs; Tentacrawl handles browser orchestration, JavaScript rendering, and content extraction.
{
"tool": "tentacrawl_extract",
"parameters": {
"url": "https://news.ycombinator.com",
"format": "markdown",
"selector": ".titleline"
}
}
The response comes back as clean, structured content the agent can reason about immediately — no HTML noise, no pagination logic to write.
Request lifecycle
- Validation — parameters are checked and the job is queued
- Browser orchestration — a headless Chromium instance navigates to the URL
- Rendering — JavaScript executes, dynamic content loads
- Extraction — the relevant content is identified and isolated
- Formatting — raw HTML is converted to Markdown or JSON
- Response — clean data flows back through the MCP channel
Most pages complete the full round-trip in under 3 seconds.
Setting it up
Connecting Tentacrawl to an MCP-compatible agent takes one configuration block:
# mcp-config.yaml
servers:
tentacrawl:
endpoint: http://localhost:8080
tools:
- tentacrawl_extract
- tentacrawl_search
- tentacrawl_monitor
Point your agent framework at this config and the tools are immediately available.
What this unlocks
With MCP + Tentacrawl, agents can:
- Browse any public website and get back structured data
- Monitor pages for changes and trigger downstream workflows
- Search across multiple sites and synthesize results
- Verify claims against live sources before taking action
All without writing scraping code, managing browser infrastructure, or dealing with anti-bot measures. The agent focuses on reasoning; Tentacrawl handles the web.