Model Context Protocol: giving AI agents real-time web access

What is MCP?

The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024 that defines how AI models interact with external tools and data sources. Built on JSON-RPC 2.0, it gives any compatible AI agent a single, uniform interface to call external systems — whether that's a database, an API, or a web scraper.

Think of it as a universal adapter for LLMs. Instead of building custom integrations for every model and every tool, you implement MCP once and any compatible client can connect.

In December 2025, Anthropic donated MCP to the Agentic AI Foundation (a Linux Foundation project co-founded with OpenAI and Block), cementing its status as a vendor-neutral industry standard. Today it has over 97 million monthly SDK downloads, 10,000+ active servers, and first-class support in Claude, ChatGPT, Cursor, Gemini, VS Code, and Microsoft Copilot.

Why agents need live web access

AI agents are only as useful as the information they can act on. Training data goes stale — product prices change, news breaks, documentation gets updated. Without a way to reach the live web, agents are limited to what they already know.

Practical use cases that require real-time data:

Researching competitors and market trends
Monitoring websites for changes and triggering workflows
Gathering structured data for analysis pipelines
Verifying facts against live sources before acting

MCP solves the plumbing problem: it gives agents a standardized way to request this data without the agent needing to know anything about HTTP clients, browser automation, or HTML parsing.

What's new in the November 2025 spec

The latest MCP specification (2025-11-25) shipped several major additions:

Async tasks — Servers can now create long-running tasks, return a handle immediately, and stream progress updates until the result is ready. This is key for workloads like page rendering, document indexing, or crawling multi-page sites.

Streamable HTTP transport — Replaces server-sent events with a bi-directional HTTP transport that supports chunked transfer encoding and progressive delivery over a single connection. More scalable and easier to deploy behind standard infrastructure.

Tool output schemas — Servers can now declare the shape of what a tool returns. Clients and models know the output structure ahead of time, which enables better reasoning and reduces hallucinations about response format.

OAuth 2.1 authorization — A formal auth framework based on OAuth 2.1 with Protected Resource Metadata discovery and OpenID Connect support. Tokens are tightly scoped per server, reducing over-privileged access.

How Tentacrawl implements MCP

Tentacrawl exposes a standard MCP server that any compatible agent framework can connect to. The agent describes what it needs; Tentacrawl handles browser orchestration, JavaScript rendering, and content extraction.

{
  "tool": "tentacrawl_extract",
  "parameters": {
    "url": "https://news.ycombinator.com",
    "format": "markdown",
    "selector": ".titleline"
  }
}

The response comes back as clean, structured content the agent can reason about immediately — no HTML noise, no pagination logic to write.

Request lifecycle

Validation — parameters are checked and the job is queued
Browser orchestration — a headless Chromium instance navigates to the URL
Rendering — JavaScript executes, dynamic content loads
Extraction — the relevant content is identified and isolated
Formatting — raw HTML is converted to Markdown or JSON
Response — clean data flows back through the MCP channel

Most pages complete the full round-trip in under 3 seconds.

Setting it up

Connecting Tentacrawl to an MCP-compatible agent takes one configuration block:

# mcp-config.yaml
servers:
  tentacrawl:
    endpoint: http://localhost:8080
    tools:
      - tentacrawl_extract
      - tentacrawl_search
      - tentacrawl_monitor

Point your agent framework at this config and the tools are immediately available.

What this unlocks

With MCP + Tentacrawl, agents can:

Browse any public website and get back structured data
Monitor pages for changes and trigger downstream workflows
Search across multiple sites and synthesize results
Verify claims against live sources before taking action

All without writing scraping code, managing browser infrastructure, or dealing with anti-bot measures. The agent focuses on reasoning; Tentacrawl handles the web.