Adapstory Docs
Plugins

MCP tools

How LLMs invoke your plugin through the Plugin Gateway.

MCP (Model Context Protocol) is how LLMs in Adapstory discover and invoke capabilities your plugin exposes. The Plugin Gateway (BC-01) sits between the LLM and your code.

Anatomy of a tool call

LLM ──▶ Plugin Gateway (BC-01) ──▶ Your plugin (/mcp/<tool>)

           ├─ auth: service-account JWT scoped to (tenant, plugin)
           ├─ guardrails: content filter, prompt-injection check
           ├─ budget:     token + per-call cost check
           └─ telemetry:  OpenTelemetry span starts here

Your plugin sees a normal HTTP request with:

  • Authorization: Bearer <jwt> — the calling user's identity
  • X-Tenant-Id: <tenant> — the tenant context
  • X-Correlation-Id: <uuid> — joins the trace
  • X-Llm-Session: <id> — the conversation this call is part of
  • JSON body matching your schema

Writing the handler

Keep handlers small. The pattern is:

  1. Validate args against your JSON-Schema (the SDK does this automatically).
  2. Call core APIs or the LLM Gateway for the actual work.
  3. Return a JSON result matching your returnSchema (if declared).
# src/hello_learner/tools.py
from adapstory_plugin_sdk import mcp_tool, LlmGateway, Context

@mcp_tool(name="hello_learner")
async def hello_learner(
    learner_name: str,
    *,
    ctx: Context,
    gateway: LlmGateway,
) -> dict:
    ctx.logger.info("greeting", learner=learner_name)
    response = await gateway.chat(
        model=ctx.plugin.defaultModel,
        messages=[
            {"role": "system", "content": "Be brief and encouraging."},
            {"role": "user",   "content": f"Say hi to {learner_name}."},
        ],
    )
    return {"message": response.text}

Streaming

For tools that emit partial results (summaries, generations), set streaming: true in the manifest and return Server-Sent Events:

@mcp_tool(name="summarize_course", streaming=True)
async def summarize_course(course_id: str, *, gateway: LlmGateway):
    async for chunk in gateway.stream(
        model="claude-sonnet-4-6",
        messages=[...],
    ):
        yield { "delta": chunk.text }

The Gateway forwards SSE to the LLM client, buffering appropriately.

Cancellation

The Gateway sends an X-Llm-Cancel: <correlation-id> header when the upstream LLM drops. Respect it:

async for chunk in gateway.stream(...):
    if ctx.cancelled:
        break
    yield { "delta": chunk.text }

SDK propagates the cancellation to in-flight Gateway calls.

Budgets and back-pressure

If BC-01 returns 429 Too Many Requests with X-LLM-Budget-Reset, do not retry. Return a degraded result (cached, heuristic, or empty) and let the client decide.

Per-tool rate limits (declared in manifest.rateLimit) are enforced by the Gateway. You don't need to implement them.

Error contracts

Return structured errors so the LLM can recover:

from adapstory_plugin_sdk import ToolError

raise ToolError(
    code="learner_not_found",
    message="No learner with that ID in this tenant.",
    retryable=False,
    suggestion="Check that the learner has been enrolled.",
)

The Gateway converts this to a JSON response the LLM can reason about.

Observability contract

Every @mcp_tool-decorated handler automatically:

  • Starts an OpenTelemetry span plugin.{name}.tool.{tool_name} with attributes tenant.id, user.id, llm.session.id.
  • Emits a log line at INFO on entry + exit.
  • Reports duration + outcome to Prometheus via the OTLP collector.

You don't have to instrument any of this. Do add ctx.logger.info(...) with business context though.

Next: Lifecycle.

On this page