Skills vs Dynamic MCP Loadouts
Briefly

Skills vs Dynamic MCP Loadouts
"When the agent encounters a tool definition through reinforcement learning or otherwise, it is encouraged to emit tool calls through special tokens when it encounters a situation where that tool call would be appropriate. For all intents and purposes, tool definitions can only appear between special tool definition tokens in a system prompt. Historically this means that you cannot emit tool definitions later in the conversation state. So your only real option is for a tool to be loaded when the conversation starts."
"In agentic uses, you can of course compress your conversation state or change the tool definitions in the system message at any point. But the consequence is that you will lose the reasoning traces and also the cache. In the case of Anthropic, for instance, this will make your conversation significantly more expensive. You would basically start from scratch and pay full token rates plus cache write cost, compared to cache read."
Tool definitions must appear between special tool-definition tokens in the system prompt, preventing emission of new tool definitions later in a conversation. Loading tools only at conversation start preserves reasoning traces and cache benefits; changing tool definitions mid-conversation loses reasoning traces and forces cache writes, increasing costs. Deferred tool loading delays injection of predeclared tools until later in the interaction, but still requires the set of possible tools to remain static for the entire conversation. Migrating MCPs to skills and wiring MCP calls via code are possible workarounds, but deferred loading does not enable truly dynamic tool sets or eliminate core MCP limitations and complexity. Consequently, deferred loading reduces initial token load but does not resolve state and cost tradeoffs inherent to MCPs.
[
|
]