Overview of an AI Agent System Based on A2A and MCP Architectures
Overview of an AI Agent System Based on A2A and MCP Architectures
This article introduces how to build efficient AI agent systems by leveraging Google’s A2A (Agent-to-Agent) protocol and Anthropic’s MCP (Model Context Protocol). It not only discusses the differences in data precision requirements among humans, AI models, and software but also details the internal structure of AI agents and how they interact with external systems (such as MCP servers).
1. The Difference Between Imprecise and Precise Data
Imprecise Data
- Exists in forms such as natural language, images, audio, etc. (fuzzy information)
- While humans and some AI models can understand it relatively well, the output may contain ambiguity
- Suitable for presentation to humans or for further processing
Precise Data
- Exists in forms such as scalars, JSON, CSV, etc.
- Software can parse such data strictly according to the defined format
- Under special circumstances, AI models or humans might try to generate precise data, but there is a higher risk of failure
For example:
- Input: “On June 1st of 2015, John Smith joined Microsoft as a full-time employee.”
After processing, an AI model might output a JSON object that complies with a defined schema and can be used by subsequent software; - However, directly feeding fuzzy data into software can lead to parsing errors or even exceptions.
2. Building AI Agents with the A2A Protocol
2.1 Core Capabilities of an AI Agent
- Each AI agent specializes in a particular set of skills and generates an execution plan based on user input or other agent tasks.
- AI agents are exposed as HTTP services that accept imprecise inputs and return imprecise outputs.
- Their descriptive attributes (such as name, skills, and functions) are all represented as imprecise information, allowing humans or other AI to decide which agent to invoke.
2.2 Multi-Agent Task Collaboration
- When a task is initiated, a conversation unfolds between the client and server agents, with every interaction orchestrated by an internal scheduler.
- As each AI model is stateless, maintaining the complete conversation history and filtering out irrelevant information becomes critical.
- The orchestrator can also employ Retrieval-Augmented Generation (RAG) techniques to inject relevant knowledge into prompts, thereby assisting the model’s decision-making.
2.3 The Workflow of the Orchestrator
- It collects user input along with contextual information to generate a task execution plan.
- Based on the agent’s registered list of available tools, it determines whether additional tools need to be invoked for specific tasks (such as calendar updates, map queries, etc.).
- It continues to loop through interactions until the task is complete, finally generating the agent’s response or output.
3. Tools, Resources, and Prompts under the MCP Protocol
Anthropic’s MCP provides standardized interfaces among various components, enabling AI agents to seamlessly invoke external system functions. The main parts include:
3.1 MCP Server Tools (Functions)
- Each MCP server exposes multiple tools, each defined by a name, a description, and a precise JSON input schema.
- For example, there might be a “calculate_sum” tool which requires two numbers as input, or an “execute_command” tool that runs shell commands.
- The orchestrator integrates the tool information into conversation prompts so that the AI model can return the tool’s name and parameters when needed.
3.2 Resource Management
- MCP servers list the available resources that can be read, including files, API endpoints, and so on. For each resource, they provide a name, URI, MIME type, and description.
- Some resource URIs use templated formats, where certain variables must be filled in by the user or agent to form a complete link.
- Once an appropriate resource is selected, the agent will request its data and embed that information into the next round of prompts.
3.3 Prompts and Parameters
- In addition to tools, MCP servers may also provide suggested prompts (e.g., quick actions in a command palette).
- Each prompt includes a name, a description, and parameter requirements so that the agent can determine which prompt to invoke based on context.
- The parameters may require human confirmation before being passed on to ensure that the AI model receives the correct information.
4. UI-Based AI Agents (Copilots)
- In addition to backend AI agents exposed as HTTP services, client applications or websites can implement their own AI assistants with user interfaces, known as “Copilots.”
- A UI agent typically features a chat-like interface where users enter requests and see the agent’s responses.
- Such agents also integrate an orchestrator, contextual knowledge, and MCP capabilities; if necessary, they can even act as an MCP server themselves to enable local function calls.
- The UI design aligns with modern applications, providing users with an intuitive intelligent collaboration experience.
5. Conclusion
This overview outlines the multi-layered ideas for building an AI agent system:
- The need to convert imprecise inputs into precise JSON outputs when required;
- Using the A2A protocol to build multi-turn conversational agents based on HTTP services;
- Incorporating MCP-driven tools, resources, and prompts to facilitate seamless integration between agents and software functions;
- And finally, the emergence of UI-based agents (Copilots) that provide an intuitive user interface for intelligent interaction.
This architecture ensures that humans, AI models, and software can collaborate in their areas of strength, creating a more robust and efficient AI agent system.