Pydantic AI 教學:用型別安全的方式打造 LLM Agent,從第一支程式到上線
做過 LLM 應用的人都知道,最痛的不是接 API,而是模型回傳的東西每次都長得不一樣。Pydantic AI 把 Pydantic 的型別驗證搬進 Agent 開發,讓輸出有結構、工具能被靜態檢查。這篇帶你從安裝寫到進階,附上我自己踩過的坑。
Introduction: Why Model Outputs Are Inconsistent
If you've worked with LLM APIs, you're probably familiar with this scenario: you ask the model to "return a JSON with name and score," and it works fine for the first nine attempts. However, on the tenth attempt, it adds a sentence like "Here's the result:" before the JSON, causing your json.loads() to fail. You then spend hours writing regular expressions to clean the string and if statements to check for fields, resulting in a significant portion of your "AI application" code being dedicated to battling the model's inconsistent output.
I once maintained an internal classification service, and dealing with the model occasionally missing a field took me three days to resolve. After switching to Pydantic AI, I was able to delete most of the defensive code, as the framework handled validation for me. This article will explain how to use it.
What is Pydantic AI
Pydantic AI is a Python Agent framework developed by the Pydantic team. You may have encountered the Pydantic name elsewhere, as it's used by official SDKs from OpenAI, Anthropic, and Google, as well as LangChain and LlamaIndex, for data validation. In other words, when it comes to validation, they are the most qualified people in the ecosystem.
Its design philosophy is similar to FastAPI: it uses Python's native type hints to define behavior clearly, leaving the rest to the framework. The core concepts are simple: Agent, Tools, Dependencies, and Structured Output. You don't need to memorize a bunch of custom abstract classes; writing code is just like writing regular Python, but with auto-completion and type checking (using tools like Pyright or mypy) that can catch errors before you even run the code.
It's model-agnostic, meaning it's not tied to a specific model vendor. With support for over a dozen vendors, including OpenAI, Anthropic, Google, Groq, Coherence, Mistral, and Ollama, switching models usually only requires changing a single string. To understand the difference between an Agent and a regular API call, you can start by reading what is an AI Agent.
Use Cases
In simple terms, Pydantic AI is suitable for any scenario where you need the model to return reliable results:
- Structured extraction: Throw a customer complaint into the model and ask it to output
sentiment,category, andurgencyfields, ensuring the types are correct. - Classification and labeling: Labeling large documents with limited output, ensuring the model's response conforms to your defined Enum.
- Tool-based Agents: Enable the model to call your functions, such as querying a database, calling a weather API, or performing mathematical calculations, with the framework handling the conversion of function types to tool descriptions that the model can understand.
- RAG question-answering: Building a question-answering system with a vector database, which can be referenced in our RAG implementation guide.
Compared to large frameworks like LangChain, Pydantic AI is intentionally lightweight. If you only need to make model outputs more reliable without wanting to adopt an entire ecosystem for a small feature, its learning curve will be much friendlier.
Getting Started
1. Installation
bash
pip install pydantic-ai
It's recommended to create a virtual environment. Use Python 3.9 or later for better compatibility.
2. Setting API Keys
For example, with Anthropic, set an environment variable:
bash
export ANTHROPIC_API_KEY=your_key
For OpenAI, set OPENAI_API_KEY, and so on.
3. Writing Your First Agent
python
from pydantic_ai import Agent
agent = Agent('anthropic:claude-sonnet-4-6')
result = agent.run_sync('Explain what a vector database is in one sentence')
print(result.output)
The first parameter is the model name, in the format vendor:model. To switch to OpenAI, change it to 'openai:gpt-4o', and the rest of the code remains the same – this is the benefit of being model-agnostic.
4. Structuring Outputs
This is the key part. Define a Pydantic model as the output format:
python
from pydantic import BaseModel
from pydantic_ai import Agent
class Review(BaseModel):
sentiment: str # positive / negative / neutral
score: int # 1 to 5
summary: str
agent = Agent('anthropic:claude-sonnet-4-6', output_type=Review)
result = agent.run_sync('The food is delicious but I waited almost an hour, a bit exaggerated')
print(result.output.score) # Directly access the integer without parsing
print(result.output.sentiment) # Directly access the string
If the model's response doesn't match the Review type, the framework will automatically ask the model to retry. When you access result.output, it's already a validated Python object, and your IDE will provide auto-completion for fields.
5. Giving the Agent a Tool
python
from pydantic_ai import Agent
agent = Agent('anthropic:claude-sonnet-4-6')
@agent.tool_plain
def get_weather(city: str) -> str:
"""Query the current weather of a specified city"""
return f'{city} is currently 28 degrees, sunny'
result = agent.run_sync('What is the weather like in Taipei now?')
print(result.output)
The docstring isn't just for readability – it becomes the tool description that the model sees. The function's type hints (city: str) are also converted into parameter specifications that the model understands, with parameters undergoing Pydantic validation.
Advanced Tips
Dependency injection is one of its most underestimated features. You can pass data like database connections, user identities, or API clients into the Agent and tools in a type-safe manner using RunContext:
python
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
@dataclass
class Deps:
user_id: int
db: object # Your database connection
agent = Agent('anthropic:claude-sonnet-4-6', deps_type=Deps)
@agent.tool
def get_orders(ctx: RunContext[Deps]) -> str:
return f'Querying orders for user {ctx.deps.user_id}'
When writing tests, you can replace db with a mock object, avoiding the need to touch the real database, which is crucial for writing unit tests.
Streaming: For real-time typing effects, use agent.run_stream(), which validates and outputs structured data as it's generated, significantly improving user experience.
Observability: Pydantic AI integrates well with Logfire. Once connected, you can see every model call, every tool invocation, token consumption, and execution time. This makes debugging LLM applications much easier, as you no longer have to guess why the model responded in a certain way. For a more comprehensive Agent development guide, refer to our Agent development guide.
Common Errors and Precautions
- Assuming output_type makes it 100% safe: The framework will retry validation failures, but there's a limit to retries. If the model continues to fail, it will throw an exception, which you still need to handle with try/except. Type validation reduces "dirty data" but doesn't guarantee the model will never make mistakes.
- Writing vague docstrings for tools: The model relies on docstrings to decide when to call tools. Vague descriptions can lead to incorrect or missed tool calls. Treat docstrings as instructions for the model.
- Putting too much logic in tools without handling exceptions: Errors in tool code will be returned to the model, potentially causing it to loop or waste tokens. Handle exceptions properly.
- Ignoring costs: Structured output retries and multiple tool calls can consume tokens quickly. Always monitor costs before going live.
- Using it as a large framework: Pydantic AI is intentionally lightweight. If you need complex multi-step orchestration or a suite of connectors, you might find LlamaIndex or another solution more convenient. Don't force it to be something it's not.
TheAI Academy Review
Honestly, the market is flooded with Agent frameworks, making it hard to choose. However, Pydantic AI addresses a specific pain point that every LLM developer faces: unreliable outputs. It doesn't aim to be the "strongest framework in the universe" but rather brings "type safety," a concept already valued in the Python community, into AI development cleanly. For those familiar with FastAPI and Pydantic, the learning curve is nearly nonexistent.
It won't make your model smarter, but it will make your code more reliable – and that's what truly saves you in the long run, especially when your application goes live and needs maintenance.
If you're just doing demos or casual experimentation, you might not appreciate its value deeply. However, once your project needs to go live, be used by real people, and require long-term maintenance, the importance of type safety and observability will become increasingly apparent.
References
Frequently Asked Questions
Pydantic AI 跟 LangChain 差在哪,該選哪個?
最大差別是「重量」。LangChain 是大型生態系,連接器、整合、抽象層都很多,適合需要複雜編排的大型專案,但學習曲線陡。Pydantic AI 刻意做得薄,核心只有 Agent、工具、依賴注入、結構化輸出幾個概念,主打型別安全。如果你的需求是「讓模型輸出變可靠、寫法貼近原生 Python」,Pydantic AI 上手快很多;如果你需要大量現成整合,LangChain 比較省事。兩者不衝突,看專案規模選。
一定要用 OpenAI 的模型嗎?可以接本地模型嗎?
不用。Pydantic AI 是 model-agnostic,支援 OpenAI、Anthropic、Google、Groq、Mistral、Cohere、Ollama 等十幾家,換模型通常只改建立 Agent 時那一個字串。要跑本地模型可以透過 Ollama,把模型字串指向本地服務即可,程式其他部分不用動。
結構化輸出真的能保證模型不亂回嗎?
不能保證模型本身不出錯,但能保證「不符合你定義型別的資料不會溜進系統」。當模型回的東西通不過 Pydantic 驗證,框架會自動把錯誤訊息丟回去要它重試。不過重試有次數上限,持續失敗會丟例外,所以你還是要用 try/except 處理最壞情況。它降低的是髒資料風險,不是模型的智商問題。
新手沒寫過 Pydantic,學這個會很難嗎?
如果你會基本 Python 跟型別註記(type hints),門檻不高。Pydantic 的核心就是「用 class 定義資料長什麼樣」,寫法很直覺。建議先花十分鐘看一下 Pydantic 怎麼定義 BaseModel,再回來寫 Agent,會順很多。真正的觀念門檻反而是 Agent 跟工具的設計思路,可以搭配我們的 Agent 開發指南一起學。