The Right Tool for One Job, Part II

Mic
Mar 26
14 min read

In the previous post, we mostly focused on output schemas of models.

For this post we agains look at tools that go after a similarly narrow target. In this case we wll look at the implementatin of agents. They pick one layer of the stack, build for that layer very deliberately, and avoid pretending to be the universal answer to all AI application design. At the agent layer, four libraries stand out for very different reasons: Smolagents, PydanticAI, CrewAI, and Semantic Kernel.

All four can build agentic systems, but they are not interchangeable.

Smolagents is the minimal, code-first option. It is small, direct, and happiest when you want an agent that uses tools without dragging in a large orchestration framework.
PydanticAI is what happens when the Pydantic people build an agent framework: types, validation, dependencies, and structured outputs are not bolted on later; they are the architecture.
CrewAI treats agents less like nodes in a graph and more like coworkers on a team. Roles, goals, delegation, and task handoffs are the main abstractions.
Semantic Kernel comes from a different direction again. It is less “Python-first agent toybox” and more “enterprise SDK for adding AI capabilities to existing software.” Its centre of gravity is the kernel, the plugin system, and function calling.

So if the question is “which one should I use?”, the real answer is: What kind of agent system are you looking for?

As usual ChatGPT thinks that my post is about cute little robots

Smolagents

Hugging Face’s smolagents is what you reach for when you want a tool-using agent and do not want five layers of abstractions standing between you and the code.

Its design is unusually direct. A `CodeAgent` writes Python snippets to call tools and solve the task, while a `ToolCallingAgent` sticks to structured tool calls. In practice, the interesting part is that the library stays small while still supporting custom tools, managed sub-agents, local models, and MCP integrations.

A minimal agent with a built-in search tool

For this we set up a very basic web search tool for our system

This is the main appeal of Smolagents: the distance between “I have an idea” and “the agent is running” is extremely short. Here we use a common model from Hugging Face and use the Hugging Face token existing as an environment variable. We also make sure that the result is streamed. Depending on when you run this, it will retrieve a longer answer first and then decide how to summarise it in a short two sentence version.

We are leaving out any type of formatting in this case.

Turning an ordinary Python function into a tool

In addition to already existing tools, we can also upgrade ordinary Python functions to tools. As an example we choose two simple functions that count words or specific characters.

The @tool decorator is the sweet spot for simple tools. You keep plain Python, add type hints and a useful docstring, and the agent gets enough structure to decide when to call it. This of course implies that you have to write docstrings that are exact and useful for the model to understand what the functions do.

A hierarchical multi-agent setup

In most cases we will want to organise agents in a hierarchy and have agents that are able to activate and run other agents.

This is the point where Smolagents stops being “small demo library” and becomes genuinely useful. You can keep one agent focused on one job, then give a manager agent the option to delegate. Of course in our example this is a very simple construct, but can can get very useful the moment agents with widely different tools appear.

When Smolagents fits best

Use Smolagents when you want the shortest path from Python functions to a working agent. It is especially appealing when your mental model is not “workflow graph” but simply “give the model a few tools and let it solve the problem.” The fact that is fairly straight forward and simple in its implementation is its greatest strength.

PydanticAI

PydanticAI is an agent framework built around the assumption that validation, typed outputs, and dependency injection should be first-class concerns.

That sounds dry and a bit overdone .... until you start building real systems. Then it becomes very attractive.

The core design is simple: define an Agent, specify dependencies if needed, specify an output_type if you want structured output, add tools, and run it. Because it is grounded in Pydantic-style typing, the framework feels much closer to ordinary application code than many agent libraries do.

A basic agent

We start again with a simple definition of an agent

Here we use Google Gemini, a corresponding GOOGLE_API_KEY has to be set as usual. Additionally one has to be a bit careful with syntax changes, for example in this basic example is the final value lives on result.output, not result.data as it used to in previous versions.

Structured output with output_type

As already mentioned, a strong point of PydanticAI is of course structured outputs. Which is not really a surprise.

This is where PydanticAI starts to feel very different from a plain wrapper around an LLM API.

The CodeReview model defines the exact shape of the response you want back: a summary, a list of issues, a list of suggestions, and a numeric score. By passing that model as output_type, you are telling the agent that free-form prose is not good enough. The run only counts as successful if the response can be turned into that structured type. PydanticAI’s output docs explicitly describe output_type as the way you specify the agent’s expected result type, including Pydantic models and other typed structures.

At runtime, the framework asks the model to produce an answer that matches that schema, then validates the returned data. If everything works, result.output is not a blob of text that you still need to parse. It is an actual CodeReview object, which means result.output.score is an integer and result.output.issues is already a Python list. That removes a surprising amount of downstream mess.

This matters in practice because structured outputs shift the burden of correctness earlier in the pipeline. Instead of getting “something kind of JSON-ish” and hoping your parser survives, you define the contract once and let the framework enforce it.

As usual this is of course not a guaranty that the model will return information that can match the schema, especially not if it not defined well.

Tools with dependency injection

We now add two additional options into an agent: tools and dependencies.

The tool itself is the function decorated with @agent.tool. PydanticAI’s tools documentation describes function tools as a mechanism for models to perform actions or retrieve extra information that helps them answer. In this case, the tool is not querying a web API or a database; it is reading from a provided exchange-rate mapping. But the pattern is the same as it would be for a real service.

The more interesting piece here is the line deps_type=AppContext as well as the argument RunContext[AppContext]. That is PydanticAI’s dependency injection model. Instead of hiding external state in globals, closures, or singleton objects, you define the dependency shape explicitly and pass an instance of it at run time. Inside the tool, ctx.deps gives you access to that run-specific state.

So what happens when the run executes? The agent receives the user’s question, recognises that currency conversion is needed, calls the convert_currency tool with arguments inferred from the prompt, and passes along the injected context. The tool computes the conversion using the provided rates, returns the result, and the model can then fold that tool result into its final answer. The big architectural win is that the tool code stays testable and deterministic, while the model handles the language layer. Function tools are specifically designed for this kind of action-and-information handoff.

This is one of the strongest reasons to like PydanticAI: it treats “agent code” more like regular application code than many agent libraries do.

Continuing a conversation with message history

This example shows how PydanticAI handles multi-turn conversations without hiding the mechanism from you.

The first run is just a normal call. But after that run completes, the result object contains the messages generated during that run. The command new_messages() returns the messages produced in the current run only, excluding any earlier history that may already have been supplied.

That is why the second run passes message_history=result1.new_messages(). You are explicitly telling the agent, “continue from what just happened.” The model then sees the earlier exchange as context, so when the user asks, “What language did I just mention?”, the answer can refer back to Python. This approach is much cleaner than manually concatenating previous turns into one giant prompt string. It also gives you precise control over what conversational state you keep and what you discard.

Architecturally, this is a nice middle ground. PydanticAI preserves conversational context, but it does not force you into an opaque chat-session object that quietly accumulates state in the background. You can see the history, pass it forward, trim it, store it, or replace it.

When PydanticAI fits best

Choose PydanticAI when you want an agent framework that behaves like a serious Python library rather than a bag of prompt tricks.

It is especially strong when your application needs typed outputs, explicit dependencies, predictable tool wiring, and conversation state that you can control directly. In other words, it is a very good fit when the agent is not just a demo, but a component that has to live inside real software. The official docs consistently frame agents, typed outputs, tools, and message history as first-class parts of the framework, which is exactly why it feels unusually production-minded.

CrewAI

CrewAI approaches agent systems as coordinated specialist work.

That is the central idea behind the framework. Instead of treating an agent workflow mainly as a graph of nodes and edges, CrewAI encourages you to think in terms of roles, goals, tasks, and processes. It describe an Agent as an autonomous unit that can perform tasks, use tools, collaborate, and make decisions based on its assigned role and goal, while a Task is the unit of work assigned to an agent.

For the examples below, we will use Google Gemini and assume that your API key is in an environment variable as usual.

Using a single agent directly with kickoff()

We start with the simplest useful CrewAI pattern: create an agent and run it directly.

What happens here is that the Agent object bundles together an identity and a working style. The role defines what kind of specialist this is supposed to be, the goal defines what success looks like, and the backstory gives the model behavioural context that nudges its tone and priorities. The model is supplied through an LLM(...) object, which is the standard CrewAI configuration layer for provider, model, temperature, and authentication settings.

Then kickoff() executes the agent directly on the user’s request. This matters because it means you can use CrewAI’s role-based agent abstraction even when you do not yet need a full multi-agent system. The output is returned as a structured CrewAI result object, and .raw gives you the plain text answer.

So the mental model here is: define a specialised worker once, then reuse it whenever that kind of task appears. This is the lightest-weight way to use CrewAI without giving up its core abstractions.

A basic two-agent crew

This is where CrewAI starts to look distinct from a simple model wrapper.

The two Task objects define separate units of work and assign them to different agents. The researcher does not write the final prose, and the writer does not invent facts from scratch. Instead, the framework allows one task’s output to feed into another through the context parameter. In sequential mode, CrewAI runs tasks in order, so the writer receives the research task’s output as usable context.

That is the key architectural shift: instead of one long prompt doing everything badly, you divide the work into specialist stages. The researcher produces an intermediate artefact, and the writer transforms it into the final form. This is exactly the sort of workflow where CrewAI feels natural. This is of course especially useful when your tasks would need different 'behaviour' types.

We also enabled the verbose flags on both agents and the crew, so you will get a bit of extra information about the workflow.

Adding custom tools with BaseTool

Tools are where CrewAI stops being just “agents with roles” and a gimmick and becomes operationally useful.

CrewAI describes tools as capabilities agents can use while completing tasks, and supports custom tools either via decorators or by subclassing BaseTool. The BaseTool approach is more explicit: you define a name, a description, and a _run method. The description matters because the agent uses it to decide when the tool is relevant. The tools are simply added to the agent.

At runtime, the agent reads the task, decides that deterministic computation would help, and calls the tools as needed. That means the model does not need to estimate word count or readability from intuition alone. Instead, it can delegate the measurable part of the job to ordinary Python code and then use the tool results when composing the final answer. This division of labour is one of the strongest patterns in agent design: use the model for judgment and language, and use code for exact operations.

Hierarchical execution with a manager model

Hierarchical mode is where CrewAI becomes most recognisably CrewAI.

It distinguishes sequential and hierarchical processes. We have already seen sequential mode above. In a hierarchical process, you must supply either manager_llm or manager_agent. The manager layer oversees planning, delegation, and validation, rather than simply letting tasks run in a fixed order with no supervisory logic.

That changes the execution model in an important way. In sequential mode, you already know the order of work ahead of time. In hierarchical mode, you are adding a coordinating intelligence that decides how work should be allocated and checked. This is a better fit for more open-ended or ambiguous jobs, where the path from question to answer is not fully obvious in advance.

When CrewAI fits best

Choose CrewAI when the problem naturally looks like coordinated specialist work.

It is especially strong when you want to define distinct roles, attach tools to the right agents, split the work into explicit tasks, and pick a process model that matches the difficulty of the job. CrewAI’s frames its architecture around agents, tasks, crews, and processes. This makes it rather intuitive in use.

Semantic Kernel

Semantic Kernel is Microsoft’s model-agnostic SDK for building AI features that sit inside real software systems.

A lot of Python LLM libraries feel like they were designed mainly for greenfield AI experiments: start a workflow, define a chain, maybe add a tool, and see what happens. Semantic Kernel feels different. Its centre of gravity is the kernel itself: a runtime object that holds AI services, plugins, prompt functions, and execution settings. The official repo describes Semantic Kernel as a model-agnostic SDK for building and orchestrating AI agents and multi-agent systems, and the docs consistently treat plugins and function calling as the main extension mechanism.

As most of the time we are using it with Google Gemini, with the API key in the environmental variable `GOOGLE_API_KEY`.

A basic kernel with chat completion

What happens in this example is the core Semantic Kernel pattern in its simplest form.

First, you create a Kernel. That kernel is not just a convenience wrapper around one model call. It is the container for the AI runtime: services, plugins, prompt functions, and related settings all live there. This is why Semantic Kernel tends to fit enterprise-style application design better than many lighter agent libraries. The kernel is meant to be part of your application architecture, not just a helper object you throw away after one response.

Next, the Gemini model is attached through GoogleAIChatCompletion. This is quite different from the other libraries above.

Then a ChatHistory object is created and given a user message. This is important because Semantic Kernel treats a chat conversation as a structured history object rather than as one manually concatenated string. This object in itself has further abilities.

Finally, get_chat_message_content(...) sends the history plus the execution settings to the model and returns its answer. This is the essential runtime shape that Semantic Kernel builds on.

Prompt functions with KernelFunctionFromPrompt

This example shows the use of prompt-functions.

A KernelFunctionFromPrompt is exactly what the name suggests: a callable kernel function created from a prompt template rather than from native Python code.

Why does this matter? Because Semantic Kernel does not treat prompts as anonymous one-off strings. It turns them into named, reusable functions. That means your summariser and translator are not just two snippets of prompt text floating around your codebase; they are explicit callable units that can be invoked, stored, composed, or exposed to other parts of the system. This is one of the cleaner aspects of the SDK. Prompt engineering is not hidden, but it is given structure.

The plugin_name="WritingPlugin" part is also doing conceptual work. In Semantic Kernel, a plugin is a group of functions that can be exposed to AI applications and orchestrated together.

Then kernel.invoke(...) runs each prompt function. In the first call, the model generates a one-sentence summary. In the second, the translation function consumes that result. Notice what Semantic Kernel is buying you here: prompt-based transformations become addressable units with names and semantics, not just manual API calls chained together by hand.

Native Python functions exposed as a plugin

In addition to the prompt functions, we can also add code as plugins.

Instead of turning prompts into functions, you turn ordinary Python methods into kernel functions using @kernel_function. For this, descriptions and annotations are not decorative, they are part of the contract between your code and the model.

The MathPlugin class groups two related capabilities: add and percentage. Once you register it with kernel.add_plugin(...), those functions become part of the kernel’s available capability set. In this example we invoke them directly with kernel.invoke(...), which is useful because it shows that plugins are not only for AI-selected function calling. They are also a structured way to organise application logic that can be called explicitly when needed.

The broader architectural point is this: Semantic Kernel does not force a hard split between “AI functions” and “regular code.” It gives both a common abstraction. Prompt functions and native functions can live in the same plugin-oriented system. That is a large part of why the framework feels suited to real applications rather than demos.

Automatic function calls

In the previous example we explicitly used the plugin functions. Of course you can have the kernel decide when to use functions. For this we simply have to add the following line to our first example just after defining the settings

settings.function_choice_behavior = FunctionChoiceBehavior.Auto()

This lets the kernel decide on its own what functions to use that were added to it as plugins.

When Semantic Kernel fits best

Choose Semantic Kernel when AI is not a standalone experiment but a capability you want to embed into a larger software system.

It is especially strong when you want a central runtime object that manages model services, prompt-based functions, native code plugins, and function-calling behaviour in a consistent way. That makes it a good fit for applications where AI needs to interact with existing business logic, internal services, or reusable application components rather than living in an isolated notebook or script.

In practice, Semantic Kernel works best when structure matters: explicit service registration, well-defined plugins, reusable functions, and orchestration that can sit comfortably inside an established architecture. It feels less like a lightweight prototyping tool and more like infrastructure for production software.

Final Thoughts

After going through these four libraries, the main thing that jumps out is that they are all trying to solve the “agent” problem, but they clearly do not mean the same thing by it.

Smolagents feels like the one it just wants to get on with it. Give the model some tools, let it work, keep the abstractions light, and do not build a cathedral around the whole thing. There is something very pleasant about that. It does not try to turn every agent into a grand architectural statement.

PydanticAI comes at the same space from almost the opposite direction. It feels like it was built by people who have already been burned by messy LLM code and decided they would prefer types, validation, and explicit dependencies. If Smolagents is the fast, scrappy option, PydanticAI is the one that says: yes, but could this still look like real software by the time we are done?

CrewAI is the most willing to lean into the whole “team of specialists” idea. Sometimes that sounds a bit theatrical on paper, but in practice it can be a very natural way to structure a problem. Researcher does research, writer writes, manager manages. If your workflow already sounds like that in your head, CrewAI probably feels much more intuitive than trying to force everything into a graph or a giant prompt.

And then there is Semantic Kernel, which feels like it grew up in a different neighbourhood from the others. It is less about spinning up an agent quickly and more about plugging AI into a larger software system without everything turning into spaghetti. Plugins, services, functions, orchestration — it is the most “this needs to live inside an actual application” of the bunch.

So the big takeaway is not that one of these is secretly the winner and the others are there for decoration. It is that they are useful in different ways.

If you want something lean and tool-first, Smolagents makes sense. If you want typed outputs and cleaner engineering discipline, PydanticAI starts looking very attractive. If the problem really is a bunch of specialist roles handing work to one another, CrewAI fits that shape nicely. And if you are wiring AI into a bigger system and want structure from the start, Semantic Kernel is playing a different, more architectural game.