The Right Tool for One Job, Part I

Mic
Mar 24
8 min read

In previous posts (this one and this one), the focus was on the big LLM frameworks: the libraries that want to be the centre of gravity for your entire application. They handle orchestration, retrieval, memory, routing, evaluation, and increasingly enough abstractions to make you wonder whether you are building an AI system or accidentally founding a small software republic.

But not every problem calls for a full framework.

Sometimes you do not need a grand architecture. Sometimes you just need the model to return valid structured data. Or you want generation constrained tightly enough that it cannot wander off into formatting nonsense. Or you want a cleaner, more Pythonic way to make LLM calls without wrapping the whole thing in a chain, graph, pipeline, and three layers of terminology.

That is where this set of libraries becomes genuinely useful.

Instructor, Outlines, Mirascope, and Guidance all sit close to the model interface itself. They do not try to run your whole application. Instead, they improve the layer where you actually ask the model for something and then have to deal with whatever came back. And in practice, that layer is where many real headaches live.

If the large frameworks are trying to manage the whole factory, these libraries are better thought of as specialised tools for the workbench. They are narrower, sharper, and often much better behaved.

I think ChatGPT thinks these 4 libraries are cute little robots...

Instructor

Instructor wraps LLM calls with schema validation and retries. Its core idea is wonderfully narrow: you tell the model what shape the answer must have, and Instructor keeps trying until the response validates or you hit the retry limit. The project positions itself as a structured-output layer with type safety, validation, and automatic retries across multiple providers, including Gemini support with both JSON and tool-based modes.

The model cannot return invalid data, it has to be in the form of an instance of the User class. If it fails, Instructor retries automatically until the output validates or it runs out of attempts. Important note here is that it can just run out of tries, it does not change how a model behaves, so if a model just does not want to give you the right type of answer, it will never do so.

Nested Models and Validation

Instructor handles arbitrary Pydantic model complexity, including nested objects and field-level validation.

The nice part is that your validation logic is not bolted on as an afterthought. It is part of the contract. If the model tries to sneak through an invalid age, the validator rejects it, Instructor feeds the error back, and the model gets another chance to behave.

Streaming Partial Objects

For larger structured outputs, Instructor supports streaming — you get a partially populated model object back as soon as each field is ready.

The UI can display results progressively rather than making the user stare at a spinner for three seconds. Of course you might want to try a prompt that is a bit more challenging to really see the results pop up one by one than the one we used here.

Works with Any Provider

Instructor is not tied to a single API. The same pattern works with Anthropic, Gemini, and other supported backends, which is a large part of its appeal if your application needs to switch model vendors without rewriting the rest of the extraction layer.

Main Features

What makes Instructor especially practical is that it fits naturally into a codebase that already uses Pydantic seriously. You are not introducing a second schema language, a second validation philosophy, or a second place where business rules mysteriously go to live. You just reuse the same models and validators you already trust elsewhere.

It also supports multiple modes for some providers, including Gemini-specific approaches such as JSON-schema-style enforcement and tool-based interaction patterns. That gives you some flexibility in how strict or native you want the structured-output path to be.

In practice, Instructor often feels like a better json.loads() for LLM output, except that it is opinionated, persistent, and much less willing to accept nonsense with a straight face.

Outlines

Where Instructor adds validation on top of an existing API, Outlines does something more fundamental: it constrains generation itself. The project’s current docs describe it as guaranteeing structured outputs during generation, with support for JSON Schema, regexes, and context-free grammars, and with integrations across different model backends rather than tying itself to a single provider. One should note that at the moment of this post, there is no support for Gemini.

Constrained JSON Generation

We give to 'skeleton' example of forcing a model to only generate a specific type of JSON compatible answer. First we have an OpenAI version, which is structurally much closer to how we call the model with other libraries

And then we have a version using a local model, note that this model will be downloaded from HuggingFace, so alternatively you might want to run this on Google Colab or so.

The guarantee here is stronger than Instructor's retry loop. The model is physically incapable of generating a malformed response. No validation, no retries, no fallback logic needed.

Regex-Constrained Generation

Outlines can constrain output to match any regular expression. This is useful for generating structured codes, identifiers, or formatted values.

The model cannot output prose, hallucinated formats, or anything that does not match the pattern. For extracting dates, amounts, identifiers, or codes from unstructured text, this is extremely precise.

Note that we are also arriving at a capability boundary here. Even though Outlines claims compatibility with different models, not all models allow all answer generation types. For example at the point of writing of this post, the OpenAi model backend does not support Regex-constrained generation. So you would probably need a local model, as in the example above to run this.

Choice Constraints

Sometimes you want the model to pick from a fixed list of options and nothing else.

No parsing, no normalisation, no edge cases. Three possible outputs, all guaranteed to be from your list.

Main Features

The bigger story with Outlines is that it increasingly positions itself as a general structured-generation layer rather than merely a JSON helper. The current docs emphasise support for JSON Schema, regular expressions, and context-free grammars, plus portability across OpenAI-style APIs, local runtimes, Ollama, vLLM, and more. That combination matters: it means Outlines is as much about provider independence as it is about syntax guarantees. If Instructor is “retry until valid,” Outlines is “make invalid generations impossible in the first place.” Depending on what kind of generation limitations you are interested in, this might limit the model that can be used.

Mirascope

Mirascope sits in the same broad space as LangChain’s application layer, but it makes a stronger stylistic bet: LLM calls should look and feel like ordinary Python functions. Its docs describe the call decorator as a provider-agnostic wrapper around typed functions, and its API pages expose features such as tools, response models, output parsers, streaming, and multiple provider backends.

A Basic LLM Call

We start with the basic LLM call, which in this case looks quite a bit different from your average framework.

The function signature is the interface. There is no chain to build and no pipeline to configure. You call a Python function; Mirascope handles everything else.

Structured Output with a Response Model

Now we combine the call with a structured return model.

We add a format to the wrapper and after obtaining the response invoke its parse() method. This gives us a Recommendation instance as a result.

Streaming

Streaming works a bit different than the previous wrapper construction. At the point of writing the documentation claims that the wrapper with a stream=True parameter should return an iterable object, but it currently does not.

It is a typical example where the changes in Python versions and the differences in package versions can have a huge impact on the whole syntax and abilities.

Provider Switching

Mirascope treats providers as swappable decorators. That is one of its cleanest ideas: the prompt and the Python function stay central, while the transport layer changes around them.

Main Features

The extra features that make Mirascope more than “decorators, but nicer” are its support for tool definitions, output parsers, JSON mode, and evaluation-oriented workflows in the docs. It is one of the few libraries that really leans into the idea that prompting, parsing, validation, and even testing should all live close together in normal Python code. But it is still clearly in flux and development, so be wary of changing syntax and changing structure in the package.

Guidance

Guidance takes a very different path. Rather than treating generation as one big opaque response, it lets you interleave fixed text and constrained generation spans. The project’s README highlights named captures, regex-constrained generation, fixed-option selection, and even context-free grammar support, along with local debugging using a mock model.

An example for a use of Guidance with an OpenRouter call looks as follows.

Interleaved Generation

Guidance uses that phrase for mixing generation with normal Python control flow such as conditionals, loops, and tool logic. The official tutorial describes Guidance as letting you “interleave (constrained) generation commands with traditional python control structures,” and the repo summary similarly says it can “interleave control (conditionals, loops, tool use) and generation.”

Selecting from a Fixed Set

Using the select() method constrains the model to pick exactly one option from the list. No parsing required — the variable is already populated with the chosen value.

Generating Structured Multi-Field Output

Guidance can fill multiple named variables in a single pass, building up a structured result field by field in the order you specify. This can already be seen in the example above since both capital_raw and population_raw were both part of the same extended generated answer.

Local Example

Since some of the above mentioned features are only supported with local models, we will give a small example for that. To run it, the easiest way is to start up Google Colab or a similar system. The example is also not complete and only contains a logic for one of three cases, to keep it at least somewhat reasonable.

Main Features

What makes Guidance interesting is not only constrained generation but its grammar-like worldview. The README frames gen, select, and concatenation as buildinga constraint system powerful enough to enforce regexes and, with supported backends, full context-free grammars. It also exposes practical features such as local grammar debugging with a mock model, hidden generations, list appends, saved prompts, and log probs. In other words, Guidance is not “templating for LLMs.” It is closer to a small programming language for controlled generation. As already mentioned some of these require a local model and thus may be of no use if you are stuck with a fixed backend.

Closing Thoughts

These four libraries all live near the point where your application actually touches the model, but they solve different versions of the same problem: how do you make that interaction less fragile?

Instructor makes outputs validate cleanly. Outlines prevents invalid outputs from being generated at all. Mirascope makes model calls feel like normal Python again. Guidance gives you direct, fine-grained control over the generation process itself.

None of them wants to be your whole application stack. That is not a weakness. It is the point. But this comes with restrictions, especially for Outlines, Mirascope, and Guidance, most of their main features are restricted to local models that can be accessed and controlled much better.

In practice, these are the libraries you reach for when you already know the real problem is not orchestration or retrieval or agent state. The real problem is that the output must fit a schema, the model keeps drifting out of format, the API layer is getting messy, or the prompt itself needs firmer boundaries than “please do the right thing.”

And that is one of the clearer lessons of the current ecosystem. As the big frameworks become more ambitious, the focused tools become more valuable too. Sometimes the right tool for the job is not the one that does everything. Sometimes it is the one that very firmly refuses to do anything else.

The Right Tool for One Job, Part I

Instructor

Nested Models and Validation

Streaming Partial Objects

Works with Any Provider

Main Features

Outlines

Constrained JSON Generation

Regex-Constrained Generation

Choice Constraints

Main Features

Mirascope

A Basic LLM Call

Structured Output with a Response Model

Streaming

Main Features

Guidance

Interleaved Generation

Selecting from a Fixed Set

Generating Structured Multi-Field Output

Local Example

Main Features

Closing Thoughts

Recent Posts

Comments