Skip to content

Structured Output from LLMs: The Complete Guide

Structured output generation has become the cornerstone of reliable LLM applications. Instead of parsing unpredictable text responses, modern developers demand consistent, type-safe data structures that integrate seamlessly with their applications.

This comprehensive guide covers everything you need to know about generating structured outputs from Large Language Models, from basic concepts to advanced implementation patterns across all major providers.

What is Structured Output?

Structured output refers to LLM responses that conform to predefined data schemas instead of returning free-form text. Think JSON objects, validated data models, or typed responses that your application can reliably process.

The Problem with Unstructured Text

# Traditional approach - fragile and unreliable
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Extract name and age from: John is 25"}]
)

# You get: "The person's name is John and they are 25 years old."
# Now what? String parsing? Regex? Hope and pray?

The Structured Output Solution

import instructor
from pydantic import BaseModel
from openai import OpenAI

class Person(BaseModel):
    name: str
    age: int

client = instructor.from_openai(OpenAI())

person = client.chat.completions.create(
    model="gpt-4",
    response_model=Person,
    messages=[{"role": "user", "content": "Extract: John is 25"}]
)

print(person.name)  # "John"
print(person.age)   # 25
print(type(person)) # <class '__main__.Person'>

Why Structured Outputs Matter

1. Type Safety and Reliability

Structured outputs eliminate the guesswork. Your IDE provides autocomplete, your tests can validate schemas, and runtime errors become compile-time catches.

2. Integration with Existing Systems

APIs expect JSON. Databases need structured data. UI components require predictable objects. Structured outputs make LLMs compatible with your entire stack.

3. Validation and Error Handling

With schemas, you can validate outputs, retry on failures, and ensure data quality automatically.

4. Performance and Latency

Many providers optimize structured output generation, leading to faster response times and lower token usage.

Core Technologies Behind Structured Outputs

Function Calling / Tool Use

The foundation of structured outputs is function calling (also called "tool use" by some providers). Instead of generating free text, the LLM "calls" a predefined function with structured parameters.

# The LLM sees this function signature
def extract_person(name: str, age: int) -> Person:
    """Extract person information from text"""
    return Person(name=name, age=age)

# And generates a structured call:
# extract_person(name="John", age=25)

JSON Schema

Under the hood, your Pydantic models are converted to JSON Schema that constrains the LLM's output format:

{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "age": {"type": "integer"}
  },
  "required": ["name", "age"]
}

Native Structured Output Support

Some providers (like OpenAI's GPT-4o) have native structured output modes that guarantee valid JSON conforming to your schema.

Provider-Specific Implementation

OpenAI

OpenAI offers multiple approaches for structured outputs:

1. Function Calling (Legacy)

import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())
client = instructor.from_openai(
    OpenAI(),
    mode=instructor.Mode.JSON
)

Key Features: - Guaranteed valid JSON - 100% schema adherence - Optimized performance - Native support in GPT-4o and GPT-4o-mini

Anthropic Claude

Anthropic uses tool calling for structured outputs:

import instructor
from anthropic import Anthropic

client = instructor.from_anthropic(Anthropic())

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    response_model=Person,
    messages=[{"role": "user", "content": "Extract: Sarah is 30"}]
)

Key Features: - Advanced reasoning capabilities - Large context windows - Strong performance on complex schemas

Google Gemini

Google's Gemini supports both REST API and SDK implementations:

import instructor
import google.generativeai as genai

client = instructor.from_gemini(
    genai.GenerativeModel("gemini-1.5-flash-latest")
)

Key Features: - Free tier with generous limits - Multimodal capabilities - Fast inference times

Other Providers

Instructor supports 20+ providers including: - Cohere: Strong multilingual support - Mistral: European alternative with good performance - Groq: Ultra-fast inference speeds - Ollama: Local model deployment - Together AI: Open source model hosting

For complete provider documentation, see our integrations guide.

Essential Patterns and Best Practices

1. Define Clear, Descriptive Models

from typing import List, Optional
from enum import Enum

class Priority(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

class Task(BaseModel):
    """A task extracted from text with priority and metadata."""
    title: str
    description: Optional[str] = None
    priority: Priority = Priority.MEDIUM
    estimated_hours: Optional[float] = None
    tags: List[str] = []

2. Use Validation for Data Quality

from pydantic import Field, validator

class Email(BaseModel):
    address: str = Field(..., description="Valid email address")
    subject: str
    confidence: float = Field(..., ge=0.0, le=1.0)

    @validator('address')
    def validate_email(cls, v):
        if '@' not in v:
            raise ValueError('Invalid email format')
        return v.lower()

3. Handle Lists and Complex Structures

class Contact(BaseModel):
    name: str
    phone: Optional[str] = None
    email: Optional[str] = None

class ExtractedContacts(BaseModel):
    """Multiple contacts extracted from a document."""
    contacts: List[Contact]
    source_confidence: float
    extraction_notes: Optional[str] = None

# Extract multiple items at once
contacts = client.chat.completions.create(
    model="gpt-4",
    response_model=ExtractedContacts,
    messages=[{
        "role": "user", 
        "content": "Extract all contacts from this business card: ..."
    }]
)

4. Implement Chain of Thought

class ReasonedAnalysis(BaseModel):
    """Analysis with explicit reasoning chain."""
    reasoning: str = Field(..., description="Step-by-step reasoning")
    conclusion: str
    confidence: float = Field(..., ge=0.0, le=1.0)

# The model will show its work
analysis = client.chat.completions.create(
    model="gpt-4",
    response_model=ReasonedAnalysis,
    messages=[{
        "role": "user",
        "content": "Analyze this financial data and explain your reasoning: ..."
    }]
)

print(f"Reasoning: {analysis.reasoning}")
print(f"Conclusion: {analysis.conclusion}")

5. Error Handling and Retries

import instructor
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10)
)
def extract_with_retry(text: str) -> Person:
    return client.chat.completions.create(
        model="gpt-4",
        response_model=Person,
        messages=[{"role": "user", "content": f"Extract: {text}"}],
        max_retries=2
    )

Advanced Use Cases

Multimodal Structured Extraction

Extract structured data from images:

class TableData(BaseModel):
    headers: List[str]
    rows: List[List[str]]
    table_title: Optional[str] = None

# Extract table from image
table = client.chat.completions.create(
    model="gpt-4-vision-preview",
    response_model=TableData,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract the table data"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }]
)

Streaming Structured Outputs

Get partial results as they're generated:

from instructor import Partial

for partial_person in client.chat.completions.create_partial(
    model="gpt-4",
    response_model=Person,
    messages=[{"role": "user", "content": "Extract: John is 25"}]
):
    print(f"Name so far: {partial_person.name}")
    print(f"Age so far: {partial_person.age}")

Parallel Processing

Extract multiple entities simultaneously:

from instructor.dsl import Parallel

class Analysis(BaseModel):
    sentiment: str
    topics: List[str]
    summary: str

parallel_analysis = client.chat.completions.create(
    model="gpt-4",
    response_model=Parallel[Analysis],
    messages=[{"role": "user", "content": "Analyze these reviews: ..."}]
)

Common Pitfalls and Solutions

1. Over-Complex Schemas

Problem: Deeply nested objects that confuse the model.

Solution: Keep schemas flat and use composition:

# Instead of deeply nested
class BadAddress(BaseModel):
    location: dict  # Avoid this

# Use clear structure
class Address(BaseModel):
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    address: Address

2. Missing Field Descriptions

Problem: Models fail because the LLM doesn't understand field purpose.

Solution: Add descriptive field documentation:

class BetterTask(BaseModel):
    title: str = Field(..., description="Brief, actionable task title")
    priority: Priority = Field(
        default=Priority.MEDIUM,
        description="Task urgency: low, medium, or high"
    )

3. Ignoring Validation Errors

Problem: Silent failures lead to poor data quality.

Solution: Implement proper error handling:

try:
    result = client.chat.completions.create(
        model="gpt-4",
        response_model=Person,
        messages=[{"role": "user", "content": text}]
    )
except instructor.ValidationError as e:
    print(f"Validation failed: {e}")
    # Handle gracefully or retry

Performance Optimization

1. Choose the Right Model

  • GPT-4o: Best balance of speed and accuracy
  • GPT-3.5-turbo: Fastest for simple schemas
  • Claude-3.5-Sonnet: Best for complex reasoning
  • Gemini-1.5-Flash: Free tier, good performance

2. Optimize Schema Design

# Efficient schema
class EfficientPerson(BaseModel):
    name: str
    age: int

# Inefficient schema (too many optional fields)
class InefficientPerson(BaseModel):
    full_name: Optional[str]
    first_name: Optional[str]
    last_name: Optional[str]
    age_in_years: Optional[int]
    birth_year: Optional[int]
    # ... 20 more optional fields

3. Use Appropriate Modes

# For guaranteed accuracy
client = instructor.from_openai(OpenAI(), mode=instructor.Mode.JSON)

# For faster responses
client = instructor.from_openai(OpenAI(), mode=instructor.Mode.TOOLS)

Migration Guide

From LangChain

# LangChain approach
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate

parser = PydanticOutputParser(pydantic_object=Person)
prompt = PromptTemplate(
    template="Extract person info: {text}\n{format_instructions}",
    input_variables=["text"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

# Instructor approach (simpler)
person = client.chat.completions.create(
    model="gpt-4",
    response_model=Person,
    messages=[{"role": "user", "content": f"Extract: {text}"}]
)

From Manual JSON Parsing

# Manual approach (fragile)
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Return JSON: ..."}]
)
try:
    data = json.loads(response.choices[0].message.content)
    person = Person(**data)  # Hope it works
except (json.JSONDecodeError, ValidationError):
    # Handle errors manually
    pass

# Instructor approach (reliable)
person = client.chat.completions.create(
    model="gpt-4",
    response_model=Person,
    messages=[{"role": "user", "content": "Extract: ..."}]
)
# Guaranteed to be a valid Person object

Future of Structured Outputs

  1. Native Provider Support: More providers implementing built-in structured output modes
  2. Multi-Modal Integration: Structured extraction from video, audio, and documents
  3. Real-time Streaming: Partial structured outputs for UI updates
  4. Agent Integration: Structured outputs as the foundation for reliable AI agents

What's Next for Instructor

  • Enhanced streaming capabilities
  • Multi-provider parallel processing
  • Advanced validation frameworks
  • Visual schema builders

Getting Started Today

1. Install Instructor

pip install instructor

2. Choose Your Provider

# OpenAI (recommended)
import instructor
from openai import OpenAI
client = instructor.from_openai(OpenAI())

# Anthropic
from anthropic import Anthropic
client = instructor.from_anthropic(Anthropic())

# Gemini (free tier)
import google.generativeai as genai
client = instructor.from_gemini(genai.GenerativeModel("gemini-1.5-flash"))

3. Define Your First Model

from pydantic import BaseModel

class ProductReview(BaseModel):
    product_name: str
    rating: int  # 1-5 stars
    review_text: str
    would_recommend: bool

4. Extract Structured Data

review = client.chat.completions.create(
    model="gpt-4",
    response_model=ProductReview,
    messages=[{
        "role": "user",
        "content": "Extract review: 'This phone is amazing! 5 stars, definitely recommend'"
    }]
)

print(review.product_name)      # "phone"
print(review.rating)           # 5
print(review.would_recommend)  # True

Conclusion

Structured outputs represent a fundamental shift in how we build with LLMs. By moving beyond text parsing to type-safe, validated data structures, we can build more reliable, maintainable, and powerful applications.

Whether you're building RAG systems, AI agents, or data extraction pipelines, structured outputs provide the foundation for production-ready LLM applications.

The future of AI development is structured, typed, and reliable. Start building with structured outputs today.

See Also

If you enjoy this content or want to try out instructor, please check out the GitHub repository and give us a star!