Structured Output from LLMs: The Complete Guide¶

Structured output generation has become the cornerstone of reliable LLM applications. Instead of parsing unpredictable text responses, modern developers demand consistent, type-safe data structures that integrate seamlessly with their applications.

This comprehensive guide covers everything you need to know about generating structured outputs from Large Language Models, from basic concepts to advanced implementation patterns across all major providers.

What is Structured Output?¶

Structured output refers to LLM responses that conform to predefined data schemas instead of returning free-form text. Think JSON objects, validated data models, or typed responses that your application can reliably process.

The Problem with Unstructured Text¶

# Traditional approach - fragile and unreliable
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Extract name and age from: John is 25"}]
)

# You get: "The person's name is John and they are 25 years old."
# Now what? String parsing? Regex? Hope and pray?

The Structured Output Solution¶

import instructor
from pydantic import BaseModel
from openai import OpenAI

class Person(BaseModel):
    name: str
    age: int

client = instructor.from_openai(OpenAI())

person = client.chat.completions.create(
    model="gpt-4",
    response_model=Person,
    messages=[{"role": "user", "content": "Extract: John is 25"}]
)

print(person.name)  # "John"
print(person.age)   # 25
print(type(person)) # <class '__main__.Person'>

Why Structured Outputs Matter¶

1. Type Safety and Reliability¶

Structured outputs eliminate the guesswork. Your IDE provides autocomplete, your tests can validate schemas, and runtime errors become compile-time catches.

2. Integration with Existing Systems¶

APIs expect JSON. Databases need structured data. UI components require predictable objects. Structured outputs make LLMs compatible with your entire stack.

3. Validation and Error Handling¶

With schemas, you can validate outputs, retry on failures, and ensure data quality automatically.

4. Performance and Latency¶

Many providers optimize structured output generation, leading to faster response times and lower token usage.

Core Technologies Behind Structured Outputs¶

Function Calling / Tool Use¶

The foundation of structured outputs is function calling (also called "tool use" by some providers). Instead of generating free text, the LLM "calls" a predefined function with structured parameters.

# The LLM sees this function signature
def extract_person(name: str, age: int) -> Person:
    """Extract person information from text"""
    return Person(name=name, age=age)

# And generates a structured call:
# extract_person(name="John", age=25)

JSON Schema¶

Under the hood, your Pydantic models are converted to JSON Schema that constrains the LLM's output format:

{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "age": {"type": "integer"}
  },
  "required": ["name", "age"]
}

Native Structured Output Support¶

Some providers (like OpenAI's GPT-4o) have native structured output modes that guarantee valid JSON conforming to your schema.

Provider-Specific Implementation¶

OpenAI¶

OpenAI offers multiple approaches for structured outputs:

1. Function Calling (Legacy)¶

import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

2. Native Structured Outputs (Recommended)¶

client = instructor.from_openai(
    OpenAI(),
    mode=instructor.Mode.JSON
)

Key Features: - Guaranteed valid JSON - 100% schema adherence - Optimized performance - Native support in GPT-4o and GPT-4o-mini

Anthropic Claude¶

Anthropic uses tool calling for structured outputs:

import instructor
from anthropic import Anthropic

client = instructor.from_anthropic(Anthropic())

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    response_model=Person,
    messages=[{"role": "user", "content": "Extract: Sarah is 30"}]
)

Key Features: - Advanced reasoning capabilities - Large context windows - Strong performance on complex schemas

Google Gemini¶

Google's Gemini supports both REST API and SDK implementations:

import instructor
import google.generativeai as genai

client = instructor.from_gemini(
    genai.GenerativeModel("gemini-1.5-flash-latest")
)

Key Features: - Free tier with generous limits - Multimodal capabilities - Fast inference times

Other Providers¶

Instructor supports 20+ providers including: - Cohere: Strong multilingual support - Mistral: European alternative with good performance - Groq: Ultra-fast inference speeds - Ollama: Local model deployment - Together AI: Open source model hosting

For complete provider documentation, see our integrations guide.

Essential Patterns and Best Practices¶

1. Define Clear, Descriptive Models¶

from typing import List, Optional
from enum import Enum

class Priority(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

class Task(BaseModel):
    """A task extracted from text with priority and metadata."""
    title: str
    description: Optional[str] = None
    priority: Priority = Priority.MEDIUM
    estimated_hours: Optional[float] = None
    tags: List[str] = []

2. Use Validation for Data Quality¶

from pydantic import Field, validator

class Email(BaseModel):
    address: str = Field(..., description="Valid email address")
    subject: str
    confidence: float = Field(..., ge=0.0, le=1.0)

    @validator('address')
    def validate_email(cls, v):
        if '@' not in v:
            raise ValueError('Invalid email format')
        return v.lower()

3. Handle Lists and Complex Structures¶

class Contact(BaseModel):
    name: str
    phone: Optional[str] = None
    email: Optional[str] = None

class ExtractedContacts(BaseModel):
    """Multiple contacts extracted from a document."""
    contacts: List[Contact]
    source_confidence: float
    extraction_notes: Optional[str] = None

# Extract multiple items at once
contacts = client.chat.completions.create(
    model="gpt-4",
    response_model=ExtractedContacts,
    messages=[{
        "role": "user", 
        "content": "Extract all contacts from this business card: ..."
    }]
)

4. Implement Chain of Thought¶

class ReasonedAnalysis(BaseModel):
    """Analysis with explicit reasoning chain."""
    reasoning: str = Field(..., description="Step-by-step reasoning")
    conclusion: str
    confidence: float = Field(..., ge=0.0, le=1.0)

# The model will show its work
analysis = client.chat.completions.create(
    model="gpt-4",
    response_model=ReasonedAnalysis,
    messages=[{
        "role": "user",
        "content": "Analyze this financial data and explain your reasoning: ..."
    }]
)

print(f"Reasoning: {analysis.reasoning}")
print(f"Conclusion: {analysis.conclusion}")

5. Error Handling and Retries¶

import instructor
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10)
)
def extract_with_retry(text: str) -> Person:
    return client.chat.completions.create(
        model="gpt-4",
        response_model=Person,
        messages=[{"role": "user", "content": f"Extract: {text}"}],
        max_retries=2
    )

Advanced Use Cases¶

Multimodal Structured Extraction¶

Extract structured data from images:

class TableData(BaseModel):
    headers: List[str]
    rows: List[List[str]]
    table_title: Optional[str] = None

# Extract table from image
table = client.chat.completions.create(
    model="gpt-4-vision-preview",
    response_model=TableData,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract the table data"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }]
)

Streaming Structured Outputs¶

Get partial results as they're generated:

from instructor import Partial

for partial_person in client.chat.completions.create_partial(
    model="gpt-4",
    response_model=Person,
    messages=[{"role": "user", "content": "Extract: John is 25"}]
):
    print(f"Name so far: {partial_person.name}")
    print(f"Age so far: {partial_person.age}")

Parallel Processing¶

Extract multiple entities simultaneously:

from instructor.dsl import Parallel

class Analysis(BaseModel):
    sentiment: str
    topics: List[str]
    summary: str

parallel_analysis = client.chat.completions.create(
    model="gpt-4",
    response_model=Parallel[Analysis],
    messages=[{"role": "user", "content": "Analyze these reviews: ..."}]
)

Common Pitfalls and Solutions¶

1. Over-Complex Schemas¶

Problem: Deeply nested objects that confuse the model.

Solution: Keep schemas flat and use composition:

# Instead of deeply nested
class BadAddress(BaseModel):
    location: dict  # Avoid this

# Use clear structure
class Address(BaseModel):
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    address: Address

2. Missing Field Descriptions¶

Problem: Models fail because the LLM doesn't understand field purpose.

Solution: Add descriptive field documentation:

class BetterTask(BaseModel):
    title: str = Field(..., description="Brief, actionable task title")
    priority: Priority = Field(
        default=Priority.MEDIUM,
        description="Task urgency: low, medium, or high"
    )

3. Ignoring Validation Errors¶

Problem: Silent failures lead to poor data quality.

Solution: Implement proper error handling:

try:
    result = client.chat.completions.create(
        model="gpt-4",
        response_model=Person,
        messages=[{"role": "user", "content": text}]
    )
except instructor.ValidationError as e:
    print(f"Validation failed: {e}")
    # Handle gracefully or retry

Performance Optimization¶

1. Choose the Right Model¶

GPT-4o: Best balance of speed and accuracy
GPT-3.5-turbo: Fastest for simple schemas
Claude-3.5-Sonnet: Best for complex reasoning
Gemini-1.5-Flash: Free tier, good performance

2. Optimize Schema Design¶

# Efficient schema
class EfficientPerson(BaseModel):
    name: str
    age: int

# Inefficient schema (too many optional fields)
class InefficientPerson(BaseModel):
    full_name: Optional[str]
    first_name: Optional[str]
    last_name: Optional[str]
    age_in_years: Optional[int]
    birth_year: Optional[int]
    # ... 20 more optional fields

3. Use Appropriate Modes¶

# For guaranteed accuracy
client = instructor.from_openai(OpenAI(), mode=instructor.Mode.JSON)

# For faster responses
client = instructor.from_openai(OpenAI(), mode=instructor.Mode.TOOLS)

Migration Guide¶

From LangChain¶

# LangChain approach
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate

parser = PydanticOutputParser(pydantic_object=Person)
prompt = PromptTemplate(
    template="Extract person info: {text}\n{format_instructions}",
    input_variables=["text"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

# Instructor approach (simpler)
person = client.chat.completions.create(
    model="gpt-4",
    response_model=Person,
    messages=[{"role": "user", "content": f"Extract: {text}"}]
)

From Manual JSON Parsing¶

# Manual approach (fragile)
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Return JSON: ..."}]
)
try:
    data = json.loads(response.choices[0].message.content)
    person = Person(**data)  # Hope it works
except (json.JSONDecodeError, ValidationError):
    # Handle errors manually
    pass

# Instructor approach (reliable)
person = client.chat.completions.create(
    model="gpt-4",
    response_model=Person,
    messages=[{"role": "user", "content": "Extract: ..."}]
)
# Guaranteed to be a valid Person object

Future of Structured Outputs¶

Emerging Trends¶

Native Provider Support: More providers implementing built-in structured output modes
Multi-Modal Integration: Structured extraction from video, audio, and documents
Real-time Streaming: Partial structured outputs for UI updates
Agent Integration: Structured outputs as the foundation for reliable AI agents

What's Next for Instructor¶

Enhanced streaming capabilities
Multi-provider parallel processing
Advanced validation frameworks
Visual schema builders

Getting Started Today¶

1. Install Instructor¶

pip install instructor

2. Choose Your Provider¶

# OpenAI (recommended)
import instructor
from openai import OpenAI
client = instructor.from_openai(OpenAI())

# Anthropic
from anthropic import Anthropic
client = instructor.from_anthropic(Anthropic())

# Gemini (free tier)
import google.generativeai as genai
client = instructor.from_gemini(genai.GenerativeModel("gemini-1.5-flash"))

3. Define Your First Model¶

from pydantic import BaseModel

class ProductReview(BaseModel):
    product_name: str
    rating: int  # 1-5 stars
    review_text: str
    would_recommend: bool

4. Extract Structured Data¶

review = client.chat.completions.create(
    model="gpt-4",
    response_model=ProductReview,
    messages=[{
        "role": "user",
        "content": "Extract review: 'This phone is amazing! 5 stars, definitely recommend'"
    }]
)

print(review.product_name)      # "phone"
print(review.rating)           # 5
print(review.would_recommend)  # True

Conclusion¶

Structured outputs represent a fundamental shift in how we build with LLMs. By moving beyond text parsing to type-safe, validated data structures, we can build more reliable, maintainable, and powerful applications.

Whether you're building RAG systems, AI agents, or data extraction pipelines, structured outputs provide the foundation for production-ready LLM applications.

The future of AI development is structured, typed, and reliable. Start building with structured outputs today.

Models and Response Models - Deep dive into Pydantic model design
Validation and Error Handling - Comprehensive validation strategies
Provider Integrations - All supported LLM providers
Streaming Support - Real-time structured outputs