Performance Optimization¶

2024/10/15
in API Development, Pydantic, Performance Optimization
2 min read

Introducing structured outputs with Cerebras Inference

What's Cerebras?

Cerebras offers the fastest inference on the market, 20x faster than on GPUs.

Sign up for a Cerebras Inference API key here at cloud.cerebras.ai.

Basic Usage

To get guaranteed structured outputs with Cerebras Inference, you

2023/11/26
in Performance Optimization, Cost Reduction, API Efficiency, Python Development
15 min read

Advanced Caching Strategies for Python LLM Applications (Validated & Tested ✅)

Instructor makes working with language models easy, but they are still computationally expensive. Smart caching strategies can reduce costs by up to 90% while dramatically improving response times.

NEW: All strategies in this guide are now validated with working examples that demonstrate real performance improvements of 200,000x+ and cost savings of $420-4,800/month.

Today, we're diving deep into optimizing instructor code while maintaining the excellent developer experience offered by Pydantic models. We'll tackle the challenges of caching Pydantic models, typically incompatible with pickle, and explore comprehensive solutions using decorators like functools.cache. Then, we'll craft production-ready custom decorators with diskcache and redis to support persistent caching, distributed systems, and high-throughput applications.