Skip to content

Data Processing

From Messy JSON to Clean Data Models

Real-world data is messy. APIs return inconsistent formats, user inputs contain typos, and legacy systems produce malformed JSON. Traditional data processing involves brittle parsing, manual cleaning, and endless edge case handling.

This comprehensive guide shows you how to transform chaotic data into clean, validated data models using LLMs and structured outputs. Learn battle-tested patterns for handling inconsistent formats, missing fields, and data quality issues.

Flashcard generator with Instructor + Burr

Flashcards help break down complex topics and learn anything from biology to a new language or lines for a play. This blog will show how to use LLMs to generate flashcards and kickstart your learning!

Instructor lets us get structured outputs from LLMs reliably, and Burr helps create an LLM application that's easy to understand and debug. It comes with Burr UI, a free, open-source, and local-first tool for observability, annotations, and more!

Analyzing Youtube Transcripts with Instructor

Extracting Chapter Information

Code Snippets

As always, the code is readily available in our examples/youtube folder in our repo for your reference in the run.py file.

In this post, we'll show you how to summarise Youtube video transcripts into distinct chapters using instructor before exploring some ways you can adapt the code to different applications.

By the end of this article, you'll be able to build an application as per the video below.