Cookbook: Data Extraction
Turn unstructured text (PDFs, Emails) into structured JSON data.
The Problem#
You have 1,000 invoices in PDF format and you need to put them into a SQL database. Regex is too brittle. Agents are perfect for this.
The Pattern: Schema-Enforced Tool#
The trick is to create a "No-Op" tool whose only purpose is to define the output structure.
extract-invoice.ts
import { Agent, Tool } from '@akios/sdk'
import { z } from 'zod'
// 1. Define the Schema
const InvoiceSchema = z.object({
invoice_number: z.string(),
date: z.string().describe("ISO 8601 format"),
vendor: z.string(),
line_items: z.array(z.object({
description: z.string(),
amount: z.number(),
quantity: z.number()
})),
total: z.number()
})
// 2. Create a "Save" tool
const saveInvoice = new Tool({
name: 'save_invoice',
description: 'Call this tool to save the extracted invoice data.',
schema: InvoiceSchema,
execute: async (data) => {
// Save to DB
console.log("Saving:", data)
return "Success"
}
})
// 3. The Agent
const extractor = new Agent({
name: 'Extractor',
model: 'gpt-4o', // Smart models work best for complex extraction
systemPrompt: `You are a data entry clerk.
Extract info from the text and save it using the tool.
If fields are missing, mark them as null or 0.`,
tools: [saveInvoice]
})
// 4. Run
const rawText = `
INVOICE #INV-2024-001
Date: Jan 15, 2024
From: Acme Corp
Services:
- Consulting: $500 (2 hrs)
- Hosting: $50
Total Due: $550
`
await extractor.run(rawText)Cost Optimization
For high volume, use a cheaper model like `gpt-3.5-turbo` or `mistral-small` once you have verified the prompt works reliably.