Battle-tested prompt patterns from running LLM features in production: structured output, chain-of-thought, and graceful failure handling.

On this page

Prompt Engineering Patterns That Actually Work in Production

After running LLM-powered features for 8 months in production, these are the patterns that survived contact with real users and messy data.

Pattern 1: Structured Output with Schema Enforcement #

Asking an LLM to "return JSON" works 90% of the time. The other 10% crashes your parser at 2 AM.

What we do:

python.python

import json
from pydantic import BaseModel

class ExtractedEntity(BaseModel):
    name: str
    category: str
    confidence: float

SYSTEM_PROMPT = """Extract entities from the text.
Return ONLY valid JSON matching this schema:
{"name": string, "category": string, "confidence": number 0-1}
Return an array. No explanation, no markdown fences."""

def extract_entities(text: str) -> list[ExtractedEntity]:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text},
        ],
        temperature=0.1,
    )
    raw = response.choices[0].message.content.strip()
    # Strip markdown fences if the model adds them anyway
    if raw.startswith("```"):
        raw = raw.split("\n", 1)[1].rsplit("```", 1)[0]
    data = json.loads(raw)
    return [ExtractedEntity(**item) for item in data]

Why it works: Low temperature, explicit schema in the prompt, and a defensive parser that handles the most common failure mode (markdown fences).

Pattern 2: Chain-of-Thought for Complex Decisions #

For classification tasks with nuance, asking the model to think step-by-step improved accuracy from 78% to 91%.

code

Classify this support ticket. Think step by step:
1. What product area does this relate to?
2. Is this a bug report, feature request, or question?
3. What is the urgency (low/medium/high)?

Then return your answer as JSON: {"area": ..., "type": ..., "urgency": ...}

Key insight: The reasoning steps aren't just for the model—they're also audit trails when a human reviews the classification.

Pattern 3: Graceful Degradation #

LLM calls fail. Rate limits hit. Latency spikes. Your feature needs a fallback.

python.python

async def summarize_with_fallback(text: str) -> str:
    try:
        result = await call_llm(text, timeout=5.0)
        return result
    except (TimeoutError, RateLimitError):
        # Fallback: first 200 chars + ellipsis
        return text[:200].rsplit(" ", 1)[0] + "..."
    except json.JSONDecodeError:
        logger.warning("LLM returned unparseable response")
        return "Summary unavailable"

Best practice: Every LLM call should have a timeout, a retry budget, and a non-LLM fallback.

Pattern 4: Few-Shot Examples Over Long Instructions #

Instead of a 500-word system prompt explaining the format, give 2-3 examples:

code

Convert the user message to a database query.

Example: "orders from last week" -> SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
Example: "top customers by revenue" -> SELECT customer_id, SUM(amount) as revenue FROM orders GROUP BY customer_id ORDER BY revenue DESC LIMIT 10

User: {user_message}

This is more reliable than describing the syntax rules in prose.

Production Checklist #

Set temperature to 0.0-0.2 for deterministic tasks
Always validate and parse LLM output before using it
Log raw prompts and responses for debugging (redact PII)
Set timeouts (3-10s) and retry with backoff
Have a non-LLM fallback for every LLM feature
Monitor latency p99 and parse failure rate

The models are impressive, but production reliability comes from everything around the model call.

Prompt Engineering Patterns That Actually Work in Production

Prompt Engineering Patterns That Actually Work in Production

Pattern 1: Structured Output with Schema Enforcement #

Pattern 2: Chain-of-Thought for Complex Decisions #

Pattern 3: Graceful Degradation #

Pattern 4: Few-Shot Examples Over Long Instructions #

Production Checklist #

Stay Updated

AWS Cost Audit: 7 Things We Found Wasting Money Every Month

Linux Performance Troubleshooting: A Real Incident Walkthrough

More from AI

Self-Hosted LLMs vs OpenAI API: A Cost-vs-Latency Analysis After 6 Months

Embedding Quality in RAG: How We Cut Hallucinations by 60%

Model Fallback Policies for Customer-Facing AI: The Routing Rules That Kept SLA Intact

Self-Hosted LLMs vs OpenAI API: A Cost-vs-Latency Analysis After 6 Months

Embedding Quality in RAG: How We Cut Hallucinations by 60%

Model Fallback Policies for Customer-Facing AI: The Routing Rules That Kept SLA Intact

Embedding Model Upgrades Without Search Chaos: A Safer RAG Rollout Pattern

Prompt Versioning and Regression Testing: How Teams Avoid Silent AI Regressions

RAG Retrieval Quality Evaluation: The Checks We Added After Bad Answers Reached Production

About Kiril urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

Best Practices: Cloud Disaster Recovery Runbook Design

Linux Performance Tuning for Containers and Kubernetes Nodes

Prompt Engineering Patterns That Actually Work in Production

Pattern 1: Structured Output with Schema Enforcement#

Pattern 2: Chain-of-Thought for Complex Decisions#

Pattern 3: Graceful Degradation#

Pattern 4: Few-Shot Examples Over Long Instructions#

Production Checklist#

Stay Updated

AWS Cost Audit: 7 Things We Found Wasting Money Every Month

Linux Performance Troubleshooting: A Real Incident Walkthrough

More from AI

Self-Hosted LLMs vs OpenAI API: A Cost-vs-Latency Analysis After 6 Months

Embedding Quality in RAG: How We Cut Hallucinations by 60%

Model Fallback Policies for Customer-Facing AI: The Routing Rules That Kept SLA Intact

About Kiril urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

Best Practices: Cloud Disaster Recovery Runbook Design

Linux Performance Tuning for Containers and Kubernetes Nodes

Pattern 1: Structured Output with Schema Enforcement #

Pattern 2: Chain-of-Thought for Complex Decisions #

Pattern 3: Graceful Degradation #

Pattern 4: Few-Shot Examples Over Long Instructions #

Production Checklist #