✅ Link copied to clipboard!
Updated March 2025

Prompt Engineering
Cheat Sheet

The complete reference for crafting effective prompts across GPT-4o, Claude, and Gemini. 40+ techniques with examples. Bookmark this page.

🎯 Role Prompting 🔗 Chain-of-Thought 🌳 Tree of Thought 📐 Structured Output 🚫 Anti-Patterns 🌡️ Temperature Guide
🎯

Basic Techniques

Foundation patterns every prompt engineer should know

Role Prompting

Essential

Assign a persona or expertise to the model to shape tone, depth, and domain focus. The model adapts vocabulary, reasoning style, and assumptions to match.

You are a senior backend engineer with 15 years of experience in distributed systems. Review this architecture diagram and identify potential failure modes.

When to use: Whenever you need domain-specific expertise, consistent tone, or a specific perspective. Works especially well for code review, writing, and analysis.

Few-Shot Prompting

Essential

Provide 2–5 input→output examples before your actual request. The model learns the pattern and applies it consistently. More reliable than describing the format in words.

Classify the sentiment:

Text: "Love this product!" → Positive
Text: "Worst purchase ever" → Negative
Text: "It works I guess" → Neutral

Text: "Shipping was fast but the item broke" →

When to use: Classification, formatting, style matching, data transformation. Essential when zero-shot results are inconsistent.

Chain-of-Thought (CoT)

High Impact

Ask the model to reason step-by-step before giving a final answer. Dramatically improves accuracy on math, logic, and multi-step reasoning tasks.

Solve this step by step, showing your reasoning at each stage before giving the final answer:

A store offers 20% off, then an additional 15% off the sale price. What is the total discount on a $200 item?

When to use: Math problems, logical deductions, complex analysis, debugging. Add "Let's think step by step" as a simple trigger.

Step-by-Step Instructions

Essential

Break your request into numbered steps to control execution order and ensure nothing is skipped. Works like pseudocode for the model.

Follow these steps exactly:
1. Read the customer email below
2. Identify the core complaint
3. Determine urgency (low/med/high)
4. Draft a response using our brand voice
5. Flag if escalation is needed

When to use: Multi-part tasks, workflows, any time order matters. Prevents the model from skipping steps or combining them incorrectly.

Zero-Shot with Context

Essential

Provide rich context and constraints without examples. Works well when you clearly define what you want, including format, length, tone, and audience.

Write a 280-character tweet thread (5 tweets) explaining quantum computing to a 12-year-old. Use analogies. No jargon. Each tweet should stand alone.

When to use: When the task is clear enough that examples aren't needed. Good for creative tasks, simple transformations, and when you have specific constraints.

Delimiter Separation

Essential

Use clear delimiters (triple quotes, XML tags, markdown headers) to separate instructions from data. Prevents prompt injection and improves clarity.

Summarize the text between the tags.

<document>
[paste long article here]
</document>

Provide a 3-sentence summary.

When to use: Whenever your prompt includes user-supplied data, long documents, or multiple distinct sections. Critical for security.

🧠

Advanced Techniques

Power-user patterns for complex reasoning and reliability

Tree of Thought (ToT)

Advanced

The model explores multiple reasoning paths, evaluates each, and selects the best. Like CoT but branching — better for problems with multiple valid approaches.

Consider 3 different approaches to solving this:

Approach A: [describe]
Approach B: [describe]
Approach C: [describe]

For each, evaluate pros, cons, and likelihood of success. Then pick the best one and execute it.

When to use: Strategy decisions, architecture design, complex debugging where multiple hypotheses exist. Higher token cost but better outcomes.

Self-Consistency

Advanced

Generate multiple independent answers, then pick the most common result. Reduces variance and catches errors through majority voting.

Solve this problem 3 times independently, using a different method each time. Then compare your answers. If they agree, that's the answer. If they disagree, analyze why and determine the correct one.

When to use: High-stakes calculations, factual questions, when you need confidence in the answer. Best with temperature > 0 for diversity.

Self-Critique / Reflection

Advanced

Ask the model to generate an answer, critique it against specific criteria, then revise. Implements a feedback loop within a single prompt.

Write your answer. Then review it against these criteria:
- Is it factually accurate?
- Is it concise (<200 words)?
- Does it address the user's actual question?

If any criterion fails, revise and output only the final version.

When to use: Quality-critical content, customer-facing text, anywhere you'd normally review output manually.

Meta-Prompting

Advanced

Use the model to generate or improve prompts. Feed it your goal and let it craft the optimal prompt — then use that prompt. Recursion for prompts.

I want to build a prompt that extracts action items from meeting transcripts. The output should be JSON with assignee, task, and deadline.

Write the best possible prompt for this task. Include role, format spec, examples, and edge case handling.

When to use: When you're stuck crafting a prompt, building prompt templates for production, or optimizing existing prompts for better results.

ReAct (Reason + Act)

Advanced

Interleave reasoning and tool-use actions. The model thinks about what to do, executes an action, observes the result, then reasons about the next step.

Use this pattern for each step:
Thought: [what you need to figure out]
Action: [tool to use and input]
Observation: [what the tool returned]
... repeat until solved ...
Final Answer: [conclusion]

When to use: Agent workflows, tool-augmented tasks, research questions requiring multiple lookups.

Prompt Chaining

Advanced

Break complex tasks into a pipeline of simpler prompts, where each step's output feeds the next. More controllable and debuggable than one mega-prompt.

Step 1: Extract key claims from article → list
Step 2: For each claim, assess if verifiable → filtered list
Step 3: Fact-check each verifiable claim → results
Step 4: Compile into final report

When to use: Complex workflows, content pipelines, anywhere a single prompt would be unreliably long. Each step can use a different model or temperature.

📐

Output Formatting

Control the shape and structure of model outputs

JSON Mode

Structured

Force the model to output valid JSON. Use the API's response_format parameter when available, or specify the schema in your prompt.

Extract entities from this text. Return ONLY valid JSON, no markdown:

{"people": ["name"], "places": ["name"], "dates": ["ISO 8601"]}

Text: "John met Sarah in Paris on March 5, 2025."

Pro tip: Provide the exact JSON schema with example values. Use response_format: {type: "json_object"} in the API for guaranteed valid JSON.

XML Tags

Structured

Use XML-style tags to structure both input and output. Especially effective with Claude, which was trained to respect XML tag boundaries.

<instructions>Analyze the code below</instructions>

<code>
def fib(n): return fib(n-1) + fib(n-2)
</code>

Return your analysis in:
<bugs>...</bugs>
<fix>...</fix>
<explanation>...</explanation>

Pro tip: Claude particularly excels with XML. Use for separating instructions from data and for multi-part outputs. Also helps prevent prompt injection.

Markdown Formatting

Structured

Request output in markdown for human-readable structured content. Great for documentation, reports, and content that will be rendered in a UI.

Write a technical comparison in markdown with:
- H2 headers for each option
- A pros/cons bullet list under each
- A summary comparison table at the end
- Bold the recommended choice

Pro tip: Specify which markdown elements to use. Models default to markdown, but explicit instructions prevent inconsistent heading levels or missing tables.

Structured Output / Tool Use

Structured

Use the API's function calling / tool use feature to get perfectly typed responses. The model fills in a predefined schema — no parsing needed.

// Define a tool schema:
{
"name": "extract_contact",
"parameters": {
"name": "string",
"email": "string",
"phone": "string | null",
"company": "string | null"
}
}

Pro tip: Most reliable way to get structured data. Use OpenAI's tools, Anthropic's tool_use, or Google's function_calling.

🤖

Model-Specific Tips

What works best for each major model family

🟢

GPT-4o / OpenAI

  • System message is king — put persona & constraints there
  • Excellent at response_format: json_object
  • Function calling is best-in-class
  • Follows numbered instructions very well
  • Can be verbose — add "be concise" explicitly
  • May refuse edge cases — rephrase as hypothetical
  • o1/o3 models: skip CoT prompting — they do it internally
🟠

Claude / Anthropic

  • XML tags are a superpower — use <tags> everywhere
  • Excels at long documents (200K context)
  • Best at following nuanced, detailed instructions
  • Strong at code — give full file context
  • Extended thinking for hard reasoning tasks
  • Can be overly cautious — frame requests clearly
  • Prefill assistant message to steer output format
🔵

Gemini / Google

  • Excellent multimodal — images, video, audio natively
  • Massive context (1M+ tokens) — dump everything in
  • Great at grounded search with Google integration
  • Strong at structured data and tables
  • Less reliable at strict format compliance
  • May hallucinate on niche topics — provide sources
  • Use system instructions for consistent persona
🔄

Common Patterns

Copy-paste templates for everyday tasks

📝 Summarization

Summarize this in [N] bullet points for a [audience]. Focus on [what matters]. Skip [what doesn't].

<text>...</text>

Always specify: length, audience, focus area, and what to omit.

🔍 Extraction

Extract from the text below:
- Company names (exact match)
- Dollar amounts (as numbers)
- Dates (ISO 8601)

Return as JSON array. If a field is ambiguous, use null.

Define exact fields, types, and how to handle missing/ambiguous data.

🏷️ Classification

Classify this support ticket into exactly ONE category:
[bug, feature_request, billing, account, other]

Return: {"category": "...", "confidence": 0.0-1.0, "reasoning": "..."}

Enumerate all valid categories. Add confidence scores for filtering.

Generation

Write a [type] about [topic].
Tone: [casual/formal/technical]
Length: [words/paragraphs]
Audience: [who]
Must include: [key points]
Avoid: [what not to include]

The more constraints you give, the better the output. Underconstrained = generic.

📊 Analysis

Analyze this [data/code/text] and provide:
1. Key findings (top 3-5)
2. Potential issues or risks
3. Recommendations with priority
4. What you'd need to investigate further

Structure the output. Ask for prioritized findings and unknowns.

💻 Code

Write a [language] function that [does what].
Requirements:
- Input: [types and constraints]
- Output: [type and format]
- Handle: [edge cases]
- Style: [conventions]
Include tests.

Specify language, I/O types, edge cases, and conventions. Always ask for tests.

🚫

Anti-Patterns

Common mistakes that degrade output quality

Vague Instructions

❌ Bad

Make this better.

✅ Good

Improve clarity by: shortening sentences to <20 words, removing jargon, adding a concrete example.

The model can't read your mind. Define "better" explicitly.

Negative Instructions Only

❌ Bad

Don't be formal. Don't use bullet points. Don't be too long.

✅ Good

Use a casual, conversational tone. Write in flowing paragraphs. Keep under 150 words.

Tell the model what TO do, not just what NOT to do. Positive instructions are more reliable.

Overloading a Single Prompt

❌ Bad

Analyze this data, create charts, write a report, email it to the team, and schedule a meeting.

✅ Good

Step 1: Analyze this dataset and identify the top 3 trends. Present as a table.

Break complex tasks into a chain of focused prompts. Each prompt = one clear job.

No Examples or Format Spec

❌ Bad

Convert this data to a usable format.

✅ Good

Convert to CSV with columns: name, email, role. Example row: "John,john@co.com,admin"

"Usable format" is ambiguous. Show one example of the exact output you want.

Trusting Without Verification

❌ Bad

List all Supreme Court cases about X. [accepts at face value]

✅ Good

List cases about X. For each, include the citation so I can verify. Flag any you're uncertain about.

Models hallucinate. Ask for citations, confidence levels, and explicit uncertainty markers.

Ignoring Context Window

❌ Bad

[Dumps 100-page doc] Summarize everything in detail.

✅ Good

Summarize sections 3-5, focusing on methodology and key findings.

Even with large context windows, quality degrades with irrelevant content. Feed only what's needed.

🌡️

Temperature & Parameters Guide

Dial in the right settings for your task

Temperature Scale

0.0

Deterministic

Extraction, classification, math, code that must be correct

0.3

Focused

Summarization, translation, Q&A, technical writing

0.7

Balanced

General writing, explanations, most tasks (default)

1.0

Creative

Brainstorming, creative fiction, marketing copy, poetry

1.5+

Wild

Experimental only. Incoherent at high values. Rarely useful.

Top-P (Nucleus Sampling)

Controls the cumulative probability pool. Lower = more focused. Usually set one of temp OR top-p, not both.

Factual / precise0.1 – 0.3
General purpose0.8 – 0.95
Creative0.95 – 1.0

Max Tokens

Set output length limits. Prevents runaway generation and controls cost. 1 token ≈ ¾ of a word in English.

Short answers50 – 150
Paragraphs300 – 800
Full articles2000 – 4000
Long-form / code4000+

Other Parameters

Frequency Penalty 0–2

Reduces repetition. 0.3–0.8 for most tasks. Higher = more varied vocabulary.

Presence Penalty 0–2

Encourages new topics. 0.1–0.6 to keep things fresh without losing coherence.

Stop Sequences

Define tokens that halt generation. Useful for structured outputs or preventing over-generation.

💰

Token Optimization Tips

Reduce cost without losing quality

🎯 Prompt Compression

  • Remove filler words: "I would like you to please" → just state the task
  • Use abbreviations in system prompts (models understand shorthand)
  • Replace verbose instructions with 1 example (few-shot > explaining)
  • Use structured data (JSON/YAML) instead of natural language for inputs

⚡ Context Management

  • Summarize conversation history instead of keeping full messages
  • Use RAG to inject only relevant chunks, not entire documents
  • Trim system prompts — most can be cut 50% without quality loss
  • Cache frequent system prompts (OpenAI prompt caching saves 50%)

🔀 Model Routing

  • Use smaller models for simple tasks (classification, extraction)
  • Route to larger models only for reasoning/creative tasks
  • GPT-4o-mini, Claude Haiku, Gemini Flash — 90% of tasks, 10% of cost
  • Build a classifier that picks the model per request

📊 Cost Math Cheat Sheet

1,000 tokens ≈750 words
1 page of text ≈~500 tokens
Average API call ≈1K–4K tokens

💡 Rule of thumb: Input tokens are 2-10× cheaper than output tokens. Keep prompts tight, but focus optimization on reducing output length first.

⚡ Quick Reference: Technique Selector

Task Type Best Technique Temperature Key Tip
Math / LogicChain-of-Thought0.0Add "show your work"
ClassificationFew-Shot + JSON0.0Enumerate all valid labels
Code GenerationRole + Step-by-Step0.0–0.3Specify language, tests, edge cases
Creative WritingRole + Constraints0.8–1.0Give tone, audience, length
Data ExtractionStructured Output0.0Use tool/function calling
SummarizationDelimiter + Constraints0.3Specify length and focus
Strategy / PlanningTree of Thought0.5–0.7Explore 3+ approaches first
BrainstormingRole + High Temp1.0–1.2Ask for quantity, filter later