Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Shubhamsaboo/awesome-llm-apps/llms.txt

Use this file to discover all available pages before exploring further.

Overview

LLM API costs are directly tied to token count. These optimization tools help you reduce token usage while maintaining accuracy, enabling cost-effective AI applications at scale.

TOON Format

63.9% average token reduction for structured data

Headroom

47-92% token savings through intelligent compression

Why Optimize?

Impact on API Costs

Based on GPT-4 pricing ($0.03/1K input tokens):
Usage VolumeStandard CostWith Optimization (60% reduction)Savings
1,000 calls$2.55$1.02$1.53
100,000 calls$255.00$102.00$153.00
1M calls$2,550.00$1,020.00$1,530.00
10M calls$25,500.00$10,200.00$15,300.00

Toonify Token Optimization

Reduce token usage by 30-73% using TOON (Token-Oriented Object Notation) format

What is TOON?

TOON is a compact serialization format designed specifically for LLM token efficiency. It achieves CSV-like compression while maintaining structure and readability.

Key Benefits

63.9% Average Reduction

Verified across 50 real-world datasets

73.4% for Tabular Data

Optimal for structured, uniform data

Human Readable

Still easy to understand and debug

<1ms Overhead

Negligible conversion time

Format Comparison

{
  "products": [
    {"id": 101, "name": "Laptop Pro", "price": 1299},
    {"id": 102, "name": "Magic Mouse", "price": 79},
    {"id": 103, "name": "USB-C Cable", "price": 19}
  ]
}
Token Savings: 85 → 39 tokens (54.1% reduction)Cost Impact: 2.55/1Krequests2.55/1K requests → 1.17/1K requests

Implementation

1

Install Toonify

pip install toonify
2

Convert Data to TOON

from toon import encode, decode
import json

# Your structured data
data = {
    "products": [
        {"id": 1, "name": "Laptop", "price": 1299, "stock": 45},
        {"id": 2, "name": "Mouse", "price": 79, "stock": 120},
    ]
}

# Convert to TOON format
toon_str = encode(data)
print(f"JSON: {len(json.dumps(data))} bytes")
print(f"TOON: {len(toon_str)} bytes")
3

Send to LLM

from openai import OpenAI

client = OpenAI()

# Use TOON format in prompt
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"Analyze this product data:\n\n{toon_str}"
    }]
)

print(response.choices[0].message.content)
4

Decode if Needed

# Convert back to Python objects
original_data = decode(toon_str)
assert original_data == data  # Roundtrip verification

Real-World Example

from toon import encode
from openai import OpenAI
import json

client = OpenAI()

# Sample product catalog (could be 100s of products)
products = [
    {"id": 1, "name": "Laptop Pro", "price": 1299, "stock": 45, "category": "Electronics"},
    {"id": 2, "name": "Magic Mouse", "price": 79, "stock": 120, "category": "Accessories"},
    {"id": 3, "name": "USB-C Hub", "price": 49, "stock": 200, "category": "Accessories"},
    {"id": 4, "name": "4K Monitor", "price": 599, "stock": 30, "category": "Electronics"},
    {"id": 5, "name": "Keyboard", "price": 129, "stock": 85, "category": "Accessories"},
    # ... potentially hundreds more
]

# Measure token reduction
json_str = json.dumps(products)
toon_str = encode(products)

print(f"JSON size: {len(json_str)} bytes")
print(f"TOON size: {len(toon_str)} bytes")
print(f"Reduction: {((len(json_str) - len(toon_str)) / len(json_str) * 100):.1f}%")

# Send optimized data to LLM
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"""
        Analyze this product catalog and provide:
        1. Total inventory value
        2. Low stock items (< 50 units)
        3. Average price by category
        
        Data:
        {toon_str}
        """
    }]
)

print(response.choices[0].message.content)
Results:
  • JSON: 487 bytes, ~165 tokens
  • TOON: 186 bytes, ~68 tokens
  • Reduction: 58.8% tokens, 61.8% bytes

Best Use Cases

Optimal for:
  • Product catalogs
  • CSV exports
  • Database query results
  • API response data
  • Survey results
  • Analytics data
Token savings: 60-73%
Good for:
  • Configuration files
  • Uniform object arrays
  • API payloads
  • Log data
Token savings: 50-65%
Avoid TOON for:
  • Highly nested data (greater than 3 levels)
  • Irregular/heterogeneous structures
  • Small payloads (less than 100 bytes)
  • Binary data
  • When JSON compatibility is critical

Interactive Demo

import streamlit as st
import json
from toon import encode, decode
import tiktoken

st.title("🎯 Toonify Token Optimizer")

# Token counter
enc = tiktoken.encoding_for_model("gpt-4")

def count_tokens(text):
    return len(enc.encode(text))

# Input data
data_input = st.text_area(
    "Paste your JSON data",
    value=json.dumps({
        "products": [
            {"id": 1, "name": "Laptop", "price": 1299},
            {"id": 2, "name": "Mouse", "price": 79},
        ]
    }, indent=2),
    height=200
)

if data_input:
    try:
        # Parse JSON
        data = json.loads(data_input)
        
        # Convert to TOON
        toon_str = encode(data)
        
        # Calculate metrics
        json_tokens = count_tokens(data_input)
        toon_tokens = count_tokens(toon_str)
        reduction = ((json_tokens - toon_tokens) / json_tokens) * 100
        
        # Display results
        col1, col2 = st.columns(2)
        
        with col1:
            st.subheader("JSON Format")
            st.code(data_input, language="json")
            st.metric("Tokens", json_tokens)
            st.metric("Bytes", len(data_input))
        
        with col2:
            st.subheader("TOON Format")
            st.code(toon_str, language="text")
            st.metric("Tokens", toon_tokens)
            st.metric("Bytes", len(toon_str))
        
        # Savings
        st.success(f"Token Reduction: {reduction:.1f}%")
        
        # Cost calculator
        st.subheader("Cost Savings Calculator")
        requests = st.number_input("Number of API requests", value=1000, step=1000)
        
        gpt4_cost_per_1k = 0.03
        json_cost = (json_tokens / 1000) * gpt4_cost_per_1k * requests
        toon_cost = (toon_tokens / 1000) * gpt4_cost_per_1k * requests
        savings = json_cost - toon_cost
        
        st.write(f"**JSON cost**: ${json_cost:.2f}")
        st.write(f"**TOON cost**: ${toon_cost:.2f}")
        st.write(f"**💰 Savings**: ${savings:.2f}")
        
    except Exception as e:
        st.error(f"Error: {e}")
Run with: streamlit run toonify_app.py

Performance Benchmarks

Dataset TypeAvg ReductionBest CaseWorst Case
Tabular68.5%73.4%62.1%
Structured JSON61.2%67.8%54.3%
Nested JSON48.7%56.2%41.5%
Mixed55.4%63.9%47.8%

Headroom Context Optimization

Reduce token usage by 47-92% through intelligent context compression for AI agents

What is Headroom?

Headroom is a context optimization layer that compresses tool outputs and conversation history while preserving accuracy. Unlike simple truncation, it uses statistical analysis to keep what matters.

Key Benefits

47-92% Token Reduction

Verified across production workloads

Zero Code Changes

Transparent proxy integration

Reversible Compression

LLM can retrieve original data via CCR

Provider Caching

Optimizes for OpenAI/Anthropic caching

Core Features

Statistical Compression

Keeps:
  • First N items (context)
  • Last N items (recency)
  • Anomalies (statistical outliers)
  • Query-relevant matches
Removes:
  • Repetitive boilerplate
  • Redundant middle sections
  • Low-information content
from headroom import SmartCrusher

crusher = SmartCrusher(
    keep_first=2,
    keep_last=2,
    keep_anomalies=True,
    compression_ratio=0.3
)

# Compress tool output
compressed = crusher.compress(tool_output)

Installation & Setup

1

Install Headroom

pip install headroom-ai
2

Choose Integration Method

# Start proxy server
headroom proxy --port 8787

# Point existing tools at proxy
export OPENAI_BASE_URL=http://localhost:8787/v1
export ANTHROPIC_BASE_URL=http://localhost:8787

# Use tools normally - compression is automatic

Real-World Performance

These are actual results from production API calls, not estimates.

Needle in Haystack Test

Setup:
  • 100 production log entries
  • 1 critical FATAL error at position 67
  • Question: “What caused the outage? Error code? Fix?”
Baseline (no compression):
Tokens: 10,144
Cost: $0.30
Response time: 4.8s
Answer: ✅ Correct (payment-gateway, PG-5523, increase max_connections)
With Headroom:
Tokens: 1,260 (87.6% reduction)
Cost: $0.04 (86.7% savings)
Response time: 1.2s (75% faster)
Answer: ✅ Correct (same details)
What Headroom kept:
  • Position 67: FATAL error (the needle)
  • Position 1-2: Context (timeline start)
  • Position 99-100: Most recent state
  • Position 45: Anomaly (connection spike)
What Headroom removed:
  • 96 INFO/DEBUG entries
  • Repetitive health checks
  • Standard operational logs
Result: Same accuracy, 87.6% fewer tokens

Configuration

HeadroomChatModel
class
LangChain integration with compression

Best Use Cases

Optimal for:
  • Multi-tool workflows
  • Code search agents
  • Database query agents
  • API integration agents
  • Log analysis agents
Average savings: 75-90%
Ideal for:
  • Code search results (100+ files)
  • Database query results (1000+ rows)
  • API responses (large JSON)
  • Log files (10K+ lines)
  • Documentation searches
Average savings: 80-92%
Useful for:
  • Long chat sessions
  • Multi-turn debugging
  • Context-heavy conversations
  • Memory-intensive agents
Average savings: 50-70%

Safety Guarantees

Never Removes Human Content

User and assistant messages are always preserved in full

Never Breaks Tool Pairing

Tool calls and responses stay together

Parse Failures = No-op

Malformed content passes through unchanged

Reversible Compression

LLM can retrieve original data via CCR

Best Practices

Use CaseToolExpected Savings
Structured data (JSON, CSV)TOON60-73%
AI agent tool outputsHeadroom75-92%
Large API responsesBoth80-95%
Conversation historyHeadroom50-70%
Mixed/nested JSONTOON45-60%
import tiktoken

enc = tiktoken.encoding_for_model("gpt-4")

def measure_optimization(original, optimized):
    original_tokens = len(enc.encode(original))
    optimized_tokens = len(enc.encode(optimized))
    
    reduction = ((original_tokens - optimized_tokens) / original_tokens) * 100
    
    # Calculate cost savings (GPT-4 pricing)
    cost_per_1k = 0.03
    original_cost = (original_tokens / 1000) * cost_per_1k
    optimized_cost = (optimized_tokens / 1000) * cost_per_1k
    
    return {
        "original_tokens": original_tokens,
        "optimized_tokens": optimized_tokens,
        "reduction_percent": reduction,
        "cost_savings": original_cost - optimized_cost
    }
from headroom.integrations import HeadroomChatModel
import logging

# Set up monitoring
llm = HeadroomChatModel(base_model)

# Log metrics
def log_metrics():
    logging.info(f"Total tokens saved: {llm.total_tokens_saved}")
    logging.info(f"Total cost saved: ${llm.total_cost_saved:.2f}")
    logging.info(f"Compression ratio: {llm.avg_compression_ratio:.1%}")

# Call after batch of requests
log_metrics()
from toon import encode
from headroom.integrations import HeadroomChatModel

# Use both TOON and Headroom
def optimized_agent_call(structured_data, tools):
    # 1. Convert structured data to TOON
    toon_data = encode(structured_data)
    
    # 2. Use Headroom for tool outputs
    llm = HeadroomChatModel(base_model)
    
    # 3. Combine for maximum savings
    response = llm.invoke(
        f"Analyze this data and search for patterns:\n{toon_data}"
    )
    
    return response

# Result: 80-95% total token reduction

Cost Calculator

import streamlit as st

st.title("LLM Optimization Cost Calculator")

# Inputs
col1, col2 = st.columns(2)

with col1:
    avg_tokens = st.number_input("Average tokens per request", value=5000, step=100)
    requests_per_day = st.number_input("Requests per day", value=1000, step=100)
    
with col2:
    model = st.selectbox("Model", ["GPT-4", "GPT-4o", "Claude 3.5 Sonnet"])
    optimization = st.slider("Token reduction %", 0, 95, 60)

# Pricing
pricing = {
    "GPT-4": 0.03,
    "GPT-4o": 0.0025,
    "Claude 3.5 Sonnet": 0.003
}

cost_per_1k = pricing[model]

# Calculate
monthly_requests = requests_per_day * 30
yearly_requests = requests_per_day * 365

# Baseline
baseline_monthly = (avg_tokens / 1000) * cost_per_1k * monthly_requests
baseline_yearly = (avg_tokens / 1000) * cost_per_1k * yearly_requests

# Optimized
optimized_tokens = avg_tokens * (1 - optimization / 100)
optimized_monthly = (optimized_tokens / 1000) * cost_per_1k * monthly_requests
optimized_yearly = (optimized_tokens / 1000) * cost_per_1k * yearly_requests

# Display
st.subheader("Cost Analysis")

col1, col2, col3 = st.columns(3)

with col1:
    st.metric("Monthly Baseline", f"${baseline_monthly:.2f}")
    st.metric("Monthly Optimized", f"${optimized_monthly:.2f}")
    st.metric("Monthly Savings", f"${baseline_monthly - optimized_monthly:.2f}", delta=f"-{optimization}%")

with col2:
    st.metric("Yearly Baseline", f"${baseline_yearly:.2f}")
    st.metric("Yearly Optimized", f"${optimized_yearly:.2f}")
    st.metric("Yearly Savings", f"${baseline_yearly - optimized_yearly:.2f}", delta=f"-{optimization}%")

with col3:
    st.metric("3-Year Baseline", f"${baseline_yearly * 3:.2f}")
    st.metric("3-Year Optimized", f"${optimized_yearly * 3:.2f}")
    st.metric("3-Year Savings", f"${(baseline_yearly - optimized_yearly) * 3:.2f}", delta=f"-{optimization}%")

Resources

Toonify GitHub

TOON format library and examples

Headroom GitHub

Context optimization framework

Example Apps

Complete optimization demos

OpenAI Tokenizer

Test token counting