Are these prices current?

Provider pricing changes regularly. Re-check the official documentation before making capacity decisions. Pricing on this calculator reflects published rates at the time of the last review.

Why do output tokens cost more?

Output generation is more expensive computationally - autoregressive token-by-token generation. Input is processed once in parallel.

How do I count tokens?

Use the provider's tokenizer (tiktoken for OpenAI, similar for others). Rough rule of thumb: 1 token ~ 0.75 words in English. Specialized content (code, JSON) tokenizes differently.

Should I use a smaller model?

Smaller models are dramatically cheaper and often sufficient. Test on your specific use case; quality often plateaus before cost does.

How do caching discounts work?

Anthropic's prompt caching, OpenAI's prompt caching: cached prefix tokens are reused at lower cost. Useful when many requests share long initial context (system prompts, RAG context). Discounts of 50-90% on cached portions.

AI Tokens Per Word Calculator

Convert words to LLM tokens estimate.

WordsLang (1=English 2=European 3=Asian)

Enter values above — results appear instantly as you type.

AI Insight: Token counts vary by language: English averages 0.75 words per token, while Chinese, Japanese, and Korean often run 1-2 tokens per character. Code and JSON tokenize more efficiently than prose — always estimate using a sample of your actual content, not a generic ratio.

Reviewed by the CalcNest Editorial Team · Last reviewed: May 2026 · Methodology

Looking for a different calculator? Try our AI Finder — describe what you need in plain English. Try AI Finder →

Formula

Tokens ≈ Words × Lang Factor

Example

1000 English words → 1300 tokens.

Tokens are not words. They're chunks of text that LLMs use as their basic unit, and the ratio between tokens and words varies more than most developers realize. A 1,000-word document in English is roughly 1,300 tokens. The same content translated to Chinese might be 1,800 tokens. As JSON, it might be 2,500. As code, it could hit 3,000. Knowing your token ratio matters for cost projection and context window planning.

What tokenizers actually do

LLMs don't read text directly. Before the model sees anything, a tokenizer splits the text into pieces — usually subword chunks based on statistical frequency. Common words become single tokens. Rare words get split into multiple tokens. Whitespace and punctuation sometimes get their own tokens, sometimes get merged with adjacent words.

Different models use different tokenizers. OpenAI's GPT-4 uses tiktoken with the cl100k_base or o200k_base vocabulary. Claude uses its own tokenizer. Llama uses SentencePiece. These produce different token counts for the same input text — sometimes by 20-30%.

Tokens-per-word by language

English is unusually efficient for tokenizers — most common English words map to single tokens, and the tokenizer vocabularies were primarily trained on English text. Other languages can be substantially less efficient.

Language	Tokens per word (avg)	Tokens per 1,000 words
English	1.3-1.4	~1,350
Spanish	1.5-1.7	~1,600
French	1.5-1.7	~1,600
German	1.7-2.0	~1,850
Russian (Cyrillic)	2.5-3.0	~2,750
Arabic	3.0-4.0	~3,500
Chinese (per character)	1.0-2.0	varies by script
Japanese (per character)	1.5-2.5	varies by script
Hindi (Devanagari)	3.0-4.5	~3,800

This matters for two reasons. First, multilingual applications cost more per "equivalent meaning" when operating in non-English languages. A French user message and an English user message conveying the same idea use different token counts. Second, context windows fill faster in non-English languages. A "128K token" context window holds about 95K English words but only 35-40K Hindi words.

Special cases that surprise developers

Content type	Tokens per word	Why
Standard English prose	1.3-1.4	Baseline; tokenizer optimized for this
Technical jargon	1.5-1.8	Specialized terms often split into multiple tokens
Code (Python, JS)	2.0-3.0	Symbols, snake_case, camelCase all add tokens
JSON / XML	2.5-3.5	Brackets, quotes, structural symbols
URLs and long IDs	2-5 tokens per "word"	Random strings can't be tokenized efficiently
Emojis	1-3 tokens each	Complex emoji sequences use multiple tokens
Mathematical notation	2-4 tokens per term	LaTeX symbols, equations, formulas
Repeated content / common phrases	0.8-1.0	Some phrases get compressed to single tokens

How "Hello, world!" tokenizes across models

Why context windows matter

Every model has a maximum context window — the total tokens (input + output combined) it can process in one request. Going beyond the limit produces an error. The numbers sound large but fill up faster than you'd expect.

Model	Context window	Approximate English words
GPT-3.5 Turbo	16K tokens	~12,000 words
GPT-4o	128K tokens	~95,000 words
Claude 3.5 Sonnet	200K tokens	~150,000 words
Gemini 1.5 Pro	2M tokens	~1.5 million words
Gemini 1.5 Flash	1M tokens	~750,000 words
Llama 3.1 (open)	128K tokens	~95,000 words

For perspective: a typical novel is 70,000-100,000 words. Claude can hold an entire novel in context. Gemini Pro can hold an entire textbook. These large context windows enable use cases (analyzing entire codebases, long legal documents, full book Q&A) that weren't possible with 8K or 32K windows.

Counting tokens before sending

For production cost projection, count tokens before sending. Each provider offers tools:

OpenAI: tiktoken library (Python, JS). Returns exact token counts for any model.
Anthropic: count_tokens API endpoint. Free, returns Claude-specific token count.
Google: count_tokens method on the Gemini API. Free.
Hugging Face Transformers: tokenizer libraries for all major open models.

For rough estimation without an API call, use 0.75 words per token as a default for English. So 4,000 tokens ≈ 3,000 words. This estimate is good enough for planning but not for billing-accurate calculations.

A brief history of tokenization

Modern LLM tokenization is built on Byte-Pair Encoding (BPE), an algorithm Phillip Gage published in 1994 for data compression. Researchers at Edinburgh and OpenAI adapted it for neural machine translation in 2015-2016, and it became the standard for transformer models. The basic idea: start with single characters, then iteratively merge the most common pairs into single tokens until you reach a target vocabulary size.

GPT-2 used a 50,257-token vocabulary. GPT-3 expanded slightly to 50,257 (same). GPT-4 grew to 100,277 tokens (cl100k_base). GPT-4o pushed to 200,019 (o200k_base). Each expansion improved efficiency for non-English text and code — what took 5 tokens in GPT-3 might take 3 tokens in GPT-4o.

Different tokenization algorithms have different strengths. BPE (used by OpenAI and many others) excels at general text. WordPiece (BERT, ELECTRA) handles morphology better — useful for languages with complex word forms. SentencePiece (Llama, T5) treats whitespace as just another character, which works better across languages with different word-boundary conventions.

Prompt compression: getting more from fewer tokens

For applications hitting context window limits or paying high token costs, prompt compression is the technical frontier. Several approaches work:

Technique	How it works	Compression ratio
Removing redundancy manually	Audit your prompts; eliminate repeated instructions	10-30% savings
Summarization preprocessing	Use a cheap model to summarize before sending to expensive model	3-10× compression possible
LLMLingua / Selective Context	Algorithmically remove low-information tokens	2-5× compression with quality preservation
Soft prompts / embedding tokens	Replace text instructions with learned embedding vectors	20-50× for repeated tasks
Conversation summarization	Summarize old turns instead of keeping full history	Linear-to-constant memory

The newest approaches use "soft prompts" — embedding-level representations that aren't human-readable but encode instructions more densely than text. For applications doing the same task repeatedly (a SaaS feature with fixed system instructions), soft prompts can replace 500-2,000 text tokens with 10-20 embedding tokens.

Edge cases that produce surprising token counts

Trailing whitespace. Some tokenizers count trailing spaces as separate tokens. Two text strings that look identical can have different token counts.
Unicode characters with combining marks. Characters like emoji with skin-tone modifiers can use 5-10 tokens each.
Repeated punctuation. "!!!!" is usually 1-2 tokens but "?????" might be 4. Inconsistent across tokenizers.
URLs. A URL like "https://example.com/very/long/path/with/many/segments?query=string" can use 15-25 tokens for what looks like a single "word."
UUIDs and hashes. Random strings can't be tokenized efficiently. A UUID is typically 12-18 tokens despite being a single identifier.
Code with single-letter variables. Compact code (i, j, x, y) uses fewer tokens than verbose code with full names.
JSON keys. JSON keys get tokenized separately from values. Verbose key names ("customerFirstName" vs "name") substantially increase token count.

Practical token budgeting

For production applications, build a token budget per request type. Don't just track "average tokens per request" — track p50, p95, and p99. Long-tail requests (rare cases with much longer context) often drive surprise costs.

A typical chatbot might have: p50 = 1,800 tokens, p95 = 4,500 tokens, p99 = 12,000 tokens. Budgeting based on p50 underestimates actual costs by 15-25%. Use weighted average ((p50 × 0.7) + (p95 × 0.25) + (p99 × 0.05)) for realistic projections.

Tokenizer playground tools

To see exactly how text gets tokenized, several free playground tools let you paste text and see the token breakdown visually. OpenAI's tokenizer page (platform.openai.com/tokenizer) shows GPT tokenization. Hugging Face hosts tokenizer demos for most open models. These are invaluable for debugging surprising token counts in production prompts.

One quick exercise that teaches more than reading about tokenization: paste a typical user prompt from your app into a tokenizer playground. The visual breakdown often reveals 10-20% of tokens you didn't realize were there (extra whitespace, hidden Unicode, structural overhead).

Common mistakes

Assuming "tokens = words" for cost math. You'll undershoot by 30-40%. Always multiply word count by 1.3-1.4 minimum.
Ignoring system prompts in token count. The system prompt (often 200-2000 tokens) is counted in every request. A long system prompt at high volume is a real cost driver.
Forgetting conversation history. Multi-turn chats include all previous messages in each new request. A 20-message conversation eventually costs more per turn than the first message did.
Using one tokenizer for all models. tiktoken counts are accurate for OpenAI but wrong for Claude or Gemini by 5-15%.
Not accounting for tool calls. Function calling adds tokens for the function schema, tool calls, and tool results. Tool-heavy applications can have 50%+ overhead.

Questions and answers

Why doesn't 1 word = 1 token?

Because tokenizers optimize for compression across the entire training corpus. Common words become single tokens; rare words split into multiple tokens. "The" is 1 token. "Antidisestablishmentarianism" is probably 6.

Can I write more efficiently for tokens?

Slightly. Shorter common words, avoiding unnecessary punctuation, and removing redundant whitespace help marginally. For real cost reduction, focus on system prompt length and conversation history management — those have much bigger impacts.

Do images and audio count as tokens?

Yes, in multimodal models. Images convert to "tokens" representing visual patches — a typical image is 1,000-3,000 tokens depending on resolution. Audio in models like GPT-4o is similar. Multimodal inputs aren't "free."

Sources

OpenAI: tiktoken library documentation (github.com/openai/tiktoken)
Anthropic Claude tokenizer reference
Google Gemini count_tokens documentation
Sennrich et al. (2016): "Neural Machine Translation of Rare Words with Subword Units"