Token Counter

Estimate

Estimate token counts for GPT-4, Claude, and other LLMs. Plan your prompts and calculate costs. All processing happens in your browser.

Input Text

0

Characters

0

Words

0

Lines

0

Sentences

Token Estimates

Char-based (~4 chars/token)

0

Word-based (~0.75 tok/word)

0

Average Estimate

~0

Token counts are estimates. Actual counts depend on the model's tokenizer (BPE for GPT, SentencePiece for others). Estimates are most accurate for plain English text.

Cost Calculator
GPT-4oOpenAI
$0.00($2.5/1M in)
Claude Sonnet 4Anthropic
$0.00($3/1M in)
Claude Opus 4Anthropic
$0.00($15/1M in)
Gemini 2.5 ProGoogle
$0.00($1.25/1M in)

About Token Estimation

Tokens are the basic units that language models process. A token can be as short as one character or as long as a word. For English text, a rough rule of thumb is 1 token per ~4 characters, or ~0.75 tokens per word. Code, special characters, and non-English text often use more tokens. Prices shown are approximate and subject to change.

About This Tool

Tokens are the units LLMs use to process text. A token is roughly 0.75 English words on average, but varies — common words may be a single token while rare words split into multiple subword pieces. Pricing, context windows, and rate limits are denominated in tokens, not characters or words.

The counter estimates token count for a given text using the tokenizer family (cl100k for GPT-4 era, tiktoken's various encodings for newer models). Estimates are within a few percent of the exact count; truly precise counting requires running the actual model's tokenizer.

Tokenizers come from a class of algorithms called byte-pair encoding (BPE) or its variants (WordPiece, SentencePiece, Tiktoken). The training process scans a large text corpus and merges frequent adjacent character pairs into tokens, then frequent token pairs, repeating until a target vocabulary size (typically 50K to 200K tokens) is reached. Common English words like 'the' or 'because' end up as single tokens; rare words split into pieces ('antidisestablishmentarianism' might tokenize as 'anti', 'dis', 'establishment', 'arian', 'ism'). The vocabulary is fixed once the tokenizer is trained, and different model families use different tokenizers. GPT-4o, Claude, Gemini, Llama, and Mistral all use distinct schemes. Token counts can vary 10 to 30 percent for the same text. The counter here uses a tiktoken-class approximation; for precise counts, run the actual model's tokenizer.

A worked example. The sentence 'The quick brown fox jumps over the lazy dog' is 9 words and 43 characters. In the cl100k_base tokenizer (GPT-4): roughly 10 tokens. In Claude's tokenizer: about 11 tokens. In Llama 3's tokenizer: about 12 tokens. The differences come from how each tokenizer handles common words, the leading space convention (most modern tokenizers prepend space to words after the first), and vocabulary size. A more illustrative example: the JSON string '{"key":"value","count":42}' is roughly 12 tokens despite being 28 characters. Code, structured data, and non-English text typically tokenize less efficiently than natural English prose. A 1,000-word English document is around 1,300 to 1,400 tokens; the same content translated to Chinese is often 2,500 to 4,000 tokens because CJK characters tokenize less efficiently in models trained primarily on English.

Limitations and practical pricing implications. Most LLM pricing is per-1,000 tokens for both input and output, and a single chat exchange easily includes 1 to 5K tokens of context, history, and system prompts. Long documents, code samples, and multi-turn conversations balloon quickly. Watching the token-to-cost ratio is the biggest single lever on production LLM operating cost. Context window limits (e.g., 200K for Claude 3, 128K for GPT-4 Turbo, up to 1M for some Gemini models) cap input + output combined per request — hitting the limit either truncates input or rejects the request entirely. Counting before sending is essential for production reliability; silent truncation produces subtly wrong outputs. Spanish, Chinese, and other non-English languages cost more per equivalent message because most major tokenizers were trained on English-heavy corpora and assign efficient single-token representations to common English words. Tools that need exact token counts for billing or routing should use the model's actual tokenizer rather than approximations; the counter is for capacity planning and rough cost estimation.

The about text and FAQ on this page were drafted with AI assistance and reviewed by a member of the Coherence Daddy team before publishing. See our Content Policy for editorial standards.

Frequently Asked Questions

Why do tokens cost more than I expected?
Most pricing is per-1,000-tokens for both input and output, and a single chat exchange easily includes 1 to 5K tokens of context, history, and system prompts. Long documents, code, and conversations balloon quickly. Watching the token-to-cost ratio is the biggest single lever on LLM operating cost.
Are tokenizers the same across models?
No. GPT-4o, Claude, Gemini, Llama, Mistral all use different tokenizer schemes. Token counts can vary by 10 to 30% for the same text. Code, JSON, and non-English text show the widest variation. Always count with the tokenizer for the model you're actually using.
Why does Spanish or Chinese cost more?
Most major tokenizers were trained on English-heavy corpora and assign efficient single-token representations to common English words. Other languages, especially CJK and morphologically rich ones, fragment into more tokens per word. A Chinese sentence can run 2 to 4x more tokens than the same meaning in English.
How does the context window relate to tokens?
The window is a hard cap on input + output tokens combined per request. A 128K window holds about 96,000 English words. Hitting the limit either truncates input or rejects the request entirely. Counting before sending is essential for production reliability — silent truncation produces subtly wrong outputs.
What's the difference between input and output tokens for pricing?
Output tokens are typically 2 to 5 times more expensive than input tokens across most providers. Why: output requires sequential generation, which is computationally heavier than parallelizable input processing. Long context windows with short outputs are the cheapest pattern; conversational systems that generate long replies are the most expensive.
How can I reduce token usage in production?
Trim system prompts ruthlessly — every redundant word costs across every request. Compress conversation history with summaries instead of including every prior turn. Use cheaper models for routine subtasks and reserve flagship models for hard ones. Move static instructions to fine-tuned models or cached prompts where supported.
Are there ways to make tokenizers more efficient?
Larger vocabularies (200K+) compress text more but cost more memory and training compute. Specialized tokenizers (code-only, language-specific) achieve much better compression for their domain. Most general-purpose models trade compression for broad applicability. Domain-specific applications sometimes benefit from custom tokenizers trained on representative data.
Why do my counts differ between this tool and the actual model?
Approximation. The counter uses a tiktoken-class tokenizer that matches GPT-4 closely but isn't byte-for-byte identical to every model's tokenizer. For exact counts, use the model provider's official tokenizer (tiktoken for OpenAI, anthropic-tokenizer-typescript for Claude, etc.). The estimate here is within a few percent for most English text.