Question 1

Why do tokens cost more than I expected?

Accepted Answer

Most pricing is per-1,000-tokens for both input and output, and a single chat exchange easily includes 1 to 5K tokens of context, history, and system prompts. Long documents, code, and conversations balloon quickly. Watching the token-to-cost ratio is the biggest single lever on LLM operating cost.

Question 2

Are tokenizers the same across models?

Accepted Answer

No. GPT-4o, Claude, Gemini, Llama, Mistral all use different tokenizer schemes. Token counts can vary by 10 to 30% for the same text. Code, JSON, and non-English text show the widest variation. Always count with the tokenizer for the model you're actually using.

Question 3

Why does Spanish or Chinese cost more?

Accepted Answer

Most major tokenizers were trained on English-heavy corpora and assign efficient single-token representations to common English words. Other languages, especially CJK and morphologically rich ones, fragment into more tokens per word. A Chinese sentence can run 2 to 4x more tokens than the same meaning in English.

Question 4

How does the context window relate to tokens?

Accepted Answer

The window is a hard cap on input + output tokens combined per request. A 128K window holds about 96,000 English words. Hitting the limit either truncates input or rejects the request entirely. Counting before sending is essential for production reliability — silent truncation produces subtly wrong outputs.

Question 5

What's the difference between input and output tokens for pricing?

Accepted Answer

Output tokens are typically 2 to 5 times more expensive than input tokens across most providers. Why: output requires sequential generation, which is computationally heavier than parallelizable input processing. Long context windows with short outputs are the cheapest pattern; conversational systems that generate long replies are the most expensive.

Question 6

How can I reduce token usage in production?

Accepted Answer

Trim system prompts ruthlessly — every redundant word costs across every request. Compress conversation history with summaries instead of including every prior turn. Use cheaper models for routine subtasks and reserve flagship models for hard ones. Move static instructions to fine-tuned models or cached prompts where supported.

Question 7

Are there ways to make tokenizers more efficient?

Accepted Answer

Larger vocabularies (200K+) compress text more but cost more memory and training compute. Specialized tokenizers (code-only, language-specific) achieve much better compression for their domain. Most general-purpose models trade compression for broad applicability. Domain-specific applications sometimes benefit from custom tokenizers trained on representative data.

Question 8

Why do my counts differ between this tool and the actual model?

Accepted Answer

Approximation. The counter uses a tiktoken-class tokenizer that matches GPT-4 closely but isn't byte-for-byte identical to every model's tokenizer. For exact counts, use the model provider's official tokenizer (tiktoken for OpenAI, anthropic-tokenizer-typescript for Claude, etc.). The estimate here is within a few percent for most English text.

Token Counter

About This Tool

Frequently Asked Questions