🔤 Tokenizer Playground
Tiktoken (OpenAI)
HuggingFace Transformers
cl100k_base (GPT-4/ChatGPT)
p50k_base (GPT-2/Codex)
p50k_edit (Codex Edit)
r50k_base (GPT-2 original)
Tokenize
Hey Jude! The tokenizer breaks text into tokens.
0
Tokens
0
Characters
0
Chars/Token
How it works:
Tokenizers convert text into numerical token IDs that LLMs understand. Different models use different vocabularies. One token is roughly 4 characters in English.