Zipf's law: the brutal truth about vocabulary
The linguist George Zipf noticed in the 1940s: in every language, word frequency follows a dramatic pattern.
- The top 100 words make up ~50 % of any text
- The top 1,000 words cover ~75 %
- The top 3,000 words get you to ~85 %
- The final 5 % requires another 20,000+ words
The brutal truth: 1000 words cover 75 %, the next 4000 only get you the last 10 %.
Translation: learn the right 1000 words and you understand three-quarters of what you hear. Learn the wrong 1000 and you understand maybe 30 %.
Which vocabulary to learn first?
High-frequency words aren't glamorous – but they're the skeleton of every language:
- Pronouns, articles, conjunctions (every single one)
- Common verbs: be, have, do, go, say, see, give, come…
- Common adverbs: already, still, then, here, always, never, a lot, a little
- Common adjectives: big, small, good, bad, new, old, important
Only after that: theme vocab (food, travel, work) – and again by frequency, not alphabetically like the textbook does it.
Frequency lists: where to find them
- Spanish: Wiktionary Top 1000 Spanish, Routledge "A Frequency Dictionary of Spanish"
- German: Leipzig Wortschatz, Routledge "Frequency Dictionary of German"
- English: NGSL (New General Service List) – 2800 words for 92 % coverage
The limits of frequency lists
With 1000–2000 words you have the skeleton. Beyond that, frequency matters less: the 5000th word might appear every two weeks. From there, comprehensible input (podcasts, books, series) pays off more – you'll pick up exactly the words that occur in your life.