Are German words longer than English ones?

There are many jokes about the Germans, but one of the few I can post here is that oft-told adage that at the present rate of increase, eventually the German language will consist of a single, extremely long word.

Ok, so that’s a bit mad, but a lot of people think that words in German are on average longer than those in English. But is it really true? I have wondered about this for a long time, but it’s actually quite hard to get sensible data on this sort of subject. I mean, who wants to search through a dictionary counting all the letters?

Well, actually, now it’s possible to do just that, and really easily. I just got the latest version of Mathematica – Version 7 – which includes a whole host of curated data not available in previous versions. One of the new datasets is a German dictionary to go with the English one we’ve had since Version 6. To see what languages are available, we just type:

In: DictionaryLookup[All]

which gives:

Out: {“Arabic”, “BrazilianPortuguese”, “Breton”, “BritishEnglish”, \
“Catalan”, “Croatian”, “Danish”, “Dutch”, “English”, “Esperanto”, \
“Faroese”, “Finnish”, “French”, “Galician”, “German”, “Hebrew”, \
“Hindi”, “Hungarian”, “IrishGaelic”, “Italian”, “Latin”, “Polish”, \
“Portuguese”, “Russian”, “ScottishGaelic”, “Spanish”, “Swedish”}

It’s also easy to look up how many words are in each dictionary. For example, to see how many words are in the German dictionary, we just type:

In: DictionaryLookup[{"German", "*"}] // Length

Out: 76155

which looks like a sensible number of words to do a comparison. The English dictionary has 92,518 words, by the way. Breton (yes, really!) has only 32,733 words… I’d never thought of Breton people as being particularly concise.

Anyway, the built-in functions in Mathematica mean that the question that has bugged me for ages can now be answered in two lines of code:

In: Mean[StringLength[DictionaryLookup[{"English", "*"}]]] // N

Out: 8.39372

In: Mean[StringLength[DictionaryLookup[{"German", "*"}]]] // N

Out: 11.6281

So there you have it: quantitative evidence that German words are longer than English ones, on average over 3 letters longer, which is quite a lot if you ask me! Some of the words are much longer, as you can see from the accompanying plot. If you think my methodology is flawed, please let me know with your quantitative results!

English versus German

English versus German

Posted Monday, November 24th, 2008 under Computing, Life, Posts, Science.

3 comments

  1. Alexwebmaster says:

    Hello webmaster
    I would like to share with you a link to your site
    write me here preonrelt@mail.ru

  2. Ziqiao Feng says:

    I think you overlook the factor of frequency distribution of words in real articles.

    You should do statistical study of word length distribution in english and german translation of same articles.

    Just give you three example sentences:

    Ich liebe dich.
    I love you.

    Woher kommen Sie?
    Where do you come from?

    Ich komme aus China, aber jetzt wohne in München.
    I come from China, but now live in Munich.

  3. Anderer Gregor says:

    Is this the distribution per vocabulary entry, or per running word? (see also: Zipp’s law)

Leave a Reply