January 5, 2022

Wordle Letter Counts

In response to some discussion on twitter, I dug into how common different letters are at different positions in a word. Since the discussion was prompted by the popular word game Wordle the focus was on five letter words.

My approach was to download a word list from the first google hit for word list download, and then throw python’s Counter module at it.

from collections import Counter

# Download a wordlist such as the one available here:
# https://github.com/dwyl/english-words

with open("../resources/words/words_alpha.txt") as d:
    wl = [x.strip() for x in d.readlines()]

firsts = [x[0] for x in wl]
fives = [x for x in wl if len(x) == 5]


def top_letters(ctr, n=10):
    return ",".join([x[0].upper() for x in ctr.most_common(n)])


print("For all words in the list")
print("Most common first letters")
print(top_letters(Counter(firsts)))
print("Most common all positions")
print(top_letters(Counter("".join(wl))))
print("\n")
print("For five letter words:")
print("Top letters any position")
print(top_letters(Counter("".join(fives))))
for i in range(5):
    print(f"Top letters in position {i}")
    letters = [x[i] for x in fives]
    print(top_letters(Counter(letters)))

And here’s the result:

For all words in the list
Most common first letters
S,P,C,A,U,M,T,D,B,R
Most common all positions
E,I,A,O,N,S,R,T,L,C


For five letter words:
Top letters any position
A,E,S,O,R,I,L,T,N,U
Top letters in position 0
S,C,A,B,T,P,M,D,G,F
Top letters in position 1
A,O,E,I,U,R,L,H,N,T
Top letters in position 2
R,A,I,N,O,L,E,U,T,S
Top letters in position 3
E,A,I,T,N,L,O,R,S,U
Top letters in position 4
S,E,Y,A,T,N,R,D,L,O

So CARES is a pretty good first word, since it contains the most common letter in four out of five positions and the second most common letter in the other position. It also contains four of the five most common letters across all positions.

If you’re thinking “Wait a minute, the most common letters are ETAOINSHRDL, or something like that”, you’re right, but that’s taking account of word frequency, which this word list does not. So let’s find a corpus of English prose we can count the words in. What about, for example, a plain text version of the complete works of Shakespeare?

with open("../resources/words/shakespeare.txt") as d:
    swl = [x.strip() for x in d.readlines()]

print("For the complete works of Shakespeare")
print("Most common letters")
print(top_letters(Counter("".join(swl)), n=11))

This yields E,T,O,A,H,S,N,R,I,L. n=11 here because the most common letter is in fact the space. I could strip out spaces, and split it into words and repeat the exercise of finding the most common letters in five letter words etc, but I couldn’t be bothered.

And, in honour of its new status as a work in the Public Domain, doing the same process for Winnie the Pooh yields E,T,O,A,I,N,H,S,R,D.

© Seamus Bradley 2021

Powered by Hugo & Kiss.