Over-The-Top Wordle: Data Analysis
Preface: If you're not acquainted with Wordle, there is a basis that would be helpful for understanding this article. Definitely check it out here: https://www.powerlanguage.co.uk/wordle/.
Wordle has been a fun game to talk about with my friends, sharing scores and talking trash as many college friend groups do. It dawned on me about a week into the Wordle craze that I could collect and analyze data to prove, once and for all, that I am better than my friends (spoiler alert: I'm actually empirically the worst). And so this project was born. Each day, my friends and I share our Wordle results and I track how well each of us performs, measured by correctness of letter position of each guess. At time of writing, this has yielded a rich dataset of 150+ observations, 30+ variables, and Wordles 215-226. What follows is a completely unnecessary (but fun!) deep dive into the data as a means to demonstrate common data description and analysis techniques. By including the baseline code, the hope is that this article would be valuable for someone learning R in an engaging way. This is also intended to be part of a series. Future posts will focus on regression analysis of the data collected.
I would be remiss if I didn't thank my friend group--affectionately nicknamed "The Dollhouse" after our college house's name--for allowing me to take their data and write this article. While I have others who have submitted their data to me, I'll be restricting the analysis to the 9 people from our friend group who have given me the most detailed data. Of course, it wouldn't be fun to read about performance and the statistics without the context of who they are. They're nicknames are included in parentheses for graphical references.
Table 1: Player Descriptions
Matt Reilly-Crank (Crank)
|Brewed beer for his own wedding, so that's something. Manages a bad back without complaining about it often and super consistent (seriously, check out performance through time below).|
Ryan Dixon (Dixon)
|A larger-than-life friend with a heart of gold and arguably a better photo taker than me (but only when in Mexico, and other caveats here).|
Daniel Ellis (Ellis)
|A mild-mannered defensive maestro in basketball, with the jump shot equivalent of a Shaq free throw except it goes in.|
Julian Johnson (Jules)
|A foil to Pooch (described below) that manages to retain a soul despite working for Meta. Speaking of: metaknight player in Smash #iykyk.|
Adrian Pearce (Pearce)
|Recently married, so I guess I can't make fun of him? Just kidding, that's no fun. He kept spreadsheets of dates he had with his (now) wife... relatable and nerdy. #neverchange|
Chris Puccia (Pooch)
|Hyper competitive, but generally loses. He's a Bengals fan so I guess he can have that one. Clearly the real life manifestation of Frasier.|
Seth Sommerfield (Seth)
|Probably cheats at Ticket to Ride given how often he wins, except for that time I got the overpowered cross-continental route on the UK map. A far more compelling writer than I, but that's only because his subject matter is more accessible, obviously.|
Brenna Valentine (Brenna)
|My partner-in-crime, who I constantly have to monitor so she doesn't bring stray animals home. Avid Pokémon player and never been wrong in her life (according to her).|
Chris Blake (Blake)
|Just me, I suppose.|
Basic Data Information
Each of these players has provided screenshots each day that allow me to construct the dataset shown below:
Figure 1: Screenshot of collected data
I collect the Wordle number (wordle), indicate if they are a dollhouse member with a binary variable (dollhouse), and then get top-level information on if they successfully completed the Wordle (correct) and the number of guesses to completion (correctRound). The remaining variables are generated using the color schema of Wordle itself. For each guess and letter position, I assign a 0 if the letter does not exist in the word of the day, 1 if the letter exists in the word but is in the wrong spot, and a 2 if the letter has been correctly placed. This yields five observations for each guess until the round of completion and I refer to these as letter-points when aggregated for each guess. Taking the top line of the dataset, Pooch received 0-1-0-0-0 for his first guess (i.e. 1 letter-point) on Wordle 215. For reference, Wordle 215 was 'ROBOT' so a possible guess to earn this score would have been 'PRIED,' as only the R is a correct letter, but in the wrong spot. For the purposes of description clarity, I use the terms 'round' and 'guess' synonymously to indicate one word entered by a player.
As usual, I import the data using the read_excel function from the readxl R package. Each column is imported as numeric except for name and correctRound (in order to keep that for a distribution plot later. After import, we usually want to take a look at the data at a summary level. The following table provides some basic summary statistics using the summarytools package in R.1
Generated by summarytools 1.0.0 (R version 4.1.1)
Of note, average points assigned for letter guesses tend to go up with each guess. This isn't surprising as players received more information with each guess. 97% of Wordle attempts were completed successfully, taking approximately 4 guesses to completion, on average. The quickest Wordle completion was 2 guesses--congratulations to Crank, Ellis, Jules, and Pooch for being lucky.
This gives us a distribution of guesses across all players that looks pretty normal. Below is the distribution of rounds to completion:
Figure 2: Distribution of Rounds to Completion2
Who is the best at Wordle?
With some basics of the data, we need to begin asking the true question of interest: Who is the best at Wordle? Let's start with a distribution of the primary players:
Figure 3: Distribution of Correct Rounds by Player3
Here, I saw the first indications that I would fail any arguments I made to my friends about being better at Wordle. I consistently finish in 4 or 5 guesses and have a failure to my name (along with Brenna, Pooch, and Seth). Very visibly, Jules and Crank have left skewed distributions, with a significant proportion of their Wordles completed in 3 to 4 rounds. Dixon also has a high proportion of Wordles finished in 3 guesses.
Alright, so maybe I'm not that great in aggregate, but I'm improving through time? Another way to view the data would be to plot the round to completion across Wordles day-by-day and is shown in Figure 3.
Figure 4: Correct Round Through Time4
What do we see here? Trendlines (Loess method) indicate the trajectory of correct rounds for each successive Wordle. There's a sign of life for me with a downward trend! With a limited number of integer values possible for the y-axis, this is interesting but perhaps not the most informative. Aside from several slight downward trends that indicate improvement (Blake, Brenna, Dixon, Jules) and upward trends that indicate later round completions (Pearce, Pooch), there's not much to glean. Much like the distribution above, most of us are finishing Wordles in Round 4 with some spread around primarily 3 and 5 Rounds. Aside from some indication of dynamics, this won't help us answer who is the best at Wordle.
Average Rounds to Completion
A common way to evaluate each player might be to check measures of central tendency and aggregated measures. Utilizing piping operators in R and the summarize function, I grouped observations by person and created average measures as well as indicators of spread and skew before storing the output to a new dataset called personSummary (for those checking the code at the end).5 Let's start with some of the basics like mean and median, presented in the table below.
Table 3: Summary Statistics by Player
|Player||Wins||Win Percentage||Mean Round to Completion||Median Round to Completion||Lower Bound of Mean (95% Confidence)||Upper Bound of Mean (95% Confidence)|
Notes: Players sorted by Mean Round to Completion.
Jules and Crank are in a dead heat for the top spot with mean round to completion of 3.68 and 3.73 respectively. As expected, the median values don't exhibit much variation with so few integer values and a high quantity of completion in rounds 3, 4, or 5. By central tendency then, Jules and Crank are the best Wordlers with third place going to Pearce. I also constructed confidence intervals for the mean round-to-completion for each player in the last two columns. Per usual, these confidence intervals are constructed statistically as: Mean ± (t-value) * (Standard Error). Intuitively, these values account for the spread of each player and, with a degree of confidence, would say that mean round to completion should fall between lower and upper bounds. Taking Jules as an example, we would expect that 95% of times we sample his Wordle scores, the mean round to completion would be between 3.17 and 4.19 rounds if we assume a t-distribution models his outcomes. The gives us a potential first-pass answer to who the best among us is. Because the upper bound of these intervals for Jules and Crank is below the lower bound of those for Blake and Seth, it can be reasonably argued that Jules and Crank have statistically lower means and are therefore comparatively better at Wordle. We cannot say the same for Pearce, Brenna, Ellis, Dixon, or Pooch as there is significant overlap in the confidence interval for mean rounds to completion between them and Jules/Crank.
Blake and Seth eliminated.
Consistency of Rounds to Completion
Looking at averages is not the only way to we might answer who is better at Wordle. It might also be the case that a few very quick Wordle solves skew the mean (as mean values are highly sensitive to outliers). What if Jules and Crank merely have some outlier scores driving their results? To account for this, I calculate a new variable called points. For each Wordle played, individuals get points for the overall outcome that depend on how quickly the Wordle is solved. To penalize those that fail to complete a Wordle, points takes on a value of -1. Otherwise, points are assigned for each Wordle as 7 minus rounds-to-completion (e.g. finishing in 3 rounds earns a player 4 points), with 7 being selected so that individual players get a pity point if they solve it on the last guess.6 I then calculate relScore as the number of points scored in a Wordle minus the current overall mean--a way to center each person's distribution relative to average. Visually, consistency would look like significantly higher peaks if we plotted an individual's distribution of rounds to completion. The next figure gives us a look at the spread for each player:
Figure 5: Distribution of Rounds to Completion (When Correct)7
Notes: Vertical line plotted at current overall mean.
These density plots suggest that Blake, Dixon, Ellis, Jules, Pooch, and Seth have a fairly wide spread for rounds-to-completion. Brenna, Crank, and Pearce are fairly consistent with higher probability spikes on round 4, in particular. Of course, interpretation here is in the eye of the beholder, let's calculate some statistics! Utilizing the new variables calculated above, we can calculate the spread of the distribution for each player to see how often they perform close to their mean. These number are presented below:
Table 4: Points Per Wordle (PPW) and Measures of Variance
|Name||PPW||Skew to Overall Mean||Variance of Points|
Table 4 is sorted by variance, with the interpretation that smaller player variances indicate a more consistent player. By this measure, Pearce, Crank, and Dixon round out the top 3 as the most consistent among us (with Ellis a close 4th). Pooch, Seth, and Blake have the highest variance and could be considered "microwave players," to pull from professional basketball terminology--when they're good, they're good, when they're not, they're not. Going by mean, Blake and Seth have already been eliminated as potential candidates for the mantel of best at Wordle, but one microwave player has not.
Skewness measures the third moment of a distribution. For a refresher, the skewness of a distribution indicates how far to the left or right the majority of observations are away from the mean. A positively (or right) skewed distribution, for example, would have a mean that exceeds the median and therefore the distribution has a high point that is relatively to the left. Seth's distribution in Figure 5 is a decent example. For the purposes of this analysis, a mean that exceeds the median would indicate the presence of a high outlier that affects the mean significantly. Likewise, a negatively (or left) skewed distribution indicates a median that exceeds the mean--indicating that a bad score is significantly affecting a player's score. This too can tell us about how consistent a player is because consistent players would have skew measures close zero. On the negative side, the data suggests that negative outliers have really hurt Ellis, Crank, and Pooch while positive outliers have really benefited Dixon, Jules (congrats on guessing in 2 rounds twice!), and Seth. Let's argue that good players don't have bad Wordles.
Ellis and Crank eliminated.
Blake, Seth, Pooch, Ellis, and Crank have all been eliminated. Leaving our final round of players as: Brenna, Dixon, Jules, and Pearce. This is where things get tricky, each of these players has relatively high points per Wordle and consistent outcomes. Perhaps we can look to how effectively they parlay their round-by-round scoring into completion. The following presents the mean points a player earned in each round of play. Recall that 0 letter-points are assigned for each letter that does not exist in the word, 1 letter-point for correct letters in the wrong spot, and 2 letter-points for a correct letter in a correct spot. Effectively, this means that a player earns a minimum of 0 letter-points and a maximum of 10 letter-points per guess. I used these letter-points to estimate correlation coefficients with the number of points scored overall for the Wordle. For example, how does scoring 3 letter-points in Round 1 versus 2 letter-points correlate with finishing the Wordle sooner? Positive correlation coefficients would indicate that strong starts help a player along while negative coefficients indicate the opposite. Below is a table showing the mean letter-points per round for the first three rounds as well as the corresponding correlation coefficients.
Table 5: Round-by-Round Scoring and Correlation Coefficients
|Name||Round 1 Scoring||Round 2 Scoring||Round 3 Scoring||Correlation-Round 1||Correlation-Round 2||Correlation-Round 3|
This is where stories get really interesting. Crank is the best first round guesser with Jules in close second and Pooch in third. The order remains the same in Round 2, but by Round 3, Pearce vaults to the top. While both Crank and Pooch have already been eliminated, this seems to be where their Wordle strategy becomes slightly less effective (that or they just keep guessing 'pluot' over and over). Turning to correlation coefficients--indicators of how variables move together--scoring higher in Round 1 actually seems to hurt Ellis. Conversely, more letter-points in Round 1 help Dixon, Pearce, and Brenna the most. Of course, scoring more letter-points in Round 1 is highly correlated with knowing common letters and optimizing first word choice. Therefore, Round 2 and beyond is really where the skill comes in. Scoring higher in Round 2 hurts Crank (just going to keep eliminating him), while it is highly correlated with more PPW for Dixon, Jules, and Seth. Round 3 performance is most correlated with points for Dixon, Brenna, and Pooch. All four of the remaining top players are listed here at least once except for Pearce. Sorry, buddy.
Three players left and only a few things left to consider. Of the remaining 3, Brenna is the only one with a loss. It pains me to do, but I suppose I can get accustomed to sleeping on the couch. Despite her relative consistency and high scoring in the first several rounds, the loss is tough to overcome at this stage.
And then there were two: Dixon and Jules. Jules has the highest PPW, more incredible 2-point performances, and better scoring in Rounds 1-3. One could easily make the case that this means Dixon is able to do more with less information. While a skill for sure, it would seem that Jules has to be considered the best Wordle player in my friend group.
What a journey it's been! The final rankings, in order of elimination are:
- Brenna/Pearce (this was a tough call)
And look at that! With some pairings, we all made top-5, which is pretty neat. Hopefully this useful to someone outside my friend group as a way to think about describing and visualizing new data. Code is below for anyone that wants to adapt and use it for their own Wordle analysis in R.
Code to create Table 2:
wordle %>% descr(.,stats = "common",transpose = TRUE) %>%print(.,file="wordleSummary.html",report.title = "Summary Statistics for Wordle Dataset")
Code to create Figure 2:
ggplot(data = wordle) + geom_bar(aes(x=correctRound),fill="seagreen",alpha=0.7) +
labs(x="Correct Round",y="Count") +
Code to create Figure 3:
ggplot(data=filter(graphicalNames)) + geom_bar(aes(x=correctRound,y=..prop..,group=name,fill=name),position = position_dodge(preserve = 'single'),width = 0.7) +
scale_fill_brewer(palette ="Paired") +
scale_y_continuous(labels = percent,limits = c(0,0.75)) +
theme(axis.title.y = element_blank(), legend.title = element_blank(), legend.position = "bottom") +
Code to create Figure 4:
ggplot(data=graphicalNames) + geom_point(aes(x=wordle,y=correctRoundNum)) + geom_smooth(aes(x=wordle,y=correctRoundNum),se=FALSE,color="seagreen")+ facet_wrap(~name,ncol=3) + theme_economist_white() + ylim(0,6) + labs(x="Wordle",y="Correct Round")
Code to create personSummary:
personSummary <- wordle %>%
roundScoring=sum(round_1,round_2,round_3,round_4,round_5,round_6,na.rm = TRUE)/sum(as.numeric(correctRound),na.rm = TRUE),
skewOverall=skewness(relScore,na.rm = TRUE),
lowerBound=meanRound-qt(p=0.05/2,df=length(correct),lower.tail = F)*sd(correctRoundNum,na.rm = TRUE)/sqrt(length(correct)),
upperBound=meanRound+qt(p=0.05/2,df=length(correct),lower.tail = F)*sd(correctRoundNum,na.rm = TRUE)/sqrt(length(correct)))
Of course, any points system could be used in these calculations, so I'll note that this is completely arbitrary.
Code for Figure 5:
facet_wrap(~name,ncol = 3) + theme_economist_white() +
labs(x="Correct Round (if correct)",y="")