Hey! So it’s been a long time since I’ve posted here (college, now work, etc.). I’ve decided to start volunteering at events, though, so I’ll probably be a bit more active again.
Today, I was thinking about the match scores over the season, and using data from VEXdb (and later a private dataset from Jordan), I examined the number of matches a team competes in over the season, as well as the average match scores over time.
The most obvious question to answer is: how many teams have competed this year? The datasets I have reveal that 7508 unique teams have participated in at least one competition match at a regular tournament (not counting leagues, scrimmages, or skills-only events). It turns out that 8 matches is the most common number of matches any given team has participated in, but this sharply drops off on either side:
(This might remind you of Zipf’s Law or a power distribution — for many datasets, the frequency of an item is inversely proportional to it’s “rank” if you were to sort by the number of times it occurs. Interestingly enough, this appears to be followed by this data set!)
For the fun of knowing, the top ten teams in terms of “number of official tournament matches competed in this season” are:
86868R with 139 matches
8675A with 122 matches
3671B with 119 matches
6741A with 118 matches
6842Z with 117 matches
1045A with 111 matches
1970K with 108 matches
10300X with 108 matches
1375A with 107 matches
1375C with 105 matches
These high match numbers, though, are very rare — to see just how uncommon this is, I produced the following two plots (the second is just a zoomed-in version of the first). They show the percentage of teams with less than (or equal to) x number of matches — by the time a team has completed 40 matches in a season, they’re already in the top 10% of teams!
Finally, I plotted the average match score over time. The two lines show the average winning and losing score (red/blue, respectively) over all tournaments on a particular date, and then I smoothed the results with a rolling average over three adjacent tournament days. The shaded region is a band 2 standard deviations wide (+/- sigma), so it’s pretty clear that there’s a ton of noise in these scores.
I showed this around on the VEX Discord, and it was mentioned that a lot of teams start competing in January, which might explain the sudden drop in average score. So, I decided to only consider matches consisting of entirely “hardcore” teams — which I’ve defined to be teams that have participated in more than 40 matches. (Again, from above, this is the top 10% of teams.)
Only considering “hardcore” teams, we see that the average match score continues to go up throughout late January. It is curious that the margin between the average winning/losing score is nearly a constant 35 points over time. One possibility is that this is due (in part) to the autonomous bonus.
Data I’d love to have: I’d love to see more detailed data breakdowns like there were in prior years (e.g. Gateway/RoundUp) with per-goal scores. This would allow for some “deep dive” analysis into specific strategies, and might even open the door towards prediction of match outcomes based on prior performance.
Anyway — this was kind-of a brain dump of some stuff I was thinking about all day. Any thoughts on the data I’m considering? Do you have requests for other plots or me to investigate some other metric?