In larger tournaments with fewer qualifying matches per team, the impact of the random pairings can be pretty huge.
If you get paired with 6 great teams, you’re golden. If you get paired with 6 bad teams, there’s no way you’re making it into the finals.
The average score mechanism is fair, but subject to noise based on the impact of these random pairings.
The result: Sometimes, very bad teams make it into the finals (our team got lucky in Tournament 1 and ended up in the finals, even though our robot barely contributed any points in our Q matches—we were carried by other teams). Other times, excellent teams don’t make it into the finals due to bad Q pairings.
To see this most clearly, look at Tournament 3 here and the fate of 5125Y (not my team). They got the highest scores in Qualifying matches (107), and they won the Solo Skills with 82 diver points solo, but they didn’t make it to the finals due to bad pairings:
https://www.robotevents.com/robot-competitions/vex-iq-challenge/RE-VIQC-19-9862.html
So here’s a team that can contribute 82 points on their own, not in the finals? If the tournament is supposed to help us identify the very best teams, this is a huge problem.
How can we fix this?
What about other metrics for picking finalists? What about the teams with the best best scores? Looking at Tournament 3, out of the 10 teams with the best single Q match scores, only 5 of them made it into the finals. So here are a bunch of top-scoring alliances, that showed us they were capable of high scores, and they weren’t given a shot at the finals.
In fact, one if the alliances in the finals only got 40 points. How did that make team 5125Y feel? The team that could get 82 points as a solo robot?
Or what about another metric, the teams with the highest worst scores? Or maybe, to avoid the impact of one very unlucky match, the highest second-worst score?
Both of these metrics (best worst and best best) also involve pairing luck, but maybe a bit less. I guess it’s always easier to detect the real dogs than the real winners (real dogs will have low worst scores and low best scores… but we’re not sure about real winners).
Some kind of meta score is possible too, where you try to assess each team’s impact on the matches they participated in. Each team can have a rolling average, and the we try to see whether this other team pushed you above your own average, or below it. Did they help you or hurt you? Did you help them or hurt them? This would be an Elo-like way for scores to move up or down. The moving average could keep being adjusted throughout the Q matches, ans the rankings recomputed as more information comes in (so we keep adjusting the impact of past matches based on this new information).
The best teams would help everyone they played with to score above their own average. The worst teams would push everyone they played with to score below their own average.
The down-side to more complicated formulas is that it turns it into more of a “black box”, but I’m not sure people are really digging into the average on the tournament floor anyway. Elo is more complex for Chess and such, but no one seems to care that they don’t understand the underlying math.
Essentially, for each match, we can ask, “How surprising is this result?” If the result is surprising in either direction, the team’s score should change in that direction.
Currently, we’re doing that with the average, but we’re only factoring in how surprising the result is based on this team’s past performance, and not the other team’s past performance. Did you get a bad score because you did poorly (big negative surprise, score should go down a lot), or because you were paired with a team that always gets bad scores? (no actual surprise, score should barely go down)
A simple example:
Your average is 90 points.
You get paired with a team that has an average of 30 points.
Together, you score 70 points.
Wow, you raised them up 40 above their own average. +40 for you.
Ouch, they pulled you down 20 below your own average. -20 for them.
This of course makes playing with bad teams really valuable, and two good teams playing together mostly worthless (if we both have an average of 90, and we work together and get 90, do we each get 0 points).
It seems like any fixed, predetermined pairing might be susceptible to this, which leads us to the next idea: dynamic pairing during qualification. I haven’t done the math, but my hunch is that valuable information would come out of always pairing the best teams (so far) with the worst teams. Maybe even a regular average would work if you did this. Pair up randomly, and each pair plays one match. For the second match, pair up best with worst. Play another match. Repeat.