Most Competitive Region

What region/regions do you guys think are the most competitive?

This year IMO the most competitive regions were Hawaii, BC, Texas, South Cali, and Wisconsin. Also New Zealand has some great teams again this year. The rest of their teams seemed down. I forgot Utah and Puetro Rico. Utah has the top two in robot skills.

I think Utah is objectively the most competitive this season.

Based on videos and scores I have seen this year:
-Utah
-Ontario
-British Columbia
-Texas
-Puerto Rico

IMO Utah and NZ are the most competitive regions this year.

No one has mentioned Singapore?

For middle school, I think it’s China, Singapore, China, Singapore, China, China, and Hawaii. Must be something about the plate tectonics. Oh, and did I remember to mention China and Singapore and China? Singapore should probably be included in that list, too. Along with China.

Why hello random stranger.

I too think that this years most competitive regions are Utah and NZ based on footage. Of course there are always the classics, Ontario, China, and California. :slight_smile:

You might want to add China to that list ;).

I agree with New Zealand, Texas, Utah, Puetro Rico, and British Columbia, since I’ve seen videos on Youtube and results on Robot Events, and … yeah, they’re pretty good :slight_smile:

My weekend project - analyze the data out of Nate’s API. I may throw all this in a couchbase instance I have on my PC. It’s been a problem I have been looking at for a few months. Json to json should make it easy. Maybe I’ll install mongo DB and try that.

http://vex.us.nallen.me/the_data

The data’s all there guys. Time to moneyball this question and use statistics and data analytics. Posture all you want guys, but data does not lie.

The question is - what stats to use??? Region with most awards, region with most championships (Hai Sing and the Shanghai teenage center will show through there I think), region with the most division finalists, or the region with the best OPR? This is where you should be debating.

To keep the data set small, I am going to use world championship data that is on the API and deal with awards first, then champs. We’ll see how long that takes.

If we care about last year’s results, they’re ranked by qualification match wins/losses here:
http://vex.us.nallen.me/extras/worlds_regions?RE-VRC-14-0110=1

The best High School region was British Colombia, with 10 teams going 67-28-5 overall.

The best Vex U regions were New York and Veracruz, each with two teams going 18-2-0 overall.

Middle School didn’t have a clear leader.

I think what people want to know though is what the strongest regions will be this year, which is a trickier question. For different regions within the US it should be possible (at least in principle, with enough data) to come up with an answer based on tournament wins and losses, because there’s some mixing between the regions. For regions that are disconnected all the information we have is skills scores, OPR and videos.

How was your trip to Bahrain? My kids and I must have had a really good time there because none of us can remember a thing about it. :smiley:

This is where stats can be tricky. Looks to me France was the strongest region at 60% win rate!

Low sample sizes throws big old monkey wrenches into normal distribution type statistics. That’s why I want to go longer across other periods as well as look at other elements like just semi-finals and above per division, award winners, skills views, etc.

The trouble with using over the year data is how to match up teams improving over the year or redesigning their robot? I bet if we moved this tournament to Las Vegas we could get some good odds makers to help decide how to rank teams. :smiley:

Vex seems kind of like the NCAA tournament where you play hard to get in and there so you have accomplished a lot already just to get there. But there are “Cinderella teams” that knock off a Goliath taking them out of the mix. We are going to Louisville so we will see plenty of dejected Cardinal fans who just lost in the NCAA tourney and got knocked out. Earlier rounds had upsets like my local represented college Villanova as the number 1 seed getting knocked out by the 8 seed in their bracket.

You’re right that there’s no single correct method of ranking teams based on win-loss-tie record if they have different numbers of games, but I think by any reasonable method Canada’s 142-97-11 beats France’s 6-4. The observed win rates are close, and there isn’t enough evidence to say that one country was definitely better than the other, but there is enough evidence to say that Canada’s expected win rate was above 50% while the same can’t be said of France.

I don’t think it’s very interesting to track which country had the highest recorded win%, because it will nearly always be a country with very few teams that had good luck on they day(s). I think it’s much more interesting to treat the results as measurements of each country’s probability of winning a random qualification match, and under that approach BC was the strongest region in High School in 2014.

I agree, Utah and NZ are probably the most competitive this year.

I don’t think statistically there is a proper method, I’ve looked into trying to get a good method between regions (or even events within a region) but the issue is that you can’t really determine whether, for example, an event had a lot of weak teams, or good teams that were equally offset by defensive teams that made them appear “weaker”.

Because of this things like OPR and DPR don’t really transfer across events… There might be some correlation across events but it likely won’t be completely reliable.

It’s totally Texas guys, check out vex db and see which region has the most different teams that have broken the 100 point barrier. Also Texas has the highest combined match score by a “huge” margin;).

BTW I am totally unbiased in this opinion.:smiley:

Your highest combined only bested ours by 3 points, that’s not that huge or a margin lol.

Part 1 of the moneyball saga…

I have some progress on loading to Mongo DB this but I am at a tough part as Mongo is not a relational DB so i need to figure out how to denormalize this data.

I created a Mongo DB for all this as the source data was in JSON so I thought that would be easy. (relatively easy but not drag and drop easy). It’s version 3 since that is the latest on the site and is running on windows 8.1 on my PC here (with no external access as of yet since it is a local instance)

I then downloaded the team lists from 2012-2014 of the world championship events, the awards, and rankings. These are all in JSON format from Nate’s site. Matches will require some homework on my part to get the full sets from all three years.

In Mongo I created collections for teams, awards, and rankings for now.

Then comes the loading part. The JSON coming out of Nate’s API is apparently not directly importable into Mongo for two reasons. first is the collection of the results which is easy enough to trim off in a text editor, but the next is Mongo import wants each JSON document/object on its own line.

So I had to massively replace the end of record items with a replacement of the comma with a CR/LF to make a new line. A sed script would have been my tool of choice but editors come to mind. I thought I’ll finally get eclipse running here. That was a fail. So I ran home to mama and downloaded emacs like I do in every situation like this over the years. You always revert back to the tools you know and love.

A quick query-replace on the files and I was in business. Do a save-as and I have these nice JSON files ready for import.

So on the teams, I wanted one record per team no matter what year you were in. So I did two things. First was a unique index on the element number in the teams collection. Second was the method of import to the list so I don’t get duplicates.

I used mongoimport as the means of insertion into the collection. It worked!


R:\data>..\bin\mongoimport.exe --db local --upsertFields number -collection teams < 2012vrcms.json
2015-04-05T08:38:26.164-0400    connected to: localhost
2015-04-05T08:38:26.219-0400    imported 150 documents

Now to find some aggregation on the awards data. Simple sums of awards records will do for now.


db.awards.aggregate(
{$group:{_id: "$team",total_awards:{$sum: 1}}},
{ $sort: { total_awards: -1 } }
])

Here’s the output for 2014 awards (notice that top result, we’ll have to deal with that later):

So that leads to stumper number one - how to get rid of the blank and names in the team field? This is what we call data quality. Teams has a few records as well with country missing or state being not nice for teams in China, New Zealand, and other places.

Next stumper is adding sub-documents of awards data and match data to the teams document. Mongo is not relational so I can’t simply say where the team.number is equal to awards.team. I now have to add the awards as sub-documents to team (or the other way round to add team region info to awards but I want more things back to team) There are some fancy ways of doing this but it essentially is going to make one a sub of the other in a new data set.

OK, so where am I going with this? We can look at success metrics of teams via awards and progressing in matches to later rounds (match.round values of 3, 4, or 5 with instance 1 will get you those teams that have moved on and how far per SKU). We’ll call these accolades and that is what I think we will measure.

We can then look at aggregate numbers of accolades and rank order the regions. That will tell me the regions that are historically successful on the world stage. That’s a very nice metric we can debate about.

But does it tell me powerhouse relative to all? Some regions have larger supply sets than others so they will more likely produce more accolades in theory. So the percentage of teams in a region with an accolade seems like another metric to try out. But that will fall down for small regions. If a one team region gets an award and makes it through, they will be the max accolades in relation to their peers (like France when you measure just win percentage)

Also, do we weight the accolades and do a weighted score? Does an award with a red trophy count as more than a black banded trophy?

We can also count the teams with multiple accolades over the years but teams get reused and it’s not the same team members per team number year to year.

Grouping by the team number’s numeric part to get to club level (again data quality makes organization hard to aggregate on).

Any thoughts and contributions would be helpful. I am currently pondering how to take the awards collection data to make sub-documents on the teams collection. Next will be match and rankings.

But boy this is taking a lot longer than I thought…

Money ball part 2 - still stuck but working through it…

The awards data has some data quality issues to deal with. Not all awards are given to teams and it seems teams have some missing elements like country or region at times.

For the awards, we can select the subset of awards given to a team by making a new collections awardsout which we will use from now on. the teams missing data will have to be manually dealt with later.


db.awards.aggregate({$match:{team: {$regex: /^[1-9]/}}},{$out:"awardout"})

Now we need to loop through all the awards and add them as sub documents to the teams as accolades of various sorts. We’ll do the same with getting to a qualification round at worlds and possibly skills scores too.

But this is where I get stuck. I can;t seem to add more than one award into the teams as subdocs. Anyone know Mongo DB?


db.awardout.find().forEach(function(award_team){
if(award_team != null)
{
db.teams.findAndModify(
{
   query: {number: award_team.team},
   update: { $set: {accolades:  {sku: award_team.sku, acc_category: "Award", acc_name: award_team.name }] } },
   upsert: true,
   });
   }
   });

Here are the awards records…


/* 18 */
{
  "_id" : ObjectId("552077289ad9c19e809965de"),
  "sku" : "RE-VRC-14-0110",
  "name" : "Tournament Finalists (VRC/VEXU)",
  "team" : "44",
  "qualifies" : ],
  "order" : "1"
}
/* 35 */
{
  "_id" : ObjectId("552077289ad9c19e809965ef"),
  "sku" : "RE-VRC-14-0110",
  "name" : "Robot Skills Winner (VRC/VEXU)",
  "team" : "44",
  "qualifies" : ],
  "order" : "4"
}

But it only inserts one subdocument into teams.


{
  "_id" : ObjectId("552071f69ad9c19e80996432"),
  "program" : "VRC",
  "robot_name" : "Fred VII",
  "organisation" : "Green Egg Robotics",
  "region" : "Massachusetts",
  "country" : "United States",
  "number" : "44",
  "team_name" : "Green Egg Robotics",
  "city" : "Oakham",
  "grade" : "High School",
  "is_registered" : "0",
  "accolades" : {
      "sku" : "RE-VRC-14-0110",
      "acc_category" : "Award",
      "acc_name" : "Think Award (VRC/VEXU)"
    }]
}

Where am I going with this? We create accolades into a team and then we can aggregate and score different accolades across regions and countries. Count teams with total teams per region and get percentages of teams with accolades of different types filtered in here.

Now if I can only get multiple accolades ETL’d into the teams document.