[API] An alternative way for RobotEvents data!


#1

Hey everyone,

So I’m sure we all know how slow Robot Events can be sometimes if you’re attempting to get lots of data, as well as the fact that it’s often hard to get the information you’re looking for when doing statistics / analysis.

Anyway, without any further explaining, I’m pleased to announce my latest creation for people to use, an (almost) Robot Events API. :smiley:

It works by scraping data from Robot Events periodically (through HTML and the CSV files), keeping all of it in a database, and creating an API that accesses that database. So know that this isn’t directly interfacing with Robot Events, but does enable quite a lot more nice information to be gathered.

Documentation (with examples) is available through the link posted above, and feel free to ask any questions in the thread. :slight_smile:

There are a couple of things to note:

  • At the moment the database is only updated every 2 minutes, but I might bring it down to 1 minute depending on how things go.
  • The API is currently completely public but if demand is crazy I may implement a more closed system for those who want it (I’ll make an announcement if I do).

Also, other things that I’ll look at implementing soon:

  • “Last changed” times for events.
  • Push Notifications for changes.

Any other suggestions are completely welcome, whether you have another kind of ranking method you want implemented other than TRSP, or another way of filtering results, all are welcome! :slight_smile:

Also, on a kind of related note, the rest of that site that the API is hosted on (http://vex.us.nallen.me/) uses the same data as an example of how extra links (events by team, etc.) can be created and used. This site is still being worked on a bit (ie. the Team Info page is a bit lacking), and was started before the other (better) scouting databases were created :stuck_out_tongue: so it might not be terribly useful, but oh well!

I hope that at least someone finds this API useful and enables a lot more great websites that do statistics or analysis on teams / games / events / whatever!

Have fun!


#2

This is very nice. Thank you.


#3

You beat me to it! :slight_smile:
This is very well done, and I can testify how incredibly useful it is to have the data so well organized. If I had had access to this when I started by site, that would have saved me A LOT of time. I will still try to finish my API for people that want my estimated contributions, but it is great that this data is available now as well. Thanks again for making this public!


#4

Is there something in the database that shows middle school vs high school?


#5

In VRC, there is no real distinction between HS and MS, so I assume not.

Nathan’s work is incredible. The update times are even scarier xD


#6

Wow, this is sweet. It is so well laid out, beautifully designed, and EXTREMELY fast.

Will you be adding: Average Score, Average Points Won By, etc. to your site? That would be extremely useful.

Also, for Worlds. Will you be creating 5 separate divisions so we can just see the teams in our division? It would then be nice to sort that division by “Highest Scorer” etc. Is that possible?

Again, awesome job. This is breathtaking (for about 5 sec, then I breathe again).


#7

What’s the data store housing this? Is it Apache’s Couch DB?

Nice JSON on the way out, just wondering fi you stored it that way too.


#8

I assume it’s just MySQL. JSON would probably be the only standard way to store it. I know that I do for my database.


#9

The API part was on my list of things to do from about 4 months ago but uni, work, etc. delayed that and I only got around to start it a couple of days ago :(. The amount of extra information you have on your site is impressive so it will be interesting to see what you come up with. :slight_smile:

As Ruiqi said, Robot Events has no (at least public) distinction as they’re all just VRC teams. I assume it’s up to the event itself to check people are / are not HS or MS? (I may be wrong :p)

Are you referring to the website alongside it? Like I said in the original post that part’s still not completely done, but basically yes :stuck_out_tongue: Those are things I’ll look at adding.

Again, Ruiqi is correct. :stuck_out_tongue: All of the data is stored in a MySQL database after scraping from Robot Events, and JSON is only used as a means to transfer the requested data from the server to the user.


#10

Thank you! This is exactly what I needed, before this the best thing I could think of was manually importing robotevents data into my mysql database haha


#11

Thanks for this awesome resource!:smiley:

My only suggestion is that, if possible, include the team’s actual skills scores on their page rather than/in addition to their ranking. The less intelligent of the population, such as myself, can find themselves staring at the number in confusion for ten minutes before finally reading it and catching on.:stuck_out_tongue:
(If unclear, I totally did this for like ten minutes:rolleyes:)

Thanks!


#12

When you register a team you have to indicate whether it is a high school or middle school team. As an event partner I can access that information for my state and the Tournament Manager software used to run competitions knows as well. If you look at the skills runs list during an event it will indicate HS or MS. I cannot find any place on RobotEvents.com, however, that indicates HS vs. MS. Some teams could be determined by looking at teams registered for middle school only events, but I don’t think there are that many of those throughout the season.

Thanks for putting together the API. I had abandoned my OPR calculations that I made available a year ago because I didn’t want to deal with scraping the data or processing the download files. With the data easily accessible now I may get back to the OPR calculations.

Jay


#13

Like I said in the original post, the website that goes alongside the API is not yet really finished :p. Things like displaying skills scores per-team (you’ll notice the tab is already there) are coming, when I get time to add them

Thanks for the information! I’m in contact with someone from IFI so this might be added soon. :slight_smile:

No problem, this was the whole point of the API, so that people could make projects that they thought were a bit too daunting or would take a bit too much time.


#14

This is now active! :slight_smile: When getting data about teams there is a new field called “grade” which tells you whether a team is High School or Middle School. The API docs have been updated for this change. It uses the grade of the team as of the most recent event they attended (or are going to attend). So for teams that have not attended events this will default to High School.

In other news, I’ve been talking to someone from IFI (he’s been REALLY helpful with extra information! :D) and the timings of scraping data have been worked on so as to not put crazy loads / requests on their server, whilst still keeping the database up to date:

  • Data of Current Events will update every 2 minutes while the event is running (this isn’t going to become more often, IFI already made this possible, down from 5 minutes).
  • Team Lists will update every 6 Hours from when they are available until the event ends.
  • Event Information will be updated 24-hourly until the event is over.
  • New events / teams will be checked for and added 6-hourly.

I already know of a couple of projects using this information, hopefully more to come! :slight_smile:


#15

Any clue what is up with this?

http://vex.us.nallen.me/skills/Round_Up/Robot

This doesn’t seem legit to me. Can you remove that competition from your database?


#16

Can you share how your system is able to more efficiently check data? My server is also tasked with constantly accessing data robotevents data, so if there is a way to check what events have updated data, that would be very helpful.


#17

Thanks for the fantastic resource! I will be sure to use this in preparation for the Open and Worlds :smiley:


#18

Done.

There’s no way to check what events have updated data, it’s just more efficient in terms of not sending a huge number of requests to RobotEvents, only updating events that it’s reasonable to assume will have changed, etc.

You’re welcome!


#19

First of all I think this is great. If you can implement update dates/times, that would be really helpful for me. And push notifications would be awesome to have sometime in the future. One problem, I notice that when I do a “get_teams” query, it limits the results to 5000. Similarly, “get_matches” and “get_rankings” are limited to 10000. Those two aren’t as much of a problem, as I plan on only updating those per event, separately. However, teams would be updated every day/week or so, all at once. Right now I plan on using a link I found awhile back that directly accesses the Robot Events database.

Just wanted to thank you for putting time into this, I think it will help a lot of people.

~Jordan


#20

So this is one of the many things that I’ve added / worked on over the last little while and I’ll explain below. :stuck_out_tongue:

Things that have been added / changed:

  • Ability to limit teams search by region / city has been added.
  • All requests that have a parameter for season now have an extra valid option called “current” which will limit to the current season. :smiley:
  • The accompanying web page has been converted to use the API for its information (as proof it can be used :wink: ).
  • Graphs have been added to show some information about how the API is being used (the graphs might look a bit weird while the data set is small). :stuck_out_tongue:
  • Ability to get total number of results (without the data) has been added for all types of requests through the “nodata” parameter.
  • Results are now limited in size by default. This had to happen at some point, otherwise the arrays being generated and encoded that are very large in size, contributing to excessive RAM usage (and often then errors). I felt this was the best option as very large data pulls should be relatively rare. The API Docs have been updated to reflect this and have an example about how to do a very large request on them.

Basically, if you still want to do a massive grab of information you’ll have to do it bit by bit (amount of iterations will vary based on the number of results):

  1. Do a request with parameter “nodata” set to true to get the total results.
  2. Do a request for the results including data, recording all the results you get.
  3. If the size of your results is less than the total expected - re-perform Step 2 with parameter “limit_start” set to the size of your results.

Sorry for any inconvenience this causes (even though it is only a few more lines of code) to anyone but it’s the only way I can reliably guarantee that things won’t break as more and more events are known. :frowning: