Platinum Document | Knowing Where The Robot and Other Objects Are

I am currently in the process of developing what is known as the “Platinum Document.” This document will be posted later on over time once the document is polished enough.

In this screenshot, the camera knows where it is in 3D space, and also where the turning point elements are in 3D space by integrating machine learning with matrix mathematics. Because I am lazy, for now the machine learning is using a pre-trained library of YOLOv5 for the sake of testing. In blue of the image you can see that the tracked balls are the x, y, and z of the ball in world space. That is, whereever the camera is in 3D space, as long as it sees the balls the X, Y, and Z of the balls would be seen as static unless the balls themselves move.

As for being able to accomplish what is being done, I may release snippets of code however I don’t believe I am willing to release the github code (explained later). It is highly likely that this system will be a staple of VEXU position tracking due to its capability of permanence and relying on the environment instead of GPS field strips (we do not wish to flip a coin and hope that the field was setup correctly for the GPS system). Therefore we plan to provide steps on how to create a similar system that guides teams on how we approached this system.

The use case is to put the camera on the V5 robot and the V5 robot is capable of knowing where it is, and also where the objects are, without any additional idler odometry wheels.

When complete, the document will include:

  1. Pose estimation that has permanence, at a rate of ~30-60Hz, depending on the computational device. (X, Y, Z, pitch, yaw, roll in space)
  2. X, Y, Z object tracking relative to the camera (in meters)
  3. Object detection

When complete, the document will not include:

  1. X, Y, Z of tracked objects in world space (as shown in the picture). This is done in my own project with a large amount of matrix math and therefore is not something I can simply add, and neither am I comfortable sharing the raw code over github and flat-out giving code to individuals who don’t know how it works.

In development:

  1. Object permanence. This is the capability of identifying an object and giving it a unique identifier by recognizing where it is in 3D space. If the camera does not see the object by turning away from it, the camera can still keep note that the object still exists in 3D space despite not actually seeing it. This allows the robot to drive around and keep note and drive to objects across the field without actively needing to see it all the time. And if the camera looks back and sees the object disappear, then it removes the object from its index.
  2. Adequate communication library. I have already established a line of communication with the V5 Brain, however I need to establish communication in a manner that is modular and allows additional lines to be established over a single port.

The document will be released as a reply below once the document is polished enough. It ultimately adds many new features and also removes software (such as Docker) that are no longer a requirement to use.

12 Likes

This seems very interesting and I just wanted to clarify a couple of the technical aspects of how this all works. It seems from your post that you already have the technical aspects working and that you just need to finish the documentation, but please correct me if I’m wrong.

First of all, how does your camera know where it is in 3d space? Like how does it determine where on a vex field it is? Does it just track the x,y, and z coordinates of objects on the field and assume they are not moving and that the camera is moving relative to those objects? Or is it some other method of determining where the camera is?

What device is running the AI? Is it a laptop/desktop or is it an edge device like a Jetson Nano or Raspberry Pi? I did a research project last semester on making models smaller and deploying them on to smaller devices like Jetsons and it was harder than I expected the first time.

What camera are you using and what is it’s frame rate and image size? I know YOLO works well for live feeds, so how often are you sampling in your current model?

1 Like

Correct.
Currently the system is tested on a laptop with Ubuntu 20.04 (same version I plan to install onto the Jetson Nano, which officially supports 18.04 but community supports 20.04), however I need to verify that it works for the Jetson Nano and add installation instructions for 20.04 version.

It does vector-based visual odometry. A good examppe of vector-based visual odometry is ORB_SLAM3

I plan to allow you to set the x y z and yaw of the robot and it’s up to the user to line the robot up to match the field, similar to wheel odometry. However that would be a good idea to use the D435i to identify the location of the game elements at the start of the match to determine the starting coordinates.

Tested and works properly with a laptop with a gpu on it, and currently working on testing with Jetson Nano at the moment.

Honestly with my laptop the frame rate is bottlenecked by the D435i’s 30 frames/second cap. It would probably be higher if the D435i allows it. YOLOv5n is the most lightweight version of YOLOv5 available and it runs really quickly.

1 Like

Additional details will be answered in the document.