Full AprilTag pose?

Is there any plan on the development roadmap to extract full pose (i.e., 3DOF position, 3DOF orientation) information from Apriltags recognized by the AI Vision sensor? At present it only gives boresight (roll) angle which is arguably the least useful orientation quantity and does not provide range.

Failing that, is there any way to programatically access the whole image coming out of the AI Vision Sensor, in order to implement a ‘standardized’ AprilTag decoder?

I had left that as an exercise for user code, there’s some thought it may get added to the SDK at some point in the future, but no timeline.

Nope, we do use the standard apriltag C code inside the AiVision sensor, there’s not enough bandwidth to the V5 to send the full res image, what we display in the AiVision dashboard is sub-sampled and very low framerate.

Thanks for the reply. That said it’s not really clear though how you would reconstruct the rest of the pose. There’s just very little information on the returns from the AVS. You could probably try to reconstruct depth with some calibration and the size fields. But there’s nothing there you can use for the orientation.

Using the data for the tag corners, it’s possible to reconstruct the homography matrix the apriltag pose estimation code needs. In addition to this you need the focal length of the lens and physical size of the tag, then pose can be estimated. There will be some error due to the fact that coordinate system is only 320x240 and we have rounded corner coordinates down to integers.

What’s your application ? Competition is not using apriltags.

Unless there are some undocumented properties, I don’t think the corners are available. Looks like the object report: Origin (x,y), Height, Width, Center (x,y), ID, and angle. If we had all the corners, you could reconstruct the homography (as you say), but there’s no skew information in the returns as far as I can see. If I’m misunderstanding the properties of the object returns, please let me know.

I teach an undergrad class in robotics (in the coming semester). We’ve adopted Vex EXP this year after a long time using LEGO Mindstorms. We’ll be doing some mobile robot navigation and wanted to understand capabilities and limitations of the AVS.

The education team may not have documented this for api.vex.com, but the tag corners are available, there is a “tag” property.
for C++

          /**
           * @brief The raw coordinates of an apriltag.
           */
          const tagcoords &tag;

where

          class tagcoords {
            public:
              int16_t   x[4];
              int16_t   y[4];
          };

Something similar if you are using Python.

might be used in this way

void
calc_pose( aivision::object &obj ) {
    apriltag_detection_t  mydet;

    mydet.c[0]    = obj.centerX;
    mydet.c[1]    = obj.centerY;
    mydet.p[0][0] = obj.tag.x[0]; 
    mydet.p[1][0] = obj.tag.x[1]; 
    mydet.p[2][0] = obj.tag.x[2]; 
    mydet.p[3][0] = obj.tag.x[3]; 

    mydet.p[0][1] = obj.tag.y[0];
    mydet.p[1][1] = obj.tag.y[1];
    mydet.p[2][1] = obj.tag.y[2];
    mydet.p[3][1] = obj.tag.y[3];

    mydet.H = matd_create(3, 3);

    if( ! calc_homography_matric(&mydet) ) {
        matd_destroy(mydet.H);
        return;
    }

    // more code ...

I had some proof of concept code for EXP, can share at some point if needed. We have still not completely decided the final approach for EXP, whether to add to the aivision sensor firmware or add to the EXP SDK, each has pros and cons. VEX’s primary interest in this area at the moment is possible addition for the CTE workcell system using aivision sensor in conjunction with the 6-axis arm.

Thanks for the details. I will investigate.

So picking up on this thread…

I’ve verified that the tag property exists (and is not documented in the API). Thanks for identifying that information.

Part of my (self-inflicted) pain is that I’m trying to get a solution that will work in Python. This was judged to be most accessible for the students in the course. The extant UMich AprilTag pose code is implemented in C and relies on an SVD solver (from matd.c). If I were writing a c-project, I think I could just compile everything together. But I don’t think the MicroPython implemented on the brain supports making calls to c-libraries.

The Python AprilTag code that I’ve found tends to just provide wrappers for the underlying c-functions, not an direct re-implementation of the code itself.

Perhaps I’m overlooking something.

yea, that’s a problem, even if the C code could be ported to native python, not sure the EXP would have enough resources to run it. It would also be slow.

If we implement for EXP, there would be a simple get_pose() call or something similar, but there is no plan to release anything in the near future.

Hi, I am working pose estimation with apriltags fused with odometry and im looking for a way to either measure the latency of a detection or timestamp it. What would you suggest?

That’s an interesting question. I’m not sure that there is anything in the API that would let you do that directly. I don’t see any functions to register callbacks with the AIVS, and I don’t see anything in the class itself that would give you timestamps.

You could poll the device in a separate thread running on it’s own timer. That would at least let you bound the latency, but frankly I don’t even know the framerate of the camera. Maybe @jpearman can offer some insight.

FWIW, I’ve had success stubbing out part of the apriltag library to get the pose calculation functionality without the tag detection logic. If you’re interested, I can probably point you in the right direction. We calibrate the camera intrinsics external to the Vex program and then automatically generate a c++ declaration, that we can add to the build.

I’ve also already done camera calibration. I currently am planning on just forwarding the data through serial to my laptop and impliment something in OpenCV first.

If I am reading this right, I am doing a similar thing. I am just going to have a dictionary with the key being the AprilTag ID and the value being an x and a z for positional data. Unfortunately, I am just going to have to hardcore the x and z when I make the dictionary, but that will be as simple as identifying where the middles of the goals are and going from there.

There will be updates to V5 and EXP vexos and the VEXcode SDKs (both C++ and Python) this summer that will include the ability to calculate pose based on an apriltag object and known tag size. There will also be an update to the aivision sensor that somewhat improves apriltag detection frame rate.

Any plan to add timestamps to frames?

There’s no easy way to add anything like that with the current way the aivision sensor and V5 communicate. What you really want to know is what was the absolute time for the image that the apriltag objects were calculated from (all the object data is created on the aivision sensor and sent over to the V5). The V5 and aivision sensor do not have any coordinated clocks, the packet structure for sending the calculated apriltag object data has no room for any additional information, so it’s not really possible without a significant amount of redesign.

My students will be very excited to hear this!

They’ve been working on understanding the math behind this process (“Perspective-n-point” has been said in our room more times this week than I can count) and finding both the challenge of yaw recovery and the way that framerate limits some of their desired uses.

Overall though this season has started with much more excitement than other recent years because of the heavy coding interests of our students so thanks for all you’re doing to push this through.

Great to hear about the incoming updates. I have few questions:

  1. What is the current frame rate from the AIVS?
  2. Is the plan to put the pose estimation on the AIVS? Or build that into the Brain?
  3. The API call to the AIVS is takeSnapshot(). Does that actually trigger an exposure, or does that just return the most recent result from the data streaming from the AIVS?
  4. If timestamps are unlikely, is it possible to trigger callbacks from incoming AIVS data?

I am also curious: What is the calibration for the AI Vision Sensor camera? I’m not an expert, but I read that the camera matters for this kind of calculation (focal width/length, lens warp)?

The AIVS will appear as a general webcam to Windows when you plug it in. So you can probably use any convenient data acquisition and camera calibration workflow. We use Matlab, but I’m sure there are many free tools out there as well. Put together some calibration targets (typically checker-boards or grids of AprilTags), gather a bunch of images from different poses, and then calculate the intrinsics. We’ve had reasonable results using a relatively simple model (principal point, x/y focal distance, r^2 radial distortion). You need to do a little scaling because the AIVS reads 640x480, but when connected to the brain, effectively uses 320x240 when reporting the detected tag coordinates.