Random Joystick Detection in PROS

Hey all,

I’ve been playing around with PROS for the past few weeks, and it seems that the Cortex has been randomly detecting phantom button presses - all hands off joystick and LCD screen. The following code is used for the joystick presses… What button triggers, variable names, and result changes from button to button, but this is what I’ve written for the joystick:

if (lcdReadButtons(uart1) == LCD_BTN_RIGHT && !lcdRightPressed)
{
	output = 127;
	lcdRightPressed = true;
}
else if (lcdReadButtons(uart1) != LCD_BTN_RIGHT) lcdRightPressed = false;

The following code is for the LCD Screen. Again the variable names, buttons to be detected, and affect on output changes from button to button, but otherwise the code is the same:

if (joystickGetDigital(1, 8, JOY_LEFT) && !joy8LPressed)
{
	output = -127;
	joy8LPressed = true;
}

So the issue is that the code seems to be randomly detecting button presses. I’ve ran this code in ROBOTC many times last year and have never run into any issues.

Any suggestions would be greatly appreciated!

*As I know somebody’s going to ask for the full code, here are links to the init.c and opcontrol.c (no other files have been changed)

They (PROS) are pretty much saturating the comms link to the LCD display, not something I would have done (well, I didn’t). Perhaps the poor little LCD is sending back corrupted data. I will run your code later and see if I can duplicate.

My best suggestion would be “use ConVEX” :slight_smile:

Haha, I just might have to check it out now. You can’t imagine my dilemma… Support Purdue (being from Indiana) or support jpearman (who, just by being jpearman, deserves a worthy consideration)

Thanks in advance for testing it

Even if it were sending corrupted data, doesn’t it have that verification code or something? I remember reading that somewhere.

Also, I love the shameless advertising for your own product :slight_smile:

Do you think it’s randomly detecting joystick or LCD button pushes. I though you meant LCD in my first reply but now I see you are using joystick as well. How often are you seeing this?

Edit:

Well, so far I se no issues, I left it on the “Test pneumatics” waiting for a digital output to change and trigger the scope, so far nothing.

What’s your setup? are you tethered or using VEXnet ?

Also, which version of PROS, I’m still using 2b06, did you upgrade?

Edit2:

So I ran with Wifi for a while, still cannot find anything wrong. Guess I won’t have a new customer after all :frowning:

I suggest you try the same code with ROBOTC and exactly the same hardware so as to eliminate the hardware from being the problem, or if it does the same then maybe the joystick (or LCD) is broken (or intermittent).

Edit3:

Ok, so it just detected something under the “motor” control mode, either right LCD or 8R joystick, don’t know which. It took a long time.

We have never had this problem, after nearly 12 hours of practice. The buttons don’t trigger without us pushing them.

We keep the LCD screen running (returning IME values), and used one joystick, if that means anything for your debugging. That’s across three robots, three different Cortexes and four or five controllers. I don’t think it’s a PROS problem.

EDIT: Do you mean the LCD buttons? I haven’t had a problem with those either, but we don’t really use them outside of Initialize.

I’ve now seen a second error, same button, joystick 8 Right (I changed the code so as to detect which case was causing the error). It took at least 30 minutes for this to happen, my “guess” at this time is a bad (corrupted) SPI message as I have seen that before in ConVEX. I’m going to port the code to ConVEX and see if I can reproduce there.

Ok, last post for tonight. I ported the code to ConVEX, very easy, almost line for line identical except for the difference in API. I ran this code for 60 minutes with no errors (not surprised but somewhat relieved about this). Setup was identical, joystick tethered to cortex and powered through the USB cable, cortex powered from my bench supply, the same setup I use for 95% of my testing.

I then went back to the PROS version, two more errors within 15 minutes of running, always joystick button 8 right.
(which now I think about it is not unexpected, you test for joystick right after everything else and that will always take precedence. What probably happens is that all bits are set to ‘1’ or something like that).

So my conclusion is that using this example code PROS has a bug when reading the joystick.

Other software may not experience this bug, in this code the buttons are latched, most student code does not do this and a brief 20mS blip in a joystick button may go unnoticed. The exact timing and delays in this code may be the cause, only Purdue can debug this further, it’s not part of my job description.

Your code is different, you may not use button 8R, you may check it more/less often. Bugs like this are very hard to find, it may the 20mS delay used, it may be the order that the joystick is read (left, down, right). I have some theories but PROS is a black box library that would take me too long to debug.

FYI, comparison of the PROS and ConVEX code (just a small part, with some extra stuff I added for debug)

PROS


<< snip >>

        switch (digitalRead(12))
        {
            //running motors mode
        case true:
            if (lcdReadButtons(uart1) == LCD_BTN_LEFT && !lcdLeftPressed)
            {
                output = -127;
                lcdLeftPressed = true;
                lcdSetText(uart1, 1, "LL");
            }
            else if (lcdReadButtons(uart1) != LCD_BTN_LEFT) lcdLeftPressed = false;

            if (lcdReadButtons(uart1) == LCD_BTN_CENTER && !lcdCenterPressed)
            {
                output = 0;
                lcdCenterPressed = false;
                lcdSetText(uart1, 1, "LC");
            }
            else if (lcdReadButtons(uart1) != LCD_BTN_CENTER) lcdCenterPressed = false;

            if (lcdReadButtons(uart1) == LCD_BTN_RIGHT && !lcdRightPressed)
            {
                output = 127;
                lcdRightPressed = true;
                lcdSetText(uart1, 1, "LR");
            }
            else if (lcdReadButtons(uart1) != LCD_BTN_RIGHT) lcdRightPressed = false;



            if (joystickGetDigital(1, 8, JOY_LEFT) && !joy8LPressed)
            {
                output = -127;
                joy8LPressed = true;
                lcdSetText(uart1, 1, "JL");
            }
            else if (!joystickGetDigital(1, 8, JOY_LEFT)) joy8LPressed = false;

<< snip >>

ConVEX


<< snip >>

        switch (vexDigitalPinGet(kVexDigital_12))
        {
            //running motors mode
        case true:
            if (vexLcdButtonGet(VEX_LCD_DISPLAY_1) == kLcdButtonLeft && !lcdLeftPressed)
            {
                output = -127;
                lcdLeftPressed = true;
                vexLcdSet( VEX_LCD_DISPLAY_1, VEX_LCD_LINE_1, "LL");
            }
            else if (vexLcdButtonGet(VEX_LCD_DISPLAY_1) != kLcdButtonLeft) lcdLeftPressed = false;

            if (vexLcdButtonGet(VEX_LCD_DISPLAY_1) == kLcdButtonCenter && !lcdCenterPressed)
            {
                output = 0;
                lcdCenterPressed = false;
                vexLcdSet( VEX_LCD_DISPLAY_1, VEX_LCD_LINE_1, "LC");
            }
            else if (vexLcdButtonGet(VEX_LCD_DISPLAY_1) != kLcdButtonCenter) lcdCenterPressed = false;

            if (vexLcdButtonGet(VEX_LCD_DISPLAY_1) == kLcdButtonRight && !lcdRightPressed)
            {
                output = 127;
                lcdRightPressed = true;
                vexLcdSet( VEX_LCD_DISPLAY_1, VEX_LCD_LINE_1, "LR");
            }
            else if (vexLcdButtonGet(VEX_LCD_DISPLAY_1) != kLcdButtonRight) lcdRightPressed = false;



            if (vexControllerGet( Btn8L ) && !joy8LPressed)
            {
                output = -127;
                joy8LPressed = true;
                vexLcdSet( VEX_LCD_DISPLAY_1, VEX_LCD_LINE_1, "JL");
            }
            else if (!vexControllerGet( Btn8L )) joy8LPressed = false;

<< snip >>

I was somewhat confident that it was the Joystick, as I don’t remember seeing the issue when the Cortex was tethered to the computer.

I have tested the code on the same hardware in ROBOTC, different Cortex and joysticks in PROS. I’ll double check to make sure that I’m updated 2b07.

As far as frequency, I’ve been seeing this every few minutes. I talked to my team mate, she says that it’s more likely to happen when we move around the Cortex and Joystick… perhaps it’s an issue with our USB connections. Coming from our club, that doesn’t surprise me at all. It was happening with all the hardware we tested, so I was a little bit apprehensive towards all of those joysticks and Cortexes being broken

I’ll try some extra testing tonight, reviewing everything you’ve said so far.

Thanks!

We have noticed a similar issue with incorrect joystick information at random when we were testing PROS, caused by single bit or byte errors in communication. However, it has proved difficult to impossible to debug due to its rare occurrence, about once every hour on average. On some code structures, it happens more often than others, and it also occurs more often on some VEX Cortex units than others. Adding almost any kind of debugging code or using external debugging tools changes or masks the issue entirely, pointing to a timing problem. As some of the critical communication timing is undocumented, we are completely unsure if we are handling it differently than other environments.

We have been investigating other, somewhat related issues over the last week for another reason and may already have developed a solution for the next release of PROS.

So you see errors on the master/user SPI communications ?

Compared to some problems, once an hour is luxury. I saw four occurrences in about an hour of running, that’s quite often.

That sound’s more like it’s a firmware issue then, if it’s a hardware timing problem the code shouldn’t effect it.

but that suggests hardware timing.

So using a scope of the SPI data and clock causes the issue to go away? That’s what I would do, then capture and trigger on the bad packets to try and isolate why. Many current scopes can analyze SPI data on the fly, they produce output a bit like this. (This is the IFI default code SPI startup sequence).

[ATTACH]7773[/ATTACH]

What about JTAG debugging, do you use that?

Sure, this is an issue, but I assume you did what I did and analyze what everyone else was doing before deciding on the best approach for your own firmware.

Perhaps you could share that with the community so we know what’s going on.

There are obviously workarounds for this type of problem, if you implement something traditional please let us know, it would be better to figure out the root cause (and hopefully share that information with everyone).

Edit:

I don’t have a cortex here at work but I did take a quick look at data I collected a few weeks ago.

As you know, the SPI communication consists of four groups of four words with a small delay between each group. Looks like perhaps you have 40uS-50uS less delay than everyone else (meaning ConVEX, ROBOTC, EasyC and the IFI default code). I do need to verify on the scope as I don’t have all the information here. How are you doing the delays between words and groups of four words?
spi_packets.jpg

Are you saying that adding a line of debug code, like


 printf("I'm debugging ");

will fix the issue?

Besides what jpearman said, it would be great to have a microblog (read: Twitter) with any sort of update news about PROS. Jpearman, you should do the same with ConVEX.

That’s sort of what he means.

Basically, with problems like these the issue is caused by timing being slightly off and things aligning perfectly to make the problem happen. This makes it extremely difficult to debug because you are often unsure if a “fix” worked because it’s difficult to reproduce the error consistently in the first place.

What he means is that adding a line of debug code typically changes the timing slightly (because printing something takes some time) so an error which was present before adding the line might not be experienced after adding the line. This makes it even more difficult to debug because you can’t debug the problem without affecting it (sort of like the Uncertainty principle.)

So, adding a debug statement doesn’t “fix” the issue, but it can make it so the issue is experienced less often or not at all. “Fixing” a problem like this by adding debug code is a kludge.

I doubt adding a line of code at the user level will change much, it may, but if the firmware is that sensitive to a change by the user then it’s a serious issue.

He probably means that adding debug code to their low level communication function changes the timing. I don’t know how PROS handles the various delays needed by the SPI code, EasyC (and I think the default IFI firmware) use delays created by the C code, just about the simplest (and worst but that’s debatable) way to do it. I use a hardware timer that wakes the SPI task (which needs to have high priority). It looks like perhaps PROS does more in the SPI interrupt code than perhaps they should but my guess at this point is that they have attempted to reduce the communications time a little too far to try and improve performance.

Normally you would use external equipment (ie. an oscilloscope or SPI bus monitor) to help debug these sorts of problems.

This all assumes that this is a bug in the SPI comms, it may be something else but I don’t see why that would be hard to fix. If you get good data from the master CPU, all the processing that follows is easy.

There are workarounds that you can implement in the user code but Purdue need to fix this at the low level otherwise, as Daniel said, it’s just a kludge.

Having never heard the term kludge until today, that’s pretty much what I was thinking. Was trying to portray a surprised tone; I heard there was some issue with transferring sarcasm over the Internet.

I had a look at the SPI timing for all the development options I have as well as the IFI default code. I’m not going to explain too much of this, it’s really for the Purdue guys. The communications consists of 4 groups each of 4 16 bit words with a small delay between each word and also each group.

This first composite scope picture shows the timing for the four groups, it’s a capture of the slave select signal, the IFI default code is shown at the top as a reference and then each flavor of firmware.

[ATTACH]7778[/ATTACH]

The second image is the same thing, just zoomed in a little. I show the delta between the end of the first group and the start of the second, I think PROS may be pushing the envelope a little.

[ATTACH]7777[/ATTACH]

I will leave it to the PROS team to decide what to make of this information.
Comp_1.jpg
Comp_2.jpg

We did attach a scope to the SPI lines on our STM32F103 test board and checked the timings. The results were similar to yours. However, leaving the scope running for hours triggering on bad packets was not an efficient use of tester time, due to the low frequency of occurrence on the Cortex units we have for testing. JTAG debugging revealed good information about the Cortex memory state to track the issue down to single bit/byte errors, but any kind of invasive debugging involving breakpoints, single stepping, or watches obviously disrupted the Cortex enough to hide the issue.

We recently had an issue with a faulty custom sensor driver with its interrupts mistakenly programmed at maximum priority, driving the SPI timings out of spec and causing a run away robot. While we were debugging this issue, we developed an accelerated test program involving two Cortex units running with a computer controlled competition switch rapidly swapping between teleop and autonomous, to expose the issue much more frequently than our previous test model. After the sensor driver was fixed, this model caused other SPI errors frequently enough to expose a race condition on an index counter that was likely causing the problem.

The latest revision of PROS, 2b08, incorporates extra SPI timing margin and the fix mentioned above. We believe that if the SPI timing was indeed too short, then the issue should come up more often than once every few minutes or hours. On the accelerated test model, we had no occurrences of the issue on the latest version; this does not mean that it is totally fixed, but it should be an improvement. edjubuh, could you try upgrading PROS (and updating the project you are testing) and see if the issue disappears?

I tried to prove this, but so far the results are inconclusive to the point where I doubt the reduced timing on the SPI packet is the problem. I changed ConVEX to match your timing, but no luck in reproducing errors. I also had trouble in getting PROS to create errors tonight, there’s a remote possibility that temperature is a factor. I had been running the cortex opened up to allow access to the SPI test points. In that condition I could not get PROS to fail, once I closed the cortex up again then I managed two failures within 30 minutes.

FYI (PROS team) I don’t use interrupts on the SPI communications as the SPI word only takes 7uS to transmit. I was using DMA to do the transfer at one point but decided it wasn’t worth the time to context switch only to have to switch back to the SPI task almost straight away. I just poll the SPI status, EasyC does the same thing, don’t know what ROBOTC does.

Anyway, I will try 2b08 over the weekend and see if it has improved things.

Perhaps it would be best to de-bounce this data, do the bit errors last for just one message ?

I ran further tests on 2b08 and, so far, the bug has not resurfaced.

I would like to hear from edjubuh, if he has had a good experience with the new version I think we can all move on and assume it’s fixed.

For anyone that’s interested, here is the revised SPI timing PROS is using in 2b08 as compared to 2b06. At this point I’m in agreement with the PROS team and suspect this has nothing to do with the original bug. We (PROS and ConVEX) are guessing about how the master processor reacts to the SPI timing, we have no information that indicates that it’s even necessary to leave these short gaps between groups. It does not follow common practice for most embedded SPI, but I’m sure there was a good reason originally. I tend to use ROBOTC as the reference implementation for this type of thing, I have great respect for the ROBOTC developers and trust that they know what they are doing. In this case, however, I also decided to tighten up the timing a little and also allow other tasks to run between the SPI groups. I will say that EasyC (in version 4.1.0.5 that I tested) is extremely wasteful of cpu resource, they are using almost 1.5mS to transmit the data in this version with a period of around 20mS. I looked at some old data I had and this did not used to be the case, I have an idea of what happened but that’s for them to sort out if they want to. Just be aware almost 8% of cpu time in EasyC seems to be dedicated to this process.

[ATTACH]7784[/ATTACH]
pros_spi_timing_change.jpg

Sorry for the late response, was visiting colleges over the weekend and hadn’t had time to get on the forums.

Anyway, I must be doing something ridiculous that I can’t seem to get the 2b08 to download. I’m getting the notification, but can’t get it to download… Perhaps I’m forgetting something?