Potentiometer woes

So my students are still struggle with potentiometers and before suggesting some extreme measures, I’d though I’d try to ask teams here for their experience and suggestions.
They have RD4B with a one potentiometer on each side. Left goes to analog2, right to analog3. They use that for driving the RD4B lift, including compensation for uneven lift. This approach worked well for them, their code is well tested and reliable.
But they get frequent issues with the right pot - The readout value oscilates between the expected value and a nonsensical reading of typically around 250 (they have them mounted that the full working range of RD4B is between 1200-3200 on both pots).
This only happens on the right pot.
We have tried replacing the pot - worked better (not 100%) for some time, but we’re at 5th new pot.
We have tried replacing the cable (there is just one 36" extension between the brain and the pot).
They now have 2 layers of filtering in place (1st layer - replace recognized bad value with the value from the other side, 2nd layer - average 4 readings), but still observe really bad jitter on the lift (I have yet to verify their filtering).
(Note: This is all with a brand new pot that was never pushed out of its working range. Not even close to the edge of the range)

We have a datalog (unfortunately I don’t have a png, just the CSV this time) from the last matchload practice, where the left pot was showing perfect smooth curves, while the right looked literally like a comb. Thanks to the filtering, both the driver and their automation is still mostly able to use the lift, but it’s frustrating. We can’t replace the pot for every match, can we?

Possible extreme measures:
a) Abandon balacing, only use left pot for PID
b) Add extra pot or two to the right side, pick more reasonable value
c) Keep replacing the pots with new ( we’ve got few new packs for words, at least pots are cheap)

So, are there any known issues with pots, analog ports or something (like there is this interrupt thing with digital4 vs. digital10)?
They have most of analog ports used - 5 pots, gyro, voltage feedback from expander.

Any suggestions how to avoid this class of problems?

Datalog sample (5th row is the right pot, 6th is left, lift was moving down):
1945, 20.110, 73     , 127    , 1728   , 1797...
1946, 20.120, -44    , 10     , 1771   , 1785...
1947, 20.130, -44    , 10     , 1886   , 1776...
1948, 20.140, -54    , 0      , 270    , 1769...
1949, 20.150, -54    , 0      , 250    , 1763...
1950, 20.160, -54    , 0      , 1862   , 1760...
1951, 20.170, -40    , -34    , 702    , 1758...
1952, 20.180, -40    , -34    , 648    , 1756...
1953, 20.190, -10    , -4     , 1839   , 1752...
1954, 20.200, -10    , -4     , 1771   , 1746...
1955, 20.210, -10    , -4     , 1063   , 1740...
1956, 20.220, 73     , 127    , 249    , 1730...
1957, 20.230, 73     , 127    , 1846   , 1726...
1958, 20.240, -62    , -52    , 342    , 1720...
1959, 20.250, -62    , -52    , 586    , 1718...
1960, 20.260, 55     , 109    , 1392   , 1711...
1961, 20.270, 55     , 109    , 1812   , 1700...
1962, 20.280, 55     , 109    , 1563   , 1699...
1963, 20.290, 73     , 127    , 776    , 1694...

I have had issues like this with potentiometers, even brand new ones, I think they are just poor quality and I would recommendusing quadrature encoders if you can. If this is not possible filters should work quite well, and replacing the potentiometer when the issue becomes too bad.

Looks like it’s typically 2, sometimes 3 bad readings in a row, and that these bad values are always small. So a simple filter could do it. When you take in a new value:

int valueList[4];
int largestPotValue;

for( i=0; i++; i<3) {
 valueList* = valueList *;
}
valueList[3] = new pot value; // insert code to check appropriate port
largestPotValue = valueList[0];
for( i=1; i++; i<4) {
 if (valueList* > largestPotValue) {
  largestPotValue = valueList*;
  }
}

Then use largestPotValue where you would have used the potentiometer value. You could do more complex filtering; it just looks like this would work easily. This does assume you’re checking the value fairly rapidly, though, since trying to lower it will require several readings to purge the earlier higher value.****

250 is the spike value for a broken pot. Replace it first and see what happens.

Sounds like the pot wiper is bouncing.
Does it only do this when it is in motion, or also when it is standing still?
Does the pot have any cantilever force on it?

Did you try swapping the two pot analog ports?
If you did, did the problem follow the pot or the analog port?

36" is a long way (although certainly doable), does the wire run past any motors?

What rate are the data points sampled in your example?

I’m with @TriDragon that the issue may not be the pot.

I recommend you go through every point of potential failure – Every piece of the circuit from the pot to the Cortex. Do your tests one at a time and record your findings. This should help you narrow down the cause. Make sure that you test with the Cortex powered on with a battery (and not just running on the USB power). This can cause bad results.

  • Try replacing the (left) known working pot with some of the ones that failed and see if they still fail.
  • Switch the pot ports on the Cortex (2 for 3, and 3 for 2).
  • Check that your cable colors are aligned (black, red, white) and you don’t have a plug flipped.
  • Swap out the extension cables (before you mount them) and test if they cause issues.
  • Try using a different port on the Cortex.
  • Test your pots on a different Cortex if you have one available.
  • Unplug the other analog devices except for the pots and see if the right pot still fails. If not then one-by-one introduce the other devices until you see a failure.

Thanks for all the advices, I’ll get them read this thread.

What we have and what we tested so far:

  • 36" is necessary - these are the pots on the mid-section of the RD4B. At least we only have one connection on the cable.
  • It’s a single cable bundle, 3 pots, 6 motors, so the pot cables go next to high-power motor cables, but:
    • so does the left pot and that one is totally clean
    • we tested connecting the pot with a new, out-of-the-bundle 36" cable. Similar results.
  • The pot sits on free end of a shaft. The stack is: Pot | Flat Bearing | C-chan | gear with arm | C-chan | FB, it is fixed by one screw and shouldn’t be under lateral stress. Left side is exact mirror and works.
  • Wiring is correct - it works >80% of time in all lift heights.
  • SW filtering: The datalog was 10ms sampled. Signal has a lot of spikes, but also some >200ms (or even stable) stretches of bad value. It also sometimes read an off value that falls into the valid range (1200 when it should have been 1800)
    They have already introduced filtering, For PID at 20ms loop time, they have (pretty complicated layered code by now, comments are mine):

// maps the valid calibrated range (1294 .. 3176) to 0..100, just returning negative below 1294, >100 for over 3176
float getLiftRight(){
  return(valMap(1294, SensorValue(Rpot_lift), 3176));
}
float getLiftLeft(){
  return(valMap(1141, SensorValue(Lpot_lift), 3064));
}

// difference is a "return argument", a way to return more than 1 result in C
float getLift (float* difference = NULL){
    float left = getLiftLeft();
    float right = getLiftRight();
    if(left < -15) left = right;
    if(right < -15) right = left;
    float avg = (left+right)/2;
    float delta = left-right;
    if(difference != NULL) {
      *difference = delta;
    }
    return(avg);
}

// gets and averages 4 samples over the last 10ms of the 20ms loop time
// I now see a little, bug-let here, it should return float here, will tell them.
int liftFilter(float *delta) {
  float avg, diff, tmp;
  wait1Msec(1);
  avg = getLift(&diff);
  wait1Msec(3);
  avg += getLift(&tmp);
  diff += tmp;
  wait1Msec(3);
  avg += getLift(&tmp);
  diff += tmp;
  wait1Msec(3);
  avg += getLift(&tmp);
  diff += tmp;

  *delta =diff/4;
  return(avg/4);
}


task LIFT_PID() {
    repeat(forever) {
        wait10Msec(1);
        float delta;
        float avg = liftFilter(&delta);

         // rest of PID code below. Worked well with reliable pots
    }

We have a meeting today, so we’ll try some more experiments (swapping cables, ports, pots). For example, I so far fail to see why the bad value elimination in getLift fails to help. More logging necessary I think.

Try swapping the ports with the two pots. That way you still have a pot on each of those ports. If the errors switch to the other pot, then it’s pretty clearly the port or the port in conjunction with a pot in general. Even then, that port might work fine for something else. You’d just need to find a new port that works well for the pot.

an analog port reading around 250s is indicative of a disconnected pot, you can confirm this by simply loading a code with motor/sensor set up of a pot in analog1 and running the code without anything plugged into the port. The reading should read around 250s

Using the average of the previous 4 values seems like it will not help the problem. We just wrote a small filtering code for an ultrasonic sensor, and when the incoming value was more than, say 10mm off from the previous value (choose your own threshold), we instead used the median of the previous 5 values. And then put that median in the latest-5-values array as the “new” value (in lieu of the actual [anomalous] sensor reading).

In your case (with the pot on a lift), the median of the previous 5 might not be entirely useful because you expect that the value is increasing or decreasing (in our case, we were using the ultrasonic to ping off the field perimeter and keep a fixed distance from the wall, and hence the previous-5 median was a good substitute). In your case, it gets a little complicated if you want to get the slope of the previous 5 values and extrapolate to what the actual value likely is, but these intervals are very short, so maybe extrapolation is not 100% required.