Tracking objects from camera; PID controlling; Par

2020-06-28 16:27发布

问题:

I am working on a project where I should implement object tracking technique using the camera of Parrot AR Drone 2.0. So the main idea is, a drone should be able to identify a specified colour and then follow it by keeping some distance.

I am using the opencv API to establish communication with the drone. This API provides function:

ARDrone::move3D(double vx, double vy, double vz, double vr)

which moves the AR.Drone in 3D space and where

  • vx: X velocity [m/s]
  • vy: Y velocity [m/s]
  • vz: Z velocity [m/s]
  • vr: Rotational speed [rad/s]

I have written an application which does simple image processing on the images obtained from the camera of the drone using OpenCV and finds needed contours of the object to be tracked. See the example below:

Now the part I am struggling is finding the technique using which I should find the velocities to be sent to the move3D function. I have read that common way of doing controlling is by using PID controlling. However, I have read about that and could not get how it could be related to this problem.

To summarise, my question is how to move a robot towards an object detected in its camera? How to find coordinates of certain objects from the camera?

回答1:

EDIT:
So, I just realized you are using a drone and your cordinate system WRT the drone is likely x forward into image, y left of image (image columns), z up vertically(image rows). My answer has coordinates WRT the camera x = columns, y = rows, z = depth (into image) Keep that n mind when you read my outline. Also everything I wrote is psuedo-code, it won't run without many modifications

Original Post:
A PID controller is a Proportional–integral–derivative controller. It decides an action sequenced based on the your specific error.

For your problem lets assume the optimal tracking means the rectangle is in the center of the image, and it takes up ~30% of the pixel space. This means that you move your camera/bot until these conditions are met. We will call these goal parameters

x_ideal = image_width / 2
y_ideal = image_height / 2
area_ideal = image_width * image_height * 0.3

Now lets say your bounding box is characterized by 4 parameters

(x_bounding, y_bounding, width_bounding_box, height_bounding_box)

Your error would be something along the lines of:

x_err = x_bounding - x_ideal;
y_err = y_bounding - y_ideal;
z_err = area_ideal - (width_bounding_box * height_bounding_box)

Notice I have tied the z distance (depth) to the size of the object. This assumes that the object being tracked is rigid and doesn't change size. Any change in size is due to the objects distance to the camera (a bigger bounding box means object is close, a small one means object is far). This is a bit of an estimation, but without having any parameters on the camera or the object itself we can only make these general statements.

We need to keep the sign in mind when creating our control sequences, this is why the order matters when doing subtraction. Lets think about this logically. x_err determines how far off the bounding box is horizontally from the desired position. In our case this should be positive, meaning the bot should move to the left so the object moves closer to the center of the image. The box is too small, meaning the object is too far away, etc.

z_err < 0 : means bot is too close and needs to slow down, Vz should be reduced
z_err = 0 : keep the speed command the same, no change
z_err > 0 : we need to get closer, Vz should increase

x_err < 0 : means bot is to the right and needs to turn left(decreasing x), Vx should be reduced
x_err = 0 : keep the speed in X the same, no change to Vx
x_err > 0 : means bot is to the left and needs to turn right(increasing x), Vx should be increased

We can do the same for each y axis. Now we use this error to create a command sequence for the bot.

That description sounds a lot like a PID controller. Observe a state, figure out an error, create a control sequence to reduce the error, then repeat the process over and over. In your case the velocity would be the actions output by your algorithm. You will essentially have 3 PIDs running

  1. PID for X
  2. PID for Y
  3. PID for Z

Because these are orthogonal by nature, we can say each system is independent (and ideally it is), moving in the x direction shouldn't affect the Y direction. This example also completely ignores the bearing information (Vr), but it's meant to be a thought exercise, not a complete solution

The exact velocity of the corrections is determined by your PID coefficients and this is where things get a little tricky. Here is a easy to read (almost no math) overview or PID control. You will have to play with your system (aka "Tune" your parameters) through a bit of experimentation. This is made even more difficult because the camera is not a full 3d sensor, so we can't extract true measurements from the environment. It hard to convert an error of ~30 pixels into m/s without knowing more information about your sensor/environment, But I hope this gave you a general idea of how to proceed