We are trying to develop HawkEye System used in cricket for our college project.
The process used in HawkEye System is as follows:
- images of the the ball at different instances of time(different points) from bowler's hand to the batsmen's(during entire flight of the ball) are needed
- determining the (x,y) coordinates of the ball at different instances of time during the entire flight of the ball
- converting the (x,y) coordinates into the corresponding 3D coordinates (x,y,z)
- modelling the trajectory of the ball during the entire flight of the ball along with the surrounding environment of the ball which includes field, pitch, wickets, stadium
- extending the trajectory of the ball to see whether the ball would have hit the wickets or not
So far this is what we've planned to accomplish this project:
we'll shoot the video of the batsman from the leg umpire's position and then play that video in slow motion in vlc player and simultaneously taking multiple screenshots of the flight of the ball, i guess this will take care of the step 1.
but right now we are stuck in step 2, the problem which we're facing now is that how to recognize and find the (x,y) coordinate of the ball at a particular instance (from the image of the ball taken from leg side) if we can find the (x,y) of the ball and if the distance of the camera from some reference point is known then we can find the depth of the image i.e. the z-coordinate, hence we can find out the corresponding (x,y,z) coordinates and then we can model it in 3D using OpenGL
we're trying to implement it in C++
any help appreciated :)
A quick edit:
I came to know that in real HawkEye System 6 cameras are adjusted on the circumference of the cricket field, all the cameras are seperated by an angle of 60 degree, HawkEye can work perfectly using 4 cameras only but for better precision 2 extra cameras are used.
since we dont have so many cameras, I think we'll be using 3 cameras kept on the circumference of the field seperated by 120 degrees and to reduce the complexity we'll be choosing a small field may be one with radius=5m, but we're not sure where to place the cameras to get more accurate results, may be the positions can be: one on the legside, one on off-side and the third one straight in front but i'm still not sure what positions to choose
this approach is called Multi Camera Calibration and for ball recognition I think we should choose OpenCV over MATLAB because of more speedy image processing done by OpenCV
What do you all have to say?
One camera is enough if you've got the projection/view matrix to get from image space to world space (there are tons of documents out there to do the camera calibration/coordinate transformation). This will get you a vector that points from the camera through the ball. The ball's size can then be used to determine the distance from the camera.
Guess the easiest way to find the ball would be introducing a threshold that "cuts" the ball from the rest of the image. Or use motion detection to extract the ball and/or combine both approaches.
Give your ball a distictive colour that you are unlikely to find elsewhere in the image, then look for pixels in that colour in each image. This is the easiest option. Given the speed at which a ball could move in cricket and you are using only 30 fps most other options are much more difficult. Just finding a white ball is quite difficult (as you probably found) so your best bet would be to use the information about the movement of the ball in previous frames to help find it in new frames. However, low frame rate and high ball speed means your ball will move quite a bit between frames. At 142 km/h for a high speed bowl you are looking at more than one meter of movement between frames, which will result in quite a large gap between the images of the ball between subsequent frames, and this makes using the temporal information more difficult.
As an alternative to a weird colour you could also paint your ball with a layer that is highly reflective in the IR domain and use IR lights (which humans can't see) and IR sensitive camera's (you could remove the IR filter from the camera's you have for this).
I think you'll need two cameras to determine the distance from the camera to the ball. Either that, or you'll have to use some workaround like looking at the size of the ball in each frame or the distance of the ball from it's shadow. But I doubt that these two workarounds would be accurate enough...
With respect to step 2, extracting the location of a ball, there are a multitude of possible approaches and sources of literature. I would strongly recommend looking into the work on Robot soccer (Robocup), which contains many examples of similar problems.
In an ideal world (say a black disk on a white background), the starting point would probably be to use something like a Hough Transform, or contour tracing, and extracting the position using statistical moments of the resultant contour.
The challenge of this approach is that a cricket field is most definitely going to have background features that are challenging to remove. With some trial and error you may be able to use common image processing techniques such as background subtraction, morphological operators, edge detectors, color filtering and thresholding to improve your capability to consistently find the ball. From past experience, I strongly recommend using a set of tools that allows you to rapidly prototype image processing pipelines and techniques, probably MATLAB.
Perhaps a more robust way to phrase this problem, leading into the following sections, is that if you have some idea of where the ball was previously, then you can make a reasonable estimation of where the ball should be after some small amount of time. This is the field of optimal estimation, and Kalman Filters. A good introductory text, albeit from a very different problem space, is Probabilistic Robotics by Thrun et al.