I am just being adventurous and taking my first baby step toward computer vision. I tried to implement the Hough Transformation on my own but I just don't get the whole picture. I read the wikipedia entry, and even the original "use of the hough transformation to detect lines and curves in pictures" by richard Duda and Peter Hart, but didn't help.
Can someone help explaining to me in a more friendly language?
It's more common to think of a line in rectangle coordinates, i.e. y = mx + b. As the Wikipedia article states, a line can also be expressed in polar form. The Hough transform exploits this change of representation (for lines, anyway. The discussion can also be applied to circles, ellipses, etc.).
The first step in the Hough transform is to reduce the image to a set of edges. The Canny edge-detector is a frequent choice. The resulting edge image serves as the input to the Hough process.
To summarize, pixels "lit" in the edge image are converted to polar form, i.e. their position is represented using a direction theta and a distance r - instead of x and y. (The center of the image is commonly used as the reference point for this change of coordinates.)
The Hough transform is essentially a histogram. Edge pixels mapping to the same theta and r are assumed to define a line in the image. To compute the frequency of occurrence, theta and r are discretized (partitioned into a number of bins). Once all edge pixels have been converted to polar form, the bins are analyzed to determine the lines in the original image.
It is common to look for the N most frequent parameters - or threshold the parameters such that counts smaller than some n are ignored.
I'm not sure this answer is any better than the sources you originally presented - is there a particular point that you are stuck on?
Here's a very basic, visual explanation of how a Hough Transform works for detecting lines in an image:
The Hough transform is a way of finding the most likely values which represent a line (or a circle, or many other things).
You give the Hough transform a picture of a line as input. This picture will contain two types of pixels: ones which are part of the line, and ones which are part of the background.
For each pixel that is part of the line, all possible combinations of parameters are calculated. For example, if the pixel at co-ordinate (1, 100) is part of the line, then that could be part of a line where the gradient (m) = 0 and y-intercept (c) = 100. It could also be part of m = 1, c = 99; or m = 2, c = 98; or m = 3, c = 97; and so on. You can solve the line equation y = mx + c to find all possible combinations.
Each pixel gives one vote to each of the parameters (m and c) that could explain it. So you can imagine, if your line has 1000 pixels in it, then the correct combination of m and c will have 1000 votes.
The combination of m and c which has the most votes is what is returned as the parameters for the line.
Here's another perspective (one used in the pilot episode of the T.V show Numbers): Imagine a fountain-like lawn sprinkler was somewhere on a lawn earlier, casting out water droplets around itself. Now the sprinkler is gone, but the drops remain. Imagine turning each drop into its own sprinkler, itself casting out droplets around itself - in all directions because the drop doesn't know what direction it came from. This will scatter a lot of water thinly around on the ground, except there will be a spot where a whole lot of water hits from all drops at once. That spot is where the original sprinkler was.
The application to (e.g) line detection is similar. Each point in the image is one of the original droplets; when it acts as a sprinkler it sends its own droplets marking all of the lines that could be passing through that point. Places where a whole lot of secondary droplets land represent the parameters of a line that passes through a whole lot of image points - VOILA! Line detected!