Its just a basic question. I am fitting lines to scatter points using polyfit
.
I have some cases where my scatter points have same X values and polyfit
cant fit a line to it. There has to be something that can handle this situation. After all, its just a line fit.
I can try swapping X and Y and then fir a line. Any easier method because I have lots of sets of scatter points and want a general method to check lines.
Main goal is to find good-fit lines and drop non-linear features.
First of all, this happens due to the method of fitting that you are using. When doing polyfit
, you are using the least-squares method on Y
distance from the line.
(source: une.edu.au)
Obviously, it will not work for vertical lines. By the way, even when you have something close to vertical lines, you might get numerically unstable results.
There are 2 solutions:
- Swap x and y, as you said, if you know that the line is almost vertical. Afterwards, compute the inverse linear function.
- Use least-squares on perpendicular distance from the line, instead of vertical (See image below) (more explanation in here)
(from MathWorld - A Wolfram Web Resource: wolfram.com)
Polyfit uses linear ordinary least-squares approximation and will not allow repeated abscissa as the resulting Vandermonde matrix will be rank deficient. I would suggest trying to find something of a more statistical nature.
If you wish to research Andreys method it usually goes by the names Total least squares or Orthogonal distance regression http://en.wikipedia.org/wiki/Total_least_squares
I would tentatively also put forward the possibility of detecting when you have simultaneous x values, then rotating your data about the origin, fitting the line and then transform the line back. I could not say how poorly this would perform and only you could decide if it was an option based on your accuracy requirements.