Haskell ranges and floats

2019-01-02 16:33发布

问题:

Why is the behavior of the Haskell range notation different for floats than for integers and chars?

Prelude> [1, 3 .. 10] :: [Int]
[1,3,5,7,9] 
Prelude> [1, 3 .. 10] :: [Float]
[1.0,3.0,5.0,7.0,9.0,11.0]
Prelude> ['a', 'c' .. 'f']
"ace"

I would understand it if the last element was close to the upper bound, but this is obviously not a rounding issue.

回答1:

The syntax [e1, e2 .. e3] is really syntactic sugar for enumFromThenTo e1 e2 e3, which is a function in the Enum typeclass.

The Haskell standard defines its semantics as follows:

For the types Int and Integer, the enumeration functions have the following meaning:

  • The sequence enumFrom e1 is the list [e1,e1 + 1,e1 + 2,…].
  • The sequence enumFromThen e1 e2 is the list [e1,e1 + i,e1 + 2i,…], where the increment, i, is e2 − e1. The increment may be zero or negative. If the increment is zero, all the list elements are the same.
  • The sequence enumFromTo e1 e3 is the list [e1,e1 + 1,e1 + 2,…e3]. The list is empty if e1 > e3.
  • The sequence enumFromThenTo e1 e2 e3 is the list [e1,e1 + i,e1 + 2i,…e3], where the increment, i, is e2 − e1. If the increment is positive or zero, the list terminates when the next element would be greater than e3; the list is empty if e1 > e3. If the increment is negative, the list terminates when the next element would be less than e3; the list is empty if e1 < e3.

This is pretty much what you'd expect, but the Float and Double instances are defined differently:

For Float and Double, the semantics of the enumFrom family is given by the rules for Int above, except that the list terminates when the elements become greater than e3 + i∕2 for positive increment i, or when they become less than e3 + i∕2 for negative i.

I'm not really sure what the justification for this is, so the only answer I can give you is that it is that way because it's defined that way in the standard.

You can work around this by enumerating using integers and converting to Float afterward.

Prelude> map fromIntegral [1, 3 .. 10] :: [Float]
[1.0,3.0,5.0,7.0,9.0]


回答2:

Ok, @Henning Makholm already said this in his comment, but he didn't explain why this actually is a better solution.

First thing to say: when dealing with floating-point, we must always be aware of the possible rounding errors. When we write [0.0, 0.1 .. 1.0] we must be aware that all these numbers, except for the first one, will not be at the exact places of tenths. Where we need this kind of certainty, we must not use floats at all.

But of course there are many applications where we're content with reasonable certainy, but need high-speed. That's where floats are great. One possible application of such a list would be a simple trapezoid numerical integration:

trIntegrate f l r s = sum [ f x | x<-[l,(l+s)..r] ] * s - (f(l)+f(r))*s/2

let's test this: trIntegrate ( \x -> exp(x + cos(sqrt(x) - x*x)) ) 1.0 3.0 0.1 => 25.797334337026466
compared to 25.9144 an error of less than one percent. Not exact of course, but that's inherent to the integration method.

Suppose now that float ranges were defined to always terminate when crossing the right border. Then, it would be possible (but we can't be certain about it!) that only 20 values rather than 21 are calculated in the sum, because the last value of x happens to be 3.000000something. We can simulate this

bad_trIntegrate f l r s = sum [ f x | x<-[l,(l+s)..(r-s)] ] * s - (f(l)+f(r))*s/2

then we get

bad_trIntegrate ( \x -> exp(x + cos(sqrt(x) - x*x)) ) 1.0 3.0 0.1

=> 21.27550564546988
urgh!

This has nothing to do with hiding the problems with floating point. It's just a method to help the programmer getting around these problems easier. In fact, the counterintuitive result of [1, 3 .. 10] :: Float helps to remember these problems!



标签: