New state of the art in unlimited generation of Ha

2019-01-12 01:06发布

问题:

(this is exciting!) I know, the subject matter is well known. The state of the art (in Haskell as well as other languages) for efficient generation of unbounded increasing sequence of Hamming numbers, without duplicates and without omissions, has long been the following (AFAIK - and btw it is equivalent to the original Edsger Dijkstra's code too):

hamm :: [Integer]
hamm = 1 : map (2*) hamm `union` map (3*) hamm `union` map (5*) hamm
  where
    union a@(x:xs) b@(y:ys) = case compare x y of
        LT -> x : union  xs  b
        EQ -> x : union  xs  ys
        GT -> y : union  a   ys

The question I'm asking is, can you find the way to make it more efficient in any significant measure? Is it still the state of the art or is it in fact possible to improve this to run twice faster and with better empirical orders of growth to boot?

If your answer is yes, please show the code and discuss its speed and empirical orders of growth in comparison to the above (it runs at about ~ n^1.05 .. n^1.10 for first few hundreds of thousands of numbers produced). Also, if it exists, can this efficient algorithm be extended to producing a sequence of smooth numbers with any given set of primes?

回答1:

If a constant factor(1) speedup counts as significant, then I can offer a significantly more efficient version:

hamm :: [Integer]
hamm = mrg1 hamm3 (map (2*) hamm)
  where
    hamm5 = iterate (5*) 1
    hamm3 = mrg1 hamm5 (map (3*) hamm3)
    merge a@(x:xs) b@(y:ys)
        | x < y     = x : merge xs b
        | otherwise = y : merge a ys
    mrg1 (x:xs) ys = x : merge xs ys

You can easily generalise it to smooth numbers for a given set of primes:

hamm :: [Integer] -> [Integer]
hamm [] = [1]
hamm [p] = iterate (p*) 1
hamm ps = foldl' next (iterate (q*) 1) qs
  where
    (q:qs) = sortBy (flip compare) ps
    next prev m = let res = mrg1 prev (map (m*) res) in res
    merge a@(x:xs) b@(y:ys)
        | x < y     = x : merge xs b
        | otherwise = y : merge a ys
    mrg1 (x:xs) ys = x : merge xs ys

It's more efficient because that algorithm doesn't produce any duplicates and it uses less memory. In your version, when a Hamming number near h is produced, the part of the list between h/5 and h has to be in memory. In my version, only the part between h/2 and h of the full list, and the part between h/3 and h of the 3-5-list needs to be in memory. Since the 3-5-list is much sparser, and the density of k-smooth numbers decreases, those two list parts need much less memory that the larger part of the full list.

Some timings for the two algorithms to produce the kth Hamming number, with empirical complexity of each target relative to the previous, excluding and including GC time:

  k            Yours (MUT/GC)               Mine (MUT/GC)
 10^5           0.03/0.01                    0.01/0.01      -- too short to say much, really
2*10^5          0.07/0.02                    0.02/0.01
5*10^5          0.17/0.06  0.968  1.024      0.06/0.04      1.199    1.314
 10^6           0.36/0.13  1.082  1.091      0.11/0.10      0.874    1.070
2*10^6          0.77/0.27  1.097  1.086      0.21/0.21      0.933    1.000
5*10^6          1.96/0.71  1.020  1.029      0.55/0.59      1.051    1.090
 10^7           4.05/1.45  1.047  1.043      1.14/1.25      1.052    1.068
2*10^7          8.73/2.99  1.108  1.091      2.31/2.65      1.019    1.053
5*10^7         21.53/7.83  0.985  1.002      6.01/7.05      1.044    1.057
 10^8          45.83/16.79 1.090  1.093     12.42/15.26     1.047    1.084

As you can see, the factor between the MUT times is about 3.5, but the GC time is not much different.

(1) Well, it looks constant, and I think both variants have the same computational complexity, but I haven't pulled out pencil and paper to prove it, nor do I intend to.



回答2:

So basically, now that Daniel Fischer gave his answer, I can say that I came across this recently, and I think this is an exciting development, since the classical code was known for ages, since Dijkstra.

Daniel correctly identified the redundancy of the duplicates generation which must then be removed, in the classical version.

The credit for the original discovery (AFAIK) goes to Rosettacode.org's contributor Ledrug, as of 2012-08-26. And of course the independent discovery by Daniel Fischer, here (2012-09-18).

Re-written slightly, that code is:

import Data.Function (fix)

hamm = 1 : foldr (\n s -> fix (merge s . (n:) . map (n*))) [] [2,3,5]

with the usual implementation of merge,

merge a@(x:xs) b@(y:ys) | x < y     = x : merge xs b
                        | otherwise = y : merge a ys
merge [] b = b
merge a [] = a

It gives about 2.0x - 2.5x a speedup vs. the classical version.



回答3:

A question here How do you find the list of all numbers that are multiples of only powers of 2, 3, and 5? the-list-of-all-numbers-that-are-multiples-of-only-powers-of-2 Then, I am now not sure weather the question marked duplicate wants powers or multiples. I confuse way to easily.

Taking the numbers directly from the integers precludes sorting and merging. Taking the deltas of the list of multiples of 2,3 and 5, for the first few hundred, you notice that the pattern of deltas recurs every (22 out of) 30. You can use this pattern in a scanl and a cycle of the pattern.

take 77 $ scanl (\b a -> a+b) 2 $ cycle  [1,1,1,1,2,1,1,2,2,1,1,2,2,1,1,2,1,1,1,1,2,2]

This generates this list [2,3,4,5,6,8,9,10,12,14,15,16,18,20,21,22,24,25,26,27,28,30,32,33,34,35,36,38,39,40,42,44,45,46,48,50,51,52,54,55,56,57,58,60,62,63,64,65,66,68,69,70,72,74,75,76,78,80,81,82,84,85,86,87,88,90,92,93,94,95,96,98,99,100,102,104,105]

I use the inverse of this list that uses a very much shorter pattern because there are very fewer primes.



回答4:

@Will Ness I found that every new candidate number for Hamming status that can can be dived evenly by any of [2,3,5] has a quotient in the list of Hamming numbers being constructed. This does preclude merge-ing and unioning. It does not use any multiples or other factors. It only uses the Hamming list generated. But, it is still not fast.

I posted it on CodeReview and max taldykin showed me how Data.Set functions are way faster than elem but still no where near linear like the above.

I analyzed the 2,3,5 multiples of the Hamming list. If you take all the 2 multiples and all the odd multiples of the 3s, the only thing remaining of the 5s that are not duplicate are, for example, is the ratio of 15 in 6,103,515,625 which can be recursively calculated with base case of 1 and multiply the previous value by 5 for the next value.

The first parameter is a list of Hamming numbers less than 10, the second parameter is a base list from which to draw candidate numbers. The logic will run in a foldl which is also slow. A reverse list construction is faster search times because the candidate Hamming is at the end of the list to be added if its quotient is in the list.

-- foldl (\b a -> let d = divm a [2,3,5] in if elem d b then b++[a] else b) [1,2,3,4,5,6,8,9] [10..n]

hamx ls (x:xs) 
   | even x        && elem (div x 2) ls = hamx (x:ls) xs
   | mod  x 3 == 0 && elem (div x 3) ls = hamx (x:ls) xs
   | mod  x 5 == 0 && elem (div x 5) ls = hamx (x:ls) xs
   | null xs = ls
   | otherwise = hamx ls xs 


回答5:

Well I was unable to post my last answer to a similar question because is was tagged as duplicate because of this question.

So now, on to this question. I still think it best to traverse across an ordered set so as not to have to compensate for not.

I did find my no2s3s5s function useful for this, too. It is funny that no2s3s5s starts from 11. no2s3s5s is just counting in a particular way.

This is no2s3s5s

no2s3s5s = scanl (\b a -> a + b) 11 $ cycle [2,4,2,4,6,2,6,4]

It produces mostly primes, from 11 to 97 there is only 49, 77 and 91 that are not prime, the dreaded 7s. It is used in hamseq in lieu of primes. My bad.

hamseq n = [ b | b <- [1..n], all (>0) [mod b f | f <- take (div b 2) (7:no2s3s5s)]]

hamseq 729

produces [1,2,3,4,5,6,8,9,10,12,15,16,18,20,24,25,27,30,32,36,40,45,48,50,54,60,64,72,75,80,81,90,96,100,108,120,125,128,135,144,150,160,162,180,192,200,216,225,240,243,250,256,270,288,300,320,324,360,375,384,400,405,432,450,480,486,500,512,540,576,600,625,640,648,675,720,729]

@Will Ness, I am literal minded. I never thought about producing this list with parameter 77. My bad.

But, here

hamseq n = take n $ [ b | b <- [1..], all (>0) [mod b f | f <- take (div b 2) (7:no2s3s5s)]]

12/2/2018 Well again because of the Fundamental Theorem of Arithmetic (my original motivation for this function) I stopped using no2s3s5 in hamseq. There are many fewer values in a primes list. I have a really fast prime function but I have to dig it up. I did this one just earlier today. The hamseq function is way faster, now, not linear but faster.

prs n =takeWhile (<n)$[p|p <-no2s3s5s, all (>0) [mod p f|f <- take (ceiling.sqrt.fromIntegral$p) (2:3:5:7:no2s3s5s)]]
hamseq' n ls = [ b | b <- [1..n], all (>0) [mod b f | f <- take (div b 3) ls]]
hamseq n = hamseq' n (7:(prs n))

The (div b 3) is approximate. The optimum number is between 3 and 4. I'll find my way faster prime generator and replace this clunker.

I forgot this and it cuts the last time in half.

base n = 1:[ d | d <- [2..n], any (==0) $ [mod d f | f <- [2,3,5]]]
so
hamseq' n ls = [ b | b <- (base n), all (>0) [mod b f | f <- take (div b 3) ls]]

12/4/2018

Well, i gave up. It is just not possible to use a prime list to factor each x because the length of the list becomes to very large and contributes to lethargy.

The alternative is to successively factor each number by one of [2,3,5]. This means the last value calculated is used in the next calculations, that is recursive. Simple recursion in Haskell means fold/scan.

First is the divuntil (divide until) for the scanl in facs. It repeatedly divided a value by one of [2,3,5], whichever matches before and after each divide. This was split into two functions but now is merged and is faster.
Second is the facs function which uses scanl to generate quotients to re-divide. The irritating nature of div causes scanl to drill down to 1 for Hamming numbers. If the quotient does not result in 1 then the number is not Hamming. If the quotient does not divide by 2,3 or 5 the result is the null list. Finally an unnamed list comprehension to run facs repeatedly on a list of numbers. A list comprehension because it would otherwise require an if with no else, that is, no easy way to ignore or discard values.

divuntil = \b a -> let ff=[f | f <- [2,3,5], mod b f == 0] in
                       if null ff then 0 else div b (ff !! 0)

facs= \n -> last.takeWhile (>0) $ scanl divuntil n [1..]

take 200 $ [ b | b <- [1..], facs b == 1]

This code is ugly, I know. It can most certainly be improved upon. Probably because of the way I write Haskell. One line-at-a-time. My bad. I could probable break down and put this into a file of one function which would probably speed it up, too.

The speed up over my last function is well over 12 times I estimate. It's still not close to linear but faster.

facs can be used with any value to identify it as Hamming or not.

I start this with what I thought was an answer to another question, that is, all the multiples of [2,3,5]. I answered with this.

base = scanl (\b a -> a+b) 2 $ cycle [1,1,1,1,2,1,1,2,2,1,1,2,2,1,1,2,1,1,1,1,2,2]

It produces the multiples a sample of which is above. The recursive addition is again, counting. It is faster than recursive division of a single value. It makes the following faster.

take 600 $ [ b | b <- base, facs b == 1]

A take 600 is 18.84 seconds with [1..] on my office PC and 16.28 seconds with base.

Faster is more fun. Try

take 430 $ [ floor.log.fromIntegral $ b | b <- scbase, facs b == 1]

You'll can get counts of how many of each of [0..12] which are [1,4,8,10,14,21,27,34,44,51,59,72,82] and the deltas [1,3,4, 2,4,7, 6,7,10, 7,8,13, 10]

12/10/2018 As I suspected, one function is faster than many. The scanl is not so controllable. The div function can also be controlled. This function will analyze one number and issue a True (Hamming) or False (not Hamming). It is run in a list comprehension as the list comprehension can produce ans x based on a Boolean.

fac n
      | n < 7 = True
      | even n       = fac (div n 2)
      | mod n 3 == 0 = fac (div n 3)
      | mod n 5 == 0 = fac (div n 5)
      | otherwise = False

take 600 $ [ d | d <- base, fac d ] 9.57 seconds is the best I can do by checking each value. Next, if I have time, a more analytical approach.

12/21/2018

The speed-up is 10 fold from fac just above this but this is way more the test of a concept.

This is three functions. The first is b25 which replaces base from above. b25 produces all evens and 5s (mod 10).

b25 = scanl' (\b a -> (b+a)) 2 $ cycle [2,1,1,2,2,2]

What is interesting to me is the main function produces no 3-multiples.

Multiples of 3 are provided by the second function.

the3s = \n -> scanl' (\b a -> a*b) 1 $ replicate n 3

The fac2 function changes to process only evens or odds (5s).

In it n > 7 is superfluous. Maybe a case statement.

fac2 n
    | n <= 7 = True
    | n >  7 = if even n then fac (div n 2) else fac (div n 5)
    | otherwise = False

Run this like,

sort $ the3s 14 ++ [ d | d <- (take 150000 b25), fac2 d]

The speedup is about 10 fold. 600 Hammings took .92 of a second in GHCi.

I've gotten much use from fac above because you can add things to the list comprehension.

Indeed, processing 150,000 numbers to get 600 is crazy and the best way to get anywhere close to linear is to not process from a list.



回答6:

Well this was easier than I thought. This will do 1000 Hammings in 0.05 seconds on my slow PC at home. This afternoon at work and a faster PC times of less than 600 were coming out as zero seconds.

This take Hammings from Hammings. It's based on doing it fastest in Excel.

I was getting wrong numbers after 250000, with Int. The numbers grow very big very fast, so Integer must be used to be sure, because Int is bounded.

mkHamm :: [Integer] -> [Integer] -> [Integer] -> [Integer] 
       -> Int -> (Integer, [Int])
mkHamm ml (x:xs) (y:ys) (z:zs) n =  
   if n <= 1
       then (last ml, map length [(x:xs), (y:ys), (z:zs)])
       else mkHamm (ml++[m]) as bs cs (n-1)
     where
         m = minimum [x,y,z]
         as = if x == m then xs ++ [m*2] else (x:xs) ++ [m*2]
         bs = if y == m then ys ++ [m*3] else (y:ys) ++ [m*3]
         cs = if z == m then zs ++ [m*5] else (z:zs) ++ [m*5]

Testing,

> mkHamm  [1] [2] [3] [5] 5000
(50837316566580,[306,479,692])        -- (0.41 secs)

> mkHamm  [1] [2] [3] [5] 10000
(288325195312500000,[488,767,1109])   -- (1.79 secs)

> logBase 2 (1.79/0.41)     -- log of times ratio = 
2.1262637726461726          --   empirical order of growth

> map (logBase 2) [488/306, 767/479, 1109/692] :: [Float]
[0.6733495, 0.6792009, 0.68041545]     -- leftovers sizes ratios

This means that this code's run time's empirical order of growth is above quadratic (~n^2.13 as measured, interpreted, at GHCi prompt).

Also, the sizes of the three dangling overproduced segments of the sequence are each ~n^0.67 i.e. ~n^(2/3).

Additionally, this code is non-lazy: the resulting sequence's first element can only be accessed only after the very last one is calculated.

The state of the art code in the question is linear, overproduces exactly 0 elements past the point of interest, and is properly lazy: it starts producing its numbers immediately.

So, though an immense improvement over the previous answers by this poster, it is still significantly worse than the original, let alone its improvement as appearing in the top two answers.

12.31.2018

Only the very best people educate. @Will Ness also has authored or co-authored 19 chapters in GoalKicker.com “Haskell for Professionals”. The free book is a treasure.

I had carried around the idea of a function that would do this, like this. I was apprehensive because I thought it would be convoluted and involved logic like in some modern languages. I decided to start writing and was amazed how easy Haskell makes the realization of even bad ideas.

I had used a subsidiary list in fac2, here, with a significant performance increase. The problem was I could not include the 5 (mod 10)’s with the 3’s and 5's. Of course, such lists of mine are of unique multiples. I was playing with the 5 (mod 10)’s this last Saturday in Excel and found that when my threes list was used to start recursive 5 multiples, I would get the 5’s (mod 10), and the threes and all of them in one list. As a bonus from starting with 1 the few unique 5 multiples are also included.

Even the "unique 5 multiple columns" in Excel use values faster from the first column and none from later columns until needed. That is, uses them diagonally. I’ve not incorporated this into my functions but it will be easy.

Now, with all unique 3’s and 5’s all that remains is the 2’s

This function lent itself to doing it, though with some revision. There are three functions to generate the 3’s and 5’s. One is the threes because they are the initial value of the second function list which compiles the lists. The multiple lists are flattened with the list comprehension comp then sorted.

This is the file of functions.

import Data.List (sort)
threes n = scanl (\a b -> a*b) 1 $ replicate n 3
list n =   scanl (\a b -> a*b) n $ replicate 16 5
comp = sort $ [e*d|d<-threes 26, e <- list d]

ham2 = ham2' (tail comp) [1]

ham2' (f:fo) (h:hx) = if h == min h f
                      then h:ham2' (f:fo) (  hx ++ [h*2])
                      else f:ham2'    fo  (h:hx ++ [f*2])

To half the size of the mixed list of unique 3's and 5's, a diagonal calc. These replacement functions reduce each successive list by one. I took sort out of the newcomb function so it can be run with a small parameter to see the reductions of the lists. Each list starts with a 3^e which is multiplied by [5^e|e<-[0..]]

newcomb n = [ e*d |  (e,t) <- zip [3^e|e <- [0..div n 2]] [1..], 
                        d  <- take ((t+n)-(t*2)) [5^e|e <- [0..]] ]
ham2 = ham2' (tail.sort.newcomb $ 48) [1]

There does seem to be a small gain in performance with these last functions but it is not for me to say.

With the diagonal, I'm more comfortable with a large factor of 48 which will generate past 30,000 Hammings but not beyond 40,000.