Number crunching in Ruby (optimisation needed)

2019-07-22 05:22发布

Ruby may not be the optimal language for this but I'm sort of comfortable working with this in my terminal so that's what I'm going with.

I need to process the numbers from 1 to 666666 so I pin out all the numbers that contain 6 but doesn't contain 7, 8 or 9. The first number will be 6, the next 16, then 26 and so forth. Then I needed it printed like this (6=6) (16=6) (26=6) and when I have ranges like 60 to 66 I need it printed like (60 THRU 66=6) (SPSS syntax).

I have this code and it works but it's neither beautiful nor very efficient so how could I optimize it?

(silly code may follow)

class Array
  def to_ranges
    array = self.compact.uniq.sort
    ranges = []
    if !array.empty?
      # Initialize the left and right endpoints of the range
      left, right = array.first, nil
      array.each do |obj|
        # If the right endpoint is set and obj is not equal to right's successor 
        # then we need to create a range.
        if right && obj != right.succ
          ranges << Range.new(left,right)
          left = obj
        end
        right = obj
      end
      ranges << Range.new(left,right) unless left == right
    end
    ranges
  end
end

write = ""
numbers = (1..666666).to_a

# split each number in an array containing it's ciphers
numbers = numbers.map { |i| i.to_s.split(//) }

# delete the arrays that doesn't contain 6 and the ones that contains 6 but also 8, 7 and 9
numbers = numbers.delete_if { |i| !i.include?('6') }
numbers = numbers.delete_if { |i| i.include?('7') }
numbers = numbers.delete_if { |i| i.include?('8') }
numbers = numbers.delete_if { |i| i.include?('9') }

# join the ciphers back into the original numbers
numbers = numbers.map { |i| i.join }

numbers = numbers.map { |i| i = Integer(i) }

# rangify consecutive numbers
numbers = numbers.to_ranges

# edit the ranges that go from 1..1 into just 1
numbers = numbers.map do |i|
  if i.first == i.last
    i = i.first
  else
    i = i
  end
end

# string stuff
numbers = numbers.map { |i| i.to_s.gsub(".."," thru ") }

numbers = numbers.map { |i| "(" + i.to_s + "=6)"}

numbers.each { |i| write << " " + i }

File.open('numbers.txt','w') { |f| f.write(write) }

As I said it works for numbers even in the millions - but I'd like some advice on how to make prettier and more efficient.

10条回答
▲ chillily
2楼-- · 2019-07-22 05:47

Exploiting patterns in the numbers, you can short-circuit lots of the loops, like this:

If you define a prefix as the 100s place and everything before it, and define the suffix as everything in the 10s and 1s place, then, looping through each possible prefix:

  • If the prefix is blank (i.e. you're testing 0-99), then there are 13 possible matches
  • elsif the prefix contains a 7, 8, or 9, there are no possible matches.
  • elsif the prefix contains a 6, there are 49 possible matches (a 7x7 grid)
  • else, there are 13 possible matches. (see the image below)

enter image description here

(the code doesn't yet exclude numbers that aren't specifically in the range, but it's pretty close)

number_range = (1..666_666)
prefix_range = ((number_range.first / 100)..(number_range.last / 100))

for p in prefix_range
  ps = p.to_s

  # TODO: if p == prefix_range.last or p == prefix_range.first,
  # TODO: test to see if number_range.include?("#{ps}6".to_i), etc...

  if ps == '0'
    puts "(6=6) (16=6) (26=6) (36=6) (46=6) (56=6) (60 thru 66) "

  elsif ps =~ /7|8|9/
    # there are no candidate suffixes if the prefix contains 7, 8, or 9.

  elsif ps =~ /6/
    # If the prefix contains a 6, then there are 49 candidate suffixes
    for i in (0..6)
      print "(#{ps}#{i}0 thru #{ps}#{i}6) "
    end
    puts

  else
    # If the prefix doesn't contain 6, 7, 8, or 9, then there are only 13 candidate suffixes.
    puts "(#{ps}06=6) (#{ps}16=6) (#{ps}26=6) (#{ps}36=6) (#{ps}46=6) (#{ps}56=6) (#{ps}60 thru #{ps}66) "

  end
end

Which prints out the following:

(6=6) (16=6) (26=6) (36=6) (46=6) (56=6) (60 thru 66) 
(106=6) (116=6) (126=6) (136=6) (146=6) (156=6) (160 thru 166) 
(206=6) (216=6) (226=6) (236=6) (246=6) (256=6) (260 thru 266) 
(306=6) (316=6) (326=6) (336=6) (346=6) (356=6) (360 thru 366) 
(406=6) (416=6) (426=6) (436=6) (446=6) (456=6) (460 thru 466) 
(506=6) (516=6) (526=6) (536=6) (546=6) (556=6) (560 thru 566) 
(600 thru 606) (610 thru 616) (620 thru 626) (630 thru 636) (640 thru 646) (650 thru 656) (660 thru 666) 
(1006=6) (1016=6) (1026=6) (1036=6) (1046=6) (1056=6) (1060 thru 1066) 
(1106=6) (1116=6) (1126=6) (1136=6) (1146=6) (1156=6) (1160 thru 1166) 
(1206=6) (1216=6) (1226=6) (1236=6) (1246=6) (1256=6) (1260 thru 1266) 
(1306=6) (1316=6) (1326=6) (1336=6) (1346=6) (1356=6) (1360 thru 1366) 
(1406=6) (1416=6) (1426=6) (1436=6) (1446=6) (1456=6) (1460 thru 1466) 
(1506=6) (1516=6) (1526=6) (1536=6) (1546=6) (1556=6) (1560 thru 1566) 
(1600 thru 1606) (1610 thru 1616) (1620 thru 1626) (1630 thru 1636) (1640 thru 1646) (1650 thru 1656) (1660 thru 1666) 

etc...

查看更多
你好瞎i
3楼-- · 2019-07-22 05:49

My first answer was trying to be too clever. Here is a much simpler version

class MutablePrintingCandidateRange < Struct.new(:first, :last)
  def to_s
    if self.first == nil and self.last == nil
      ''
    elsif self.first == self.last
      "(#{self.first}=6)"
    else
      "(#{self.first} thru #{self.last})"
    end
  end

  def <<(x)
    if self.first == nil and self.last == nil
      self.first = self.last = x
    elsif self.last == x - 1
      self.last = x
    else
      puts(self) # print the candidates
      self.first = self.last = x # reset the range
    end
  end

end

and how to use it:

numer_range = (1..666_666)
current_range = MutablePrintingCandidateRange.new

for i in numer_range
  candidate = i.to_s

  if candidate =~ /6/ and candidate !~ /7|8|9/
    # number contains a 6, but not a 7, 8, or 9
    current_range << i
  end
end
puts current_range
查看更多
Ridiculous、
4楼-- · 2019-07-22 05:57

Basic observation: If the current number is (say) 1900 you know that you can safely skip up to at least 2000...

查看更多
贼婆χ
5楼-- · 2019-07-22 06:00

I came up with this piece of code, which I tried to keep more or less in FP-styling. Probably not much more efficient (as it has been said, with basic number logic you will be able to increase performance, for example by skipping from 19xx to 2000 directly, but that I will leave up to you :)

def check(n)
    n = n.to_s
    n.include?('6') and
    not n.include?('7') and
    not n.include?('8') and
    not n.include?('9')
end

def spss(ranges)
  ranges.each do |range|
    if range.first === range.last
      puts "(" + range.first.to_s + "=6)"
    else
      puts "(" + range.first.to_s + " THRU " + range.last.to_s + "=6)"
    end
  end
end

range = (1..666666)

range = range.select { |n| check(n) }

range = range.inject([0..0]) do |ranges, n|
  temp = ranges.last
  if temp.last + 1 === n
    ranges.pop
    ranges.push(temp.first..n)
  else
    ranges.push(n..n)
  end
end

spss(range)
查看更多
做自己的国王
6楼-- · 2019-07-22 06:01

My answer below is not complete, but just to show a path (I might come back and continue the answer):

There are only two cases:

1) All the digits besides the lowest one is either absent or not 6

6, 16, ...

2) At least one digit besides the lowest one includes 6

60--66, 160--166, 600--606, ...

Cases in (1) do not include any continuous numbers because they all have 6 in the lowest digit, and are different from one another. Cases in (2) all appear as continuous ranges where the lowest digit continues from 0 to 6. Any single continuation in (2) is not continuous with another one in (2) or with anything from (1) because a number one less than xxxxx0 will be xxxxy9, and a number one more than xxxxxx6 will be xxxxxx7, and hence be excluded.

Therefore, the question reduces to the following:

3)

Get all strings between "" to "66666" that do not include "6"
For each of them ("xxx"), output the string "(xxx6=6)"

4)

Get all strings between "" to "66666" that include at least one "6"
For each of them ("xxx"), output the string "(xxx0 THRU xxx6=6)"

查看更多
叛逆
7楼-- · 2019-07-22 06:06

Note I don't speak ruby, but I intend to dohave done a ruby version later just for speed comparison :)


If you just iterate all numbers from 0 to 117648 (ruby <<< 'print "666666".to_i(7)') and print them in base-7 notation, you'll at least have discarded any numbers containing 7,8,9. This includes the optimization suggestion by MrE, apart from lifting the problem to simple int arithmetic instead of char-sequence manipulations.

All that remains, is to check for the presence of at least one 6. This would make the algorithm skip at most 6 items in a row, so I deem it less unimportant (the average number of skippable items on the total range is 40%).

Simple benchmark to 6666666666

(Note that this means outputting 222,009,073 (222M) lines of 6-y numbers)

Staying close to this idea, I wrote this quite highly optimized C code (I don't speak ruby) to demonstrate the idea. I ran it to 282475248 (congruent to 6666666666 (mod 7)) so it was more of a benchmark to measure: 0m26.5s

#include <stdio.h>

static char buf[11]; 
char* const bufend = buf+10;

char* genbase7(int n)
{
    char* it = bufend; int has6 = 0;
    do
    { 
        has6 |= 6 == (*--it = n%7); 
        n/=7; 
    } while(n);

    return has6? it : 0;
}

void asciify(char* rawdigits)
{
    do { *rawdigits += '0'; } 
    while (++rawdigits != bufend);
}

int main()
{
    *bufend = 0; // init

    long i;
    for (i=6; i<=282475248; i++)
    {
        char* b7 = genbase7(i);
        if (b7)
        {
            asciify(b7);
            puts(b7);
        }
    }
}

I also benchmarked another approach, which unsurprisingly ran in less than half the time because

  • this version directly manipulates the results in ascii string form, ready for display
  • this version shortcuts the has6 flag for deeper recursion levels
  • this version also optimizes the 'twiddling' of the last digit when it is required to be '6'
  • the code is simply shorter...

Running time: 0m12.8s

#include <stdio.h>
#include <string.h>

inline void recursive_permute2(char* const b, char* const m, char* const e, int has6)
{
    if (m<e)
        for (*m = '0'; *m<'7'; (*m)++)
            recursive_permute2(b, m+1, e, has6 || (*m=='6'));
    else
        if (has6)
            for (*e = '0'; *e<'7'; (*e)++)
                puts(b);
        else /* optimize for last digit must be 6 */
            puts((*e='6', b));
}

inline void recursive_permute(char* const b, char* const e)
{
    recursive_permute2(b, b, e-1, 0);
}

int main()
{
    char buf[] = "0000000000"; 
    recursive_permute(buf, buf+sizeof(buf)/sizeof(*buf)-1);
}

Benchmarks measured with:

gcc -O4 t6.c -o t6
time ./t6 > /dev/null
查看更多
登录 后发表回答