What's the fastest/most efficient way to count

2019-02-09 09:28发布

问题:

Given a string string, what is the fastest/most-efficient way to count lines therein? Will accept best answers for any flavour of Rebol. I've been working under the assumption that the parse [some [thru]] combination was the fastest way to traverse a string, but then I don't know that for certain, hence turning to SO:

count-lines: func [string [string!] /local count][
    parse/all string [
        (count: 1) some [thru newline (count: count + 1)]
    ]
    count
]

Or:

count-lines: func [string [string!] /local count][
    count: 0
    until [
        count: count + 1
        not string: find/tail string newline
    ]
    count
]

And how about counters? How efficient is repeat?

count-lines: func [string [string!]][
    repeat count length? string [
        unless string: find/tail string newline [
            break/return count
        ]
    ]
]

Update: line count goes by the Text Editor principle:

An empty document still has a line count of one. So:

>> count-lines ""
== 1
>> count-lines "^/"
== 2

回答1:

count-lines: func [
    str
    /local sort-str ][
sort-str: sort join str "^/"
1 + subtract index? find/last sort-str "^/" index? find sort-str "^/"
]


回答2:

Enhanced PARSE version, as suggested by BrianH:

i: 1 ; add one as TextMate
parse text [any [thru newline (++ i)]]
print i


回答3:

Here's the best simple non-parse version I can think of:

count-lines: function [text [string!]] [
    i: 1
    find-all text newline [++ i]
    i
]

It uses function and ++ from more recent versions of Rebol, and find-all from either R3 or R2/Forward. You could look at the source of find-all and inline what you find and optimize, but situations like this are exactly what we wrote find-all for, so why not use it?



回答4:

Here is the best for me:

temp: read/lines %mytext.txt
length? temp


回答5:

remove-each can be fast as it is native

s: "1^/2^/3"
a: length? s
print a - length? remove-each v s [v = #"^/"]
; >> 2

or as a function

>> f: func [s] [print [(length? s) - (length? remove-each v s [v = #"^/"])]]
>> f "1^/2^/3"
== 2


回答6:

Why no one came with the simplest solution I wonder :)

t: "abc^/de^/f^/ghi"
i: 0 until [i: i + 1 not t: find/tail t newline] i
== 4

Not sure about the performance but I think it's quite fast, as UNTIL and FIND are natives. WHILE could be used as well.

i: 1 while [t: find/tail t newline] [i: i + 1] i
== 4

Just need to check for empty string. And if it would be a function, argument series needs to be HEADed.



回答7:

Not the most efficient, but probably one of the fastest solution (anyway if a benchmark is run, I would like to see how this solution performs):

>> s: "1^/2^/ ^/^/3"
>> (length? s) - length? trim/with copy s newline
== 4


回答8:

Do not know about performance, and the last line rule (r3).

>> length? parse "1^/2^/3" "^/"
== 3


回答9:

hehehe the read/lines length? temp is a great thing I though about read/lines -> foreach lines temps [ count: count + 1]

another way to do it would be to do

temp: "line 1 ^M line2 ^M  line3 ^M "
length? parse temp newline ; that cuts the strings into a block 
;of multiple strings that represent each a line [ "line 1" "line2" "line3" ] 
:then you count how much  strings you have in the block with length? 

I like to code in rebol it is so funny

Edit I didnt read the whole post so my solution already waas proposed in a different way...

ok to amend for my sin of posting a already posted solution I will bring insight comment of a unexpected behavior of that solution. Multiple chained carriage returns are not counted (using rebol3 linux ...)

>> a: "line1 ^M line2 ^M line3 ^M^M"
== "line1 ^M line2 ^M line3 ^M^M"

>> length? parse a newline 
== 3