Erlang sublist function performance

2019-05-25 07:04发布

I'm reading http://learnyousomeerlang.com/ which includes a tail recursive sublist function that reverses the list to preserve order. I have written an alternative that does not require the reverse call. Is mine more efficient (it is certainly more verbose, but I'm not concerned with that), or have I overlooked something?

-module(sublist).

-export([sublist/2,sublistR/2]).

-include_lib("eunit/include/eunit.hrl").

sublist(_,0) ->
    [];
sublist([],_) ->
    [];
sublist(List,Length) ->
    sublist([hd(List)], tl(List), Length-1).

sublist(Acc,[],_) ->
    Acc;
sublist(Acc,_,0) ->
    Acc;
sublist(Acc,Tail,Length) ->
    sublist(Acc++[hd(Tail)], tl(Tail), Length-1).

sublistR(L, N) -> lists:reverse(sublistR(L, N, [])).

sublistR(_, 0, SubList) -> SubList;
sublistR([], _, SubList) -> SubList;
sublistR([H|T], N, SubList) when N > 0 ->
    sublistR(T, N-1, [H|SubList]).

sublist_test() ->
    sublisttest(fun sublist:sublist/2),
    sublisttest(fun sublist:sublistR/2).

sublisttest(SublistFunc) ->
    [] = SublistFunc([],10),
    [] = SublistFunc([1,2,3], 0),
    [1,2,3] = SublistFunc([1,2,3],3),
    [1,2,3] = SublistFunc([1,2,3],4),
    [1,2] = SublistFunc([1,2,3],2).

标签: erlang
1条回答
Root(大扎)
2楼-- · 2019-05-25 07:22

No. But don't feel bad, this is a common thing everyone has to come to terms with early on.

Its all about this statement:

Acc ++ [hd(Tail)]

Let's say Acc = [1,2,3] is true. Then Acc ++ [hd(Tail)] is directly equivalent to [1,2,3 | [Head]]. What does this mean?

In this particular case it means it is exactly the same as writing out every element in Acc as the car and using the result of [hd(Tail)] as the cdr in a new cons. This means to join a single element to the end of an existing list, the entire list must be traversed (for deconstruction, to expose the final cdr). On the other hand, adding a single element to the beginning of a list is a simple operation.

The magic in the use of lists:reverse/1 (and the entire reason this is idiomatic in languages that use cons style lists) is that it is an array operation implemented in C, not an expansion in the object language you're writing right then.

This issue of access to the last element being expensive VS access to the first element is the reason hd/1 (or head() in other languages) returns the first element of a list, but tl/1 (or tail() in other languages) returns everything but the first element of a list. Uses of a function guaranteed to access the last element of a list are usually surrounded by warnings that "this is slow!".

This is also why you will very rarely see anyone use a foldr in almost any language. If they really need to fold from the right (say, because of a non-commutative operation) they will usually do lists:reverse/1 first, then call foldl. foldl can be tail-recursive, but a true foldr cannot.

Actually, the Erlang documentation has a mention of this: http://www.erlang.org/doc/man/lists.html#foldl-3

The cost of folding from the right increases as the length of the list increases. That is why this happens:

1> AMillionThings = lists:seq(1, 1000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]
2> OneThing = [0].
[0]
3> timer:tc(fun() -> OneThing ++ AMillionThings end).
{13,
 [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26|...]}
4> timer:tc(fun() -> AMillionThings ++ OneThing end).
{31291,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

13μs vs 31291μs. Ouch.

Also see:

查看更多
登录 后发表回答