When pattern matching maps in Erlang, why is this

2019-02-17 15:44发布

问题:

-module(count).
-export([count/1]).

count(L) when is_list(L) ->
  do_count(L, #{});
count(_) ->
  error(badarg).

do_count([], Acc) -> Acc;
do_count([H|T], #{}) -> do_count(T, #{ H => 1 });
do_count([H|T], Acc = #{ H := C }) -> do_count(T, Acc#{ H := C + 1});
do_count([H|T], Acc) -> do_count(T, Acc#{ H => 1 }).

In this example, the third clause where the map key "H" exists and has a count associated with it, will not compile. The compiler complains:

count.erl:11: variable 'H' is unbound    

Why is H unbound?

This works by the way:

do_count([], Acc) -> Acc;
do_count([H|T], Acc) -> do_count(T, maps:update_with(H, fun(C) -> C + 1 end, 1, Acc)).

But it seems like the pattern match ought to work and it doesn't.

回答1:

The answer is pretty much the same as the one I recently gave here: https://stackoverflow.com/a/46268109/240949.

When you use the same variable multiple times in a pattern, as with H in this case:

do_count([H|T], Acc = #{ H := C }) -> ...

the semantics of pattern matching in Erlang say that this is as if you had written

do_count([H|T], Acc = #{ H1 := C }) when H1 =:= H -> ...

that is, they are first bound separately, then compared for equality. But a key in a map pattern needs to be known - it can't be a variable like H1, hence the error (exactly as for field size specifiers in binary patterns, in the answer I linked to).

The main difference in this question is that you have a function head with two separate arguments, and you might think that the pattern [H|T] should be matched first, binding H before the second pattern is tried, but there is no such ordering guarantee; it's just as if you had used a single argument with a tuple pattern {[H|T], #{ H := C }}.



回答2:

Because that kind of match occurs out of context for unification. In fact though it doesn't explicitly forbid this in the docs, the docs do explicitly state only that matches with literals will work in function heads. I believe that there is an effort under way to make this construction work, but not yet.

The issues surrounding unification VS assignment in different contexts within function heads is related to another question about matching internal size values within binaries in function heads that came up the other day.

(Remember, the function head is not just doing assignment, it is also trying to efficiently pick a path of execution. So this isn't actually a straightforward issue.)

All that said, a more Erlangish (and simpler) version of your count/1 function could be:

count(Items) ->
    count(Items, #{}).

count([], A) ->
    A;
count([H | T], A) ->
   NewA = maps:update_with(H, fun(V) -> V + 1 end, 1, A),
   count(T, NewA).

The case you are writing against was forseen by the stdlib, and we have a nifty solution in the maps module called maps:update_with/4.

Note that we didn't name count/2 a new name. Unless necessary in the program, it is usually easier to name a helper function with a different arity the same thing when doing explicit recursion. A function's identity is Name/Arity, so these are two totally separate functions whether or not the label is the same. Also, notice that we didn't check the argument type because we have an explicit match in count/2 that can only ever match a list and so will throw a bad_arg exception anyway.

Sometimes you will want polymorphic arguments in Erlang, and typechecking is appropriate. You almost never want defensive code in Erlang, though.

Session with a module called foo:

1> c(foo).
{ok,foo}
2> foo:count([1,3,2,4,4,2,2,2,4,4,1,2]).
#{1 => 2,2 => 5,3 => 1,4 => 4}

BUT

We want to avoid explicit recursion unless there is a call for it, as we have all these nifty listy functional abstractions laying about in the stdlib. What you are really doing is trying to condense a list of values into an arbitrarily aggregated single value and that is by definition a fold. So we could rewrite the above perhaps more idiomatically as:

count2(Items) ->
    Count = fun(I, A) -> maps:update_with(I, fun(V) -> V + 1 end, 1, A) end,
    lists:foldl(Count, #{}, Items).

And we get:

3> foo:count2([1,3,2,4,4,2,2,2,4,4,1,2]).
#{1 => 2,2 => 5,3 => 1,4 => 4}

Regarding case...

What I wrote about unification in a function head holds -- for function heads because they are a completely blank unification context. Richard's answer provides just the best shorthand for remembering why this is crazy:

f(A, #{A := _})

is equivalent to

f(A, #{B := _}) when B =:= A

And that's just not going to fly. His comparison to tuple matching is spot on.

...but...

In a case where the primary objects have already been assigned this all works just fine. Because, as Richard helpfully mentioned in a comment, there is only one A in the case below.

1> M = #{1 => "one", 2 => "two"}.
#{1 => "one",2 => "two"}
2> F = 
2>   fun(A) ->
2>     case M of
2>       #{A := B} -> B;
2>       _         -> "Oh noes! Not a key!"
2>     end
2>   end.
#Fun<erl_eval.6.87737649>
3> F(1).
"one"
4> F(2).
"two"
5> F(3).
"Oh noes! Not a key!"

So that may feel a bit idiosyncratic, but it makes sense based on the rules of matching/unification. And means you can write your do_count/2 the way you did above using a case inside of a function, but not as a set of function heads.



回答3:

I made up this rule for myself: when using maps in the head of a function clause, the order of matching is not guaranteed. As a result, in your example you can't count on a [H|T] match to provide a value for H.

Several features of maps look like they should work, and Joe Armstrong says they should work, but they don't. It's a dumb part of erlang. Witness my incredulity here: https://bugs.erlang.org/browse/ERL-88

Simpler examples:

do_stuff(X, [X|Y]) ->
    io:format("~w~n", [Y]).

test() ->
    do_stuff(a, [a,b,c]).

4> c(x).
{ok,x}

5> x:test().
[b,c]
ok

But:

-module(x).
-compile(export_all).

do_stuff(X, #{X := Y}) ->
    io:format("~w~n", [Y]).

test() ->
    do_stuff(a, #{a => 3}).


8> c(x).
x.erl:4: variable 'X' is unbound