GC performance in Erlang

2020-07-23 03:10发布

问题:

I've started programming erlang recently and there are a few things I want to understand regarding GC. As far as I understand there is a generational GC for the private heap of each process and a reference counting GC for the global shared heap. What I would like to know is if there is anyway to get:

  1. How many number of collection cycles?
  2. How many bytes are allocated and deallocated, on a global level or process level?
  3. What are the private heaps, and shared heap sizes? And can we define this as a GC parameter?
  4. How long does it take to collect garbage? The % of time needed?
  5. Is there a way to run a program without GC?

Is there a way to get this kind of information, either with code or using some command when I run an erlang program?

Thanks.

回答1:

  1. To get information for a single process, you can call erlang:process_info(Pid). This will yield (as of Erlang 18.0) the following fields:

    > erlang:process_info(self()).
    [{current_function,{erl_eval,do_apply,6}},
     {initial_call,{erlang,apply,2}},
     {status,running},
     {message_queue_len,0},
     {messages,[]},
     {links,[<0.27.0>]},
     {dictionary,[]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.26.0>},
     {total_heap_size,4184},
     {heap_size,2586},
     {stack_size,24},
     {reductions,3707},
     {garbage_collection,[{min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,7}]},
     {suspending,[]}]
    

    The number of collection cycles for the process is available in the field minor_gcs under the section garbage_collection.

  2. Per Process

    The current heap size for the process is available in the field heap_size from the results above (in words, 4 bytes on a 32-bit VM and 8 bytes on a 64-bit VM). The total memory consumption of the process can be obtained by calling erlang:process_info(Pid, memory) which returns for example {memory,34312} for the above process. This includes call stack, heap and internal structures.

    Deallocations (and allocations) can be traced using erlang:trace/3. If the trace flag is garbage_collection you will received messages on the form {trace, Pid, gc_start, Info} and {trace, Pid, gc_end, Info}. The Info field of the gc_start message contains such things as heap_size and old_heap_size.

    Per System

    Top level statistics of the system can be obtained by erlang:memory/0:

    > erlang:memory().
    [{total,15023008},
     {processes,4215272},
     {processes_used,4215048},
     {system,10807736},
     {atom,202481},
     {atom_used,187597},
     {binary,325816},
     {code,4575293},
     {ets,234816}]
    

    Garbage collection statistics can be obtained via erlang:statistics(garbage_collection) which yields:

    > statistics(garbage_collection).
    {85,23961,0}
    

    Where (as of Erlang 18.0) the first field is the total number of garbage collections performed by the VM and the second field is the total number of words reclaimed.

  3. The heap sizes for a process are available under the fields total_heap_size (all heap fragments and stack) and heap_size (the size of the youngest heap generation) from the process info above.

    They can be controlled via spawn options, specifically min_heap_size which sets the initial heap size for a process.

    To set it for all process, erlang:system_flag(min_heap_size, MinHeapSize) can be called.

    You can also control global VM memory allocation via the +M... options to the Erlang VM. The flags are described here. However, this requires extensive knowledge about the internals of the Erlang VM and its allocators and using them should not be taken lightly.

  4. This can be obtained via the tracing described in answer 2. If you use the option timestamp when tracing, you will receive a timestamp with each trace message that can be used to calculate the total GC time.

  5. Short answer: no.

    Long answer: Maybe. You can control the initial heap size (via min_heap_size) which will affect when garbage collection will occur the first time. You can also control when a full sweep will be performed with the fullsweep_after option.

More information can be found in the Academic and Historical Questions and Processes section of the Efficiency Guide.

The most practical way of introspecting Erlang memory usage at runtime is via the Recon library, as Steve Vinoski mentioned.