I am trying to start using erlang:trace/3 and the dbg module to trace the behaviour of a live production system without taking the server down.
The documentation is opaque (to put it mildly) and there don't appear to be any useful tutorials online.
What I spent all day trying to do was capture what was happening in a particular function by trying to apply a trace to module:function using dbg:c and dbg:p but with no success at all...
Does anyone have a succinct explanation of how to use trace in a live Erlang system?
On live systems we rarely trace to shell. If the system is well configured then it is already collecting your Erlang logs that were printed to the shell. I need not emphasize why this is crucial in any live node...
Let me elaborate on tracing to files:
It is possible to trace to file, which will produce a binary output that can be converted and parsed later. (for further analysis or automated controlling system, etc.)
An example could be:
Trace to multiple files wrapped (12x50 Mbytes).Please always check the available disk space before using such a big trace!
dbg:p(all,[call,timestamp,return_to]).
That said let's have a look at a basic tracing command sequence:
<1>
dbg:stop_clear().
<2>
dbg:tracer().
<3>
dbg:p(all,[call, timestamp]).
<4>
dbg:tp( ... ).
<5>
dbg:tpl( ... ).
<42>
dbg:stop_clear().
You can:
add triggers by defining some fun()-s in the shell to stop the trace at a given time or event. Recursive fun()-s are the best to achieve this, but be very careful when applying those.
apply a vast variety of pattern matching to ensure that you only trace for the specific process with the specific function call with the specific type of arguments...
I had an issue a while back, when we had to check the content of an ETS table and on appearance of a certain entry we had to stop the trace within 2-3 minutes.
I also suggest the book Erlang Programming written by Francesco Cesarini. (Erlang Programming @ Amazon)
The basic steps of tracing for function calls are on a non-live node:
You can trace for multiple functions at the same time. Add functions by calling
tp
for each function. If you want to trace for non-exported functions, you need to calltpl
. To remove functions, callctp
orctpl
in a similar manner. Some general tp calls are:The last argument is a match specification. You can play around with that by using
dbg:fun2ms
.You can select the processes to trace on with the call to p(). The items are described under erlang:trace. Some calls are:
I guess you will never need to directly call
erlang:trace
, asdbg
does pretty much everything for you.A golden rule for a live node is to generate only an amount of trace output to the shell, which lets you to type in
dbg:stop_clear().
. :)I often use a tracer that will auto-stop itself after a number of events. For example:
If you are looking for debugging on remote nodes (or multiple nodes), search for
pan
,eper
,inviso
oronviso
.The 'dbg' module is quite low-level stuff. There are two hacks that I use very frequently for the tasks that I commonly need.
Use the Erlang CLI/shell expansion code at http://www.snookles.com/erlang/user_default.erl. It was originally written (as far as I know) by Serge Aleynikov and has been a useful "so that's how I add custom functions to the shell" example. Compile the module and edit your ~/.erlang file to point to its path (see comment at the top of the file).
Use the "redbug" utility that's bundled with in the EPER collection of utilities. It's very easy to use 'dbg' to create millions of trace events in a few seconds. Doing so in a production environment can be disastrous. For development or production use, redbug makes it nearly impossible to kill a running system with a trace-induced overload.
If you would prefer a graphical tracer then try erlyberly. It allows you to select the functions you would like to trace (on all processes at the moment) and deals with the dbg API.
However it does not protect against overload so is not suitable for production systems.