I have googled for three hours but to no avail.
I have an ejabberd installation which is not installed using apt. It is installed from source and there is no program called ejabberd in it. Start and Stop and everything is through ejabberdctl.
It was running perfectly for a month and all of a sudden one day it stopped with the infamous
kernel pid terminated error
Anytime i do
sudo ejabberdctl start --node ejabberd@MasterService
A erl_crash file gets generated and when i try
ejabberdctl
i get
Failed to connect to RPC at node ejabberd@MasterService
Now what have i tried
- Tried killing all running process of ejabberd, beam, epmd and starting fresh - DID NOT WORK
- Checked /etc/hosts and hostname and all is well. Hostname is provided in hosts file with the IP
- Checked the ejabberdctl.conf file to ensure teh host name is indeed right and the node name is right
- checked .erlange.cookie file is being created with content in it
In all of web one way or another the search led me to either one of the above.
I have nowhere else to go and dont know where else to look. Any help would be much appreciated.
You'll have to analyze the crash dump to try to guess why it failed.
To carry out this task, Erlang has a special webtool (called, uh,
webtool
) from which a special application — Crash Dump Viewer — might be used to load a dump and inspect the stack traces of the Erlang processes at the time of the crash.You have to
Install the necessary packages:
Start an Erlang interpreter:
(Further actions are taken there.)
Run the webtool. In a simplest case, it will listen on the local host:
(Notice the period.) It will print back an URL to navigate in your browser to reach the running tool.
If this happens on a server, and you'd rather like to have the webtool listen on some non-local-host interface, the call encantation would be trickier:
The
{0, 0, 0, 0}
IP spec will make it listen everywhere, and you might as well specify some more sensible octets, like{192, 168, 0, 1}
. Theserver_name
clause might use arbitrary name — this is what will be printed in the generated URL, the server's hostname.Now connect to the tool with your browser, navigate to the "Start tools" menu entry, start crash dump viewer and make a link to it appear in the tool's top menu. Proceed there and find a link to load the crash dump.
After loading a crash dump try to mess around with the tool's interface to look at the stack traces of the active Erlang processes. At least one of them should contain something fishy, which should include an error message — that's what you're looking after to refine your question (or ask another at the ejabberd mailing list).
To quit the tool, run
in the running Erlang interpreter. And then quit it either by running
and waiting a bit or pressing
Ctrl-g
and then entering the letterq
followed by pressing the Return key.The relevant links are: the crash dump viewer manual and the webtool manual.