Monday, August 10, 2009

Daemons in Erlang

I've written Linux daemons in Python & C/C++ before and I must say that enough documentation is available online to make this experience as smooth as I believe it can be. Granted there are some interesting cases to take care of (e.g. zombies) but overall the procedure is fairly straightforward. Now my new found love in Erlang brings me once again to writing daemons.

First attempt
For my first go, I inspired myself with the procedure I had used with both Python & C/C++:
  • Use /var/run as a registry to hold the running state of the daemon
  • For starting the daemon, check /var/run for a running PID & if found, assume it is that of the daemon and abort
  • For stopping the daemon, check /var/run for a running PID & if not found, assume no daemon instance is running and thus spawn one
It turned out that to work OK but it seemed utterly complex and inflexible given that Erlang provides such excellent communication capabilities. On my next project, I had to try something else.
My second attempt
This time around, I relied on Erlang's intrinsic distributed programming capabilities to help manage the life-cycle of the daemon.
  1. A Python script serves as top-level manager ( it could very well be written in bash or Perl but I happen to like Python a lot and given its quasi ubiquity on Linux based distros, it is a safe operational choice ). The said script exports the two familiar management commands: start and stop.
  2. Start procedure: the manager script spawns a controller escript (Erlang Script) with for command-line parameter "status" . The return code ( exported through Erlang's erlang:halt/1 function ) is inspected by the manager script to determine if a daemon is already running. If not, the manager script spawns (with the option -detached) the actual Erlang based daemon.
  3. Stop procedure: the manager script spawns a controller escript with for command-line parameter "stop". The said controller script uses Erlang's rpc module to query a potential running daemon for its process PID (the Erlang emulator system PID (retrievable through os:getpid/0) i.e. not the PID of the daemon running process in the Erlang emulator). If the said RPC call succeed, the controller script issues a system command "kill -9 PID".
The communication between the controller script and the actual daemon is done through RPC and requires that both ends register with EPMD (Erlang's Port Mapper Daemon). I used the "short naming" convention to achieve this (the -sname option for erl).

Example of controller script written in escript:



#!/usr/bin/env escript
%% -*- erlang -*-
%%! -sname etrx_control
%%
%% @author Jean-Lou Dupont
%%

code_ok() -> 0.
code_error() -> 1.
code_daemon_found() -> 2.
code_daemon_not_found() -> 3.
code_lib_not_found() -> 4.
code_node_not_found() -> 5.


err_lib() -> "erlang-transmission not found".
err_daemon() -> "daemon not found".
err_node() -> "transmission node not found".

msg_pid() -> "daemon found, pid: ".
msg_usage() -> "usage: etrx_control [-q] [status|stop]".
msg_kill() -> "stop command sent".


main(["-q", "stop"]) -> run(quiet, stop);
main(["-q", "status"]) -> run(quiet, status);
main(["stop"]) -> run(verbose, stop);
main(["status"]) -> run(verbose, status);

main([]) ->
msg(verbose, code_ok(), msg_usage()),
halt(code_ok());

main([_Cmd]) ->
msg(verbose, code_ok(), msg_usage()),
halt(code_error()).


run(Feedback, stop) ->

add_cwd(),

case getstatus() of
daemon_not_found ->
msg(Feedback, code_daemon_not_found(), err_daemon());

{pid, Pid} ->
os:cmd("kill -9 "++Pid),
msg(Feedback, code_ok(), msg_kill());

{error, lib_not_found} ->
msg(Feedback, code_lib_not_found(), err_lib());

{error, node_not_found} ->
msg(Feedback, code_node_not_found(), err_node())
end;


run(Feedback, status) ->

%%for development
add_cwd(),

case getstatus() of
daemon_not_found ->
msg(Feedback, code_daemon_found(), err_daemon());

{pid, Pid} ->
msg(Feedback, code_daemon_found(), msg_pid(), Pid);

{error, lib_not_found} ->
msg(Feedback, code_lib_not_found(), err_lib());

{error, node_not_found} ->
msg(Feedback, code_node_not_found(), err_node())

end.


add_cwd() ->
{ok,Cwd}=file:get_cwd(),
Cp=Cwd++"/ebin",
code:add_pathsa([Cp]).


getstatus() ->
try
Status=rpc(status),
case Status of
rpcerror ->
daemon_not_found;

{pid, Pid} ->
{pid, Pid}
end
catch
error:undef ->
{error, lib_not_found};

_X:_Y ->
{error, node_not_found}
end.


msg(Feedback, Code, Msg) ->
case Feedback of
verbose ->
io:format("etrx_control: ~s~n", [Msg]);
_ ->
ok
end,
halt(Code).

msg(Feedback, Code, Msg1, Msg2) ->
msg(Feedback, Code, Msg1++Msg2).


%%%%%%%%%%%%%%
%% RPC related
%%%%%%%%%%%%%%

rpc(Command) ->
case dorpc(Command) of
rpcerror ->
daemon_not_found;

Response ->
Response
end.



dorpc(Message) ->
Node=tools:make_node(transmission),

case rpc:call(Node, transmission_daemon, api, [Message], 2000) of
{badrpc, _Reason} ->
rpcerror;

Other ->
Other
end.