Erlang/OTP Forums

Author Message

<  Erlang  ~  How reliable can we be when monitoring/linking processes

darkolee
Posted: Fri Dec 04, 2009 3:33 pm Reply with quote
User Joined: 02 Dec 2009 Posts: 12
Erlang provides the facility to monitor/link processes and be notified when a process dies. How are nodes monitored? - how can a process decided that another process really died and not that it is just slow to respond? (I am trying to determine Erlang's failure detector behaviour in relation to the FLP proof for failure detectors)

Can a process be suspected to have failed but in reality be up and running?

Is this signal actually sent by the local Erlang runtime system to our process, or is this signal trigerred by the remote erlang runtime system upon detecting that a process crashed? - if so what happens when the whole erlang runtime system crashes or is killed?

And finally is there a performance degradation associated with monitoring/linking processes?

Thanks a lot.
View user's profile Send private message
uwiger
Posted: Sun Dec 06, 2009 2:49 pm Reply with quote
User Joined: 03 Jul 2006 Posts: 604 Location: Sweden
darkolee wrote:
Erlang provides the facility to monitor/link processes and be notified when a process dies. How are nodes monitored? - how can a process decided that another process really died and not that it is just slow to respond?


Nodes are monitored using a heartbeat mechanism on the shared communication channel between the nodes. If that heartbeat mechanism times out, the other node will be disconnected, causing process links and monitors to trigger 'EXIT' signals and 'DOWN' messages. This can happen due to severe overload or network problems, in which case the other node may still be running.

In the local case (monitoring a local process) the runtime system will send notifications when it terminates the process, so in this case, there can be no such confusion. In this case, it will of course also notify remote processes that had links or monitors to the dead process.


darkolee wrote:
And finally is there a performance degradation associated with monitoring/linking processes?


Links are kept in a hash table, so it is possible to be linked to a very large number of processes without noticeable performance degradation. There are always corner cases, though. I once was able to kill a node by spawn-linking 100,000 processes from the shell, and crashing the shell process by mis-typing length(processes()) as lenth(processes()). This caused an EXIT message containing 100,000 pids to be replicated to 100,000 linked processes... Embarassed

BR,
Ulf W

_________________
http://www.erlang-consulting.com
View user's profile Send private message Visit poster's website

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum