Erlang Mailing Lists

Author Message

<  Erlang questions mailing list  ~  Supervisor shutdown

HEINRICH.VENTER at tebaba
Posted: Tue Nov 25, 2003 8:49 am Reply with quote
Guest
Hello List

I have a bit of an odd question regarding the shut down of applications. I have the following sequence

application -> supervisor -> gen_server

The gen_server does all the weird and wonderful stuff and needs to be shut down in an orderly fashion. To do that I have a terminate/2 function.
BUT, If I stop the application with application:stop(myapp). what happens is that the supervisor process gets terminated and thus the gen_server gets terminated too (without calling the terminate/2 function).
Then I though I'd be clever and use the exported application stop/1 function to call the supervisor:terminate_child/2 function, but this also just kills the child process without calling the termiante/2 function.
Next I thought I'd kill the gen_server directly in the stop/1 function with my stop/0 function in the gen_server. But then the supervisor jumped in and restarted the gen_server as soon as it realised that it is dead.

So how do I stop the application and ensure that my gen_server terminate code gets executed??

-]-[einrich

#####################################################################################
The information contained in this message and or attachments is intended
only for the person or entity to which it is addressed and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon,
this information by persons or entities other than the intended recipient
is prohibited. If you received this in error, please contact the sender and
delete the material from any system and destroy all copies.
#####################################################################################


Post generated using Mail2Forum (http://m2f.sourceforge.net)
gunilla at erix.ericsson.
Posted: Tue Nov 25, 2003 9:24 am Reply with quote
Guest
Snip from the gen_server man page:

"If the gen_server is part of a supervision tree and is
ordered by its supervisor to terminate, this function will
be called with Reason=shutdown if the following
conditions apply:
* the gen_server has been set to trap exit signals, and
* the shutdown strategy as defined in the
supervisor's child specification is an integer
timeout value, not brutal_kill. "

Best regards, Gunilla

HEINRICH VENTER wrote:
>
> Hello List
>
> I have a bit of an odd question regarding the shut down of applications. I have the following sequence
>
> application -> supervisor -> gen_server
>
> The gen_server does all the weird and wonderful stuff and needs to be shut down in an orderly fashion. To do that I have a terminate/2 function.
> BUT, If I stop the application with application:stop(myapp). what happens is that the supervisor process gets terminated and thus the gen_server gets terminated too (without calling the terminate/2 function).
> Then I though I'd be clever and use the exported application stop/1 function to call the supervisor:terminate_child/2 function, but this also just kills the child process without calling the termiante/2 function.
> Next I thought I'd kill the gen_server directly in the stop/1 function with my stop/0 function in the gen_server. But then the supervisor jumped in and restarted the gen_server as soon as it realised that it is dead.
>
> So how do I stop the application and ensure that my gen_server terminate code gets executed??
>
> -]-[einrich
>
> #####################################################################################
> The information contained in this message and or attachments is intended
> only for the person or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender and
> delete the material from any system and destroy all copies.
> #####################################################################################

--
_____Gunilla Arendt______________________________________________
EAB/UKH/KD Erlang/OTP Product Development
Gunilla.Arendt_at_ericsson.com +46-8-7275730 ecn 851 5730


Post generated using Mail2Forum (http://m2f.sourceforge.net)
vlad_dumitrescu at hotmai
Posted: Tue Nov 25, 2003 9:32 am Reply with quote
Guest
From: "HEINRICH VENTER" <HEINRICH.VENTER_at_tebabank.com>
> So how do I stop the application and ensure that my gen_server terminate code
gets executed??

Hi,

It should work without any tweaks. From documentation of gen_server, the
terminate callback:

If the gen_server is part of a supervision tree and is ordered by its supervisor
to terminate, this function will be called with Reason=shutdown if the following
conditions apply:
-the gen_server has been set to trap exit signals, and
-the shutdown strategy as defined in the supervisor's child specification is
an integer timeout value, not brutal_kill.

Maybe these conditions aren't met?

regards,
Vlad


Post generated using Mail2Forum (http://m2f.sourceforge.net)
ulf.wiger at ericsson.com
Posted: Tue Nov 25, 2003 10:04 am Reply with quote
Guest
On Tue, 25 Nov 2003 10:32:35 +0100, Vlad Dumitrescu
<vlad_dumitrescu_at_hotmail.com> wrote:

> From: "HEINRICH VENTER" <HEINRICH.VENTER_at_tebabank.com>
>> So how do I stop the application and ensure that my gen_server
>> terminate code
> gets executed??
>
> Hi,
>
> It should work without any tweaks. From documentation of gen_server, the
> terminate callback:
>
> If the gen_server is part of a supervision tree and is ordered by its
> supervisor to terminate, this function will be called with
> Reason=shutdown if the following conditions apply:
> -the gen_server has been set to trap exit signals, and
> -the shutdown strategy as defined in the supervisor's
> child specification is an integer timeout value,
> not brutal_kill.

...and the gen_server has been started with gen_server:start_link(...),
not gen_server:start(...).

/Uffe
--
Ulf Wiger, Senior System Architect
EAB/UPD/S


Post generated using Mail2Forum (http://m2f.sourceforge.net)
vlad_dumitrescu at hotmai
Posted: Tue Nov 25, 2003 10:34 am Reply with quote
Guest
From: "Ulf Wiger" <ulf.wiger_at_ericsson.com>
> >> So how do I stop the application and ensure that my gen_server
> >> terminate code
> > gets executed??
> >
> > If the gen_server is part of a supervision tree and is ordered by its
> > supervisor to terminate, this function will be called with
> > Reason=shutdown if the following conditions apply:
> > -the gen_server has been set to trap exit signals, and
> > -the shutdown strategy as defined in the supervisor's
> > child specification is an integer timeout value,
> > not brutal_kill.
>
> ...and the gen_server has been started with gen_server:start_link(...),
> not gen_server:start(...).

Mmmm, not necessarily with start_link. The child can use any function to start,
but it is *required* to create the link in some way.
However, I haven't seen yet any child using something else than start_link :-)

BTW, wouldn't it be better if the supervisor would check that the link was
created, or maybe better, create it itself? Since the specs make the link
mandatory, why not impose the requirement in the code too?

regards,
Vlad


Post generated using Mail2Forum (http://m2f.sourceforge.net)
gunilla at erix.ericsson.
Posted: Tue Nov 25, 2003 10:52 am Reply with quote
Guest
No, the supervisor *is* required to call start_link!
Simply linking is not enough, as the gen_server can be linked to any
process but it must also be able to separate EXIT signals from its
supervisor from other EXIT signals! That is what start_link ensures.

And yes, there are alternative start strategies that in retrospect
might have been better choices. For example, simply providing
the supervisor with the name of the callback module(s) and make it
responsible for starting its child processes the correct way.

Best regards, Gunilla


Vlad Dumitrescu wrote:
>
> From: "Ulf Wiger" <ulf.wiger_at_ericsson.com>
> > >> So how do I stop the application and ensure that my gen_server
> > >> terminate code
> > > gets executed??
> > >
> > > If the gen_server is part of a supervision tree and is ordered by its
> > > supervisor to terminate, this function will be called with
> > > Reason=shutdown if the following conditions apply:
> > > -the gen_server has been set to trap exit signals, and
> > > -the shutdown strategy as defined in the supervisor's
> > > child specification is an integer timeout value,
> > > not brutal_kill.
> >
> > ...and the gen_server has been started with gen_server:start_link(...),
> > not gen_server:start(...).
>
> Mmmm, not necessarily with start_link. The child can use any function to start,
> but it is *required* to create the link in some way.
> However, I haven't seen yet any child using something else than start_link Smile
>
> BTW, wouldn't it be better if the supervisor would check that the link was
> created, or maybe better, create it itself? Since the specs make the link
> mandatory, why not impose the requirement in the code too?
>
> regards,
> Vlad

--
_____Gunilla Arendt______________________________________________
EAB/UKH/KD Erlang/OTP Product Development
Gunilla.Arendt_at_ericsson.com +46-8-7275730 ecn 851 5730


Post generated using Mail2Forum (http://m2f.sourceforge.net)
HEINRICH.VENTER at tebaba
Posted: Tue Nov 25, 2003 11:15 am Reply with quote
Guest
> -the gen_server has been set to trap exit signals, and
I seem to have missed this bit. Now that it traps the exit signals with

process_flag(trap_exit, true)

Everything works out smoothely, or rather more smoothely.

I have a spanner to throw in the works now however :)

The gen_sever spawns some worker processes that need to be termianted. These processes might not terminate immediately when the stop message is sent to them. (I have discovered a falw in the curent design but will post to a separate thread about this) The gen_server can not terminate before all its workers have terminated in an orderly fasion, BUT since the stop function executes in a handle_call and the workers notify the gen_server of their termination through a different handle_call.
This brings me to the {noreply, State} mentioned in another thread.
How do I know the PID where to send the response at the later time when I am ready to quit? The process dictionary perhaps? Then use the function that notifies the gen_server of the process termination to check if all the processes are down and then reply with the {stop, normal, ok, State} message instead of {reply, State}
Does this make sense?

-]-[einrich

#####################################################################################
The information contained in this message and or attachments is intended
only for the person or entity to which it is addressed and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon,
this information by persons or entities other than the intended recipient
is prohibited. If you received this in error, please contact the sender and
delete the material from any system and destroy all copies.
#####################################################################################


Post generated using Mail2Forum (http://m2f.sourceforge.net)
raimo at erix.ericsson.se
Posted: Tue Nov 25, 2003 11:42 am Reply with quote
Guest
The idea of tossing around a State variable in the gen_server is to
store such things as a pid() for a reply. State is commonly a record, so
just add a field. Using the process dictionary is more "ugly", since it
is used by e.g the gen_server code itself.

If your worker processes have such an intircate protocol with the
gen_server you might think about if they should form a supervision tree
of their own, and your current gen_server should be a supervisor. It is
a common tactic to not let the worker processes trap exit, and just get
instantly killed when the gen_server exits, provided you can make them
stupid enough. Making a new supervision tree is probably too heavy, though.

--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB




HEINRICH VENTER wrote:
>> -the gen_server has been set to trap exit signals, and
>
> I seem to have missed this bit. Now that it traps the exit signals with
>
> process_flag(trap_exit, true)
>
> Everything works out smoothely, or rather more smoothely.
>
> I have a spanner to throw in the works now however Smile
>
> The gen_sever spawns some worker processes that need to be termianted. These processes might not terminate immediately when the stop message is sent to them. (I have discovered a falw in the curent design but will post to a separate thread about this) The gen_server can not terminate before all its workers have terminated in an orderly fasion, BUT since the stop function executes in a handle_call and the workers notify the gen_server of their termination through a different handle_call.
> This brings me to the {noreply, State} mentioned in another thread.
> How do I know the PID where to send the response at the later time when I am ready to quit? The process dictionary perhaps? Then use the function that notifies the gen_server of the process termination to check if all the processes are down and then reply with the {stop, normal, ok, State} message instead of {reply, State}
> Does this make sense?
>
> -]-[einrich
>
> #####################################################################################
> The information contained in this message and or attachments is intended
> only for the person or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender and
> delete the material from any system and destroy all copies.
> #####################################################################################



Post generated using Mail2Forum (http://m2f.sourceforge.net)
ulf.wiger at ericsson.com
Posted: Tue Nov 25, 2003 11:57 am Reply with quote
Guest
On Tue, 25 Nov 2003 13:09:46 +0200, HEINRICH VENTER
<HEINRICH.VENTER_at_tebabank.com> wrote:

> The gen_sever spawns some worker processes that need to be termianted.
> These processes might not terminate immediately when the stop message is
> sent to them. (I have discovered a falw in the curent design but will
> post to a separate thread about this) The gen_server can not terminate
> before all its workers have terminated in an orderly fasion, BUT since
> the stop function executes in a handle_call and the workers notify the
> gen_server of their termination through a different handle_call.

You can use monitors to make sure that your server doesn't
hang while waiting for a child process that may have died trying
to do the orderly shutdown:

terminate_children(Children) ->
MRefs = lists:foldl(
fun(Child, Acc) ->
MRef = erlang:monitor(process, Child),
Child ! {self(), shutdown},
[MRef | Acc]
end, [], Children),
collect_mrefs(MRefs).

collect_mrefs(MRrefs) ->
timer:sleep(100),
collect_mrefs(MRefs, []).

collect_mrefs(Rest, Left) ->
receive
{'DOWN',MRef,process,_,_} ->
collect_mrefs(Rest, Left)
after 0 ->
collect_mrefs(MRefs, [MRef|Left])
end;
collect_mrefs([], [_|_] = Left) ->
collect_mrefs(Left);
collect_mrefs([], []) ->
done.

If you want to be really paranoid, you could start
a timer using erlang:send_after() before entering the
collect_.. loop. This way (if you keep the child pids
as well), you can do your own version of brutal_kill
if the children never terminate on their own accord.


It would then be good if the child doesn't call the parent
synchronously in order to notify it of their own death
(in general, it's a bad idea to have gen_server:call()
in both directions between two processes.)


> This brings me to the {noreply, State} mentioned in another thread.
> How do I know the PID where to send the response at the later time when
> I am ready to quit? The process dictionary perhaps?

You can keep a 'pending' list in your State variable.

> Then use the function that notifies the gen_server of the process
> termination to check if all the processes are down and then reply with
> the {stop, normal, ok, State} message instead of {reply, State}
> Does this make sense?

As mentioned above, it's dangerous to only rely on the children
telling the parent that they are about to die. Monitors are quite
handy for this sort of thing.

/Uffe
--
Ulf Wiger, Senior System Architect
EAB/UPD/S


Post generated using Mail2Forum (http://m2f.sourceforge.net)
vlad_dumitrescu at hotmai
Posted: Tue Nov 25, 2003 12:03 pm Reply with quote
Guest
From: "Gunilla Arendt" <gunilla_at_erix.ericsson.se>
> No, the supervisor *is* required to call start_link!

I don't mean to argue with the OTP team Smile but the supervisor docs say, when
describing children specifications:

|StartFunc defines the function call used to start the child process. It should
be a
|module-function-arguments tuple {M,F,A} used as apply(M,F,A).
|
|The start function must create and link to the child process, and should return
|{ok,Child} or {ok,Child,Info} where Child is the pid of the child process and
Info an
|arbitrary term which is ignored by the supervisor.


Vlad


Post generated using Mail2Forum (http://m2f.sourceforge.net)
gunilla at erix.ericsson.
Posted: Tue Nov 25, 2003 12:15 pm Reply with quote
Guest
Then the documentation is not correct, very sorry about that.

Best regards, Gunilla

Vlad Dumitrescu wrote:
>
> From: "Gunilla Arendt" <gunilla_at_erix.ericsson.se>
> > No, the supervisor *is* required to call start_link!
>
> I don't mean to argue with the OTP team Smile but the supervisor docs say, when
> describing children specifications:
>
> |StartFunc defines the function call used to start the child process. It should
> be a
> |module-function-arguments tuple {M,F,A} used as apply(M,F,A).
> |
> |The start function must create and link to the child process, and should return
> |{ok,Child} or {ok,Child,Info} where Child is the pid of the child process and
> Info an
> |arbitrary term which is ignored by the supervisor.
>
> Vlad

--
_____Gunilla Arendt______________________________________________
EAB/UKH/KD Erlang/OTP Product Development
Gunilla.Arendt_at_ericsson.com +46-8-7275730 ecn 851 5730


Post generated using Mail2Forum (http://m2f.sourceforge.net)
ulf.wiger at ericsson.com
Posted: Tue Nov 25, 2003 12:27 pm Reply with quote
Guest
On Tue, 25 Nov 2003 11:52:20 +0100, Gunilla Arendt
<gunilla_at_erix.ericsson.se> wrote:

> And yes, there are alternative start strategies that in retrospect
> might have been better choices. For example, simply providing
> the supervisor with the name of the callback module(s) and make it
> responsible for starting its child processes the correct way.

...perhaps a good time for me to remind the adventurous that I
once wrote an alternative supervisor (super_hack.erl) that
(a) offered a different child_spec() syntax, where one specified
the wanted behaviour instead of an {M,F,A}
(b) made it possible for a child to find out how many restarts
had been carried out, including the reason for the first
crash, and how far the restart had escalated.
(c) ...while being fully backward compatible.

Some code to illustrate the type of information made
available to the child during restart (non-blocking,
so it can be called during the init phase):

restart_info() ->
#restarts{total = Total,
recent = Recent,
ordered = Ordered,
escalation = Escalation,
mode = Mode,
intensity = Intensity,
period = Period,
restarts = Restarts} = get_restart_info(),
[{total, Total},
{recent, Recent},
{ordered, Ordered},
{escalation, Escalation},
{mode, Mode},
{maxR, Intensity},
{maxT, Period},
{restarts, Restarts}].

'ordered' as in 'ordered restarts', i.e. when someone calls
restart_child().

This hack has been mentioned before. Perhaps I should take the hint,
but I still think it was a good idea. (:

http://www.erlang.org/ml-archive/erlang-questions/200307/msg00154.html
http://www.erlang.org/ml-archive/erlang-questions/200102/msg00176.html


Here's an extract from the README file, explaining what hacks were
made to the supervisor module:

supervisor.erl
--------------
1) Support for a new ChildSpecification:

{Name, Behaviour, BehaviourOptions, SupervisionOptions}

SupervisionOptions ::= [Option]
Option ::= {child_type, worker | supervisor} |
{restart_type, permanent | transient | temporary} |
{shutdown, Shutdown : integer() | infinity} |
{modules, [module()]} |
{maxR, MaxR : integer() | undefined} |
{maxT, MaxT : integer() | undefined}

This ChildSpec doesn't just specify a function that the supervisor should
call; it specifies the behaviour of the process. This raises the level
of abstraction of the specification, making an important modelling detail
more visible.

If maxR and maxT are specified for a child, the supervisor options
MaxR and MaxT must be specified as 'per_child'. To this end, it is also
legal for the CallbackModule:init/1 function to return:

{ok, {Strategy, per_child}, ChildSpecs}

which is the same as:

{ok, {Strategy, per_child, per_child}, ChildSpecs}

The reason for allowing per_child restart frequency is that different
children under a supervisor may (probably do) have different "weights",
and it is easier to specify different frequencies than creating multiple
supervisors.

Whether the supervisor uses the old (aggregated) restart frequency or
per_child restart frequencies can be checked by the child, using the
function

supervisor:restart_info(mode) -> aggregated | per_child.



Examples are found in testsup.erl and testsup2.erl

2) Added init_it/6, which is used to wrap Behaviour:init_it/6
3) This also allowed inserting restart information into the process
dictionary of the child, and
4) adding the API functions restart_info/0 and restart_info/1.

restart_info() ->
[{total, NoOfTotalRestartsThisChild},
{recent, NoOfRestartsWithinMaxT},
{ordered, NoOfOrderedRestarts},
{escalation, Escalation},
{mode, Mode},
{maxR, MaxR},
{maxT, Period},
{restarts, [Timestamp]}]

If Mode == aggregated, then MaxR and MaxT will be the aggregated
values; otherwise, they will be the values for this particular child.

Escalation is [] if there has been no escalation; otherwise, it will be
[RestartsAtLowestLevel | EscalationAtNextHigherLevel]


****

And finally some example code illustrating the above

init([]) ->
{ok, {{one_for_one, per_child, per_child},
[
%% example 1: a generic server
{testserver, gen_server,
[{module, testserver},
{args, []},
{regname, {local,testserver}}],
[{child_type, worker},
{restart_type, permanent},
{maxR, 3},
{maxT, 30},
{shutdown, 2000},
{modules, [testserver]}]},

%% example 2: a gen_fsm
{testfsm, gen_fsm,
[{module, testfsm},
{args, []}],
[{child_type, worker},
{restart_type, permanent},
{maxR, 10},
{maxT, 30},
{shutdown, 2000},
{modules, [testfsm]}]},

%% example 3: a gen_event
{testevent, gen_event,
[{regname, {global, testevent}}],
[{child_type, worker},
{restart_type, permanent},
{maxR, 20},
{maxT, 30},
{shutdown, 2000},
{modules, dynamic}]}
]}}.

/Uffe

--
Ulf Wiger, Senior System Architect
EAB/UPD/S


Post generated using Mail2Forum (http://m2f.sourceforge.net)
ulf.wiger at ericsson.com
Posted: Tue Nov 25, 2003 12:31 pm Reply with quote
Guest
On Tue, 25 Nov 2003 13:03:11 +0100, Vlad Dumitrescu
<vlad_dumitrescu_at_hotmail.com> wrote:

> From: "Gunilla Arendt" <gunilla_at_erix.ericsson.se>
>> No, the supervisor *is* required to call start_link!
>
> I don't mean to argue with the OTP team Smile but the supervisor docs say,
> when describing children specifications:

Yes, but there is a small quirk in the gen_server.erl module that
causes it not to heed the shutdown protocol if started with
gen_server:start(), even if you _do_ create the link in some other
way. From the supervisor side, everything is fine, but the gen_server
doesn't recognise the supervisor as its parent.

So, when the child is a gen_server, the start function must be
{gen_server, start_link, [...]} in order for everything to
work correctly.

The same goes for gen_fsm and gen_event.

/Uffe

--
Ulf Wiger, Senior System Architect
EAB/UPD/S


Post generated using Mail2Forum (http://m2f.sourceforge.net)
vlad_dumitrescu at hotmai
Posted: Tue Nov 25, 2003 12:45 pm Reply with quote
Guest
From: "Ulf Wiger" <ulf.wiger_at_ericsson.com>
> Yes, but there is a small quirk in the gen_server.erl module that
> causes it not to heed the shutdown protocol if started with
> gen_server:start(), even if you _do_ create the link in some other
> way. From the supervisor side, everything is fine, but the gen_server
> doesn't recognise the supervisor as its parent.
>
> So, when the child is a gen_server, the start function must be
> {gen_server, start_link, [...]} in order for everything to
> work correctly.
>
> The same goes for gen_fsm and gen_event.

Okay, now I see... This should be documented, I think, because using supervisors
may be confusing enough even whithout hidden inconsistencies.

In any case it's probably safer to always use start_link and perhaps unlinking
afterwards, than using start and linking later.

regards,
Vlad


Post generated using Mail2Forum (http://m2f.sourceforge.net)
vlad_dumitrescu at hotmai
Posted: Tue Nov 25, 2003 1:17 pm Reply with quote
Guest
From: "Ulf Wiger" <ulf.wiger_at_ericsson.com>
> ...perhaps a good time for me to remind the adventurous that I
> once wrote an alternative supervisor (super_hack.erl)

Cool!

I'm interested. Do you have a link to the archive?

/Vlad


Post generated using Mail2Forum (http://m2f.sourceforge.net)

Display posts from previous:  

All times are GMT
Page 1 of 2
Goto page 1, 2  Next
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum