Erlang/OTP Forums

Author Message

<  Erlang  ~  The null pointer problem

jz87
Posted: Fri Jan 11, 2008 7:50 pm Reply with quote
Joined: 10 Jan 2008 Posts: 3
So I run into this problem a lot. I'm working with a bunch of processes, then some bug in my code crashes a process. the other processes that work with it doesn't know that it crashed and keep sending messages to it. If they are using synchronous rpc calls, then this locks up the other processes that depend on this dead process. This is basically the null pointer problem you get in langauges like Java. What makes this a nasty problem is that you can't even check if a process is alive in is_process_alive in guard expressions.

I can restart a crashed process with a supervisor, but it won't have the same Pid. So I need someway of notifying every other process that depends on it that this process crashed, and distribute the new Pid. this basically makes a small problem into a big problem. It creates a whole slew of dependencies between processes. To make a simple rpc reliable, you have to build a whole Pid distribution network. Isn't there someway of intercepting messages sent to the old Pid and forwarding them or notifying the callers to update their address books?
View user's profile Send private message
bluefly
Posted: Fri Jan 11, 2008 8:30 pm Reply with quote
User Joined: 06 Jan 2008 Posts: 10
I am looking at this problem in two ways: one is a spiffy trick for registering process names, and the other is an architecture adjustment.

With this first trick, I am assuming you are spawning anonymous processes that do not deserve a global registered name. Why not give it a registered name that is unique but memorable? You can do that with this kind of call:
Code:
CPPid = spawn(...),  % spawn your crashing process

UniqueIdentifier = ...,

register(
    erlang:list_to_atom(
        "crashing_process_" ++ UniqueIdentifier),
    CPPid   % the crashing process pid
)
Then, when you restart your process, you just give it the same generated name. The generated name could be, for example, pid_to_list(PidOfSomeMasterProcess).

The second way is that you should always use spawn_link() and link() to associate the web of processes to each other so that they can detect when there is an odd situation and handle it appropriately. Because the processes are link()ed, they can be immediately made aware that the process has gone down. Restarting a process via some controller process is only one piece of the robustness puzzle for a given app; the other processes that are aware of that process need to safely handle the oddball situations, too.

I do not recommend the first way, as I think the entire system of interacting processes needs to be well-understood, robust, and not given temporary adjustments/controls that might produce code maintenance or cascading failures as the project evolves. The link() BIF is probably really what you are looking for.
View user's profile Send private message
Mazen
Posted: Sat Jan 12, 2008 8:33 am Reply with quote
User Joined: 20 Jul 2006 Posts: 164 Location: London
I agree with bluefly, you probably want to use spawn_link (or spawn and then link).

I think the point with asynch msgs is that it enables you to create an architecture where processes have little dependencies and are allowed to crash without impacting to much on their environment. Having been involved in a few "largish" projects I can safely say that I love this.

Basically, either you care about a process dying/crashing or you don't, and 95% of the time you don't. When you do care, stick to Supervisor and think hierarchy don't try to create a flat structure like a web where everyone knows everyone, unless you really have to of course. If you create a web, have a look at your architecture, perhaps you can "unweb" it. Otherwise you can often use a mapping process where processes register and calling processes get the pid based on a more static id.

So in the end I think it is more of an architectural issue tbh. You will normally have 2 types of processes (at least), servers and workers; servers serve the workers with information and need to stay operational, but if a worker dies the death should have no impact on anything.
View user's profile Send private message
Mazen
Posted: Sat Jan 12, 2008 8:49 am Reply with quote
User Joined: 20 Jul 2006 Posts: 164 Location: London
I agree with bluefly, you probably want to use spawn_link (or spawn and then link).

I think the point with asynch msgs is that it enables you to create an architecture where processes have little dependencies and are allowed to crash without impacting to much on their environment. Having been involved in a few "largish" projects I can safely say that I love this.

Basically, either you care about a process dying/crashing or you don't, and 95% of the time you don't. When you do care, stick to Supervisor and think hierarchy don't try to create a flat structure like a web where everyone knows everyone, unless you really have to of course. If you create a web, have a look at your architecture, perhaps you can "unweb" it. Otherwise you can often use a mapping process where processes register and calling processes get the pid based on a more static id.

So in the end I think it is more of an architectural issue tbh. You will normally have 2 types of processes (at least), servers and workers; servers serve the workers with information and need to stay operational, but if a worker dies the death should have no impact on anything.
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum