Erlang/OTP Forums

Author Message

<  Erlang patches mailing list  ~  Running mnesia across a firewall

asergey
Posted: Wed Mar 05, 2008 3:58 am Reply with quote
User Joined: 12 Mar 2005 Posts: 313
While most of the times when using mnesia I had a fairly straight
forward network setup, recently I ran into a need to be able to use
remote mnesia's interface in presence of a firewall and would like to
share my experience along with a net_kernel patch that adds a useful
feature.

The setup was as follows: two nodes (NodeA & NodeB) are inside the
firewall and NodeC is outside the firewall. NodeA & NodeB can connect
to NodeC, but all inbound access from NodeC is blocked. NodeA & NodeB
are running mnesia with a few tables with disc_copies. NodeC uses
mnesia's remote interface to access data tables on nodes inside the
firewall.

The diagram below shows network topology with corresponding Erlang's
kernel options.

NodeA
^ {dist_auto_connect, once}
|
|
| INSIDE | OUTSIDE
+----------|firewall------> NodeC
| | {dist_auto_connect, never},
| {inet_dist_listen_min, X},
v {inet_dist_listen_max, Y},
NodeB {global_groups, [{outside, [NodeC]}]}
{dist_auto_connect, once}

Notes about NodeC kernel options:
- {dist_auto_connect, never} prevents the node from
attempting connections inside the firewall.
- min/max listen options converge the firewall rules
to a minimal reserved setup.
- global_groups prevent global from connecting/synching
with other nodes inside the firewall after a connection
is made to some node inside the firewall.


All three nodes subscribe to net_kernel:monitor_nodes/1, and when NodeA
disconnects from NodeB (or vice versa) they enable UDP heartbeat and
upon detecting that the peer node is responding, restart one of the
nodes, which re-synchs mnesia.

If network access between NodeA (or NodeB) and NodeC is lost, NodeC
shuts down mnesia application and waits for {nodeup, NodeA} event, after
which it starts up mnesia. Without using this approach (i.e. not
shutting down mnesia on NodeC during network outage) upon healing the
network and reconnecting NodeC to NodeA/B, mnesia would detect
partitioned network condition and stop replicating data between NodeA
and NodeB (which is quite odd because NodeC doesn't have any local
tables and it's undesirable to have its visibility impact nodes inside
the firewall).

The main problem with this setup is that when NodeC looses connection to
NodeA/NodeB, either one of these two nodes would need to periodically
attempt to reconnect to NodeC. However, because {dist_auto_connect,
once} option is used on NodeA/NodeB, net_kernel wouldn't let
re-establishing connection to NodeC unless *both* NodeA and NodeB are
bounced!

The main culprit is the net_kernel's dist_auto_connect option that is an
all or none setting that cannot vary depending on connecting attempt to
a given node. The attached patch (for R12B-1) solves this issue by
introducing an additional kernel option:

{dist_auto_connect, {callback, M, F}}

This option allows to register a callback function M:F/2 with signature:

(Action, Node) -> Mode
Action = connect | disconnect
Node = node()
Mode = once | never | true

that will be called when a Node tries to connect (or looses connection).
Modes once and never are documented in kernel(3), and 'true' means to
continue connection action.

This patch allows to define different connecting behavior for connecting
a@hostA and b@hostB from connecting behavior of node a@hostA (or
b@hostB) and c@hostC.

If others find this option as useful as I do, perhaps we can pursue the
OTP team to merge this patch with the distribution.

Regards,

Serge.

P.S. Here's a sample implementation of this custom function:

nodeA&B.config:
===============
[
{kernel,
[
{dist_auto_connect,
{callback, net_kernel_connector, dist_auto_connect}}
]},

{mnesia,
[
{extra_db_nodes, [a@hostA, b@hostB]}
]}
].

nodeC.config:
=============
[
{kernel,
[
{dist_auto_connect, never},
{global_groups, [{outside, [c@nodeC]}]},
{inet_dist_listen_min, 8111},
{inet_dist_listen_max, 8119}
]}

{mnesia,
[
{extra_db_nodes, [a@hostA, b@hostB]}
]}
].


-module(net_kernel_connector).
-export([dist_auto_connect/2]).

dist_auto_connect(Action, Node) ->
case application:get_env(mnesia, extra_db_nodes) of
{ok, Masters} ->
IamMaster = lists:member(node(), Masters),
NodeIsMaster = lists:member(Node, Masters),

case {IamMaster, NodeIsMaster} of
{true, true} ->
once;
{_, _} ->
has_access(node(), Node)
end;
_ ->
has_access(node(), Node)
end.

has_access(From, To) -> has_access2(host(From), host(To)).
has_access2('hostC', _) -> never;
has_access2(_, _) -> true.

host(Node) ->
L = atom_to_list(Node),
[_, H] = string:tokens(L, "@"),
list_to_atom(H).



Post recived from mailinglist
View user's profile Send private message
uwiger
Posted: Wed Mar 05, 2008 8:14 am Reply with quote
User Joined: 03 Jul 2006 Posts: 604 Location: Sweden
Serge Aleynikov skrev:
>
> The main culprit is the net_kernel's dist_auto_connect option that is an
> all or none setting that cannot vary depending on connecting attempt to
> a given node. The attached patch (for R12B-1) solves this issue by
> introducing an additional kernel option:
>
> {dist_auto_connect, {callback, M, F}}

Have you thought about solving it with an application that
periodically tries calling net_kernel:connect(Node)?

BR,
Ulf W
_______________________________________________
erlang-patches mailing list
erlang-patches@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-patches
Post recived from mailinglist
View user's profile Send private message Visit poster's website
asergey
Posted: Wed Mar 05, 2008 12:09 pm Reply with quote
User Joined: 12 Mar 2005 Posts: 313
Ulf Wiger (TN/EAB) wrote:
> Serge Aleynikov skrev:
>>
>> The main culprit is the net_kernel's dist_auto_connect option that is
>> an all or none setting that cannot vary depending on connecting
>> attempt to a given node. The attached patch (for R12B-1) solves this
>> issue by introducing an additional kernel option:
>>
>> {dist_auto_connect, {callback, M, F}}
>
> Have you thought about solving it with an application that
> periodically tries calling net_kernel:connect(Node)?

On which node, though? If this is one of the "master" nodes A holding a
disk copy of a table X, then the node must have {dist_auto_connect,
once} set. If the firewall prohibits inbound access to this node A from
some other node C that uses remote mnesia interface to access table X,
then the only way to establish connection to node C is to do on node A
net_kernel:connect(C). However if that connection drops, there's no way
to reestablish that connection without restarting node A. Remember that
in case of {dist_auto_connect, once} net_kernel checks if a connection
is barred and if it is it won't allow to connect to a node that
previously was connected.

Serge
_______________________________________________
erlang-patches mailing list
erlang-patches@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-patches
Post recived from mailinglist
View user's profile Send private message
uwiger
Posted: Wed Mar 05, 2008 12:28 pm Reply with quote
User Joined: 03 Jul 2006 Posts: 604 Location: Sweden
Serge Aleynikov skrev:
> Ulf Wiger (TN/EAB) wrote:
>> Serge Aleynikov skrev:
>>>
>>> The main culprit is the net_kernel's dist_auto_connect option that is
>>> an all or none setting that cannot vary depending on connecting
>>> attempt to a given node. The attached patch (for R12B-1) solves this
>>> issue by introducing an additional kernel option:
>>>
>>> {dist_auto_connect, {callback, M, F}}
>>
>> Have you thought about solving it with an application that
>> periodically tries calling net_kernel:connect(Node)?
>
> On which node, though? If this is one of the "master" nodes A holding a
> disk copy of a table X, then the node must have {dist_auto_connect,
> once} set. If the firewall prohibits inbound access to this node A from
> some other node C that uses remote mnesia interface to access table X,
> then the only way to establish connection to node C is to do on node A
> net_kernel:connect(C). However if that connection drops, there's no way
> to reestablish that connection without restarting node A. Remember that
> in case of {dist_auto_connect, once} net_kernel checks if a connection
> is barred and if it is it won't allow to connect to a node that
> previously was connected.

Correction 1: It's net_kernel:connect_node(Node). My bad.

Correction 2: net_kernel:connect_node/1 ignores the value of
dist_auto_connect

What we've done is to keep a "maintenance channel" (not distr Erlang),
over which we can negotiate which node should restart.

BR,
Ulf W
_______________________________________________
erlang-patches mailing list
erlang-patches@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-patches
Post recived from mailinglist
View user's profile Send private message Visit poster's website
asergey
Posted: Thu Mar 06, 2008 12:50 pm Reply with quote
User Joined: 12 Mar 2005 Posts: 313
Ulf Wiger (TN/EAB) wrote:
> Serge Aleynikov skrev:
>> On which node, though? If this is one of the "master" nodes A holding
>> a disk copy of a table X, then the node must have {dist_auto_connect,
>> once} set. If the firewall prohibits inbound access to this node A
>> from some other node C that uses remote mnesia interface to access
>> table X, then the only way to establish connection to node C is to do
>> on node A net_kernel:connect(C). However if that connection drops,
>> there's no way to reestablish that connection without restarting node
>> A. Remember that in case of {dist_auto_connect, once} net_kernel
>> checks if a connection is barred and if it is it won't allow to
>> connect to a node that previously was connected.
>
> Correction 1: It's net_kernel:connect_node(Node). My bad.
>
> Correction 2: net_kernel:connect_node/1 ignores the value of
> dist_auto_connect
>
> What we've done is to keep a "maintenance channel" (not distr Erlang),
> over which we can negotiate which node should restart.

It turned out that was making the same mistake by using
net_kernel:connect(Node) rather than net_kernel:connect_node(Node).
Quite easy to get confused as two functions have the same signature. Sad

Thanks for pointing this out!

So for making mnesia work across a firewall a combination of kernel
options including global_groups as well as user-level
pinging/starting/stopping remote mnesia is sufficient.

Serge
_______________________________________________
erlang-patches mailing list
erlang-patches@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-patches
Post recived from mailinglist
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum