Erlang Mailing Lists

Author Message

<  Erlang bugs mailing list  ~  Mnesia deadlocks (?) while loading table

Guest
Posted: Sun Jan 24, 2010 7:52 am Reply with quote
Guest
I have been able to reproduce this some times on a 64-bit Ubuntu, a
64-bit CentOS and a 32-bit Ubuntu, running R13B02.

Open 2 terminals and start one node on each:
erl -sname test1
erl -sname test2
Create the schema from the first node (substitute ijaba2 with your
server's name):
(test1@ijaba2)1> ok = mnesia:create_schema([node(), test2@ijaba2]),
mnesia:start().
Start Mnesia also on the second node:
(test2@ijaba2)1> mnesia:start().
Create a test table from the first node and start writing to it:
(test1@ijaba2)2> mnesia:create_table(test, [{disc_only_copies,
mnesia:system_info(running_db_nodes)}]).
(test1@ijaba2)3> W = fun(F, N) -> mnesia:sync_transaction(fun
mnesia:write/1, [{test, N, N}]), F(F, N + 1) end, W(W, 1).
Restart Mnesia on the second node and wait for the table to be loaded:
(test2@ijaba2)2> mnesia:stop(), ok = mnesia:start().
(test2@ijaba2)3> mnesia:wait_for_tables([test], infinity).


For some runs of this experiment, the table will never load.
It seems the current writer transaction on the first node is waiting
for a commit, while holding a write lock:
(test3@ijaba2)1> rpc:call(test1@ijaba2, mnesia, system_info, [held_locks]).
[{{test,14691},write,{tid,14696,<6217.41.0>}}]
(test3@ijaba2)3> rpc:call(test1@ijaba2, erlang, process_info,
[list_to_pid("<6217.41.0>"), current_function]).
{current_function,{mnesia_tm,rec_all,4}}

The second node is waiting for the table to be received
(test3@ijaba2)4> rpc:call(test2@ijaba2, mnesia, system_info,
[held_locks]).
[{{schema,test},read,{tid,14009,<6358.199.0>}}]
(test3@ijaba2)5> rpc:call(test2@ijaba2, erlang, process_info,
[list_to_pid("<6358.199.0>"), current_function]).
{current_function,{mnesia_loader,wait_on_load_complete,1}}

And there's a third transaction going on (the table sender?):
(test3@ijaba2)7> rpc:call(test1@ijaba2, mnesia, system_info, [transactions]).
[{14696,<6217.41.0>,coordinator},
{14697,<6217.176.0>,coordinator}]
(test3@ijaba2)8> rpc:call(test1@ijaba2, erlang, process_info,
[list_to_pid("<6217.176.0>"), current_function]).
{current_function,{timer,sleep,1}}

None of the 3 transactions makes any progress, so maybe there's a
circular waiting here (deadlock).

I hope you can reproduce it easily. It seems to depend on whether the
first node notices test2's restart before or after test2 starts
loading the table. But maybe I'm wrong and there's another race
condition.

Igor.

--
"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.

________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org

Post received from mailinglist
Guest
Posted: Sun Jan 24, 2010 4:49 pm Reply with quote
Guest
It's not so easy as I thought to reproduce the problem with only one table.
I am attaching code that creates more tables and executes transactions
with two of them. This is not a necessary condition to reproduce the
problem, but it helps a lot.

Open 2 terminals and start one node on each:
erl -sname test1
erl -sname test2
Create everything from the first node (substitute igorrs with your
server's name):
(test1@igorrs)1> load_dl:start(test2@igorrs).
Run this restart loop on second node:
(test2@igorrs)1> load_dl:restart_forever().

At some point, the second node will stop printing and the last message
will be "Started. Waiting for tables." It will be hung forever, not
being able to load the tables. At the same time, the current
transaction on the first node will also be hung forever (I got some
info - see below - by running some RPCs from a third node).

Igor.

On Sun, Jan 24, 2010 at 5:51 AM, Igor Ribeiro Sucupira <igorrs@gmail.com> wrote:
> I have been able to reproduce this some times on a 64-bit Ubuntu, a
> 64-bit CentOS and a 32-bit Ubuntu, running R13B02.
>
> Open 2 terminals and start one node on each:
> erl -sname test1
> erl -sname test2
> Create the schema from the first node (substitute ijaba2 with your
> server's name):
> (test1@ijaba2)1> ok = mnesia:create_schema([node(), test2@ijaba2]),
> mnesia:start().
> Start Mnesia also on the second node:
> (test2@ijaba2)1> mnesia:start().
> Create a test table from the first node and start writing to it:
> (test1@ijaba2)2> mnesia:create_table(test, [{disc_only_copies,
> mnesia:system_info(running_db_nodes)}]).
> (test1@ijaba2)3> W = fun(F, N) -> mnesia:sync_transaction(fun
> mnesia:write/1, [{test, N, N}]), F(F, N + 1) end, W(W, 1).
> Restart Mnesia on the second node and wait for the table to be loaded:
> (test2@ijaba2)2> mnesia:stop(), ok = mnesia:start().
> (test2@ijaba2)3> mnesia:wait_for_tables([test], infinity).
>
>
> For some runs of this experiment, the table will never load.
> It seems the current writer transaction on the first node is waiting
> for a commit, while holding a write lock:
> (test3@ijaba2)1> rpc:call(test1@ijaba2, mnesia, system_info, [held_locks]).
> [{{test,14691},write,{tid,14696,<6217.41.0>}}]
> (test3@ijaba2)3> rpc:call(test1@ijaba2, erlang, process_info,
> [list_to_pid("<6217.41.0>"), current_function]).
> {current_function,{mnesia_tm,rec_all,4}}
>
> The second node is waiting for the table to be received
> (test3@ijaba2)4> rpc:call(test2@ijaba2, mnesia, system_info,
> [held_locks]).
> [{{schema,test},read,{tid,14009,<6358.199.0>}}]
> (test3@ijaba2)5> rpc:call(test2@ijaba2, erlang, process_info,
> [list_to_pid("<6358.199.0>"), current_function]).
> {current_function,{mnesia_loader,wait_on_load_complete,1}}
>
> And there's a third transaction going on (the table sender?):
> (test3@ijaba2)7> rpc:call(test1@ijaba2, mnesia, system_info, [transactions]).
> [{14696,<6217.41.0>,coordinator},
>
dgud
Posted: Mon Jan 25, 2010 7:40 am Reply with quote
User Joined: 01 Mar 2005 Posts: 29 Location: Stockholm
Thanks, I'll have a look at it.
I'm on a hunt for a deadlock I can't reproduce...

/Dan

On Sun, Jan 24, 2010 at 5:48 PM, Igor Ribeiro Sucupira <igorrs@gmail.com> wrote:
> It's not so easy as I thought to reproduce the problem with only one table.
> I am attaching code that creates more tables and executes transactions
> with two of them. This is not a necessary condition to reproduce the
> problem, but it helps a lot.
>
> Open 2 terminals and start one node on each:
> erl -sname test1
> erl -sname test2
> Create everything from the first node (substitute igorrs with your
> server's name):
> (test1@igorrs)1> load_dl:start(test2@igorrs).
> Run this restart loop on second node:
> (test2@igorrs)1> load_dl:restart_forever().
>
> At some point, the second node will stop printing and the last message
> will be "Started. Waiting for tables." It will be hung forever, not
> being able to load the tables. At the same time, the current
> transaction on the first node will also be hung forever (I got some
> info - see below - by running some RPCs from a third node).
>
> Igor.
>
> On Sun, Jan 24, 2010 at 5:51 AM, Igor Ribeiro Sucupira <igorrs@gmail.com> wrote:
>> I have been able to reproduce this some times on a 64-bit Ubuntu, a
>> 64-bit CentOS and a 32-bit Ubuntu, running R13B02.
>>
>> Open 2 terminals and start one node on each:
>> erl -sname test1
>> erl -sname test2
>> Create the schema from the first node (substitute ijaba2 with your
>> server's name):
>> (test1@ijaba2)1> ok = mnesia:create_schema([node(), test2@ijaba2]),
>> mnesia:start().
>> Start Mnesia also on the second node:
>> (test2@ijaba2)1> mnesia:start().
>> Create a test table from the first node and start writing to it:
>> (test1@ijaba2)2> mnesia:create_table(test, [{disc_only_copies,
>> mnesia:system_info(running_db_nodes)}]).
>> (test1@ijaba2)3> W = fun(F, N) -> mnesia:sync_transaction(fun
>> mnesia:write/1, [{test, N, N}]), F(F, N + 1) end, W(W, 1).
>> Restart Mnesia on the second node and wait for the table to be loaded:
>> (test2@ijaba2)2> mnesia:stop(), ok = mnesia:start().
>> (test2@ijaba2)3> mnesia:wait_for_tables([test], infinity).
>>
>>
>> For some runs of this experiment, the table will never load.
>> It seems the current writer transaction on the first node is waiting
>> for a commit, while holding a write lock:
>> (test3@ijaba2)1> rpc:call(test1@ijaba2, mnesia, system_info, [held_locks]).
>> [{{test,14691},write,{tid,14696,<6217.41.0>}}]
>> (test3@ijaba2)3> rpc:call(test1@ijaba2, erlang, process_info,
>> [list_to_pid("<6217.41.0>"), current_function]).
>> {current_function,{mnesia_tm,rec_all,4}}
>>
>> The second node is waiting for the table to be received
>> (test3@ijaba2)4> rpc:call(test2@ijaba2, mnesia, system_info,
>> [held_locks]).
>> [{{schema,test},read,{tid,14009,<6358.199.0>}}]
>> (test3@ijaba2)5> rpc:call(test2@ijaba2, erlang, process_info,
>> [list_to_pid("<6358.199.0>"), current_function]).
>> {current_function,{mnesia_loader,wait_on_load_complete,1}}
>>
>> And there's a third transaction going on (the table sender?):
>> (test3@ijaba2)7> rpc:call(test1@ijaba2, mnesia, system_info, [transactions]).
>> [{14696,<6217.41.0>,coordinator},
>>
View user's profile Send private message MSN Messenger
Guest
Posted: Mon Jan 25, 2010 1:01 pm Reply with quote
Guest
Here is a patch:

diff --git a/lib/mnesia/src/mnesia_tm.erl b/lib/mnesia/src/mnesia_tm.erl
index 3f3a10a..5a2407d 100644
--- a/lib/mnesia/src/mnesia_tm.erl
+++ b/lib/mnesia/src/mnesia_tm.erl
@@ -1388,7 +1388,9 @@ multi_commit(sync_sym_trans, Tid, CR, Store) ->
{WaitFor, Local} = ask_commit(sync_sym_trans, Tid, CR, DiscNs, RamNs),
{Outcome, []} = rec_all(WaitFor, Tid, do_commit, []),
?eval_debug_fun({?MODULE, multi_commit_sym_sync},
- [{tid, Tid}, {outcome, Outcome}]),
+ [{tid, Tid}, {outcome, Outcome}]),
+ [?ets_insert(Store, {waiting_for_commit_ack, Node}) ||
+ Node <- WaitFor],
rpc:abcast(DiscNs -- [node()], ?MODULE, {Tid, Outcome}),
rpc:abcast(RamNs -- [node()], ?MODULE, {Tid, Outcome}),
case Outcome of

/Dan

On Mon, Jan 25, 2010 at 8:39 AM, Dan Gudmundsson <dangud@gmail.com> wrote:
> Thanks, I'll have a look at it.
> I'm on a hunt for a deadlock I can't reproduce...
>
> /Dan
>
> On Sun, Jan 24, 2010 at 5:48 PM, Igor Ribeiro Sucupira <igorrs@gmail.com> wrote:
>> It's not so easy as I thought to reproduce the problem with only one table.
>> I am attaching code that creates more tables and executes transactions
>> with two of them. This is not a necessary condition to reproduce the
>> problem, but it helps a lot.
>>
>> Open 2 terminals and start one node on each:
>> erl -sname test1
>> erl -sname test2
>> Create everything from the first node (substitute igorrs with your
>> server's name):
>> (test1@igorrs)1> load_dl:start(test2@igorrs).
>> Run this restart loop on second node:
>> (test2@igorrs)1> load_dl:restart_forever().
>>
>> At some point, the second node will stop printing and the last message
>> will be "Started. Waiting for tables." It will be hung forever, not
>> being able to load the tables. At the same time, the current
>> transaction on the first node will also be hung forever (I got some
>> info - see below - by running some RPCs from a third node).
>>
>> Igor.
>>
>> On Sun, Jan 24, 2010 at 5:51 AM, Igor Ribeiro Sucupira <igorrs@gmail.com> wrote:
>>> I have been able to reproduce this some times on a 64-bit Ubuntu, a
>>> 64-bit CentOS and a 32-bit Ubuntu, running R13B02.
>>>
>>> Open 2 terminals and start one node on each:
>>> erl -sname test1
>>> erl -sname test2
>>> Create the schema from the first node (substitute ijaba2 with your
>>> server's name):
>>> (test1@ijaba2)1> ok = mnesia:create_schema([node(), test2@ijaba2]),
>>> mnesia:start().
>>> Start Mnesia also on the second node:
>>> (test2@ijaba2)1> mnesia:start().
>>> Create a test table from the first node and start writing to it:
>>> (test1@ijaba2)2> mnesia:create_table(test, [{disc_only_copies,
>>> mnesia:system_info(running_db_nodes)}]).
>>> (test1@ijaba2)3> W = fun(F, N) -> mnesia:sync_transaction(fun
>>> mnesia:write/1, [{test, N, N}]), F(F, N + 1) end, W(W, 1).
>>> Restart Mnesia on the second node and wait for the table to be loaded:
>>> (test2@ijaba2)2> mnesia:stop(), ok = mnesia:start().
>>> (test2@ijaba2)3> mnesia:wait_for_tables([test], infinity).
>>>
>>>
>>> For some runs of this experiment, the table will never load.
>>> It seems the current writer transaction on the first node is waiting
>>> for a commit, while holding a write lock:
>>> (test3@ijaba2)1> rpc:call(test1@ijaba2, mnesia, system_info, [held_locks]).
>>> [{{test,14691},write,{tid,14696,<6217.41.0>}}]
>>> (test3@ijaba2)3> rpc:call(test1@ijaba2, erlang, process_info,
>>> [list_to_pid("<6217.41.0>"), current_function]).
>>> {current_function,{mnesia_tm,rec_all,4}}
>>>
>>> The second node is waiting for the table to be received
>>> (test3@ijaba2)4> rpc:call(test2@ijaba2, mnesia, system_info,
>>> [held_locks]).
>>> [{{schema,test},read,{tid,14009,<6358.199.0>}}]
>>> (test3@ijaba2)5> rpc:call(test2@ijaba2, erlang, process_info,
>>> [list_to_pid("<6358.199.0>"), current_function]).
>>> {current_function,{mnesia_loader,wait_on_load_complete,1}}
>>>
>>> And there's a third transaction going on (the table sender?):
>>> (test3@ijaba2)7> rpc:call(test1@ijaba2, mnesia, system_info, [transactions]).
>>> [{14696,<6217.41.0>,coordinator},
>>>
Guest
Posted: Mon Jan 25, 2010 11:55 pm Reply with quote
Guest
Thank you. The problem seems to be fixed with the patch (and I've also
verified on the original system where I detected the problem).

So, is this patch going to be in the next release?

Best regards.
Igor.

On Mon, Jan 25, 2010 at 11:00 AM, Dan Gudmundsson <dgud@erlang.org> wrote:
> Here is a patch:
>
> diff --git a/lib/mnesia/src/mnesia_tm.erl b/lib/mnesia/src/mnesia_tm.erl
> index 3f3a10a..5a2407d 100644
> --- a/lib/mnesia/src/mnesia_tm.erl
> +++ b/lib/mnesia/src/mnesia_tm.erl
> @@ -1388,7 +1388,9 @@ multi_commit(sync_sym_trans, Tid, CR, Store) ->
>
Guest
Posted: Tue Jan 26, 2010 7:38 am Reply with quote
Guest
Yes it will be applied.

/Dan

On Tue, Jan 26, 2010 at 12:55 AM, Igor Ribeiro Sucupira
<igorrs@gmail.com> wrote:
> Thank you. The problem seems to be fixed with the patch (and I've also
> verified on the original system where I detected the problem).
>
> So, is this patch going to be in the next release?
>
> Best regards.
> Igor.
>
> On Mon, Jan 25, 2010 at 11:00 AM, Dan Gudmundsson <dgud@erlang.org> wrote:
>> Here is a patch:
>>
>> diff --git a/lib/mnesia/src/mnesia_tm.erl b/lib/mnesia/src/mnesia_tm.erl
>> index 3f3a10a..5a2407d 100644
>> --- a/lib/mnesia/src/mnesia_tm.erl
>> +++ b/lib/mnesia/src/mnesia_tm.erl
>> @@ -1388,7 +1388,9 @@ multi_commit(sync_sym_trans, Tid, CR, Store) ->
>>

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum