Erlang/OTP Forums

Author Message

<  Erlang bugs mailing list  ~  infinite loop when beam.smp compiled with -O2 on debian len

Guest
Posted: Mon May 03, 2010 11:43 pm Reply with quote
Guest
Mikeal,

Thanks a lot for that catch. I think that's it. Just did recompiles with
your patch (with -O2) and the body of the loop now shows up in the generated
code and the trivial spin loop is gone.

I got blindsided by the optimizer completely eliminating the body of the
loop, due to which I couldn't even see urbqp on the stack at all !! This
led me to the assumption that the surrounding macro
(ERTS_POLL_USE_UPDATE_REQUESTS_QUEUE) was perhaps undefined and that loop
wasn't even compiled in. Yet another strike against coding C in
pre-processor macros.

Overall, it's a big relief to know that our standard install of gcc is
not generating such obviously buggy code. I look forward to seeing the
erts_poll_info fix in an upcoming git version.

Thanks a lot once again
Chetan



On Mon, May 3, 2010 at 2:54 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:

> Chetan Ahuja writes:
> > Hi,
> >
> > We hit a bug while running rabbitmq where the beam.smp process was
> stuck
> > in a tight loop in the erts_poll_info method.
> > The process was eating up 100% of exactly one core (on a multi core box)
> and
> > rabbitmq was dysfunctional. Unfortunately
> > I could not create a small test case to reproduce this condition but it
> > would happen quite frequently while rabbitmq was in
> > operation.
> >
> > The C code for the function didn't provide any hints on what would have
> been
> > spinning in that function
> > (first time looking at this codebase though). Finally looking through
> the
> > disassembly in gdb, (at the point of where our process was spinning) I
> saw
> > the following lines in the
> > erts_poll_info_kp method:
> >
> >
> > 0x00000000004f0fe9 <erts_poll_info_kp+185>: nopl 0x0(%rax)
> > 0x00000000004f0ff0 <erts_poll_info_kp+192>: jmp 0x4f0fe9
> > <erts_poll_info_kp+185>
> >
> > (Similar assembly code can be seen when the KERNEL_POLL option is
> > disabled.)
> >
> > Clearly the above will trivially spin forever anytime we get into that
> > codepath. The above
> > looks suspiciously like some code got optimized out by the compiler
> leaving
> > the crazy
> > loop code.
> >
> > So I compiled with -O1 and then with no optimization at all. Withe
> -O1, I
> > saw a
> > a weird jmp insruction jumping to it's own address:
> >
> > 0x0000000000517102 <erts_poll_info_kp+60>: jmp 0x517102
> > <erts_poll_info_kp+60>
> >
> > With no optimization, any of those trivial spins did not exist but I
> > didn't analyze the unoptimized
> > code enough to say whether it can be proven to have an infinite loop
> (i.e.,
> > whether the optimizing
> > compiler is simply doing it's job vs. this being a compiler bug).
> >
> > Anyway, this problem exists at least since erlang-base_12.b.3-dfsg
> debian
> > package version and has been
> > verified to exists in the github version as of today.
> >
> >
> > Her'es the gcc and debian version info:
> > $ gcc --version
> > gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2
> > Copyright (C) 2008 Free Software Foundation, Inc.
>
> I looked at the procedure in question (not so easy to locate due to
> some "creative" C preprocessor abuse), and noticed an obvious bug:
> there's a loop over a linked list that forgets to actually advance
> the node pointer to the next element. When optimizing, gcc will notice
> that the loop doesn't terminate, omit the body of the loop (the
> calculations are dead), which will result in the type of object code
> shown above. Thus, it's an Erlang VM bug not a gcc miscompilation.
>
> Try the patch below and let us know if it solves your problem.
>
> /Mikael
>
> --- otp_src_R13B03/erts/emulator/sys/common/erl_poll.c.~1~ 2009-03-12
> 13:16:29.000000000 +0100
> +++ otp_src_R13B03/erts/emulator/sys/common/erl_poll.c 2010-05-03
> 23:41:32.000000000 +0200
> @@ -2404,6 +2404,7 @@ ERTS_POLL_EXPORT(erts_poll_info)(ErtsPol
> while (urqbp) {
> size += sizeof(ErtsPollSetUpdateRequestsBlock);
> pending_updates += urqbp->len;
> + urqbp = urqbp->next;
> }
> }
> #endif
>


Post received from mailinglist

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum