Erlang/OTP Forums

Author Message

<  RabbitMQ mailing list  ~  RabbitMQ internal error

Guest
Posted: Wed Feb 27, 2008 9:02 am Reply with quote
Guest
Hi,

Yesterday we experienced another problem with RabbitMQ. Possibly still
our own fault, but this time a bit more severe. Suddenly from out of
the blue it was impossible to send a single message through Rabbit.
Even restart of the components connecting to rabbit didn't help. The
erlang process stayed but didn't seem to work. Killing the beam
process helped and everything returned to normal.

In a log file we had:

ERROR 2008-02-26 16:17:32,857 --call got Closed:
Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content = None

The part of the message after "Closed:" is from a Qpid Python
exception. I'm still prepared to suspect something with threading and
are investigating. However, the other day when we had the threads
sharing a channel problem, the message in the same log was:

ERROR 2008-02-21 15:20:09,962 --send got Closed:
Method(name=close, id=60) (502, 'SYNTAX_ERROR', 60, 40) content = None

Unfortunately we don't have a rabbit.log of the incident. It seems
rabbit overwrites the old log when restarting without doing any form
of log rotation (hint hint Smile

Any suggestions or anything I can do find the problem is very welcome!!!

Regards,

Michael Arnoldus

Post recived from mailinglist
tonyg
Posted: Thu Feb 28, 2008 7:52 am Reply with quote
User Joined: 07 Nov 2006 Posts: 199
Hi Michael,

Michael Arnoldus wrote:
> Yesterday we experienced another problem with RabbitMQ. Possibly still
> our own fault, but this time a bit more severe. Suddenly from out of the
> blue it was impossible to send a single message through Rabbit. Even
> restart of the components connecting to rabbit didn't help. The erlang
> process stayed but didn't seem to work. Killing the beam process helped
> and everything returned to normal.

This is extremely interesting.

- What architecture are you running on? Is it a Mac?
- Was the CPU pinned to 100%?
- Were you able to issue commands at the Erlang prompt in the server?

We are tracking down what we suspect to be a Mac-specific bug in the
Erlang runtime that manifests in some corner-cases of socket shutdown -
it would be interesting if you have detected the same thing we're
chasing. (We are still in the early stages of our investigation - we
can't say for sure yet whether it's really a runtime problem.)

> In a log file we had:
> ERROR 2008-02-26 16:17:32,857 --call got Closed: Method(name=close,
> id=60) (541, 'INTERNAL_ERROR', 0, 0) content = None

If only the other log files hadn't been stomped on by the broker
startup! Your message has prompted us to fix this bad behaviour - we
have changed the startup scripts to move existing log files out of the
way, keeping the most recent few files.

The INTERNAL_ERROR message is very interesting, because it indicates a
real bug in the broker. We don't see it in the case of the Mac bug I
mentioned earlier, so you might have found something different.

This is probably the code that ran:

lookup_amqp_exception(Other) ->
rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
{true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.

... which produces a "Non-AMQP exit reason" message in the log. I'm
afraid without that message, we'll have a tough time diagnosing this one.

Regards,
Tony

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
View user's profile Send private message MSN Messenger
Guest
Posted: Thu Feb 28, 2008 9:18 am Reply with quote
Guest
Hi Tony,

Yes, we're running on a Mac. Don't know about the CPU as I
unfortunately wasn't present when I happened. I'll check to and see if
somebody else knows. Good to hear you have changed the startup script
and we'll be careful to save the log-file if it happens again and
check CPU utilization and if we can get an erlang shell.

Thank you for taking this so seriously. I will send you anything we
discover, and please let me know if you find anything that you think
might be relevant to this, or any experiments you would like us to try.

Regards,

Michael


On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:

> Hi Michael,
>
> Michael Arnoldus wrote:
>> Yesterday we experienced another problem with RabbitMQ. Possibly
>> still our own fault, but this time a bit more severe. Suddenly from
>> out of the blue it was impossible to send a single message through
>> Rabbit. Even restart of the components connecting to rabbit didn't
>> help. The erlang process stayed but didn't seem to work. Killing
>> the beam process helped and everything returned to normal.
>
> This is extremely interesting.
>
> - What architecture are you running on? Is it a Mac?
> - Was the CPU pinned to 100%?
> - Were you able to issue commands at the Erlang prompt in the server?
>
> We are tracking down what we suspect to be a Mac-specific bug in the
> Erlang runtime that manifests in some corner-cases of socket
> shutdown - it would be interesting if you have detected the same
> thing we're chasing. (We are still in the early stages of our
> investigation - we can't say for sure yet whether it's really a
> runtime problem.)
>
>> In a log file we had:
>> ERROR 2008-02-26 16:17:32,857 --call got Closed:
>> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =
>> None
>
> If only the other log files hadn't been stomped on by the broker
> startup! Your message has prompted us to fix this bad behaviour - we
> have changed the startup scripts to move existing log files out of
> the way, keeping the most recent few files.
>
> The INTERNAL_ERROR message is very interesting, because it indicates
> a real bug in the broker. We don't see it in the case of the Mac bug
> I mentioned earlier, so you might have found something different.
>
> This is probably the code that ran:
>
> lookup_amqp_exception(Other) ->
> rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
> {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
>
> ... which produces a "Non-AMQP exit reason" message in the log. I'm
> afraid without that message, we'll have a tough time diagnosing this
> one.
>
> Regards,
> Tony


_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Tue Mar 04, 2008 11:02 am Reply with quote
Guest
Found the problem.

Among other things we use AMQP/RabbitMQ as a transport for RPC style
calls. A fast way to implement this was to create a new anonymous
queue for the expected reply and then send the queue name in the
'reply to' field. We did and it worked, however we forgot two things:
1. Destroy the queue after it was used and 2. set the queue to auto-
delete so if the module actually crashes, the queues gets deleted
anyway. We have a watch-dog functionality that will ping all our AMQP
modules, and in case of no reply (over some time) it will kill the
module and make it restart.

So when we ran out of queues RabbitMQ simply stopped responding,
causing the watch-dog to kill the AMQP modules, they will restart and
try again, ....

The result was a heap of clients and a heap of queues.

The fix was 3 things: Set RPC reply queues to auto-delete, destroy
them actively after use or timeout, modify the watchdog so it wont
kill anything unless it's actually able to ping itself through AMQP.

Now everything works with a steady queue count.

Thank to Tony for all the help in finding this bug. Your support is
awesome!!!

Regards,

Michael Arnoldus

On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:

> Hi Michael,
>
> Michael Arnoldus wrote:
>> Yesterday we experienced another problem with RabbitMQ. Possibly
>> still our own fault, but this time a bit more severe. Suddenly from
>> out of the blue it was impossible to send a single message through
>> Rabbit. Even restart of the components connecting to rabbit didn't
>> help. The erlang process stayed but didn't seem to work. Killing
>> the beam process helped and everything returned to normal.
>
> This is extremely interesting.
>
> - What architecture are you running on? Is it a Mac?
> - Was the CPU pinned to 100%?
> - Were you able to issue commands at the Erlang prompt in the server?
>
> We are tracking down what we suspect to be a Mac-specific bug in the
> Erlang runtime that manifests in some corner-cases of socket
> shutdown - it would be interesting if you have detected the same
> thing we're chasing. (We are still in the early stages of our
> investigation - we can't say for sure yet whether it's really a
> runtime problem.)
>
>> In a log file we had:
>> ERROR 2008-02-26 16:17:32,857 --call got Closed:
>> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =
>> None
>
> If only the other log files hadn't been stomped on by the broker
> startup! Your message has prompted us to fix this bad behaviour - we
> have changed the startup scripts to move existing log files out of
> the way, keeping the most recent few files.
>
> The INTERNAL_ERROR message is very interesting, because it indicates
> a real bug in the broker. We don't see it in the case of the Mac bug
> I mentioned earlier, so you might have found something different.
>
> This is probably the code that ran:
>
> lookup_amqp_exception(Other) ->
> rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
> {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
>
> ... which produces a "Non-AMQP exit reason" message in the log. I'm
> afraid without that message, we'll have a tough time diagnosing this
> one.
>
> Regards,
> Tony



Post recived from mailinglist
alexis
Posted: Tue Mar 04, 2008 11:42 am Reply with quote
User Joined: 06 Sep 2007 Posts: 80 Location: London
Michael

Some of your client code might be useful for creating some of a
management tool. Might it be separable from your proprietary code,
and sharable in any form?

alexis


On Tue, Mar 4, 2008 at 11:01 AM, Michael Arnoldus <chime@mu.dk> wrote:
> Found the problem.
>
> Among other things we use AMQP/RabbitMQ as a transport for RPC style
> calls. A fast way to implement this was to create a new anonymous
> queue for the expected reply and then send the queue name in the
> 'reply to' field. We did and it worked, however we forgot two things:
> 1. Destroy the queue after it was used and 2. set the queue to auto-
> delete so if the module actually crashes, the queues gets deleted
> anyway. We have a watch-dog functionality that will ping all our AMQP
> modules, and in case of no reply (over some time) it will kill the
> module and make it restart.
>
> So when we ran out of queues RabbitMQ simply stopped responding,
> causing the watch-dog to kill the AMQP modules, they will restart and
> try again, ....
>
> The result was a heap of clients and a heap of queues.
>
> The fix was 3 things: Set RPC reply queues to auto-delete, destroy
> them actively after use or timeout, modify the watchdog so it wont
> kill anything unless it's actually able to ping itself through AMQP.
>
> Now everything works with a steady queue count.
>
> Thank to Tony for all the help in finding this bug. Your support is
> awesome!!!
>
> Regards,
>
> Michael Arnoldus
>
>
> On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:
>
>
>
> > Hi Michael,
> >
> > Michael Arnoldus wrote:
> >> Yesterday we experienced another problem with RabbitMQ. Possibly
> >> still our own fault, but this time a bit more severe. Suddenly from
> >> out of the blue it was impossible to send a single message through
> >> Rabbit. Even restart of the components connecting to rabbit didn't
> >> help. The erlang process stayed but didn't seem to work. Killing
> >> the beam process helped and everything returned to normal.
> >
> > This is extremely interesting.
> >
> > - What architecture are you running on? Is it a Mac?
> > - Was the CPU pinned to 100%?
> > - Were you able to issue commands at the Erlang prompt in the server?
> >
> > We are tracking down what we suspect to be a Mac-specific bug in the
> > Erlang runtime that manifests in some corner-cases of socket
> > shutdown - it would be interesting if you have detected the same
> > thing we're chasing. (We are still in the early stages of our
> > investigation - we can't say for sure yet whether it's really a
> > runtime problem.)
> >
> >> In a log file we had:
> >> ERROR 2008-02-26 16:17:32,857 --call got Closed:
> >> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =
> >> None
> >
> > If only the other log files hadn't been stomped on by the broker
> > startup! Your message has prompted us to fix this bad behaviour - we
> > have changed the startup scripts to move existing log files out of
> > the way, keeping the most recent few files.
> >
> > The INTERNAL_ERROR message is very interesting, because it indicates
> > a real bug in the broker. We don't see it in the case of the Mac bug
> > I mentioned earlier, so you might have found something different.
> >
> > This is probably the code that ran:
> >
> > lookup_amqp_exception(Other) ->
> > rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
> > {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
> >
> > ... which produces a "Non-AMQP exit reason" message in the log. I'm
> > afraid without that message, we'll have a tough time diagnosing this
> > one.
> >
> > Regards,
> > Tony
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss@lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>



--
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
View user's profile Send private message Yahoo Messenger
Guest
Posted: Tue Mar 04, 2008 12:21 pm Reply with quote
Guest
Alexis,

Yes, that might be possible. Not sure what you think would be useful
though, could you elaborate a bit?

Everything we have done has been done in Python (so far at least) and
not in erlang - just so you know Smile

Regards,

Michael

On Mar 4, 2008, at 12:42 , Alexis Richardson wrote:

> Michael
>
> Some of your client code might be useful for creating some of a
> management tool. Might it be separable from your proprietary code,
> and sharable in any form?
>
> alexis
>
>
> On Tue, Mar 4, 2008 at 11:01 AM, Michael Arnoldus <chime@mu.dk> wrote:
>> Found the problem.
>>
>> Among other things we use AMQP/RabbitMQ as a transport for RPC style
>> calls. A fast way to implement this was to create a new anonymous
>> queue for the expected reply and then send the queue name in the
>> 'reply to' field. We did and it worked, however we forgot two things:
>> 1. Destroy the queue after it was used and 2. set the queue to auto-
>> delete so if the module actually crashes, the queues gets deleted
>> anyway. We have a watch-dog functionality that will ping all our AMQP
>> modules, and in case of no reply (over some time) it will kill the
>> module and make it restart.
>>
>> So when we ran out of queues RabbitMQ simply stopped responding,
>> causing the watch-dog to kill the AMQP modules, they will restart and
>> try again, ....
>>
>> The result was a heap of clients and a heap of queues.
>>
>> The fix was 3 things: Set RPC reply queues to auto-delete, destroy
>> them actively after use or timeout, modify the watchdog so it wont
>> kill anything unless it's actually able to ping itself through AMQP.
>>
>> Now everything works with a steady queue count.
>>
>> Thank to Tony for all the help in finding this bug. Your support is
>> awesome!!!
>>
>> Regards,
>>
>> Michael Arnoldus
>>
>>
>> On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:
>>
>>
>>
>>> Hi Michael,
>>>
>>> Michael Arnoldus wrote:
>>>> Yesterday we experienced another problem with RabbitMQ. Possibly
>>>> still our own fault, but this time a bit more severe. Suddenly from
>>>> out of the blue it was impossible to send a single message through
>>>> Rabbit. Even restart of the components connecting to rabbit didn't
>>>> help. The erlang process stayed but didn't seem to work. Killing
>>>> the beam process helped and everything returned to normal.
>>>
>>> This is extremely interesting.
>>>
>>> - What architecture are you running on? Is it a Mac?
>>> - Was the CPU pinned to 100%?
>>> - Were you able to issue commands at the Erlang prompt in the
>>> server?
>>>
>>> We are tracking down what we suspect to be a Mac-specific bug in the
>>> Erlang runtime that manifests in some corner-cases of socket
>>> shutdown - it would be interesting if you have detected the same
>>> thing we're chasing. (We are still in the early stages of our
>>> investigation - we can't say for sure yet whether it's really a
>>> runtime problem.)
>>>
>>>> In a log file we had:
>>>> ERROR 2008-02-26 16:17:32,857 --call got Closed:
>>>> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =
>>>> None
>>>
>>> If only the other log files hadn't been stomped on by the broker
>>> startup! Your message has prompted us to fix this bad behaviour - we
>>> have changed the startup scripts to move existing log files out of
>>> the way, keeping the most recent few files.
>>>
>>> The INTERNAL_ERROR message is very interesting, because it indicates
>>> a real bug in the broker. We don't see it in the case of the Mac bug
>>> I mentioned earlier, so you might have found something different.
>>>
>>> This is probably the code that ran:
>>>
>>> lookup_amqp_exception(Other) ->
>>> rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
>>> {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
>>>
>>> ... which produces a "Non-AMQP exit reason" message in the log. I'm
>>> afraid without that message, we'll have a tough time diagnosing this
>>> one.
>>>
>>> Regards,
>>> Tony
>>
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss@lists.rabbitmq.com
>> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>
>
>
> --
> Alexis Richardson
> +44 20 7617 7339 (UK)
> +44 77 9865 2911 (cell)
> +1 650 206 2517 (US)



Post recived from mailinglist
alexis
Posted: Tue Mar 04, 2008 1:41 pm Reply with quote
User Joined: 06 Sep 2007 Posts: 80 Location: London
Michael

Python is perfect. I was thinking of client use cases for set up,
destroy after time out, etc. This would be useful for folks like John
I think...

alexis


On Tue, Mar 4, 2008 at 12:20 PM, Michael Arnoldus <chime@mu.dk> wrote:
> Alexis,
>
> Yes, that might be possible. Not sure what you think would be useful
> though, could you elaborate a bit?
>
> Everything we have done has been done in Python (so far at least) and
> not in erlang - just so you know Smile
>
> Regards,
>
> Michael
>
>
>
> On Mar 4, 2008, at 12:42 , Alexis Richardson wrote:
>
> > Michael
> >
> > Some of your client code might be useful for creating some of a
> > management tool. Might it be separable from your proprietary code,
> > and sharable in any form?
> >
> > alexis
> >
> >
> > On Tue, Mar 4, 2008 at 11:01 AM, Michael Arnoldus <chime@mu.dk> wrote:
> >> Found the problem.
> >>
> >> Among other things we use AMQP/RabbitMQ as a transport for RPC style
> >> calls. A fast way to implement this was to create a new anonymous
> >> queue for the expected reply and then send the queue name in the
> >> 'reply to' field. We did and it worked, however we forgot two things:
> >> 1. Destroy the queue after it was used and 2. set the queue to auto-
> >> delete so if the module actually crashes, the queues gets deleted
> >> anyway. We have a watch-dog functionality that will ping all our AMQP
> >> modules, and in case of no reply (over some time) it will kill the
> >> module and make it restart.
> >>
> >> So when we ran out of queues RabbitMQ simply stopped responding,
> >> causing the watch-dog to kill the AMQP modules, they will restart and
> >> try again, ....
> >>
> >> The result was a heap of clients and a heap of queues.
> >>
> >> The fix was 3 things: Set RPC reply queues to auto-delete, destroy
> >> them actively after use or timeout, modify the watchdog so it wont
> >> kill anything unless it's actually able to ping itself through AMQP.
> >>
> >> Now everything works with a steady queue count.
> >>
> >> Thank to Tony for all the help in finding this bug. Your support is
> >> awesome!!!
> >>
> >> Regards,
> >>
> >> Michael Arnoldus
> >>
> >>
> >> On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:
> >>
> >>
> >>
> >>> Hi Michael,
> >>>
> >>> Michael Arnoldus wrote:
> >>>> Yesterday we experienced another problem with RabbitMQ. Possibly
> >>>> still our own fault, but this time a bit more severe. Suddenly from
> >>>> out of the blue it was impossible to send a single message through
> >>>> Rabbit. Even restart of the components connecting to rabbit didn't
> >>>> help. The erlang process stayed but didn't seem to work. Killing
> >>>> the beam process helped and everything returned to normal.
> >>>
> >>> This is extremely interesting.
> >>>
> >>> - What architecture are you running on? Is it a Mac?
> >>> - Was the CPU pinned to 100%?
> >>> - Were you able to issue commands at the Erlang prompt in the
> >>> server?
> >>>
> >>> We are tracking down what we suspect to be a Mac-specific bug in the
> >>> Erlang runtime that manifests in some corner-cases of socket
> >>> shutdown - it would be interesting if you have detected the same
> >>> thing we're chasing. (We are still in the early stages of our
> >>> investigation - we can't say for sure yet whether it's really a
> >>> runtime problem.)
> >>>
> >>>> In a log file we had:
> >>>> ERROR 2008-02-26 16:17:32,857 --call got Closed:
> >>>> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =
> >>>> None
> >>>
> >>> If only the other log files hadn't been stomped on by the broker
> >>> startup! Your message has prompted us to fix this bad behaviour - we
> >>> have changed the startup scripts to move existing log files out of
> >>> the way, keeping the most recent few files.
> >>>
> >>> The INTERNAL_ERROR message is very interesting, because it indicates
> >>> a real bug in the broker. We don't see it in the case of the Mac bug
> >>> I mentioned earlier, so you might have found something different.
> >>>
> >>> This is probably the code that ran:
> >>>
> >>> lookup_amqp_exception(Other) ->
> >>> rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
> >>> {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
> >>>
> >>> ... which produces a "Non-AMQP exit reason" message in the log. I'm
> >>> afraid without that message, we'll have a tough time diagnosing this
> >>> one.
> >>>
> >>> Regards,
> >>> Tony
> >>
> >>
> >> _______________________________________________
> >> rabbitmq-discuss mailing list
> >> rabbitmq-discuss@lists.rabbitmq.com
> >> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> >>
> >>
> >
> >
> >
> > --
> > Alexis Richardson
> > +44 20 7617 7339 (UK)
> > +44 77 9865 2911 (cell)
> > +1 650 206 2517 (US)
>
>



--
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
View user's profile Send private message Yahoo Messenger

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum