Erlang Mailing Lists

Author Message

<  RabbitMQ mailing list  ~  RabbitMQ running at 100% CPU.

Guest
Posted: Thu Mar 13, 2008 9:38 am Reply with quote
Guest
Hi,

We've had a rabbit server running for quite some time (not sure
precisely how long but between 1 and 2 weeks). This morning we had
some difficulty connecting to it from the outside and on further
investigation we found it used constantly between 99.8 and 100.3 %
CPU. We are using 5.5.5 (11-5 as far as I remember), not compiled with
SMP, running on Mac OS X 10.5.2. Going into erlang it looks like it
isn't doing anything. No messages waiting in queues, at at most 30.000
reductions on any process in 10 sec. Now it is some time ago that I
have debugged erlang programs, so I'm not sure here, but as far as I
remember, 30.000 really isn't anything and certainly doesn't explain
100% CPU utilization. It's not using a lot of memory - and in short
everything looks quite ok (for me) - if it wasn't because it's hogging
the CPU.

I've kept it running. I'm sure if I restart erlang everything will
just run perfectly, but I'd rather use this incident to try to find
the problem if it's possible. However i'm running out of things to
try, so any suggestions will be welcome.

BTW - noting beyond the ordinary in rabbit.log and nothing in rabbit-
sasl.log. Also, we using QPID pythin client so we're conecting in lax
mode.

If nobody has any other suggestions I'd be inclined to upgrade to the
latest version of erlang. Does anybody have any experince using this
with Rabbit?

Regards,

Michael Arnoldus


_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Thu Mar 13, 2008 9:49 am Reply with quote
Guest
Michael,

Michael Arnoldus wrote:
> I've kept it running. I'm sure if I restart erlang everything will
> just run perfectly, but I'd rather use this incident to try to find
> the problem if it's possible. However i'm running out of things to
> try, so any suggestions will be welcome.

Try running strace on the Erlang process, to see whether it's stuck in
some busy loop making system calls.

> If nobody has any other suggestions I'd be inclined to upgrade to the
> latest version of erlang. Does anybody have any experince using this
> with Rabbit?

Several people have been running with R12B in their development
environments without any problems. It should be fine, but we haven't
done any thorough testing with it.


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Thu Mar 13, 2008 2:17 pm Reply with quote
Guest
Matthias,

Thank you for your suggestion. There is no strace on Mac OS X, but I
did find a way to see what the C-program was doing (see below). Most
of the threads are just waiting (as expected) but a single one seems
always to be in some kind of tcp_send_error. I haven't yet found a way
to see if this infinite loop is in the erlang runtime system or the
error actually reaches rabbit. Anyway, just a status. I'll try to work
out what I can - Rabbit is still running Smile

Michael


1002 Thread_2503
340 writev$UNIX2003
340 writev$UNIX2003
299 kevent
299 kevent
257 tcp_inet_drv_output
197 getpeername$UNIX2003
197 getpeername$UNIX2003
42 tcp_send_error
32 inet_reply_error
16 error_atom
8 __tolower
8 __tolower
4 pthread_getspecific
4 pthread_getspecific
3 error_atom
1 dyld_stub_pthread_getspecific
1 dyld_stub_pthread_getspecific
3 hash_put
3 hash_put
3 strlen
3 strlen
2 atom_hash
2 atom_hash
2 driver_mk_atom
2 driver_mk_atom
2 dyld_stub___tolower
2 dyld_stub___tolower
1 am_atom_put
1 am_atom_put
1 erl_errno_id
1 erl_errno_id
1 index_put
1 index_put
1 inet_reply_error
4 inet_reply_error_am
4 inet_reply_error_am
4 tcp_send_error
1 __error
1 __error
1 driver_send_term
1 driver_send_term
8 tcp_inet_drv_output
4 cerror
3 cthread_set_errno_self
3 __error
3 __error
1 cerror
2 _sysenter_trap
2 _sysenter_trap
2 inet_reply_error
2 inet_reply_error
1 __error
1 __error
1 dyld_stub___error
1 dyld_stub___error
18 check_async_ready
6 __spin_lock
6 __spin_lock
5 pthread_mutex_unlock
5 pthread_mutex_unlock
4 pthread_mutex_lock
4 pthread_mutex_lock
2 check_async_ready
1 spin_lock
1 spin_lock
14 erts_deliver_time
9 gettimeofday
9 __gettimeofday
5 __gettimeofday
4 __nanotime
4 __nanotime
4 erts_deliver_time
1 __commpage_gettimeofday
1 __commpage_gettimeofday
13 erts_time_remaining
9 gettimeofday
8 __gettimeofday
7 __nanotime
7 __nanotime
1 __gettimeofday
1 gettimeofday
4 erts_time_remaining
9 erts_poll_wait_kp
7 erts_poll_wait_kp
2 kevent
2 kevent
8 0x38830014
3 bcmp
3 bcmp
3 hash_put
3 hash_put
2 atom_cmp
2 atom_cmp
8 cerror
4 cthread_set_errno_self
3 cthread_set_errno_self
1 __error
1 __error
3 cerror
1 dyld_stub___error
1 dyld_stub___error
7 sweep_proc_bins
7 sweep_proc_bins
6 erl_sys_schedule
6 erl_sys_schedule
4 schedule
4 schedule
3 erts_check_io_kp
3 erts_check_io_kp
2 __error
2 __error
2 dyld_stub___error
2 dyld_stub___error
2 process_main
2 process_main
1 _sysenter_trap
1 _sysenter_trap
1 copy_shallow
1 copy_shallow
1 copy_struct
1 copy_struct
1 db_get_hash
1 db_get_hash
1 driver_peekq
1 driver_peekq
1 dyld_stub_gettimeofday
1 dyld_stub_gettimeofday
1 erts_check_io_interrupt_kp
1 erts_check_io_interrupt_kp
1 erts_poll_interrupt_kp
1 erts_poll_interrupt_kp
1 free_message_buffer
1 erts_cleanup_mso
1 erts_cleanup_mso
1 tcp_inet_drv_input
1 tcp_recv
1 tcp_deliver
1 deq_async
1 deq_async_w_tmo
1 deq_async_w_tmo
On Mar 13, 2008, at 10:49 , Matthias Radestock wrote:

> Michael,
>
> Michael Arnoldus wrote:
>> I've kept it running. I'm sure if I restart erlang everything will
>> just run perfectly, but I'd rather use this incident to try to
>> find the problem if it's possible. However i'm running out of
>> things to try, so any suggestions will be welcome.
>
> Try running strace on the Erlang process, to see whether it's stuck
> in some busy loop making system calls.
>
>> If nobody has any other suggestions I'd be inclined to upgrade to
>> the latest version of erlang. Does anybody have any experince
>> using this with Rabbit?
>
> Several people have been running with R12B in their development
> environments without any problems. It should be fine, but we haven't
> done any thorough testing with it.
>
>
> Matthias.


_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Thu Mar 13, 2008 2:36 pm Reply with quote
Guest
Matthias,

And more info - it seems the error returned is 32 which in errno.h is
EPIPE.

In the manual page for writev it says:

[EPIPE] An attempt is made to write to a socket of type
SOCK_STREAM that is not connected to a peer socket.

Does this make any sense to you at all? Do you have an idea if I
should continue investigating the erlang runtime system or do you
think it is something inside Rabbit?

Michael

On Mar 13, 2008, at 10:49 , Matthias Radestock wrote:

> Michael,
>
> Michael Arnoldus wrote:
>> I've kept it running. I'm sure if I restart erlang everything will
>> just run perfectly, but I'd rather use this incident to try to
>> find the problem if it's possible. However i'm running out of
>> things to try, so any suggestions will be welcome.
>
> Try running strace on the Erlang process, to see whether it's stuck
> in some busy loop making system calls.
>
>> If nobody has any other suggestions I'd be inclined to upgrade to
>> the latest version of erlang. Does anybody have any experince
>> using this with Rabbit?
>
> Several people have been running with R12B in their development
> environments without any problems. It should be fine, but we haven't
> done any thorough testing with it.
>
>
> Matthias.


_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Thu Mar 13, 2008 3:11 pm Reply with quote
Guest
Michael,

Michael Arnoldus wrote:
> Thank you for your suggestion. There is no strace on Mac OS X, but I did
> find a way to see what the C-program was doing (see below).

I gather there's ktrace and/or dtrace.

> Most of the
> threads are just waiting (as expected) but a single one seems always to
> be in some kind of tcp_send_error.

That is interesting. Tony has seen similar symptoms on his MacBook here,
but so far we have failed to reproduce the problem anywhere else.

Please let us know as much about your system as possible. Exact OS
version, Erlang/OTP version, RabbitMQ version, number and exact type of
processors, memory configuration, and anything else you can think of
that might be relevant.


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Thu Mar 13, 2008 3:36 pm Reply with quote
Guest
Matthias,

Seen on several Mac's here, but the one where we have the problem
right now is a Quad-core 2,8 GHz Mac Pro. Also seen on a dual core. 2
GB Mem. Running Mac OS X 10.5.2 (Leopard). Erlang 11B-5, Rabbit 1.2.0.

I actually have a system running with the problem, so if I can help in
any way tracing the problem down, let me know.

Yes, Mac OS X has dtrace.

It seems to me like it should be possible to find an possible infinite
loop on writev error writing to a socket, either in the erlang code or
the rabbit code (but then again - what do I know Smile Let me know if
you have any suggestions on where I should start looking.

Michael



On Mar 13, 2008, at 16:11 , Matthias Radestock wrote:

> Michael,
>
> Michael Arnoldus wrote:
>> Thank you for your suggestion. There is no strace on Mac OS X, but
>> I did find a way to see what the C-program was doing (see below).
>
> I gather there's ktrace and/or dtrace.
>
>> Most of the threads are just waiting (as expected) but a single one
>> seems always to be in some kind of tcp_send_error.
>
> That is interesting. Tony has seen similar symptoms on his MacBook
> here, but so far we have failed to reproduce the problem anywhere
> else.
>
> Please let us know as much about your system as possible. Exact OS
> version, Erlang/OTP version, RabbitMQ version, number and exact type
> of processors, memory configuration, and anything else you can think
> of that might be relevant.
>
>
> Matthias.


_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Tue Mar 25, 2008 2:55 pm Reply with quote
Guest
Matthias (and others)

Could this http://www.nabble.com/Unix-spawn-driver-does-not-close-all-file-descriptors-in-child-process-td15796733.html
explain some of the weirdness we have seen on OS X?

Michael

On Mar 13, 2008, at 16:36 , Michael Arnoldus wrote:

> Matthias,
>
> Seen on several Mac's here, but the one where we have the problem
> right now is a Quad-core 2,8 GHz Mac Pro. Also seen on a dual core.
> 2 GB Mem. Running Mac OS X 10.5.2 (Leopard). Erlang 11B-5, Rabbit
> 1.2.0.
>
> I actually have a system running with the problem, so if I can help
> in any way tracing the problem down, let me know.
>
> Yes, Mac OS X has dtrace.
>
> It seems to me like it should be possible to find an possible
> infinite loop on writev error writing to a socket, either in the
> erlang code or the rabbit code (but then again - what do I know Smile
> Let me know if you have any suggestions on where I should start
> looking.
>
> Michael
>
>
>
> On Mar 13, 2008, at 16:11 , Matthias Radestock wrote:
>
>> Michael,
>>
>> Michael Arnoldus wrote:
>>> Thank you for your suggestion. There is no strace on Mac OS X, but
>>> I did find a way to see what the C-program was doing (see below).
>>
>> I gather there's ktrace and/or dtrace.
>>
>>> Most of the threads are just waiting (as expected) but a single
>>> one seems always to be in some kind of tcp_send_error.
>>
>> That is interesting. Tony has seen similar symptoms on his MacBook
>> here, but so far we have failed to reproduce the problem anywhere
>> else.
>>
>> Please let us know as much about your system as possible. Exact OS
>> version, Erlang/OTP version, RabbitMQ version, number and exact
>> type of processors, memory configuration, and anything else you can
>> think of that might be relevant.
>>
>>
>> Matthias.
>



Post recived from mailinglist
Guest
Posted: Tue Mar 25, 2008 6:32 pm Reply with quote
Guest
Michael,

Michael Arnoldus wrote:
> Could this
> http://www.nabble.com/Unix-spawn-driver-does-not-close-all-file-descriptors-in-child-process-td15796733.html explain
> some of the weirdness we have seen on OS X?

Possibly, but a) we have only seen the problem on Leopard, and not on
any earlier versions of Mac OS X, and b) AFAIK rabbit doesn't do any
spawning.


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Tue Mar 25, 2008 8:45 pm Reply with quote
Guest
Oh well - I'll keep searching as it seems to reappear every week.

Thanks for your reply!

Michael

On Mar 25, 2008, at 19:31 , Matthias Radestock wrote:

> Michael,
>
> Michael Arnoldus wrote:
>> Could this http://www.nabble.com/Unix-spawn-driver-does-not-close-all-file-descriptors-in-child-process-td15796733.html
>> explain some of the weirdness we have seen on OS X?
>
> Possibly, but a) we have only seen the problem on Leopard, and not
> on any earlier versions of Mac OS X, and b) AFAIK rabbit doesn't do
> any spawning.
>
>
> Matthias.



Post recived from mailinglist

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum