Erlang/OTP Forums

Author Message

<  Yaws mailing list  ~  'noproc' error is returned, at random, from ssl_accept

Guest
Posted: Sat May 17, 2008 7:09 pm Reply with quote
Guest
I have been working on a problem that may be yaws, erlang (and
horrors) my application. While doing the research I have made some
discoveries. But first a description of the problem:

PROBLEM: while my YAWS application is running and processing
transactions... from time to time it stops accepting incoming socket
connections. The client program receives "connection refused"
responses to the connection request. I have inspected the client and
the server systems and confirmed that the server is running, however,
not responding. I have attempted telnet connections from the
localhost and from a remote client, with the same results. When I
looked into the report.log file I found:


=ERROR REPORT==== 14-May-2008::02:16:56 ===
SSL accept failed: normal

... snip ...

=ERROR REPORT==== 14-May-2008::02:16:56 ===
** Generic server <0.894.2> terminating
** Last message in was {transport_accept,<0.888.2>,
{sslsocket,6,<0.77.0>},
10000}
** When Server state ==
{st,acceptor,<0.893.2>,<0.888.2>,<0.888.2>,nil,true,
[],nil,nil,nil,nil,false,false}
** Reason for termination ==
** {noproc,{gen_server,call,
[<0.77.0>,
{getopts,<0.894.2>,
[nodelay,active,packet,mode,header,ip,
backlog]},
10000]}}

=ERROR REPORT==== 14-May-2008::02:16:56 ===
** Generic server <0.895.2> terminating
** Last message in was {transport_accept,<0.889.2>,
{sslsocket,5,<0.72.0>},
10000}
** When Server state ==
{st,acceptor,<0.893.2>,<0.889.2>,<0.889.2>,nil,true,
[],nil,nil,nil,nil,false,false}
** Reason for termination ==
** {noproc,{gen_server,call,
[<0.72.0>,
{getopts,<0.895.2>,
[nodelay,active,packet,mode,header,ip,
backlog]},
10000]}}



HARDWARE & OS: The application is installed on two complete OpenBSD
4.2 systems with their own primary IP address and a common CARP IP.
The CPU is a VIA 1GHz with 1GB of RAM with 80GB of disk.

APPLICATION: The application is deployed in SSL mode only with one
application running on port 444 and one application on port 443. Only
the application on 443 runs an both the exclusive IP and the shared
CARP IP.

THE TRANSACTION: is a REST-like transaction with GET/POST parameters,
performs some processing, and then returns a plaintext body to the
caller (over https). The transactions themselves are all about the
same in time and complexity. In fact the error condition occurs while
nagios is polling the application. (the application is not in
production yet and nagios runs three test transaction about every 10
to 30 seconds. On transaction to each of the private IP address and
one to the CARP. (my transaction makes use of Mnesia and replication
too)

(getting CARP to identify this error and take corrective action is on
the agenda... but for now I want to find the root cause to the issue).


STEPS I HAVE TAKEN: Klacke has recommended that I run; c:i() and
inet:i(). I do not understand the output, however, if there is
interest from the list I will post it. I have performed several code
reviews of my code and YAWS. I found a non-tail-recursion problem in
my code that I have since repaired. I'm hoping that it was the only
one, however, the tail-recursion optimizer is not full documented or
clear to me so I cannot be sure that I have found them all. I have
recommended that someone in the erlang family might implement an erl-
lint to find these things.

CONTINUED:

There was mention that the YAWS server periodically goes off the air
for no particular reason. LSOF(fstat on OpenBSD) seems to indicate
that the system is running out of file handles. This was not the case
in my experience. The real issue is trying to capture the event that
is causing the symptoms.

I think I have mentioned everything... except that the DEV tree of the
YAWS code has some new features that I hope to try. They include
escalation of some exceptions to force the "heart" to restart YAWS.
And a tool for dumping the state of yaws that might provide the key to
what's going wrong (I hope).

If anyone has any advice I'm open to your suggestions. I have a test
environment that can run millions of transactions in a day in order to
simulate any hypothesis. Thanks to everyone who participated on this
subject privately before I wrote this email.

/r


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Erlyaws-list mailing list
Erlyaws-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/erlyaws-list
Post received from mailinglist
Guest
Posted: Sat May 17, 2008 7:53 pm Reply with quote
Guest
Richard Bucker wrote:

>
> STEPS I HAVE TAKEN: Klacke has recommended that I run; c:i() and
> inet:i(). I do not understand the output,

Richard and I have been communicating privately and I think
that maybe I was barking up the wrong tree. I though that maybe he
had similar problems as yaws.hyber.org has had I.e an fd leak
somewhere - not necessarily in yaws - maybe outside - still unknown.

To try to remedy this problem I've (in trunk) introduced a
really nice debugging facility

# yaws --debug-dump

That produces a lot of good info on stdout. Typically a good idea
for sites that appear to die out of the blue is to run this debug-dump
from a cron script regularly



> There was mention that the YAWS server periodically goes off the air
> for no particular reason. LSOF(fstat on OpenBSD) seems to indicate
> that the system is running out of file handles. This was not the case
> in my experience. The real issue is trying to capture the event that
> is causing the symptoms.
>
> I think I have mentioned everything... except that the DEV tree of the
> YAWS code has some new features that I hope to try. They include
> escalation of some exceptions to force the "heart" to restart YAWS.


Actually - I saw this the other day, yaws_sup.erl had a really odd
restart strategy. I changed it radically from

- {ok,{{one_for_all,10,30}, [YawsLog, YawsRSS, YawsServ, Sess,

to

+ {ok,{{one_for_all, 0, 1}, [YawsLog, YawsRSS, YawsServ, Sess,

I think that if yaws_server dies - we have the following choices

1. embedded mode - some other supervisor restarts yaws
2. normal mode - die and let heart restart

Big Change - few characters !!


As for Richards problems - try to run the debug-dump from a cron script
and in particular run debug-dump once the system is dead.



/klacke

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Erlyaws-list mailing list
Erlyaws-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/erlyaws-list
Post received from mailinglist
wuji
Posted: Mon Aug 13, 2012 7:29 am Reply with quote
User Joined: 10 Aug 2012 Posts: 654
of one percent of those stops did the motorist choose choose [h2]cheap polo ralph lauren[/h2] choose to flee."Attorney Stephanie Yablow lobbied hard for reforms after
elderly mother and father were injured in a high speed speed jordan 6 olympic speed chase. She was shocked to learn how little recourse
had."You've got to ask, is it worth it?" she said.It's said.It's cheap Ralph Lauren said.It's a judgment police officers have to make on the
Jurors Convict Accused Ringleader17-Year-Old Found Guilty of Aggravated Battery in in jordan 6 olympic 2012 in Teen-Burning TrialBy ALON HARISHJune 19. 2012— Jurors in Fort
Fla., Tuesday evening convicted 17-year-old Matthew Bent of aggravated battery battery [h4]cheap Ralph Lauren[/h4] battery for his role in the 2009 torching of Michael
his middle-school classmate.Prosecutors failed to secure a conviction on Bent's Bent's [h3]cheap designer *beep*[/h3] Bent's second-degree attempted murder charge. Bent would have faced up
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum