| Author |
Message |
|
| Guest |
Posted: Thu May 13, 2010 8:16 pm |
|
|
|
Guest
|
We've recently experienced a 100% repeatable case of R13B02-1 and
R13B04 beam core dumps on a Montavista Linux system running on a
Cavium Octeon processor. This is technically not a problem with the
Erlang VM but rather is a case where certain socket-related calls in
this version of Montavista Linux, based on a 2.6.21 Linux kernel, are
indicating failure by returning negative numbers with large absolute
values, such as negative 0x40000, rather than returning -1 to indicate
failure as they should. The Erlang TCP code expects system calls to
return -1 to indicate errors and compares directly against that value,
so it ends up treating these negative return values as successful and
adds them to internal pointers as bytes written, offsets, etc. This in
turn corrupts these pointers and causes beam to dump core.
This patch introduces a portability macro to test for system call
failure within inet_drv.c:
git fetch git://github.com/vinoski/otp.git socket_error_portability
--steve
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Thu May 13, 2010 8:16 pm |
|
|
|
Guest
|
We've recently experienced a 100% repeatable case of R13B02-1 and
R13B04 beam core dumps on a Montavista Linux system running on a
Cavium Octeon processor. This is technically not a problem with the
Erlang VM but rather is a case where certain socket-related calls in
this version of Montavista Linux, based on a 2.6.21 Linux kernel, are
indicating failure by returning negative numbers with large absolute
values, such as negative 0x40000, rather than returning -1 to indicate
failure as they should. The Erlang TCP code expects system calls to
return -1 to indicate errors and compares directly against that value,
so it ends up treating these negative return values as successful and
adds them to internal pointers as bytes written, offsets, etc. This in
turn corrupts these pointers and causes beam to dump core.
This patch introduces a portability macro to test for system call
failure within inet_drv.c:
git fetch git://github.com/vinoski/otp.git socket_error_portability
--steve
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Thu May 13, 2010 10:12 pm |
|
|
|
Guest
|
Steve Vinoski writes:
> We've recently experienced a 100% repeatable case of R13B02-1 and
> R13B04 beam core dumps on a Montavista Linux system running on a
> Cavium Octeon processor. This is technically not a problem with the
> Erlang VM but rather is a case where certain socket-related calls in
> this version of Montavista Linux, based on a 2.6.21 Linux kernel, are
> indicating failure by returning negative numbers with large absolute
> values, such as negative 0x40000, rather than returning -1 to indicate
> failure as they should. The Erlang TCP code expects system calls to
> return -1 to indicate errors and compares directly against that value,
> so it ends up treating these negative return values as successful and
> adds them to internal pointers as bytes written, offsets, etc. This in
> turn corrupts these pointers and causes beam to dump core.
>
> This patch introduces a portability macro to test for system call
> failure within inet_drv.c:
>
> git fetch git://github.com/vinoski/otp.git socket_error_portability
I'm not happy with this patch.
First, you're changing (weakening) error checking on all non-Win32
platforms, not just the apparently horribly broken MVL. At a minimum
the new IS_SOCKET_ERROR() macro should be written as
#if <whatever the test for Montavista's butchered Linux is>
#define IS_SOCKET_ERROR(val) ((val) < 0)
#else
#define IS_SOCKET_ERROR(val) ((val) == SOCKET_ERROR)
#endif
Second, the change needs a big comment. Otherwise someone who's ever
read the POSIX or SuS specs will want to clean it up and turn it back
to the original, not realising the reason for the out-of-spec error check.
(Personally I'd rather wrap the broken socket calls with functions that
convert them to sane APIs, in this case just fix up error returns, than
to force all users to adapt to the breakage. But lots of code in erts
could do with that sort of cleanup, so I'm not going to complain if you
don't want to do it that way.)
/Mikael
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Thu May 13, 2010 10:12 pm |
|
|
|
Guest
|
Steve Vinoski writes:
> We've recently experienced a 100% repeatable case of R13B02-1 and
> R13B04 beam core dumps on a Montavista Linux system running on a
> Cavium Octeon processor. This is technically not a problem with the
> Erlang VM but rather is a case where certain socket-related calls in
> this version of Montavista Linux, based on a 2.6.21 Linux kernel, are
> indicating failure by returning negative numbers with large absolute
> values, such as negative 0x40000, rather than returning -1 to indicate
> failure as they should. The Erlang TCP code expects system calls to
> return -1 to indicate errors and compares directly against that value,
> so it ends up treating these negative return values as successful and
> adds them to internal pointers as bytes written, offsets, etc. This in
> turn corrupts these pointers and causes beam to dump core.
>
> This patch introduces a portability macro to test for system call
> failure within inet_drv.c:
>
> git fetch git://github.com/vinoski/otp.git socket_error_portability
I'm not happy with this patch.
First, you're changing (weakening) error checking on all non-Win32
platforms, not just the apparently horribly broken MVL. At a minimum
the new IS_SOCKET_ERROR() macro should be written as
#if <whatever the test for Montavista's butchered Linux is>
#define IS_SOCKET_ERROR(val) ((val) < 0)
#else
#define IS_SOCKET_ERROR(val) ((val) == SOCKET_ERROR)
#endif
Second, the change needs a big comment. Otherwise someone who's ever
read the POSIX or SuS specs will want to clean it up and turn it back
to the original, not realising the reason for the out-of-spec error check.
(Personally I'd rather wrap the broken socket calls with functions that
convert them to sane APIs, in this case just fix up error returns, than
to force all users to adapt to the breakage. But lots of code in erts
could do with that sort of cleanup, so I'm not going to complain if you
don't want to do it that way.)
/Mikael
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Thu May 13, 2010 11:55 pm |
|
|
|
Guest
|
On Thu, May 13, 2010 at 6:12 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
> Steve Vinoski writes:
> |
|
|
| Back to top |
|
| Guest |
Posted: Thu May 13, 2010 11:56 pm |
|
|
|
Guest
|
On Thu, May 13, 2010 at 6:12 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
> Steve Vinoski writes:
> |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 14, 2010 8:27 am |
|
|
|
Guest
|
I agree with Steve's points.
In the Erlang/OTP team, we usually don't use the
term "broken platform". A platform may be broken
according to some definition of "broken", but if a
customer has chosen to use a certain platform
and Erlang/OTP does not work well on it, the
customer will blame us, not the platform. Pointing
out to the customer that their chosen platform
is "broken" will not do any good.
The patch looks fine to me. (Except that the
definition of SOCKET_ERROR should be removed
since it is presumably no longer used.)
Before accepting it, we will need to consider whether
a successful return value could be interpreted as
an error. That is, will any of the socket functions
return a negative error that does not indicate an
error?
Looking at a few of the calls:
The connect() call return 0 if successful, otherwise
-1. It seems to me that if an implementation would
return for instance -2 to mean success, it would be
more horribly broken than Montavista Linux, so that
should be safe. (Alternatively, in this case, it would
be possible to test for 0, using another macro.)
listen() has the same return values as connect().
The man page for accept() says that it returns
a non-negative integer if it succeeds, so that should
be safe.
It seems that the send() family of system calls can
return a negative value that is a success, but only
if passed a sufficiently large buffer size so that
it will become negative when casted to a signed
integer. These functions could need a little
investigation to either show that it cannot happen
in practice, or do something about it so that it
cannot happen in practice (perhaps the error
should be checked in the old way for those
calls if those calls are not known for returning
strange error codes).
Steve, I suggest that you revise your commit
message to summarize the arguments in your
email and also a justification why the change
is safe (similar to what I have done in this
email, but expanding on it).
Putting that information in the commit message
has two advantages: It will be easier for us to
review the patch since all relevant information
can be found in one place and if anyone in the
future stumbles upon this commit and wonders
why it was made, (s)he will immediately find all
information in one place and will not have to
waste time searching through the mailing
list archives.
--
Björn Gustavsson, Erlang/OTP, Ericsson AB
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 14, 2010 8:28 am |
|
|
|
Guest
|
I agree with Steve's points.
In the Erlang/OTP team, we usually don't use the
term "broken platform". A platform may be broken
according to some definition of "broken", but if a
customer has chosen to use a certain platform
and Erlang/OTP does not work well on it, the
customer will blame us, not the platform. Pointing
out to the customer that their chosen platform
is "broken" will not do any good.
The patch looks fine to me. (Except that the
definition of SOCKET_ERROR should be removed
since it is presumably no longer used.)
Before accepting it, we will need to consider whether
a successful return value could be interpreted as
an error. That is, will any of the socket functions
return a negative error that does not indicate an
error?
Looking at a few of the calls:
The connect() call return 0 if successful, otherwise
-1. It seems to me that if an implementation would
return for instance -2 to mean success, it would be
more horribly broken than Montavista Linux, so that
should be safe. (Alternatively, in this case, it would
be possible to test for 0, using another macro.)
listen() has the same return values as connect().
The man page for accept() says that it returns
a non-negative integer if it succeeds, so that should
be safe.
It seems that the send() family of system calls can
return a negative value that is a success, but only
if passed a sufficiently large buffer size so that
it will become negative when casted to a signed
integer. These functions could need a little
investigation to either show that it cannot happen
in practice, or do something about it so that it
cannot happen in practice (perhaps the error
should be checked in the old way for those
calls if those calls are not known for returning
strange error codes).
Steve, I suggest that you revise your commit
message to summarize the arguments in your
email and also a justification why the change
is safe (similar to what I have done in this
email, but expanding on it).
Putting that information in the commit message
has two advantages: It will be easier for us to
review the patch since all relevant information
can be found in one place and if anyone in the
future stumbles upon this commit and wonders
why it was made, (s)he will immediately find all
information in one place and will not have to
waste time searching through the mailing
list archives.
--
Björn Gustavsson, Erlang/OTP, Ericsson AB
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 14, 2010 8:37 pm |
|
|
|
Guest
|
|
| Back to top |
|
| Guest |
Posted: Fri May 14, 2010 8:38 pm |
|
|
|
Guest
|
|
| Back to top |
|
| Guest |
Posted: Sat May 15, 2010 8:27 am |
|
|
|
Guest
|
2010/5/14 Steve Vinoski <vinoski@ieee.org>:
>
> Good point. I've amended my commit to include this information.
Excellent! Included in 'pu'.
--
Björn Gustavsson, Erlang/OTP, Ericsson AB
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Sat May 15, 2010 8:27 am |
|
|
|
Guest
|
2010/5/14 Steve Vinoski <vinoski@ieee.org>:
>
> Good point. I've amended my commit to include this information.
Excellent! Included in 'pu'.
--
Björn Gustavsson, Erlang/OTP, Ericsson AB
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Tue May 18, 2010 1:26 pm |
|
|
|
Guest
|
2010/5/15 Björn Gustavsson <bgustavsson@gmail.com>:
> 2010/5/14 Steve Vinoski <vinoski@ieee.org>:
>>
>> Good point. I've amended my commit to include this information.
>
> Excellent! Included in 'pu'.
Unfortunately this new version does not build:
drivers/common/inet_drv.c: In function ‘inet_fill_opts’:
drivers/common/inet_drv.c:5646: error: ‘SOCKET_ERROR’ undeclared
(first use in this function)
drivers/common/inet_drv.c:5646: error: (Each undeclared identifier is
reported only once
drivers/common/inet_drv.c:5646: error: for each function it appears in.)
The build problem exposes a place where the new IS_SOCKET_ERROR()
macro is not used.
Another thing is that the Windows version of the macro should still
use SOCKET_ERROR:
#define IS_SOCKET_ERROR(val) ((val) == SOCKET_ERROR)
since SOCKET_ERROR is defined in a header file on Windows.
Because of the build error, Raimo has taken out the branch
from the pu branch. He will re-instate it as soon as you have
fixed the build problem.
--
Björn Gustavsson, Erlang/OTP, Ericsson AB
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Tue May 18, 2010 1:26 pm |
|
|
|
Guest
|
2010/5/15 Björn Gustavsson <bgustavsson@gmail.com>:
> 2010/5/14 Steve Vinoski <vinoski@ieee.org>:
>>
>> Good point. I've amended my commit to include this information.
>
> Excellent! Included in 'pu'.
Unfortunately this new version does not build:
drivers/common/inet_drv.c: In function ‘inet_fill_opts’:
drivers/common/inet_drv.c:5646: error: ‘SOCKET_ERROR’ undeclared
(first use in this function)
drivers/common/inet_drv.c:5646: error: (Each undeclared identifier is
reported only once
drivers/common/inet_drv.c:5646: error: for each function it appears in.)
The build problem exposes a place where the new IS_SOCKET_ERROR()
macro is not used.
Another thing is that the Windows version of the macro should still
use SOCKET_ERROR:
#define IS_SOCKET_ERROR(val) ((val) == SOCKET_ERROR)
since SOCKET_ERROR is defined in a header file on Windows.
Because of the build error, Raimo has taken out the branch
from the pu branch. He will re-instate it as soon as you have
fixed the build problem.
--
Björn Gustavsson, Erlang/OTP, Ericsson AB
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Tue May 18, 2010 7:52 pm |
|
|
|
Guest
|
|
| Back to top |
|
|
|
All times are GMT
Page 1 of 2
Goto page 1, 2 Next
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
|