Erlang/OTP Forums

Author Message

<  Erlang patches mailing list  ~  Sendfile in erlang

enano at fi.udc.es
Posted: Fri Nov 14, 2003 11:34 am Reply with quote
Guest
Hi,

I'm not sure whether the proper list is e-questions or e-patches, but this
is a small patch anyway.

This draft patch adds a sendfile() interface to Erlang. Sendfile(2) is a
system call present in Linux 2.2, AIX 5 and later kernels, and similar
interfaces are present in recent versions of Solaris and possibly other
Unices. The usual loop of read()ing from a file and then send()ing to a
socket or write()ing to another file has an unnecesarily large overhead:
copying data from kernel space to user space on read, and then back to
kernel space again on write() or send(). Besides, if we are reading from
Erlang, that means getting all those data chunks into the erlang runtime
memory management system only to get them out again immediately and then
GC them sometime in the future. Very often (think of a web or file server)
our program has no use for that read data except sending it out again.

Sendfile(f,t,o,c) simply instructs the kernel (the OS kernel, not
$ROOTDIR/lib/kernel) to read c bytes at offset o of file descriptor f and
write them again to file descriptor t. No data is moved to/from user
space.


ObPerfData: a cycle of file:read() and gen_tcp:send() moving 4KB chunks
over 1000Base-T between 1GHz Pentium3 machines sustains a throughput of
about 55Mbps. A cycle of file:sendfile() calls sustains over 410Mbps down
the pipe. Make sure you have a well supported network card before trying.

The patch is for testing purposes - I'd be glad to hear comments. I have
kept the kernel sendfile semantics: it may write less bytes than
requested, just like the send(2) syscall; return value is {ok, SentBytes}
or {error, Reason}. Maybe it would be more polite to behave like
gen_tcp:send instead and make sure all data is sent, or else return an
error. More ugly details: it needs the socket descriptor *number*, so for
now you have to call the undocumented function get_fd in prim_inet. An
example:

{ok,From}=file:open(Filename,[read,raw]),
{ok,Sock}=gen_tcp:connect(Host,Port,[binary,{packet,0}]),
{ok,SockFD}=prim_inet:getfd(Sock),
{ok,Sent}=file:sendfile(From,SockFD,Pos,Block),


No guarantees, backup first, parachute not included, etc.

Regards,

Miguel


Post generated using Mail2Forum (http://m2f.sourceforge.net)
sean.hinde at mac.com
Posted: Fri Nov 14, 2003 1:49 pm Reply with quote
Guest
On Friday, November 14, 2003, at 11:27 am, Miguel Barreiro wrote:

> This draft patch adds a sendfile() interface to Erlang. Sendfile(2) is
> a
> system call present in Linux 2.2, AIX 5 and later kernels, and similar
> interfaces are present in recent versions of Solaris and possibly other
> Unices. The usual loop of read()ing from a file and then send()ing to a
> socket or write()ing to another file has an unnecesarily large
> overhead:
> copying data from kernel space to user space on read, and then back to
> kernel space again on write() or send(). Besides, if we are reading
> from
> Erlang, that means getting all those data chunks into the erlang
> runtime
> memory management system only to get them out again immediately and
> then
> GC them sometime in the future. Very often (think of a web or file
> server)
> our program has no use for that read data except sending it out again.
>
> Sendfile(f,t,o,c) simply instructs the kernel (the OS kernel, not
> $ROOTDIR/lib/kernel) to read c bytes at offset o of file descriptor f
> and
> write them again to file descriptor t. No data is moved to/from user
> space.

This looks like an excellent addition to Erlang. I'd fully support this
being adopted by the OTP team.

> ObPerfData: a cycle of file:read() and gen_tcp:send() moving 4KB chunks
> over 1000Base-T between 1GHz Pentium3 machines sustains a throughput of
> about 55Mbps. A cycle of file:sendfile() calls sustains over 410Mbps
> down
> the pipe. Make sure you have a well supported network card before
> trying.

The 55Mbps matches well with my measurements on a 1GHz PPC. We would
use this tomorrow if it were beefed up with some of your suggestions
and the normal OTP extra safe semantics.

Brilliant!

Sean



Post generated using Mail2Forum (http://m2f.sourceforge.net)
luke at bluetail.com
Posted: Fri Nov 14, 2003 3:37 pm Reply with quote
Guest
Miguel Barreiro <enano_at_fi.udc.es> writes:

> ObPerfData: a cycle of file:read() and gen_tcp:send() moving 4KB chunks
> over 1000Base-T between 1GHz Pentium3 machines sustains a throughput of
> about 55Mbps. A cycle of file:sendfile() calls sustains over 410Mbps down
> the pipe. Make sure you have a well supported network card before trying.

Great stuff!

Could you please post your benchmark programs too? I'm curious to try
the unoptimised version in Oprofile (best program of the year) and see
what kills the performance - user/kernel copies, context switches,
erlang GC, select(), etc. If I remember correctly, Per Bergqvist was
sending 10Mbps through Erlang on Celerons with only a fraction of the
CPU with the kpoll'ified emulator.

I've long suspected that one could move pretty much the whole network
stack into userspace without much performance loss, if you just chose
the interface well. I'm interested to find out if this is bollocks :-)

P.S., Oprofile is at http://oprofile.sourceforge.net/. It is a
whole-system profiler for Linux that will simultaneously profile
*everything* on the whole system, including all applications, kernel
interrupt handlers, etc. Completely amazing.

Cheers,
Luke



Post generated using Mail2Forum (http://m2f.sourceforge.net)
ulf.wiger at ericsson.com
Posted: Fri Nov 14, 2003 3:43 pm Reply with quote
Guest
On 14 Nov 2003 16:36:58 +0100, Luke Gorrie <luke_at_bluetail.com> wrote:

> I've long suspected that one could move pretty much the whole network
> stack into userspace without much performance loss, if you just chose
> the interface well. I'm interested to find out if this is bollocks :-)

My wet dream is that one should always start by developing a reference
implementation of any given protocol in Erlang. Then -- only if performance
is not good enough -- implement (or buy) one in C. The Erlang-based
implementation will help you understand the protocol fully, can serve
as an education and testing tool, and should eventually (this should be
a goal for the development of Erlang) be the preferred implementation
to use in your commercial product.

/Uffe

--
Ulf Wiger, Senior System Architect
EAB/UPD/S


Post generated using Mail2Forum (http://m2f.sourceforge.net)
luke at bluetail.com
Posted: Fri Nov 14, 2003 4:14 pm Reply with quote
Guest
Ulf Wiger <ulf.wiger_at_ericsson.com> writes:

> My wet dream is that one should always start by developing a reference
> implementation of any given protocol in Erlang. Then -- only if performance
> is not good enough -- implement (or buy) one in C. The Erlang-based
> implementation will help you understand the protocol fully, can serve
> as an education and testing tool, and should eventually (this should be
> a goal for the development of Erlang) be the preferred implementation
> to use in your commercial product.

Did this just the other month when building a "distributed ethernet
switch" out of Linux boxes. There's already a switch in Linux
('bridge' module), we just needed the "distributed" part. No worries -
wrote a virtual network device in Erlang with the 'tuntap' application
from Jungerl. To Linux it looks like a network card, but frames
sent/received just go to Erlang - which tunnels them over UDP between
other nodes.

Ultimately we did want more performance - the bottleneck seemed to be
the user/kernel interface. But by then it was all very well
understood, and took one day to port the traffic code into a kernel
module.

Amazing every now and then when things go as they should. :-)

Dream-wise though, I would prefer to use shared-memory between user
and kernel space for packet buffers to avoid the copies and keep the
logic in userspace. Linux seems to already have features in this
direction.

-Luke



Post generated using Mail2Forum (http://m2f.sourceforge.net)
luke at bluetail.com
Posted: Fri Nov 14, 2003 4:25 pm Reply with quote
Guest
Luke Gorrie <lgorrie_at_nortelnetworks.com> writes:

> Dream-wise though, I would prefer to use shared-memory between user
> and kernel space for packet buffers to avoid the copies and keep the
> logic in userspace. Linux seems to already have features in this
> direction.

(Isn't performance speculation a dangerous business? Here I assume
it's _copying_ that's the bottleneck, which I have no measurements
what-so-ever to back up.)



Post generated using Mail2Forum (http://m2f.sourceforge.net)
sean.hinde at mac.com
Posted: Fri Nov 14, 2003 4:47 pm Reply with quote
Guest
On Friday, November 14, 2003, at 03:36 pm, Luke Gorrie wrote:

> Miguel Barreiro <enano_at_fi.udc.es> writes:
>
>> ObPerfData: a cycle of file:read() and gen_tcp:send() moving 4KB
>> chunks
>> over 1000Base-T between 1GHz Pentium3 machines sustains a throughput
>> of
>> about 55Mbps. A cycle of file:sendfile() calls sustains over 410Mbps
>> down
>> the pipe. Make sure you have a well supported network card before
>> trying.
>
> Great stuff!
>
> Could you please post your benchmark programs too? I'm curious to try
> the unoptimised version in Oprofile (best program of the year) and see
> what kills the performance - user/kernel copies, context switches,
> erlang GC, select(), etc. If I remember correctly, Per Bergqvist was
> sending 10Mbps through Erlang on Celerons with only a fraction of the
> CPU with the kpoll'ified emulator.

That would be superb. I was at a loss in my testing to see what was
making things slow. Klacke mentioned to me sometime that he was getting
much greater throughput once upon a time so I just put this down to LAN
congestion..

Please share any results you get

Thanks,
Sean



Post generated using Mail2Forum (http://m2f.sourceforge.net)
enano at fi.udc.es
Posted: Sat Nov 15, 2003 12:59 pm Reply with quote
Guest
Hi,

I'm not sure whether the proper list is e-questions or e-patches, but this
is a small patch anyway.

This draft patch adds a sendfile() interface to Erlang. Sendfile(2) is a
system call present in Linux 2.2, AIX 5 and later kernels, and similar
interfaces are present in recent versions of Solaris and possibly other
Unices. The usual loop of read()ing from a file and then send()ing to a
socket or write()ing to another file has an unnecesarily large overhead:
copying data from kernel space to user space on read, and then back to
kernel space again on write() or send(). Besides, if we are reading from
Erlang, that means getting all those data chunks into the erlang runtime
memory management system only to get them out again immediately and then
GC them sometime in the future. Very often (think of a web or file server)
our program has no use for that read data except sending it out again.

Sendfile(f,t,o,c) simply instructs the kernel (the OS kernel, not
$ROOTDIR/lib/kernel) to read c bytes at offset o of file descriptor f and
write them again to file descriptor t. No data is moved to/from user
space.


ObPerfData: a cycle of file:read() and gen_tcp:send() moving 4KB chunks
over 1000Base-T between 1GHz Pentium3 machines sustains a throughput of
about 55Mbps. A cycle of file:sendfile() calls sustains over 410Mbps down
the pipe. Make sure you have a well supported network card before trying.

The patch is for testing purposes - I'd be glad to hear comments. I have
kept the kernel sendfile semantics: it may write less bytes than
requested, just like the send(2) syscall; return value is {ok, SentBytes}
or {error, Reason}. Maybe it would be more polite to behave like
gen_tcp:send instead and make sure all data is sent, or else return an
error. More ugly details: it needs the socket descriptor *number*, so for
now you have to call the undocumented function get_fd in prim_inet. An
example:

{ok,From}=file:open(Filename,[read,raw]),
{ok,Sock}=gen_tcp:connect(Host,Port,[binary,{packet,0}]),
{ok,SockFD}=prim_inet:getfd(Sock),
{ok,Sent}=file:sendfile(From,SockFD,Pos,Block),


No guarantees, backup first, parachute not included, etc.

Regards,

Miguel


Post generated using Mail2Forum (http://m2f.sourceforge.net)
sean.hinde at mac.com
Posted: Sat Nov 15, 2003 1:01 pm Reply with quote
Guest
On Friday, November 14, 2003, at 11:27 am, Miguel Barreiro wrote:

> This draft patch adds a sendfile() interface to Erlang. Sendfile(2) is
> a
> system call present in Linux 2.2, AIX 5 and later kernels, and similar
> interfaces are present in recent versions of Solaris and possibly other
> Unices. The usual loop of read()ing from a file and then send()ing to a
> socket or write()ing to another file has an unnecesarily large
> overhead:
> copying data from kernel space to user space on read, and then back to
> kernel space again on write() or send(). Besides, if we are reading
> from
> Erlang, that means getting all those data chunks into the erlang
> runtime
> memory management system only to get them out again immediately and
> then
> GC them sometime in the future. Very often (think of a web or file
> server)
> our program has no use for that read data except sending it out again.
>
> Sendfile(f,t,o,c) simply instructs the kernel (the OS kernel, not
> $ROOTDIR/lib/kernel) to read c bytes at offset o of file descriptor f
> and
> write them again to file descriptor t. No data is moved to/from user
> space.

This looks like an excellent addition to Erlang. I'd fully support this
being adopted by the OTP team.

> ObPerfData: a cycle of file:read() and gen_tcp:send() moving 4KB chunks
> over 1000Base-T between 1GHz Pentium3 machines sustains a throughput of
> about 55Mbps. A cycle of file:sendfile() calls sustains over 410Mbps
> down
> the pipe. Make sure you have a well supported network card before
> trying.

The 55Mbps matches well with my measurements on a 1GHz PPC. We would
use this tomorrow if it were beefed up with some of your suggestions
and the normal OTP extra safe semantics.

Brilliant!

Sean



Post generated using Mail2Forum (http://m2f.sourceforge.net)
luke at bluetail.com
Posted: Sat Nov 15, 2003 1:01 pm Reply with quote
Guest
Miguel Barreiro <enano_at_fi.udc.es> writes:

> ObPerfData: a cycle of file:read() and gen_tcp:send() moving 4KB chunks
> over 1000Base-T between 1GHz Pentium3 machines sustains a throughput of
> about 55Mbps. A cycle of file:sendfile() calls sustains over 410Mbps down
> the pipe. Make sure you have a well supported network card before trying.

Great stuff!

Could you please post your benchmark programs too? I'm curious to try
the unoptimised version in Oprofile (best program of the year) and see
what kills the performance - user/kernel copies, context switches,
erlang GC, select(), etc. If I remember correctly, Per Bergqvist was
sending 10Mbps through Erlang on Celerons with only a fraction of the
CPU with the kpoll'ified emulator.

I've long suspected that one could move pretty much the whole network
stack into userspace without much performance loss, if you just chose
the interface well. I'm interested to find out if this is bollocks :-)

P.S., Oprofile is at http://oprofile.sourceforge.net/. It is a
whole-system profiler for Linux that will simultaneously profile
*everything* on the whole system, including all applications, kernel
interrupt handlers, etc. Completely amazing.

Cheers,
Luke



Post generated using Mail2Forum (http://m2f.sourceforge.net)
ulf.wiger at ericsson.com
Posted: Sat Nov 15, 2003 1:01 pm Reply with quote
Guest
On 14 Nov 2003 16:36:58 +0100, Luke Gorrie <luke_at_bluetail.com> wrote:

> I've long suspected that one could move pretty much the whole network
> stack into userspace without much performance loss, if you just chose
> the interface well. I'm interested to find out if this is bollocks :-)

My wet dream is that one should always start by developing a reference
implementation of any given protocol in Erlang. Then -- only if performance
is not good enough -- implement (or buy) one in C. The Erlang-based
implementation will help you understand the protocol fully, can serve
as an education and testing tool, and should eventually (this should be
a goal for the development of Erlang) be the preferred implementation
to use in your commercial product.

/Uffe

--
Ulf Wiger, Senior System Architect
EAB/UPD/S



Post generated using Mail2Forum (http://m2f.sourceforge.net)
luke at bluetail.com
Posted: Sat Nov 15, 2003 1:02 pm Reply with quote
Guest
Ulf Wiger <ulf.wiger_at_ericsson.com> writes:

> My wet dream is that one should always start by developing a reference
> implementation of any given protocol in Erlang. Then -- only if performance
> is not good enough -- implement (or buy) one in C. The Erlang-based
> implementation will help you understand the protocol fully, can serve
> as an education and testing tool, and should eventually (this should be
> a goal for the development of Erlang) be the preferred implementation
> to use in your commercial product.

Did this just the other month when building a "distributed ethernet
switch" out of Linux boxes. There's already a switch in Linux
('bridge' module), we just needed the "distributed" part. No worries -
wrote a virtual network device in Erlang with the 'tuntap' application
from Jungerl. To Linux it looks like a network card, but frames
sent/received just go to Erlang - which tunnels them over UDP between
other nodes.

Ultimately we did want more performance - the bottleneck seemed to be
the user/kernel interface. But by then it was all very well
understood, and took one day to port the traffic code into a kernel
module.

Amazing every now and then when things go as they should. :-)

Dream-wise though, I would prefer to use shared-memory between user
and kernel space for packet buffers to avoid the copies and keep the
logic in userspace. Linux seems to already have features in this
direction.

-Luke



Post generated using Mail2Forum (http://m2f.sourceforge.net)
luke at bluetail.com
Posted: Sat Nov 15, 2003 1:02 pm Reply with quote
Guest
Luke Gorrie <lgorrie_at_nortelnetworks.com> writes:

> Dream-wise though, I would prefer to use shared-memory between user
> and kernel space for packet buffers to avoid the copies and keep the
> logic in userspace. Linux seems to already have features in this
> direction.

(Isn't performance speculation a dangerous business? Here I assume
it's _copying_ that's the bottleneck, which I have no measurements
what-so-ever to back up.)



Post generated using Mail2Forum (http://m2f.sourceforge.net)
sean.hinde at mac.com
Posted: Sat Nov 15, 2003 1:02 pm Reply with quote
Guest
On Friday, November 14, 2003, at 03:36 pm, Luke Gorrie wrote:

> Miguel Barreiro <enano_at_fi.udc.es> writes:
>
>> ObPerfData: a cycle of file:read() and gen_tcp:send() moving 4KB
>> chunks
>> over 1000Base-T between 1GHz Pentium3 machines sustains a throughput
>> of
>> about 55Mbps. A cycle of file:sendfile() calls sustains over 410Mbps
>> down
>> the pipe. Make sure you have a well supported network card before
>> trying.
>
> Great stuff!
>
> Could you please post your benchmark programs too? I'm curious to try
> the unoptimised version in Oprofile (best program of the year) and see
> what kills the performance - user/kernel copies, context switches,
> erlang GC, select(), etc. If I remember correctly, Per Bergqvist was
> sending 10Mbps through Erlang on Celerons with only a fraction of the
> CPU with the kpoll'ified emulator.

That would be superb. I was at a loss in my testing to see what was
making things slow. Klacke mentioned to me sometime that he was getting
much greater throughput once upon a time so I just put this down to LAN
congestion..

Please share any results you get

Thanks,
Sean



Post generated using Mail2Forum (http://m2f.sourceforge.net)
cpressey at catseye.mine.
Posted: Sat Nov 15, 2003 7:57 pm Reply with quote
Guest
On Fri, 14 Nov 2003 12:27:08 +0100 (CET)
Miguel Barreiro <enano_at_fi.udc.es> wrote:

>
> Hi,
>
> I'm not sure whether the proper list is e-questions or e-patches, but
> this is a small patch anyway.
>
> This draft patch adds a sendfile() interface to Erlang.

Yowza! :)

> Sendfile(2) is a
> system call present in Linux 2.2, AIX 5 and later kernels, and similar
> interfaces are present in recent versions of Solaris and possibly
> other Unices.

Yes, it's in FreeBSD as well. I've adapted the patch for FreeBSD; it
builds alright, but I haven't had time to test it yet. I've included it
in an experimental & unofficial port skeleton for Erlang, which can be
found at:

http://catseye.webhop.net/freebsd/ports/lang/erlang/

-Chris


Post generated using Mail2Forum (http://m2f.sourceforge.net)

Display posts from previous:  

All times are GMT
Page 1 of 2
Goto page 1, 2  Next
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum