Erlang Mailing Lists

Author Message

<  Erlang questions mailing list  ~  Is it possible to align binary's byte array to cache line bo

Guest
Posted: Thu Sep 15, 2011 7:22 am Reply with quote
Guest
It seems not supported in current VM, right?

Post received from mailinglist
Guest
Posted: Sat Sep 17, 2011 1:08 pm Reply with quote
Guest
On Thu, Sep 15, 2011 at 09:21, max tan <maxtqm@gmail.com> wrote:
> It seems not supported in current VM, right?

If it is supported, it will be automatic and done for all binaries.
Take a look at the allocator used for allocation of binaries, since it
is a different one than the one used for allocation of process heaps.

Why do you need this guarantee?


--
J.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist
Guest
Posted: Sat Sep 17, 2011 4:30 pm Reply with quote
Guest
>
> On Thu, Sep 15, 2011 at 09:21, max tan <maxtqm@gmail.com> wrote:
> > It seems not supported in current VM, right?
>
> If it is supported, it will be automatic and done for all binaries.
> Take a look at the allocator used for allocation of binaries, since it
> is a different one than the one used for allocation of process heaps.
I saw in BEAM's sources, that when the binary's size is 64 bytes or less,
BEAM will allocate a ErlHeapBin on local heap,

in "erl_binary.h":

typedef struct erl_heap_bin {
Eterm thing_word; /* Subtag HEAP_BINARY_SUBTAG. */
Uint size; /* Binary size in bytes. */
Eterm data[1]; /* The data in the binary. */
} ErlHeapBin;



in "binary.c":

/*
* Create a brand new binary from scratch.
*/

Eterm
new_binary(Process *p, byte *buf, Uint len)
{
ProcBin* pb;
Binary* bptr;

if (len <= ERL_ONHEAP_BIN_LIMIT) {
ErlHeapBin* hb = (ErlHeapBin *) HAlloc(p, heap_bin_size(len));
hb->thing_word = header_heap_bin(len);
hb->size = len;
if (buf != NULL) {
sys_memcpy(hb->data, buf, len);
}
return make_binary(hb);
}

................


So, I can be sure that BEAM will align ProcBin instead of binary "data".
When len <= ( cache_line_size - sizeof(Eterm) - sizeof(Uint) ), it's OK,
the whole ProcBin can be fit in 1 cache line. Otherwise, for example when
len = cache_line_size, the whole ProcBin will surely span 2 or more cache
lines.

>
> Why do you need this guarantee?
>
For example, my application reads/creates/stores/sends binaries of
64 bytes repeatedly, if I can be sure that the binaries is allocated along
cache line boundary ( which is normally 64/128 byte multiplied ), each
processing will need just 1 memory fetch, so cache-friendly, so I can
maximize the processing throughput.

>
> --
> J.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist
Guest
Posted: Sat Sep 17, 2011 8:52 pm Reply with quote
Guest
On Sat, Sep 17, 2011 at 18:30, max tan <maxtqm@gmail.com> wrote:

> I saw in BEAM's sources, that when the binary's size is 64 bytes or less,
> BEAM will allocate a ErlHeapBin on local heap,

If we limit ourselves to small binaries only, everything will be heap
allocated. The heap is garbage collected, and the size of the heap
will govern what kind of garbage collector there is in play here. If
the heap is small enough, all of it will be in cache and there will be
little penalty for pulling it in from the first level cache, one or
two cache lines needed.

Garbage collectors can some times outperform a manual memory
management routine, especially if you think about how much garbage you
create in your application. Unfortunately, I only have fairly old
papers which study the impact of garbage collectors quo
malloc()/free() style implementations. I dare not think they still
hold these days with multiple cores, cache coherency algorithms,
"shared memory" and new cache sizes. But for small heaps with few
objects living and most objects dead, all GCs will be in cache and run
fast as a result. Also note that messaging your binary is a 64 byte
copy (+ its header and other things - chances are that it can't fit
into a single cache line in the first place!).

So if your binaries are small, I'd rather worry about other things
than cahce alignment, since it is then in the hands of the GC. I think
that the mantra of Erlang is "mod out" such details in the program and
let the underlying VM handle it. If it proves to be too slow in
practice, you can always outsource the heavy work to a NIF, in which
you can get more explicit control over memory layout and use. In my
experience though, the binaries of Erlang have a performance that is
not too shabby to put it mildly. Depending on the problem you can
often frame it in ways that will lead to less binary allocation and
work - that is optimize algorithmically what is going on.

What I'd suggest is that you quantify how fast you need your
operations to be and then go measure and optimize if you can reach
that goal - and how many cores and nodes you need to reach it at that
speed. This will guide you if you need to worry about cache alignment
performance in the GC. Erlang is focused on being robust first,
correct second and speedy third. My experience is that it is often
_fast enough_ but it can't compete in raw CPU-bound kernel performance
besides Ocaml, Java, C or C++. But then you throw robustness
overboard.


--
J.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist
Guest
Posted: Thu Sep 22, 2011 3:49 pm Reply with quote
Guest
> I can maximize the processing throughput.
>
> I don't know of a dynamic language that gives you this control; be it PHP,
> JavaScript, Haskell or Erlang.

I just found Haskell has "Data.ByteString.Lazy", according to a paper titled
"Rewriting Haskell Strings":

A library for ByteStrings is implemented, providing a purely functional
interface, which approaches the speed of low-level mutable arrays in C.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist
Guest
Posted: Thu Sep 22, 2011 8:46 pm Reply with quote
Guest
On Thu, Sep 22, 2011 at 17:49, max tan <maxtqm@gmail.com> wrote:

>
> I just found Haskell has "Data.ByteString.Lazy", according to a paper titled
> "Rewriting Haskell Strings":
>
>   A library for ByteStrings is implemented,   providing a purely functional
> interface, which approaches the speed of low-level mutable arrays in C.
> _______________________________________________

This is true. In many ways, a Lazy ByteString in Haskell corresponds
to a list of binaries in Erlang, which is a subset of the iolist()
type in Erlang. If Haskells Lazy Bytestring implementation approaches
the speed of low-level mutable arrays, so does binaries in Erlang.

My experience is they perform at about the same speed. I don't know
what happened in my brain when I called Haskell a dynamic language
though. It is statically typed, and very much so.


--
J.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Post received from mailinglist
Guest
Posted: Sat Sep 24, 2011 5:36 pm Reply with quote
Guest
Maybe LazyByteStrings and Erlang binaries approach the performance of randomly aligned heap blocks in C.
 
But, if I do data streaming processing (say, transforming vertices for 3D graphcs, transcoding media in multimedia, detecting intersections in rigid body dynamics, etc) then the specific alignment and stride of my data matters. Neither Erlang nor Haskell allows me to specify that "the first byte of this byte string is always aligned on a multiple of 16 bytes" (for architecture-specific SIMD exploitation) or "the first byte of this byte string is always aligned on a multiple of 64 bytes" (for architecture-specific cache line exploitation) or "no word within this binary will share a L3 cache line with any other word allocated in the same manner" (for architecture-specific cache contention reduction for spinlocks).
 
C allows you to control these things. That's why it's a systems programming language, and Erlang isn't (although it's a language for programming systems Smile
 
Sincerely,
 
jw

--
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.




On Thu, Sep 22, 2011 at 1:46 PM, Jesper Louis Andersen <jesper.louis.andersen@gmail.com (jesper.louis.andersen@gmail.com)> wrote:
Quote:
On Thu, Sep 22, 2011 at 17:49, max tan <maxtqm@gmail.com (maxtqm@gmail.com)> wrote:

>
> I just found Haskell has "Data.ByteString.Lazy", according to a paper titled
> "Rewriting Haskell Strings":
>
>   A library for ByteStrings is implemented,   providing a purely functional
> interface, which approaches the speed of low-level mutable arrays in C.

> _______________________________________________

This is true. In many ways, a Lazy ByteString in Haskell corresponds
to a list of binaries in Erlang, which is a subset of the iolist()
type in Erlang. If Haskells Lazy Bytestring implementation approaches
the speed of low-level mutable arrays, so does binaries in Erlang.

My experience is they perform at about the same speed. I don't know
what happened in my brain when I called Haskell a dynamic language
though. It is statically typed, and very much so.


--
J.



Post received from mailinglist

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum