Erlang Mailing Lists

Author Message

<  Erlang questions mailing list  ~  Fwd: data sharing is outside the semantics of Erlang, but

tonyrog
Posted: Tue Sep 15, 2009 9:41 pm Reply with quote
User Joined: 12 Sep 2009 Posts: 43
Forgot the "Reply All" ....

Begin forwarded message:

> From: Tony Rogvall <tony@rogvall.se>
> Date: ti 15 sep 2009 22.36.32 GMT+02:00
> To: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
> Subject: Re: [erlang-questions] Re: data sharing is outside the
> semantics of Erlang, but it sure is useful
>
> Here is nice term encoder/decoder (below) that will solve the size
> problem.
> The algorithm is not efficient (in Erlang) but if/when implemented
> in C it could be used as a bif.
> Still it will produce decent results on (all) shared data.
>
> The encoding of Y9 (from previous example) is then:
>
> te:encode(Y9).
>
> {2,
> {1,1,1,1},
> {2,2,2,2},
> {3,3,3,3},
> {4,4,4,4},
> {5,5,5,5},
> {6,6,6,6},
> {7,7,7,7},
> {8,8,8,8},
> {9,9,9,9},
> {10,10,10,10},
> 11}
>
> That when encoded with term_to_binary lead to a binary of size 107.
> If coded straight it was 2796203 bytes!!!
>
> Other examples that reveal some properties of the algorithm:
>
> te:encode({[a,b],[a,b,c],[a,b,c,d]}).
>
> {a,b,c,d,[],
> [4|5],
> [3|6],
> [2|7],
> [1|8],
> [3|5],
> [2|10],
> [1|11],
> [2|5],
> [1|13],
> {14,12,9},
> 15}
>
> te:encode({[b,a],[c,b,a],[d,c,b,a]}).
>
> {d,c,b,a,[],[4|5],[3|6],[2|7],[1|8],{7,8,9},10}
>
>
> applied to code (parse trees) you will be able to detect all common
> sub expressions
> (renaming bindings and removing line numbers helps a lot Wink
>
> /Tony
>
> -------------
> -module(te).
>
> -export([encode/1, decode/1]).
> -export([binary_to_term/1, term_to_binary/1]).
>
> %%
> %% Decode a term given the pointer map
> %%
> binary_to_term(Bin) ->
> decode(erlang:binary_to_term(Bin)).
>
> decode(Map) ->
> decode(1,Map).
>
> %% When implement this in C the setelement is of course destructive
> decode(I,Map) ->
> E = element(I,Map),
> if I == size(Map) ->
> element(E,Map);
> is_list(E), E =/= [] ->
> H = element(hd(E),Map),
> T = element(tl(E),Map),
> decode(I+1, setelement(I, Map, [H|T]));
> is_tuple(E), size(E) > 0 ->
> T = decode_tuple(E,size(E),Map,[]),
> decode(I+1, setelement(I, Map, T));
> true ->
> decode(I+1, Map)
> end.
>
>
> decode_tuple(_T,0,_Map,Acc) ->
> list_to_tuple(Acc);
> decode_tuple(T,I,Map,Acc) ->
> Ei = element(I,T),
> E = element(Ei,Map),
> decode_tuple(T,I-1,Map,[E|Acc]).
>
> %%
> %% Do term compression/encoding
> %% Encode bottom up while assinging pointer index
> %%
> term_to_binary(Term) ->
> erlang:term_to_binary(encode(Term)).
>
> encode(Term) ->
> {E,_K,_D, S} = encode(Term, 1, dict:new(), []),
> list_to_tuple(lists:reverse([E|S])).
>
> encode(T, K, D, S) ->
> if is_list(T), T =/= [] ->
> {He,K1,D1,S1} = encode(hd(T), K, D, S),
> {Te,K2,D2,S2} = encode(tl(T), K1, D1, S1),
> insert([He|Te], K2, D2, S2);
> is_tuple(T), size(T) > 0 ->
> {Te,K1,D1,S1} = encode_tuple(T,size(T),K,D,S,[]),
> insert(Te, K1, D1, S1);
> true ->
> insert(T, K, D, S)
> end.
>
> encode_tuple(_T, 0, K, D, S, Acc) ->
> {list_to_tuple(Acc), K, D, S};
> encode_tuple(T, I, K, D, S, Acc) ->
> {E,K1,D1,S1} = encode(element(I,T), K, D, S),
> encode_tuple(T,I-1, K1, D1,S1,[E|Acc]).
>
>
> %% lookup and maybe insert
> insert(Encoded, K, D, S) ->
> try dict:fetch(Encoded, D) of
> Ke -> {Ke,K,D,S}
> catch
> error:_ ->
> De = dict:store(Encoded, K, D),
> {K,K+1,De,[Encoded|S]}
> end.
> -------------------
>
>
> On 15 sep 2009, at 21.40, Witold Baryluk wrote:
>
>> Dnia 2009-09-14, pon o godzinie 15:36 -0500, James Hague pisze:
>>>> I am missing something here. gb_sets (nor sets, ordsets, rbsets)
>>>> does not
>>>> make a copy of the data which is put into the set. All that is
>>>> copied is
>>>> enough of the *tree* to insert the new element. There is no need
>>>> to copy the
>>>> new data as it is kept within the same process. Only ets makes a
>>>> copy of the
>>>> data.
>>>
>>> Let's say you've got a long list of strings. Many of them
>>> duplicates.
>>> You don't just want to remove the duplicates because that will
>>> change
>>> the length of the list. The goals is to ensure that identical
>>> strings
>>> are shared, so there's only one copy in memory. What's a practical
>>> way of doing that?
>>>
>>> This is irrelevant most of the time, but there are some situations
>>> where it's a huge win.
>>>
>>> (My solution was to build a new list by adding each element to a
>>> binary tree. If a string is already in the tree, return the version
>>> that's already there (which is not something that gb_sets does). In
>>> the resulting list, elements are shared as much as possible. I'm
>>> clearly taking advantage of how the runtime works, but it shrunk the
>>> heap size by tens of megabytes.)
>>
>> It looks for me as quite good solution, you depend on memory
>> saving, so
>> explicitly manages to have shared strings:
>>
>> Any other way i can see is to have this add/retrivial from binary
>> tree,
>> to be performed implicitly on every step (via some sort of hashing)
>> of
>> computation to ensure that identical elements are only once in memory
>> (of one process). But this have really big performance problems i
>> think,
>> and most of times you really don't need perfect sharing.
>>
>> You can limit this "hashing" to only terms which are complex enaugh,
>> like long lists, or nested tuples. But still this is very costly.
>>
>> If you want to have some term (or list of many terms) and shrink
>> them,
>> by some magic, you can using following pseudo code:
>>
>> maximize_sharing(Term) ->
>> Tree = tree:new(),
>> NewTree = iterate_over_every_subterm_and_add_first_to_tree(Term),
>> NewTerm = iterate_over_every_subterm_and_find_in_tree(Term, NewTree).
>>
>>
>>
>> Lack of sharing is also problematic when sending term_to_binary data,
>> because term_to_binary essentially flattens term be repeatedly
>> copying
>> data. It would be nice to have term_to_binary which doesn't copies
>> data
>> more than once, which already is in the same term.
>>
>> Ie.
>>> erlang:process_info(self(), heap_size).
>> {heap_size,1597}
>>
>>> X = {2,2,2,2},
>> Y1 = {X,X,X,X},
>> Y2 = {Y1,Y1,Y1,Y1},
>> Y3 = {Y2,Y2,Y2,Y2},
>> Y4 = {Y3,Y3,Y3,Y3},
>> Y5 = {Y4,Y4,Y4,Y4},
>> Y6 = {Y5,Y5,Y5,Y5},
>> Y7 = {Y6,Y6,Y6,Y6},
>> Y8 = {Y7,Y7,Y7,Y7},
>> Y9 = {Y8,Y8,Y8,Y8},
>> ok.
>>
>>> erlang:process_info(self(), heap_size).
>> {heap_size,4181}
>>
>>>
>>
>> In memory Y9 will be pretty small, and whole process memory
>> consumption
>> will be small, but term_to_binary(Y9) (binary with 2MB of data, but
>> not
>> counting to the heap_size), or just displaying using
>> io_lib:format("~w",
>> [Y9]) (2MB of data after flattening resulting list) or sending it to
>> different node will be disaster.
>>
>>> erlang:process_info(self(), heap_size). % after io_lib:format(), ok.
>> {heap_size,9707190}
>>
>>
>> I don't know how it affects sending Y9 to process on the same node.
>> As we know Y9 need to be copied to heap of the destination process
>> (because sharing heaps between process destroy soft real-time
>> semantic
>> of garbage collection, but improves other performance metrics for big
>> messages). But is it copied smart enough?
>>
>>
>>> P = spawn(fun() -> P = receive P2 -> P2 end, io:format("~p ~n ~p
>>> ~n",
>> [erlang:process_info(self(), heap_size), size(P)]), ok end), P !
>> Y9, ok.
>>
>> {heap_size,2103540}
>>>
>>
>> Unfortunately no.
>>
>> First good step will be to have version of term_to_binary which
>> preserves sharing of (already shared) subterms (term_to_binary have
>> second argument for options, like minorversion or gzip
>> compression), and
>> binary_to_term which understand how to unpack them preserving sharing
>> (also across
>> multiple process in the same node, or compact storage in file/db).
>>
>>
>> --
>> Witold Baryluk
>



Post received from mailinglist
View user's profile Send private message
wuji
Posted: Mon Aug 27, 2012 4:28 am Reply with quote
User Joined: 10 Aug 2012 Posts: 654
into the air and one into the ground.But the gunfire gunfire Cheap Ralph Lauren Shirts gunfire did not not scare the chimps, and when they
Cussons, he got back in his car, he said. Nikki Nikki cheap polo shirts Nikki jumped on the hood of the car and started
the windshield.Cussons said he fired one round through the glass glass Cheap Ralph Lauren Shirts glass and injured the chimp. The other chimpanzee immediately became
and the chimps moved away, allowing Cussons and first responders responders cheap Ralph Lauren responders to get to Oberle, he said.First responder Lloyd Krause
when he got to Oberle, the University of Texas San San cheap Ralph Lauren Polo San Antonio student was stripped down and the only way
knew he was alive was that his chest was moving."The moving."The [h4]cheap designer *beep*[/h4] moving."The chimps were still out there. ... He was curled
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum