Erlang/OTP Forums

Author Message

<  Erlang bugs mailing list  ~  hipe crash with compiler modules

Guest
Posted: Tue Nov 03, 2009 9:43 pm Reply with quote
Guest
Hello,

I have been experiencing a random crash with hipe on FreeBSD 32bits
(R13B) and MacOS X 10.6 64bits (R13B01) when compiler modules have
been recompiled with native code.

The compiler modules have been recompiled with some code that goes
like this :

{_, Beam, Path} = code:get_object_code(Module),
{ok, _, Chunks} = beam_lib:all_chunks(Beam),
{ok, {Target, HipeBinary}} = hipe:compile(Module),
ChunkName = hipe_unified_loader:chunk_name(Target),
{ok, NewBeam} = beam_lib:build_module(Chunks ++
[{ChunkName, HipeBinary}]),

The crash happens when I compile several files (a dozen) at once with
a rpc:pmap. I believe the rpc:pmap is the reason why the crash happens
randomly. This is with an internal tool called erl_make. If I run
erl_make clean && erl_make install, I get a crash, but if I do
erl_make install; erl_make install, the second operation (almost
always) succeeds. Or sometimes, I need to run erl_make clean to
successfully compile with erl_make install.

The stack trace (on MacOS X) looks like this :

Thread 4 Crashed:
0 beam.smp 0x000000000055dc0f gensweep_nstack
+ 623
1 beam.smp 0x00000000004e5591 do_minor + 313
2 beam.smp 0x00000000004e4ef9 minor_collection
+ 547
3 beam.smp 0x00000000004e34f4
erts_garbage_collect + 590
4 beam.smp 0x00000000004e31de
erts_gc_after_bif_call + 153
5 beam.smp 0x000000000051acee process_main +
42816
6 beam.smp 0x000000000047a833
sched_thread_func + 357
7 beam.smp 0x000000000059ca27 thr_wrapper + 103
8 libSystem.B.dylib 0x00007fff86da4f66 _pthread_start +
331
9 libSystem.B.dylib 0x00007fff86da4e19 thread_start + 13

If all compiler beam files are replaced with the original ones (i.e.
without the hipe chunk), there is no crash. I couldn't single out a
compiler module that causes the crash. It looks like that if several
of them are native, the crash does happen.

I found a reference to a crash in gensweep_nstack in the archives :
http://erlang.org/pipermail/erlang-bugs/2008-December/001131.html

In this case, the code that gets compiled natively is just part of
OTP. Do you have any hint about what can be done to track down the bug ?

Paul
--
Semiocast http://titema.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris


________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org

Post received from mailinglist
Guest
Posted: Tue Nov 03, 2009 10:58 pm Reply with quote
Guest
Paul Guyot writes:
> Hello,
>
> I have been experiencing a random crash with hipe on FreeBSD 32bits
> (R13B) and MacOS X 10.6 64bits (R13B01) when compiler modules have
> been recompiled with native code.

64-bit native code on OSX has not been validated by the HiPE group,
so it is unsupported. 32-bit native code on OSX 10.5 seems to work,
but has been only very lightly tested by us.

> The compiler modules have been recompiled with some code that goes
> like this :
>
> {_, Beam, Path} = code:get_object_code(Module),
> {ok, _, Chunks} = beam_lib:all_chunks(Beam),
> {ok, {Target, HipeBinary}} = hipe:compile(Module),
> ChunkName = hipe_unified_loader:chunk_name(Target),
> {ok, NewBeam} = beam_lib:build_module(Chunks ++
> [{ChunkName, HipeBinary}]),

The proper way to compile modules is to pass 'native' as
an option to the BEAM compiler. I do not consider hipe:compile
or hipe_unified_loader:chunk_name to be public APIs.

So why do you do it in this awkward way?

> The crash happens when I compile several files (a dozen) at once with
> a rpc:pmap. I believe the rpc:pmap is the reason why the crash happens
> randomly. This is with an internal tool called erl_make. If I run
> erl_make clean && erl_make install, I get a crash, but if I do
> erl_make install; erl_make install, the second operation (almost
> always) succeeds. Or sometimes, I need to run erl_make clean to
> successfully compile with erl_make install.
>
> The stack trace (on MacOS X) looks like this :
>
> Thread 4 Crashed:
> 0 beam.smp 0x000000000055dc0f gensweep_nstack
> + 623
> 1 beam.smp 0x00000000004e5591 do_minor + 313
> 2 beam.smp 0x00000000004e4ef9 minor_collection
> + 547
> 3 beam.smp 0x00000000004e34f4
> erts_garbage_collect + 590
> 4 beam.smp 0x00000000004e31de
> erts_gc_after_bif_call + 153
> 5 beam.smp 0x000000000051acee process_main +
> 42816
> 6 beam.smp 0x000000000047a833
> sched_thread_func + 357
> 7 beam.smp 0x000000000059ca27 thr_wrapper + 103
> 8 libSystem.B.dylib 0x00007fff86da4f66 _pthread_start +
> 331
> 9 libSystem.B.dylib 0x00007fff86da4e19 thread_start + 13
>
> If all compiler beam files are replaced with the original ones (i.e.
> without the hipe chunk), there is no crash. I couldn't single out a
> compiler module that causes the crash. It looks like that if several
> of them are native, the crash does happen.
>
> I found a reference to a crash in gensweep_nstack in the archives :
> http://erlang.org/pipermail/erlang-bugs/2008-December/001131.html
>
> In this case, the code that gets compiled natively is just part of
> OTP. Do you have any hint about what can be done to track down the bug ?

There is a known problem with concurrent invokations of the HiPE compiler.
It looks like the serialization of code loading that the BEAM loader is
supposed to do isn't happening, or it is bypassed. This corrupts certain
runtime system data structures causing crashes during GC. I'm currently
trying to debug this problem.

________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org

Post received from mailinglist
Guest
Posted: Wed Nov 04, 2009 8:18 am Reply with quote
Guest
Hello Mikael,

Thank you for your reply.

> 64-bit native code on OSX has not been validated by the HiPE group,
> so it is unsupported. 32-bit native code on OSX 10.5 seems to work,
> but has been only very lightly tested by us.

I have been using the patches from MacPorts (http://trac.macports.org/browser/trunk/dports/lang/erlang/files/
), which I authored, so I realize they're not supported Smile

>> The compiler modules have been recompiled with some code that goes
>> like this :
>>
>> {_, Beam, Path} = code:get_object_code(Module),
>> {ok, _, Chunks} = beam_lib:all_chunks(Beam),
>> {ok, {Target, HipeBinary}} = hipe:compile(Module),
>> ChunkName = hipe_unified_loader:chunk_name(Target),
>> {ok, NewBeam} = beam_lib:build_module(Chunks ++
>> [{ChunkName, HipeBinary}]),
>
> The proper way to compile modules is to pass 'native' as
> an option to the BEAM compiler. I do not consider hipe:compile
> or hipe_unified_loader:chunk_name to be public APIs.
>
> So why do you do it in this awkward way?

These lines were inspired from what dialyzer does. My first goal was
to factorize the 1 or 2 minutes when dialyzer has to process more than
20 modules and decides to natively recompile "key modules" (by calling
hipe:compile/1). It seems such a waste to recompile those modules over
and over, so I wrote some code that recompile those modules once and
for all, and saves the altered beam. These are the 5 lines above, and
indeed, I call hipe_unified_loader:chunk_name/1 to avoid putting a
constant in the code there, so the code works on all development and
continuous integration machines. I did it this way because it seemed
easier than recompiling OTP modules in an OTP binary deployment. Of
course, I realize this doesn't use public API.

I thought I could natively recompile more modules than those selected
by dialyzer. This is how I ended up recompiling all compiler modules.
It seems useless to recompile several key OTP modules (e.g. lists)
because they are loaded before HiPE is actually loaded, but compiler
modules are a good target.

Everything went fine as long as the process consisted in running erlc
for each of our module and then dialyzer. Then we moved to a new
toolchain that calls compile:file/2 and dialyzer from a single VM,
with all calls to compile:file/2 through a rpc:rmap, and this is when
we started to observe those crashes.

>> In this case, the code that gets compiled natively is just part of
>> OTP. Do you have any hint about what can be done to track down the
>> bug ?
>
> There is a known problem with concurrent invokations of the HiPE
> compiler.
> It looks like the serialization of code loading that the BEAM loader
> is
> supposed to do isn't happening, or it is bypassed. This corrupts
> certain
> runtime system data structures causing crashes during GC. I'm
> currently
> trying to debug this problem.


Great. I was just asking how we could help fixing this bug. I realize
a VM crash is high priority. We're not observing this crash in
production (since it's purely related to compiling), and we definitely
don't use unsupported HiPE patches such as MacOS X 10.6/64bits on
production servers.

Thanks again,

Paul
--
Semiocast http://titema.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris


________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org

Post received from mailinglist
Guest
Posted: Wed Nov 04, 2009 10:31 am Reply with quote
Guest
Le 3 nov. 2009

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum