Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  Wrapping C libraries in pure Erlang

denik
Posted: Wed Apr 04, 2007 10:25 am Reply with quote
User Joined: 03 Feb 2007 Posts: 10
Hello,

Python has a very nice package in its stdlib -- ctypes. It allows
wrapping C libraries in pure Python. ctypes' implementation is based
on libffi, C library for handling dynamic libraries.

I wonder if Erlang could have such library, implemented as a driver on
top of libffi, or any other way that serves the goal. Excluding
intermediate IDL and associated compilation phase
simplifies interop a lot. Surely, one could crash an interpreter more
easily, but that can be mitigated by separating unsafe code in another
node.

A couple examples from ctypes documentation translated into (wishful) Erlang:

%% Functions return int by default
1> cee:call(Libc, time, [null]).
1150640792

%% Parameters' types deduced when there exists a well-defined mapping
2> cee:call(Libc, printf, ["%d bottles of beer\n", 42]).

%% Ambiguity must be resolved by user
3> cee:call(Libc, printf, ["%d bottles of beer\n", 42.5]). % float or double?
** exited: {{nocatch,{argument_error, 2}},
[{erl_eval,do_apply,5},{shell,exprs,6},{shell,eval_loop,3}]} **

4> cee:call(Libc, printf, ["int %d, double %f\n", 1234, cee:double(3.14)]).
int 1234, double 3.1400001049
31

%% where cee:double is something like
double(N) when is_float(N) -> {c_double, N}.

%% This example is somewhat different from Python's,
%% since Erlang disallows mutable data. Although, that
%% doesn't seem a big problem, as we can make more copies.
5> cee:call(Libc, sscanf, ["1 3.14 Hello", "%d %f %s",
output(c_int), output(c_float),
output(char_array(100))]).
{3, [1, 3.1400001049, "Hello"]}.

(ctypes has much more than that, including passing python
functions as callbacks)

One useful application would be accessing system calls not
covered by existing BIFs/drivers (native GUI goes in this
category)

I would like to hear any comments, especially from people who
know something about Erlang internals (I don't):
Would it be hard to implement?
Has anyone already thought of something like that?

Denis.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist
View user's profile Send private message
mats
Posted: Wed Apr 04, 2007 2:08 pm Reply with quote
User Joined: 28 Feb 2005 Posts: 168 Location: budapest,hungary
Denis Bilenko wrote:

[...]
> One useful application would be accessing system calls not
> covered by existing BIFs/drivers (native GUI goes in this
> category)
>
> I would like to hear any comments, especially from people who
> know something about Erlang internals (I don't):
> Would it be hard to implement?
> Has anyone already thought of something like that?

i think the "proper" way to wrap external libraries (esp native GUI) is to
create a c-node (i.e. a server process that implements the erlang distribution
protocol). the call to time() is wrapped in (say) Time(), accessible thus;

libc ! {'Time',[]},
receive
{libc,{ok,Ans}} -> Ans
end

many/most wrappers can be auto generated from the C header files.

this is verifiably possible (and even easy), since I've done exactly this
with GTK (http://code.google.com/p/gtknode).

a c-node is safe and easy to work with. in my experience, linked in drivers
are more trouble than their worth.

mats
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist
View user's profile Send private message MSN Messenger
asergey
Posted: Wed Apr 04, 2007 2:16 pm Reply with quote
User Joined: 12 Mar 2005 Posts: 313
Part of the issue is that this approach would make it possible for a
function call inside a 3rd party library to block for a while. Since
Erlang's concurrency is dependent on the fact that function calls are
very short, and the emulator uses reduction-based counting for giving
CPU slices to each light-weight process a blocking call would
significantly inhibit concurrency.

Drivers solve this problem by allowing asynchronous invocation of
blocking functions in the context of threads different from the
emulator(s) (*). In case of a driver or a port, once you are on the
C-side of coding interface, you can make calls to any 3rd party library
directly without needing FFI (except for cases of functions with
variable arguments), which may bring you to the question of why bother
with such an Erlang library if you already have ports and drivers that
can be written in C or even ei interface for writing C-nodes that can
make any C-calls safely with respect to the emulator?

There were attempts at somewhat automating generation of a driver using
EDTK tooklit (**) or Dryverl (***) that some people found useful.
Others prefer having more control and coding drivers by hand.

Regards,

Serge

(*) Emulators can run in multiple threads if SMP support is enabled.
(**) http://www.snookles.com/erlang/edtk/
(***) http://forge.objectweb.org/forum/forum.php?forum_id=1018


Denis Bilenko wrote:
> Hello,
>
> Python has a very nice package in its stdlib -- ctypes. It allows
> wrapping C libraries in pure Python. ctypes' implementation is based
> on libffi, C library for handling dynamic libraries.
>
> I wonder if Erlang could have such library, implemented as a driver on
> top of libffi, or any other way that serves the goal. Excluding
> intermediate IDL and associated compilation phase
> simplifies interop a lot. Surely, one could crash an interpreter more
> easily, but that can be mitigated by separating unsafe code in another
> node.
>
> A couple examples from ctypes documentation translated into (wishful) Erlang:
>
> %% Functions return int by default
> 1> cee:call(Libc, time, [null]).
> 1150640792
>
> %% Parameters' types deduced when there exists a well-defined mapping
> 2> cee:call(Libc, printf, ["%d bottles of beer\n", 42]).
>
> %% Ambiguity must be resolved by user
> 3> cee:call(Libc, printf, ["%d bottles of beer\n", 42.5]). % float or double?
> ** exited: {{nocatch,{argument_error, 2}},
> [{erl_eval,do_apply,5},{shell,exprs,6},{shell,eval_loop,3}]} **
>
> 4> cee:call(Libc, printf, ["int %d, double %f\n", 1234, cee:double(3.14)]).
> int 1234, double 3.1400001049
> 31
>
> %% where cee:double is something like
> double(N) when is_float(N) -> {c_double, N}.
>
> %% This example is somewhat different from Python's,
> %% since Erlang disallows mutable data. Although, that
> %% doesn't seem a big problem, as we can make more copies.
> 5> cee:call(Libc, sscanf, ["1 3.14 Hello", "%d %f %s",
> output(c_int), output(c_float),
> output(char_array(100))]).
> {3, [1, 3.1400001049, "Hello"]}.
>
> (ctypes has much more than that, including passing python
> functions as callbacks)
>
> One useful application would be accessing system calls not
> covered by existing BIFs/drivers (native GUI goes in this
> category)
>
> I would like to hear any comments, especially from people who
> know something about Erlang internals (I don't):
> Would it be hard to implement?
> Has anyone already thought of something like that?
>
> Denis.
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@erlang.org
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist
View user's profile Send private message
denik
Posted: Wed Apr 04, 2007 6:34 pm Reply with quote
User Joined: 03 Feb 2007 Posts: 10
Mats Cronqvist wrote:

> i think the "proper" way to wrap external libraries (esp native GUI) is to
> create a c-node (i.e. a server process that implements the erlang distribution
> protocol). the call to time() is wrapped in (say) Time(), accessible thus;
>
> libc ! {'Time',[]},
> receive
> {libc,{ok,Ans}} -> Ans
> end
>
> many/most wrappers can be auto generated from the C header files.
>
> this is verifiably possible (and even easy), since I've done exactly this
> with GTK (http://code.google.com/p/gtknode).
>
> a c-node is safe and easy to work with. in my experience, linked in drivers
> are more trouble than their worth.

That's why I want an ultimate linked-in driver -- to abstract them out, and
never have to work with them again (except, maybe, for performance reasons).

c-node is safe to work with, but erlang-node is just as safe. and it must be
easier to work with -- after all, it is a high-level dynamic Erlang versus C.
A bridge between dynamic language interpreter and C can be build from either
end. ctypes has demonstrated that building Python wrappers to C libraries
in Python itself is a good thing (or at least a very easy thing:).
I think that would be true for Erlang too.

Thank you for the pointer.
C-node around libffi can be useful, but it is limited compared
to driver (but I can always turn driver into a separate node).
What if I want to make just one system call?
It doesn't seem natural to start a node for it.
What if this call receives file descriptor?
Now it doesn't even seem possible.

Denis.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist
View user's profile Send private message
denik
Posted: Wed Apr 04, 2007 6:45 pm Reply with quote
User Joined: 03 Feb 2007 Posts: 10
Serge Aleynikov wrote:

> Part of the issue is that this approach would make it possible for a
> function call inside a 3rd party library to block for a while. Since
> Erlang's concurrency is dependent on the fact that function calls are
> very short, and the emulator uses reduction-based counting for giving
> CPU slices to each light-weight process a blocking call would
> significantly inhibit concurrency.

For that case, imaginary library 'cee' has function 'cast' that instructs
driver to call function asynchronously, as you have mentioned below.

> Drivers solve this problem by allowing asynchronous invocation of
> blocking functions in the context of threads different from the
> emulator(s) (*). In case of a driver or a port, once you are on the
> C-side of coding interface, you can make calls to any 3rd party library
> directly without needing FFI (except for cases of functions with
> variable arguments), which may bring you to the question of why bother
> with such an Erlang library if you already have ports and drivers that
> can be written in C or even ei interface for writing C-nodes that can
> make any C-calls safely with respect to the emulator?

Because, if I had such a library I would not have to bother
with C, ei, driver interface.
The point is, why manipulate Erlang data structures from C instead of
manipulating C data structures from Erlang.
Latter approach has to be more productive for (Erlang) programmer.

Denis.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist
View user's profile Send private message
Guest
Posted: Thu Apr 05, 2007 12:57 am Reply with quote
Guest
Denis Bilenko wrote:
> Serge Aleynikov wrote:
>
>> Part of the issue is that this approach would make it possible for a
>> function call inside a 3rd party library to block for a while. Since
>> Erlang's concurrency is dependent on the fact that function calls are
>> very short, and the emulator uses reduction-based counting for giving
>> CPU slices to each light-weight process a blocking call would
>> significantly inhibit concurrency.
>
> For that case, imaginary library 'cee' has function 'cast' that instructs
> driver to call function asynchronously, as you have mentioned below.

That does not work, because while an emulator scheduler's OS thread is
calling C code, it cannot execute Erlang code. Even if your process uses
cee or cast, the C code will still be executed using one of the
scheduler's OS threads that are also used to execute Erlang code. There
is no separate OS thread pool for Erlang and C code. Therefore, this
doesn't solve the problem.

There exists a separate OS thread pool for exclusive use by linked-in
drivers, but your drivers must explicitly programmed to use them, and
they have severe drawbacks, e.g., you cannot communicate with Erlang
(i.e., send a message to an Erlang process) from within such an I/O thread.

That is what Serge Aleynikov mentioned just below:

>> Drivers solve this problem by allowing asynchronous invocation of
>> blocking functions in the context of threads different from the
>> emulator(s) (*). In case of a driver or a port, once you are on the
>> C-side of coding interface, you can make calls to any 3rd party library
>> directly without needing FFI (except for cases of functions with
>> variable arguments), which may bring you to the question of why bother
>> with such an Erlang library if you already have ports and drivers that
>> can be written in C or even ei interface for writing C-nodes that can
>> make any C-calls safely with respect to the emulator?
>
> Because, if I had such a library I would not have to bother
> with C, ei, driver interface.
> The point is, why manipulate Erlang data structures from C instead of
> manipulating C data structures from Erlang.
> Latter approach has to be more productive for (Erlang) programmer.

To understand why this cannot be trivial using the current linked-in
driver mechanism implementation, please read about EDTK and Dryverl:
http://www.erlang.se/workshop/2002/Fritchie.pdf
http://www.csg.is.titech.ac.jp/paper/lenglet2006dryverl.pdf

There has also been IG around for some time, but it does not support
linked-in drivers:
http://www.bluetail.com/tobbe/ig/doc.new/

If you use anything else than BIFs, then one important drawback is that
you *must* pass all data as serialized terms between Erlang and C code.
(Except that in some cases, you can send binaries to a linked-in drivers
by reference, but that is quite tricky, cf. the paper on Dryverl)
The linked-in drivers mechanism was not designed to interface to
arbitrary C code. It was designed to implement I/O drivers. Period.
Using this mechanism for anything else is possible, but is painful,
which motivated tools like EDTK and Dryverl.
If you really want to "manuipulate C data structures from Erlang", then
you must use BIFs, which are undocumented and unsupported, i.e., your
BIFs' code will be dependent on one specific version of the emulator,
and you will have to distribute your own modified emulator to users.
Impractical.

If it were that easy to do, please believe that such an automatic
integration of C and Erlang would have been done a long time ago. (^_^)

Therefore, I agree with Mats and Serge: if you need simple concurrency
and flexibility, use C ports. Linked-in drivers potentially give better
performance, but may not be worth the effort for your problems.

--
Romain Lenglet
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist
Guest
Posted: Thu Apr 05, 2007 3:06 am Reply with quote
Guest
Romain Lenglet wrote:
> Denis Bilenko wrote:
>> Serge Aleynikov wrote:
>>
>>> Part of the issue is that this approach would make it possible for a
>>> function call inside a 3rd party library to block for a while. Since
>>> Erlang's concurrency is dependent on the fact that function calls are
>>> very short, and the emulator uses reduction-based counting for giving
>>> CPU slices to each light-weight process a blocking call would
>>> significantly inhibit concurrency.
>> For that case, imaginary library 'cee' has function 'cast' that instructs
>> driver to call function asynchronously, as you have mentioned below.
>
> That does not work, because while an emulator scheduler's OS thread is
> calling C code, it cannot execute Erlang code. Even if your process uses
> cee or cast, the C code will still be executed using one of the
> scheduler's OS threads that are also used to execute Erlang code. There
> is no separate OS thread pool for Erlang and C code. Therefore, this
> doesn't solve the problem.

OK... now that my cups of coffee start having effect, I can clarify my
paragraph above.

Your hypothetical 'cee' library cannot be implemented in Erlang, since
we have no control over the scheduler and the creation / allocation of
scheduling OS threads from Erlang code, and any time Erlang code calls a
BIF or interact with a linked-in driver, this is done in a scheduler's
OS thread. If you have many simultaneous calls to C code from Erlang,
which probability is increased when your calls to C code are blocking or
take a long time, then you have less OS threads available from the pool
to execute Erlang code. This could even stall or deadlock your application.

If your library were in C, that would be OK, and the existing APIs are
probably sufficient, but you have to do a lot to circumvent the limits
of the asynchronous I/O threads, as I mentioned below:

> There exists a separate OS thread pool for exclusive use by linked-in
> drivers, but your drivers must explicitly programmed to use them, and
> they have severe drawbacks, e.g., you cannot communicate with Erlang
> (i.e., send a message to an Erlang process) from within such an I/O thread.

That is the kind of problems that EDTK and Dryverl try to solve. No need
for yet another 'cee' library, IMO. Particularly, Chris Newcombe seems
to have done a great job dealing with heavy multithreading in linked-in
drivers using EDTK.

--
Romain Lenglet
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist
denik
Posted: Thu Apr 12, 2007 9:20 am Reply with quote
User Joined: 03 Feb 2007 Posts: 10
Romain Lenglet wrote:
> There exists a separate OS thread pool for exclusive use by linked-in
> drivers, but your drivers must explicitly programmed to use them, and
> they have severe drawbacks, e.g., you cannot communicate with Erlang
> (i.e., send a message to an Erlang process) from within such an I/O thread.
Do you mean that it cannot be done in a thread-safe way?
>From doc on erl_driver:
driver_output_term:
Note that this function is not thread-safe,
not even when the emulator with SMP support is used.
driver_send_term:
This function is only thread-safe when the emulator
with SMP support is used.
(http://www.erlang.org/doc/doc-5.5.4/erts-5.5.4/doc/html/erl_driver.html)

I assume this means that these functions must be called only from
driver_entry callbacks, never from worker threads.
EDTK uses ready_async callback to send term (with driver_output_term()).

> To understand why this cannot be trivial using the current linked-in
> driver mechanism implementation, please read about EDTK and Dryverl:
> http://www.erlang.se/workshop/2002/Fritchie.pdf
> http://www.csg.is.titech.ac.jp/paper/lenglet2006dryverl.pdf

Thanks, these are useful.
Other docs in EDTK are also very readable.
(found in http://www.snookles.com/erlang/edtk/edtk-1.5-candidate-2.tar.gz)

> If you use anything else than BIFs, then one important drawback is that
> you *must* pass all data as serialized terms between Erlang and C code.
> (Except that in some cases, you can send binaries to a linked-in drivers
> by reference, but that is quite tricky, cf. the paper on Dryverl)
> The linked-in drivers mechanism was not designed to interface to
> arbitrary C code. It was designed to implement I/O drivers. Period.
> Using this mechanism for anything else is possible, but is painful,
> which motivated tools like EDTK and Dryverl.
> If you really want to "manuipulate C data structures from Erlang", then
> you must use BIFs, which are undocumented and unsupported, i.e., your
> BIFs' code will be dependent on one specific version of the emulator,
> and you will have to distribute your own modified emulator to users.
> Impractical.
OK, then BIFs are out.
Serialization is a slowdown, but not a fatal problem and is
faced by every driver writer.

> Your hypothetical 'cee' library cannot be implemented in Erlang, since
> we have no control over the scheduler and the creation / allocation of
> scheduling OS threads from Erlang code, and any time Erlang code calls a
> BIF or interact with a linked-in driver, this is done in a scheduler's
> OS thread. If you have many simultaneous calls to C code from Erlang,
> which probability is increased when your calls to C code are blocking or
> take a long time, then you have less OS threads available from the pool
> to execute Erlang code. This could even stall or deadlock your application.

> If your library were in C, that would be OK, and the existing APIs are
> probably sufficient, but you have to do a lot to circumvent the limits
> of the asynchronous I/O threads, as I mentioned below:
Perhaps I wasn't clear -- I didn't expect cee library itself to be implemented
in pure Erlang. Its implementation will definitely involve some C code, in
form of a driver (or port, or c-node). Once it's done user gets pure-erlang
interface for calling functions in arbitrary dll/so.

> That is the kind of problems that EDTK and Dryverl try to solve. No need
> for yet another 'cee' library, IMO.

Let compare driver generated by EDTK with the one I'd like to have:

EDTK generated driver, when called by port_command, receives
* function's id
* packed arguments
then performs switch on received function id, thus selecting
1) pointer to (statically linked) function
2) procedure for decoding arguments (and encoding result)
3) async parameter - whether to use dryver_async or call function directly
then it performs call either right away or through dryver_async,
constructs term and sends it to the port.
Also it does:
a) check result against expected value, and obtain additional error info
example: if (!res) error = errno; // error goes into output term
b) maintain list of resources to free in case of process/port crash (valmaps)
c) arbitrary hacks -- snippets of code in predefined places inserted as
is from XML definition.

cee library driver would receive
* function's name and a handle to a dll,
* arguments with necessary type information
example: [{double, 3.14}, {char_p, <<"Hello">>, inout}]
* type of a return value
* async parameter - whether to use driver_async or call function directly.
then it would pack arguments in libffi structure and call the function
in the dll.

So it is possible to do and not too far away from what EDTK already does.

What about a, b and c?
Well,
a) would be definitely necessary, since errno won't hang around too long.
This could be done with predefined options:
read errno when result is NULL / not NULL / negative / is not X ...
Not as flexible as EDTK and Dryverl, but would suffice for many cases.
b) valmap is a tricky thing. Why implement it in C?
In Erlang we can limit scope of a resource using try/after,
or by monitoring (or linking to) process.
c) cannot be done.
If some additional stuff need to be done at driver level, it either
1. implemented for everybody and made available via option to port_call
2. not available
Again, not as flexible, but simpler.

Loaded C libraries won't be a pleasure to use as is.
Additional layer of Erlang code will be required to handle
resources and do other stuff to make library behave Erlangish.
But it seems much more natural than code generation.

So it can hardly be called 'yet another', it is different
from EDTK/Dryverl. (no XML, no generated drivers, simple to
use for simple cases)

> Particularly, Chris Newcombe seems to have done a great job dealing
> with heavy multithreading in linked-in drivers using EDTK.

Right, it seems that dryver_async is not good enough for all purposes.
Chris Newcombe in edtk-1.5/README-cnewcom wrote:
> Of course it is critical that the Erlang VM's scheduler threads are
> never blocked by Berkley DB. The standard, supported way to achieve this is
> using the Erlang 'async thread pool' ('erl +A' and the driver_async()
> API). Unfortunately that mechanism is not flexible enough for the
> Berkeley DB driver (and using it would risk interfering with other
> important drivers like efile_drv). So EDTK now implements private
> threadpools, and multiplexes commands across those pools. Each driver
> instance (port) has it's own set of threads, and the pools are
> resizeable at runtime.
This can be done for the cee library too.

Denis.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum