Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  UTF8 and EDoc

ngocdaothanh
Posted: Wed Sep 30, 2009 8:03 am Reply with quote
User Joined: 06 Dec 2008 Posts: 43
Greetings,

1.

When I use EDoc library in Erlang R13B02-1 to create document with
Japanese characters in the doc comments, there is error:

edoc: error in doclet 'edoc_doclet':
{'EXIT',{no_translation,[{io,put_chars,[<0.54.0>,unicode,<<60,33,68,79,67,84,89,80,...>>]},{edoc_lib,write_file,4},{edoc_doclet,source,9},{lists,foldl,3},{edoc_doclet,sources,5},{edoc_doclet,gen,6},{edoc_lib,run_plugin,5},{lists,foreach,2}]}}.
** exception exit: error
in function edoc_lib:run_plugin/5
in call from lists:foreach/2

And the doc is not generated.

2.

The cause of the problem is at io:put_chars of write_file in
edoc_lib.erl. My dirty hack:

write_file(Text, Dir, Name, Package) ->
Dir1 = filename:join([Dir | packages:split(Package)]),
File = filename:join(Dir1, Name),
ok = filelib:ensure_dir(File),
case file:open(File, [write]) of
{ok, FD} ->
%io:put_chars(FD, Text), <-- ERROR
ok = file:close(FD),
file:write_file(File, unicode:characters_to_binary(Text)); <-- HACK
{error, R} ->
R1 = file:format_error(R),
report("could not write file '~s': ~s.", [File, R1]),
exit(error)
end.

Could someone who takes care of EDoc look into the problem?

Thank you,
Ngoc

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Post received from mailinglist
View user's profile Send private message Send e-mail
ngocdaothanh
Posted: Mon Oct 05, 2009 9:33 am Reply with quote
User Joined: 06 Dec 2008 Posts: 43
This is my fix to make EDoc work with Japanese (R13B02-1, .erl and
overview.edoc files are saved in UTF-Cool. I think it will work for
other languages:

1. At edoc_lib:write_file/4

Change

file:open(File)

to

file:open(File, [write, {encoding, utf8}])

This is better than my previous dirty hack.

2. At edoc_tags:parse_tags/5

Change

case dict:fetch(Name, How) of
text ->
parse_tags(Ts, How, Env, Where, [T | Ts1]);

to

case dict:fetch(Name, How) of
text ->
Data = unicode:characters_to_list(list_to_binary(T#tag.data)),
T2 = T#tag{data = Data},
parse_tags(Ts, How, Env, Where, [T2 | Ts1]);

Regards,
Ngoc


On Wed, Sep 30, 2009 at 6:37 PM, Richard Carlsson
<carlsson.richard@gmail.com> wrote:
> Ngoc Dao wrote:
>> When I use EDoc library in Erlang R13B02-1 to create document with
>> Japanese characters in the doc comments, there is error:
>
> Yes, this is a known problem. The short answer is that the input
> encoding for Erlang source code is defined to be Latin-1. That is,
> if you put things like Japanese or Russian characters in the
> source files, you are breaking the rules to begin with. (If it's only
> in comments, and using UTF-8, it will not prevent the compiler from
> skipping the comments and compiling the program, but you can't
> expect anything else to work.)
>
> What would be needed is something like a \u-escaping preprocessing
> stage, as specified for Java. But then, the tools must also know
> about \u escape sequences and turn them back into the proper code
> point in UTF-8 or whatever.
>
>    /Richard
>

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Post received from mailinglist
View user's profile Send private message Send e-mail
Guest
Posted: Mon Oct 05, 2009 7:40 pm Reply with quote
Guest
> Ngoc Dao wrote:
>> When I use EDoc library in Erlang R13B02-1 to create document with
>> Japanese characters in the doc comments, there is error:

Richard Carlsson wrote:
> Yes, this is a known problem. The short answer is that the input
> encoding for Erlang source code is defined to be Latin-1. [...]
> What would be needed is something like a \u-escaping preprocessing
> stage, as specified for Java. But then, the tools must also know
> about \u escape sequences and turn them back into the proper code
> point in UTF-8 or whatever.

An option could be to adopt the way it is done in Python:
it (re)uses the editor's encoding declaration. If it finds the text
-*- coding: utf-8 -*- or vim: set fileencoding=utf-8 :
on the first or second line of the source file, then it sets
the encoding for the entire source file accordingly. (It also
understands unicode byte-order marks at the beginning
of the file, which apparently makes life easier in editors
on Windows.)

See http://www.python.org/peps/pep-0263.html for details.

An advantage with this scheme seems to be that it fits nicely
with editors. They already know how to handle this.

It would probably require the Erlang compiler, edoc, and other tools
to be modified to know about source file encodings, though.

I suppose that with the \u-escaping, existing tools would continue
to work without modification, but it would be more work for the
programmer to type the text in as \u-seqences, unless editors
already know how to do such a transformation on the fly?

If no such encoding declaration is found, Python assumes ASCII,
but Erlang could maybe assume Latin-1. If Python finds non-ASCII
characters in a file with no encoding declaration, then it spits
out an error like this (wrapped for readability):

prompt# python /tmp/x.py
File "/tmp/x.py", line 3
SyntaxError: Non-ASCII character '\xe5' in file /tmp/x.py on line 3,
but no encoding declared; see http://www.python.org/peps/pep-0263.html
for details
prompt# cat /tmp/
#! /usr/bin/env python

print '
Guest
Posted: Mon Oct 05, 2009 7:43 pm Reply with quote
Guest
how can i convert the erlang files to c source code?
Thanks
Sumit

2009/10/6 Tomas Abrahamsson <tomas.abrahamsson@gmail.com>

> > Ngoc Dao wrote:
> >> When I use EDoc library in Erlang R13B02-1 to create document with
> >> Japanese characters in the doc comments, there is error:
>
> Richard Carlsson wrote:
> > Yes, this is a known problem. The short answer is that the input
> > encoding for Erlang source code is defined to be Latin-1. [...]
> > What would be needed is something like a \u-escaping preprocessing
> > stage, as specified for Java. But then, the tools must also know
> > about \u escape sequences and turn them back into the proper code
> > point in UTF-8 or whatever.
>
> An option could be to adopt the way it is done in Python:
> it (re)uses the editor's encoding declaration. If it finds the text
> -*- coding: utf-8 -*- or vim: set fileencoding=utf-8 :
> on the first or second line of the source file, then it sets
> the encoding for the entire source file accordingly. (It also
> understands unicode byte-order marks at the beginning
> of the file, which apparently makes life easier in editors
> on Windows.)
>
> See http://www.python.org/peps/pep-0263.html for details.
>
> An advantage with this scheme seems to be that it fits nicely
> with editors. They already know how to handle this.
>
> It would probably require the Erlang compiler, edoc, and other tools
> to be modified to know about source file encodings, though.
>
> I suppose that with the \u-escaping, existing tools would continue
> to work without modification, but it would be more work for the
> programmer to type the text in as \u-seqences, unless editors
> already know how to do such a transformation on the fly?
>
> If no such encoding declaration is found, Python assumes ASCII,
> but Erlang could maybe assume Latin-1. If Python finds non-ASCII
> characters in a file with no encoding declaration, then it spits
> out an error like this (wrapped for readability):
>
> prompt# python /tmp/x.py
> File "/tmp/x.py", line 3
> SyntaxError: Non-ASCII character '\xe5' in file /tmp/x.py on line 3,
> but no encoding declared; see http://www.python.org/peps/pep-0263.html
> for details
> prompt# cat /tmp/
> #! /usr/bin/env python
>
> print '
Guest
Posted: Mon Oct 05, 2009 10:32 pm Reply with quote
Guest
Tomas Abrahamsson wrote:
> Richard Carlsson wrote:
>> Yes, this is a known problem. The short answer is that the input
>> encoding for Erlang source code is defined to be Latin-1. [...]
>> What would be needed is something like a \u-escaping preprocessing
>> stage, as specified for Java. But then, the tools must also know
>> about \u escape sequences and turn them back into the proper code
>> point in UTF-8 or whatever.
>>
>
> An option could be to adopt the way it is done in Python:
> it (re)uses the editor's encoding declaration. If it finds the text
> -*- coding: utf-8 -*- or vim: set fileencoding=utf-8 :
>
There is already a way to indicate whether something is UTF-8 (or
UTF-16BE or UTF-16LE for that matter), and that is a byte-order mark;
although the BOM serves no useful byte-ordering semantic for UTF-8, it
does also have the function of saying "hey, I'm UTF-8!", a message which
numerous programs understand.


________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Post received from mailinglist
Guest
Posted: Tue Oct 06, 2009 4:31 am Reply with quote
Guest
2009/10/5 Tomas Abrahamsson <tomas.abrahamsson@gmail.com>:

> An option could be to adopt the way it is done in Python:
> it (re)uses the editor's encoding declaration. If it finds the text
>   -*- coding: utf-8 -*-  or  vim: set fileencoding=utf-8 :
> on the first or second line of the source file, then it sets
> the encoding for the entire source file accordingly. (It also
> understands unicode byte-order marks at the beginning
> of the file, which apparently makes life easier in editors
> on Windows.)


yuk! Not everyone editor has this information?
If a text file needs to inform an app of its encoding, then
either
a) Enclose the encoding in the file
(xml example encoding='utf-8')
b) Be explicit when calling up the application.
I also think a default encoding as a fallback is essential,
utf-8 being the obvious one.
The BOM (byte order mark) as the first character of a file
has not been successful.

>
> See http://www.python.org/peps/pep-0263.html for details.
>
> An advantage with this scheme seems to be that it fits nicely
> with editors. They already know how to handle this.

Only if you use the 'right' editor surely?


>
> It would probably require the Erlang compiler, edoc, and other tools
> to be modified to know about source file encodings, though.

What of programmatically generated files?


>
> I suppose that with the \u-escaping, existing tools would continue
> to work without modification, but it would be more work for the
> programmer to type the text in as \u-seqences, unless editors
> already know how to do such a transformation on the fly?

Or mimic python even more?
u"A utf-8 encoded string"
and a unicode('another unicode string')

a string operator and encoding function.



>
> If no such encoding declaration is found, Python assumes ASCII,




> but Erlang could maybe assume Latin-1.

Please move on to utf-8. Latin-1 is so restrictive..



regards



--
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Post received from mailinglist
wuji
Posted: Tue Sep 18, 2012 4:22 am Reply with quote
User Joined: 10 Aug 2012 Posts: 654
abs and intensity, you would never guess he was once once cheap replica designer *beep* once a scrawny kid who failed to make the high
football team."I was so weak and vulnerable at that time time replica designer *beep* time that I just felt empowered and what gave it
me was fitness, iron, and I was hooked," he said.Powell said.Powell cheap designer *beep* said.Powell became a trainer and even appeared on "Good Morning
But after some bad business deals, he found himself living living [h1]cheap jordan shoes[/h1] living out of his car."I felt like a fraud, I
like a total fraud," he said. "I made poor business business cheap authentic air jordans business decisions. I was hundreds of thousands of dollars in
I was losing everything."That is, until he met Heidi, a a cheap replica *beep* a recently divorced single mom of two. The couple are
married and raising three kids together."We really met at the the [h3]cheap designer *beep*[/h3] the deepest, darkest part in both of our lives and
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum