Erlang Mailing Lists

Author Message

<  Erlang bugs mailing list  ~  [bug & patch] xmerl_scan doesn't decode &#x refs properly

Guest
Posted: Mon Jun 07, 2010 4:18 pm Reply with quote
Guest
Hello,

There is a bug in xmerl_scan. It doesn't decode &#x refs properly.

Considering the following code :

{UTF8Output, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>" ++ [229, 145, 156] ++ "</text>"),
#xmlElement{content = [#xmlText{value = UTF8Text}]} = UTF8Output,
{DecEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
#xmlElement{content = [#xmlText{value = DecEntityText}]} = DecEntityOutput,
{HexEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>&#x545C;</text>"),
#xmlElement{content = [#xmlText{value = HexEntityText}]} = HexEntityOutput,

UTF8Text and DecEntityText are equal and as expected ([16#545C]).
HexEntityText is (incorrectly) a list composed of the three UTF8 bytes [229, 145, 156] while it should be equal to [16#545C].

A patch with a test case can be found here:

git fetch git://github.com/pguyot/otp.git pg/xmerl_scan_hex_entities

Regards,

Paul
--
Semiocast http://semiocast.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris


________________________________________________________________
erlang-bugs (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-bugs-unsubscribe@erlang.org

Post received from mailinglist

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum