| Author |
Message |
< Erlang patches mailing list ~ [bug & patch] xmerl_scan doesn't decode refs properly |
| Guest |
Posted: Mon Jun 07, 2010 4:18 pm |
|
|
|
Guest
|
Hello,
There is a bug in xmerl_scan. It doesn't decode &#x refs properly.
Considering the following code :
{UTF8Output, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>" ++ [229, 145, 156] ++ "</text>"),
#xmlElement{content = [#xmlText{value = UTF8Text}]} = UTF8Output,
{DecEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
#xmlElement{content = [#xmlText{value = DecEntityText}]} = DecEntityOutput,
{HexEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
#xmlElement{content = [#xmlText{value = HexEntityText}]} = HexEntityOutput,
UTF8Text and DecEntityText are equal and as expected ([16#545C]).
HexEntityText is (incorrectly) a list composed of the three UTF8 bytes [229, 145, 156] while it should be equal to [16#545C].
A patch with a test case can be found here:
git fetch git://github.com/pguyot/otp.git pg/xmerl_scan_hex_entities
Regards,
Paul
--
Semiocast http://semiocast.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Mon Jun 07, 2010 4:19 pm |
|
|
|
Guest
|
Hello,
There is a bug in xmerl_scan. It doesn't decode &#x refs properly.
Considering the following code :
{UTF8Output, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>" ++ [229, 145, 156] ++ "</text>"),
#xmlElement{content = [#xmlText{value = UTF8Text}]} = UTF8Output,
{DecEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
#xmlElement{content = [#xmlText{value = DecEntityText}]} = DecEntityOutput,
{HexEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
#xmlElement{content = [#xmlText{value = HexEntityText}]} = HexEntityOutput,
UTF8Text and DecEntityText are equal and as expected ([16#545C]).
HexEntityText is (incorrectly) a list composed of the three UTF8 bytes [229, 145, 156] while it should be equal to [16#545C].
A patch with a test case can be found here:
git fetch git://github.com/pguyot/otp.git pg/xmerl_scan_hex_entities
Regards,
Paul
--
Semiocast http://semiocast.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Tue Jun 08, 2010 8:02 am |
|
|
|
Guest
|
On Mon, Jun 07, 2010 at 06:17:47PM +0200, Paul Guyot wrote:
> Hello,
>
> There is a bug in xmerl_scan. It doesn't decode &#x refs properly.
>
> Considering the following code :
>
> {UTF8Output, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>" ++ [229, 145, 156] ++ "</text>"),
> #xmlElement{content = [#xmlText{value = UTF8Text}]} = UTF8Output,
> {DecEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
> #xmlElement{content = [#xmlText{value = DecEntityText}]} = DecEntityOutput,
> {HexEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
> #xmlElement{content = [#xmlText{value = HexEntityText}]} = HexEntityOutput,
>
> UTF8Text and DecEntityText are equal and as expected ([16#545C]).
> HexEntityText is (incorrectly) a list composed of the three UTF8 bytes [229, 145, 156] while it should be equal to [16#545C].
>
> A patch with a test case can be found here:
>
> git fetch git://github.com/pguyot/otp.git pg/xmerl_scan_hex_entities
Thank you! It will be included in 'pu', after reformatting the commit
message and cherry-pick onto 'dev' since it was not based on 'dev'
but on a merge result containing 'dev'.
>
> Regards,
>
> Paul
> --
> Semiocast http://semiocast.com/
> +33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris
>
>
> ________________________________________________________________
> erlang-patches (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
>
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Tue Jun 08, 2010 8:02 am |
|
|
|
Guest
|
On Mon, Jun 07, 2010 at 06:17:47PM +0200, Paul Guyot wrote:
> Hello,
>
> There is a bug in xmerl_scan. It doesn't decode &#x refs properly.
>
> Considering the following code :
>
> {UTF8Output, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>" ++ [229, 145, 156] ++ "</text>"),
> #xmlElement{content = [#xmlText{value = UTF8Text}]} = UTF8Output,
> {DecEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
> #xmlElement{content = [#xmlText{value = DecEntityText}]} = DecEntityOutput,
> {HexEntityOutput, []} = xmerl_scan:string("<?xml version=\"1\" ?>\n<text>呜</text>"),
> #xmlElement{content = [#xmlText{value = HexEntityText}]} = HexEntityOutput,
>
> UTF8Text and DecEntityText are equal and as expected ([16#545C]).
> HexEntityText is (incorrectly) a list composed of the three UTF8 bytes [229, 145, 156] while it should be equal to [16#545C].
>
> A patch with a test case can be found here:
>
> git fetch git://github.com/pguyot/otp.git pg/xmerl_scan_hex_entities
Thank you! It will be included in 'pu', after reformatting the commit
message and cherry-pick onto 'dev' since it was not based on 'dev'
but on a merge result containing 'dev'.
>
> Regards,
>
> Paul
> --
> Semiocast http://semiocast.com/
> +33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris
>
>
> ________________________________________________________________
> erlang-patches (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
>
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
________________________________________________________________
erlang-patches (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-patches-unsubscribe@erlang.org
Post received from mailinglist |
|
|
| Back to top |
|
|
|
All times are GMT
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
|