Erlang/OTP Forums

Author Message

<  User Contributions  ~  re.erl - a new regexp package

rvirding
Posted: Sun Jun 03, 2007 10:03 pm Reply with quote
User Joined: 30 Aug 2006 Posts: 452 Location: Stockholm, Sweden
This is a new implementation of regular expressions which is sort of compatible with regexp.erl with two major improvements:

1. It now works directly on binaries, all the functions take binaries as input, but not for the regexp.

2. There are 2 new function which extract and return sub-expressions, smatch/2, and first_smatch2. These are the similar to match/2 and first_match/2 but they also sub expressions For example:

2> re:smatch("-axxxb--", "a((x+)|(y+))b").
{match,2,5,"axxxb",{{3,3,"xxx"},{3,3,"xxx"},undefined}}

A sub-expr is 'undefined' if there is no match.

It supports POSIX regexp as did the old one, but we now have POSIX character classes but only for Latin-1. So we can write "[[:digit:]]" or "[[:alnum:]]". The functions are the same as before.

The regexp engine should never explode irrespective of the regexp, which many do, and is about as fast as the old one. It depends on the regexp.

I would like some feed-back on the speed and the interface.

N.B. It is not really possible to have both POSIX and PERL regexps in the same module as apart from the difference in features they have different semantics. If all goes well a PERL module might follow.



re.erl
 Description:
A new regular expression module. (3)

Download
 Filename:  re.erl
 Filesize:  44.19 KB
 Downloaded:  1593 Time(s)


re.erl
 Description:
A new regular expression module. (2)

Download
 Filename:  re.erl
 Filesize:  43.97 KB
 Downloaded:  1635 Time(s)

View user's profile Send private message Visit poster's website MSN Messenger
Mazen
Posted: Wed Jun 06, 2007 8:01 am Reply with quote
User Joined: 20 Jul 2006 Posts: 164 Location: London
Seem to work Wink

Tested it with a few expressions, nothing to fancy. Maybe someday I will have time to write a good test, but thats when I have time Rolling Eyes

Good job! Very Happy
View user's profile Send private message
nem
Posted: Mon Jan 14, 2008 5:00 am Reply with quote
User Joined: 29 Nov 2007 Posts: 25
I've found a small bug where the start position on a match group from first_smatch is negative.

This patch adds the fix_subs call present in smatch/2 but not first_smatch/2.

Code:

--- ../racer/src/re.erl 2007-09-25 10:53:53.000000000 +1200
+++ src/re.erl  2008-01-14 17:49:31.000000000 +1300
@@ -658,13 +658,13 @@
 
 first_smatch_str(Cs, P, Nfa) ->
     case next_smatch_str(Cs, P, Nfa) of
-       {match,St,Len,_,Subs,_} -> {match,St,Len,Subs};
+       {match,St,Len,_,Subs,_} -> {match,St,Len,fix_subs_str(Subs, St, Cs)};
        nomatch -> nomatch
     end.
 
 first_smatch_bin(Bin, P, Nfa) ->
     case next_smatch_bin(Bin, P, Nfa) of
-       {match,St,Len,Subs} -> {match,St,Len,Subs};
+       {match,St,Len,Subs} -> {match,St,Len,fix_subs_bin(Subs, St)};
        nomatch -> nomatch
     end.

[/code]



re.erl.diff.txt
 Description:

Download
 Filename:  re.erl.diff.txt
 Filesize:  598 Bytes
 Downloaded:  1305 Time(s)

View user's profile Send private message
daniello
Posted: Wed Feb 13, 2008 11:59 am Reply with quote
User Joined: 03 Apr 2007 Posts: 15
Eshell V5.6 (abort with ^G)
1> re:match("user@host.com","[a-zA-Z_0-9]{1,}[@][a-zA-Z_0-9-]{1,}([.]([a-zA-Z_0-9-]{1,}))$").
{match,1,13}
2> re:match("user@host.com.pl","[a-zA-Z_0-9]{1,}[@][a-zA-Z_0-9-]{1,}([.]([a-zA-Z_0-9-]{1,}))$").
nomatch
3> re:match("user@host.com.pl","[a-zA-Z_0-9]{1,}[@][a-zA-Z_0-9-]{1,}([.]([a-zA-Z_0-9-]{1,})){1,3}$").

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution

CPU was 100%

--
Regards,
Daniel
View user's profile Send private message
nem
Posted: Mon May 19, 2008 3:40 am Reply with quote
User Joined: 29 Nov 2007 Posts: 25
Hi all, just found a little bug in first_smatch for binaries. New version of re.erl attached.

I should really put this up on github and add eunit tests. (And bug the OTP people about accepting it into the standard distribution).



re.erl
 Description:
first_smatch patch for binaries too

Download
 Filename:  re.erl
 Filesize:  44.23 KB
 Downloaded:  1197 Time(s)

View user's profile Send private message
Mazen
Posted: Thu Jun 12, 2008 9:14 am Reply with quote
User Joined: 20 Jul 2006 Posts: 164 Location: London
Please note that by R12B3 There is now a module named "re" that ships with the distribution, I.e. beware of name clashes.

http://www.erlang.org/doc/man/re.html
http://www.erlang.org/download/otp_src_R12B-3.readme


Quote:

OTP-7181 An experimental module "re" is added to the emulator which
interfaces a publicly available regular expression library
for Perl-like regular expressions (PCRE). The interface is
purely experimental and *will* be subject to change.

The implementation is for reference and testing in connection
to the relevant EEP.
View user's profile Send private message
rvirding
Posted: Thu Jun 12, 2008 7:55 pm Reply with quote
User Joined: 30 Aug 2006 Posts: 452 Location: Stockholm, Sweden
Yes, I know! I will have to start a name war, or change the name of my package. Smile
View user's profile Send private message Visit poster's website MSN Messenger

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum