| re2c |
| ---- |
| |
| Version 0.9.1 |
| Originally written by Peter Bumbulis (peterr@csg.uwaterloo.ca) |
| Currently maintained by Brian Young (bayoung@acm.org) |
| |
| The re2c distribution can be found at: |
| |
| http://www.tildeslash.org/re2c/index.html |
| |
| The source distribution is available from: |
| |
| http://www.tildeslash.org/re2c/re2c-0.9.1.tar.gz |
| |
| This distribution is a cleaned up version of the 0.5 release |
| maintained by me (Brian Young). Several bugs were fixed as well |
| as code cleanup for warning free compilation. It has been developed |
| and tested with egcs 1.0.2 and gcc 2.7.2.3 on Linux x86. Peter |
| Bumbulis' original release can be found at: |
| |
| ftp://csg.uwaterloo.ca/pub/peterr/re2c.0.5.tar.gz |
| |
| re2c is a great tool for writing fast and flexible lexers. It has |
| served many people well for many years and it deserves to be |
| maintained more actively. re2c is on the order of 2-3 times faster |
| than a flex based scanner, and its input model is much more |
| flexible. |
| |
| Patches and requests for features will be entertained. Areas of |
| particular interest to me are porting (a Solaris and an NT |
| version will be forthcoming) and wide character support. Note |
| that the code is already quite portable and should be buildable |
| on any platform with minor makefile changes. |
| |
| Peter's original version 0.5 ANNOUNCE and README follows. |
| |
| Brian |
| |
| -- |
| |
| re2c is a tool for generating C-based recognizers from regular |
| expressions. re2c-based scanners are efficient: for programming |
| languages, given similar specifications, an re2c-based scanner is |
| typically almost twice as fast as a flex-based scanner with little or no |
| increase in size (possibly a decrease on cisc architectures). Indeed, |
| re2c-based scanners are quite competitive with hand-crafted ones. |
| |
| Unlike flex, re2c does not generate complete scanners: the user must |
| supply some interface code. While this code is not bulky (about 50-100 |
| lines for a flex-like scanner; see the man page and examples in the |
| distribution) careful coding is required for efficiency (and |
| correctness). One advantage of this arrangement is that the generated |
| code is not tied to any particular input model. For example, re2c |
| generated code can be used to scan data from a null-byte terminated |
| buffer as illustrated below. |
| |
| Given the following source |
| |
| #define NULL ((char*) 0) |
| char *scan(char *p){ |
| char *q; |
| #define YYCTYPE char |
| #define YYCURSOR p |
| #define YYLIMIT p |
| #define YYMARKER q |
| #define YYFILL(n) |
| /*!re2c |
| [0-9]+ {return YYCURSOR;} |
| [\000-\377] {return NULL;} |
| */ |
| } |
| |
| re2c will generate |
| |
| /* Generated by re2c on Sat Apr 16 11:40:58 1994 */ |
| #line 1 "simple.re" |
| #define NULL ((char*) 0) |
| char *scan(char *p){ |
| char *q; |
| #define YYCTYPE char |
| #define YYCURSOR p |
| #define YYLIMIT p |
| #define YYMARKER q |
| #define YYFILL(n) |
| { |
| YYCTYPE yych; |
| unsigned int yyaccept; |
| goto yy0; |
| yy1: ++YYCURSOR; |
| yy0: |
| if((YYLIMIT - YYCURSOR) < 2) YYFILL(2); |
| yych = *YYCURSOR; |
| if(yych <= '/') goto yy4; |
| if(yych >= ':') goto yy4; |
| yy2: yych = *++YYCURSOR; |
| goto yy7; |
| yy3: |
| #line 10 |
| {return YYCURSOR;} |
| yy4: yych = *++YYCURSOR; |
| yy5: |
| #line 11 |
| {return NULL;} |
| yy6: ++YYCURSOR; |
| if(YYLIMIT == YYCURSOR) YYFILL(1); |
| yych = *YYCURSOR; |
| yy7: if(yych <= '/') goto yy3; |
| if(yych <= '9') goto yy6; |
| goto yy3; |
| } |
| #line 12 |
| |
| } |
| |
| Note that most compilers will perform dead-code elimination to remove |
| all YYCURSOR, YYLIMIT comparisions. |
| |
| re2c was developed for a particular project (constructing a fast REXX |
| scanner of all things!) and so while it has some rough edges, it should |
| be quite usable. More information about re2c can be found in the |
| (admittedly skimpy) man page; the algorithms and heuristics used are |
| described in an upcoming LOPLAS article (included in the distribution). |
| Probably the best way to find out more about re2c is to try the supplied |
| examples. re2c is written in C++, and is currently being developed |
| under Linux using gcc 2.5.8. |
| |
| Peter |
| |
| -- |
| |
| re2c is distributed with no warranty whatever. The code is certain to |
| contain errors. Neither the author nor any contributor takes |
| responsibility for any consequences of its use. |
| |
| re2c is in the public domain. The data structures and algorithms used |
| in re2c are all either taken from documents available to the general |
| public or are inventions of the author. Programs generated by re2c may |
| be distributed freely. re2c itself may be distributed freely, in source |
| or binary, unchanged or modified. Distributors may charge whatever fees |
| they can obtain for re2c. |
| |
| If you do make use of re2c, or incorporate it into a larger project an |
| acknowledgement somewhere (documentation, research report, etc.) would |
| be appreciated. |
| |
| Please send bug reports and feedback (including suggestions for |
| improving the distribution) to |
| |
| peterr@csg.uwaterloo.ca |
| |
| Include a small example and the banner from parser.y with bug reports. |
| |