| The following changes (change numbers refer to perforce) were |
| made from version 3.1.1 to 3.1.2 |
| |
| Runtime |
| ------- |
| |
| Change 5641 on 2009/02/20 by [email protected] |
| |
| Release version 3.1.2 of the ANTLR C runtime. |
| |
| Updated documents and release notes will have to follow later. |
| |
| Change 5639 on 2009/02/20 by [email protected] |
| |
| Fixed: ANTLR-356 |
| |
| Ensure that code generation for C++ does not require casts |
| |
| Change 5577 on 2009/02/12 by [email protected] |
| |
| C Runtime - Bug fixes. |
| |
| o Having moved to use an extract directly from a vector for returning |
| tokens, it exposed a |
| bug whereby the EOF boudary calculation in tokLT was incorrectly |
| checking > rather than >=. |
| o Changing to API initialization of tokens rather than memcmp() |
| incorrectly forgot to set teh input stream pointer for the |
| manufactured tokens in the token factory; |
| o Rewrite streams for rewriting tree parsers did not check whether the |
| rewrite stream was ever assigned before trying to free it, it is now |
| in line with the ordinary parser code. |
| |
| Change 5576 on 2009/02/11 by [email protected] |
| |
| C Runtime: Ensure that when we manufacture a new token for a missing |
| token, that the user suplied custom information (if any) is copied |
| from the current token. |
| |
| Change 5575 on 2009/02/08 by [email protected] |
| |
| C Runtime - Vastly improve the reuse of allocated memory for nodes in |
| tree rewriting. |
| |
| A problem for all targets at the moment si that the rewrite logic |
| generated by ANTLR makes no attempt |
| to reuse any resources, it merely gurantees that the tree shape at the |
| end is correct. To some extent this is mitigated by the garbage |
| collection systems of Java and .Net, even thoguh it is still an overhead to |
| keep creating so many modes. |
| |
| This change implements the first of two C runtime changes that make |
| best efforst to track when a node has become orphaned and will never |
| be reused, based on inherent knowledge of the rewrite logic (which in |
| the long term is not a great soloution). |
| |
| Much of the rewrite logic consists of creating a niilnode into which |
| child nodes are appended. At: rulePost processing time; when a rewrite |
| stream is closed; and when becomeRoot is called, there are many situations |
| where the root of the tree that will be manipulted, or is finished with |
| (in the case of rewrtie streams), where the nilNode was just a temporary |
| creation for the sake of the rewrite itself. |
| |
| In these cases we can see that the nilNode would just be left ot rot in |
| the node factory that tracks all the tree nodes. |
| Rather than leave these in the factory to rot, we now keep a resuse |
| stck and always reuse any node on this |
| stack before claimin a new node from the factory pool. |
| |
| This single change alone reduces memory usage in the test case (20,604 |
| line C program and a GNU C parser) |
| from nearly a GB, to 276MB. This is still way more memory than we |
| shoudl need to do this operation, even on such a large input file, |
| but the reduction results in a huge performance increase and greatly |
| reduced system time spent on allocations. |
| |
| After this optimizatoin, comparison with gcc yeilds: |
| |
| time gcc -S a.c |
| a.c:1026: warning: conflicting types for built-in function ‘vsprintf’ |
| a.c:1030: warning: conflicting types for built-in function ‘vsnprintf’ |
| a.c:1041: warning: conflicting types for built-in function ‘vsscanf’ |
| 0.21user 0.01system 0:00.22elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k |
| 0inputs+240outputs (0major+8345minor)pagefaults 0swaps |
| |
| and |
| |
| time ./jimi |
| Reading a.c |
| 0.28user 0.11system 0:00.39elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k |
| 0inputs+0outputs (0major+66609minor)pagefaults 0swaps |
| |
| And we can now interpolate the fact that the only major differnce is |
| now the huge disparity in memory allocations. A |
| future optimization of vector pooling, to sepate node resue from vector |
| reuse, currently looks promising for further reuse of memory. |
| |
| Finally, a static analysis of the rewrte code, plus a realtime analysis |
| of the heap at runtime, may well give us a reasonable memory usage |
| pattern. In reality though, it is the generated rewrite logic |
| that must becom optional at not continuously rewriting things that it |
| need not, as it ascends the rule chain. |
| |
| Change 5563 on 2009/01/28 by [email protected] |
| |
| Allow rewrite streams to use the base adaptors vector factory and not |
| try to malloc new vectors themselves. |
| |
| Change 5562 on 2009/01/28 by [email protected] |
| |
| Don't use CALLOC to allocate tree pools, use malloc as there is no need |
| for calloc. |
| |
| Change 5561 on 2009/01/28 by [email protected] |
| |
| Prevent warnigsn about retval.stop not being initialized when a rule |
| returns eraly because it is in backtracking mode |
| |
| Change 5558 on 2009/01/28 by [email protected] |
| |
| Lots of optimizations (though the next one to be checked in is the huge |
| win) for AST building and vector factories. |
| |
| A large part of tree rewriting was the creation of vectors to hold AST |
| nodes. Although I had created a vector factory, for some reason I never got |
| around to creating a proper one, that pre-allocated the vectors in chunks and |
| so on. I guess I just forgot to. Hence a big win here is prevention of calling |
| malloc lots and lots of times to create vectors. |
| |
| A second inprovement was to change teh vector definition such that it |
| holds a certain number of elements wihtin the vector structure itself, rather |
| than malloc and freeing these. Currently this is set to 8, but may increase. |
| For AST construction, this is generally a big win because AST nodes don't often |
| have many individual children unless there has not been any shaping going on in |
| the parser. But if you are not shaping, then you don't really need a tree. |
| |
| Other perforamnce inprovements here include not calling functions |
| indirectly within token stream and common token stream. Hence tokens are |
| claimed directly from the vectors. Users can override these funcitons of course |
| and all this means is that if you override tokenstreams then you pretty much |
| have to provide all the mehtods, but then I think you woudl have to anyway (and |
| I don't know of anyone that has wanted to do this as you can carry your own |
| structure around with the tokens anyway and that is much easier). |
| |
| Change 5555 on 2009/01/26 by [email protected] |
| |
| Fixed: ANTLR-288 |
| Correct the interpretation of the skip token such that channel, start |
| index, char pos in lie, start line and text are correctly reset to the start of |
| the new token when the one that we just traversed was marked as being skipped. |
| |
| This correctly excludes the text that was matched as part of the |
| SKIP()ed token from the next token in the token stream and so has the side |
| effect that asking for $text of a rule no longer includes the text that shuodl |
| be skipped, but DOES include the text of tokens that were merely placed off the |
| default channel. |
| |
| Change 5551 on 2009/01/25 by [email protected] |
| |
| Fixed: ANTLR-287 |
| Most of the source files did not include the BSD license. THis might |
| not be that big a deal given that I don't care what people do with it |
| other than take my name off it, but having the license reproduced |
| everywhere |
| at least makes things perfectly clear. Hence this mass change of |
| sources and templates |
| to include the license. |
| |
| Change 5550 on 2009/01/25 by [email protected] |
| |
| Fixed: ANTLR-365 |
| Ensure that as soon as we known about an input stream on the lexer that |
| we borrow its string factroy adn use it in our EOF token in case |
| anyone tries to make it a string, such as in error messages for |
| instance. |
| |
| Change 5548 on 2009/01/25 by [email protected] |
| |
| Fixed: ANTLR-363 |
| At some point the Java runtime default changed from discarding offchannel |
| tokens to preserving them. The fix is to make the C runtime also |
| default to preserving off-channel tokens. |
| |
| Change 5544 on 2009/01/24 by [email protected] |
| |
| Fixed: ANTLR-360 |
| Ensure that the fillBuffer funtiion does not call any methods |
| that require the cached buffer size to be recorded before we |
| have actually recorded it. |
| |
| Change 5543 on 2009/01/24 by [email protected] |
| |
| Fixed: ANTLR-362 |
| Some users have started using string factories themselves and |
| exposed a flaw in the destroy method, that is intended to remove |
| a strng htat was created by the factory and is no longer needed. |
| The string was correctly removed from the vector that tracks them |
| but after the first one, all the remaining strings are then numbered |
| incorrectly. Hence the destroy method has been recoded to reindex |
| the strings in the factory after one is removed and everythig is once |
| more hunky dory. |
| User suggested fix rejected. |
| |
| Change 5542 on 2009/01/24 by [email protected] |
| |
| Fixed ANTLR-366 |
| The recognizer state now ensures that all fields are set to NULL upon |
| creation |
| and the reset does not overwrite the tokenname array |
| |
| Change 5527 on 2009/01/15 by [email protected] |
| |
| Add the C runtime for 3.1.2 beta2 to perforce |
| |
| Change 5526 on 2009/01/15 by [email protected] |
| |
| Correctly define the MEMMOVE macro which was inadvertently left to be |
| memcpy. |
| |
| Change 5503 on 2008/12/12 by [email protected] |
| |
| Change C runtime release number to 3.1.2 beta |
| |
| Change 5473 on 2008/12/01 by [email protected] |
| |
| Fixed: ANTLR-350 - C runtime use of memcpy |
| Prior change to use memcpy instead of memmove in all cases missed the |
| fact that the string factory can be in a situation where overlaps occur. We now |
| have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately. |
| |
| Change 5471 on 2008/12/01 by [email protected] |
| |
| Fixed ANTLR-361 |
| - Ensure that ANTLR3_BOOLEAN is typedef'ed correctly when building for |
| MingW |
| |
| Templates |
| --------- |
| |
| Change 5637 on 2009/02/20 by [email protected] |
| |
| C rtunime - make sure that ADAPTOR results are cast to the tree type on |
| a rewrite |
| |
| Change 5620 on 2009/02/18 by [email protected] |
| |
| Rename/Move: |
| From: //depot/code/antlr/main/src/org/antlr/codegen/templates/... |
| To: //depot/code/antlr/main/src/main/resources/org/antlr/codegen/templates/... |
| |
| Relocate the code generating templates to exist in the directory set |
| that maven expects. |
| |
| When checking in your templates, you may find it easiest to make a copy |
| of what you have, revert the change in perforce, then just check out the |
| template in the new location, and copy the changes back over. Nobody has oore |
| than two files open at the moment. |
| |
| Change 5578 on 2009/02/12 by [email protected] |
| |
| Correct the string template escape sequences for generating scope |
| code in the C templates. |
| |
| Change 5577 on 2009/02/12 by [email protected] |
| |
| C Runtime - Bug fixes. |
| |
| o Having moved to use an extract directly from a vector for returning |
| tokens, it exposed a |
| bug whereby the EOF boudary calculation in tokLT was incorrectly |
| checking > rather than |
| >=. |
| o Changing to API initialization of tokens rather than memcmp() |
| incorrectly forgot to |
| set teh input stream pointer for the manufactured tokens in the |
| token factory; |
| o Rewrite streams for rewriting tree parsers did not check whether the |
| rewrite stream |
| was ever assigned before trying to free it, it is now in line with |
| the ordinary parser code. |
| |
| Change 5567 on 2009/01/29 by [email protected] |
| |
| C Runtime - Further Optimizations |
| |
| Within grammars that used scopes and were intended to parse large |
| inputs with many rule nests, |
| the creation anf deletion of the scopes themselves became significant. |
| Careful analysis shows that |
| for most grammars, while a parse could create and delete 20,000 scopes, |
| the maxium depth of |
| any scope was only 8. |
| |
| This change therefore changes the scope implementation so that it does |
| not free scope memory when |
| it is popped but just tracks it in a C runtime stack, eventually |
| freeing it when the stack is freed. This change |
| caused the allocation of only 12 scope structures instead of 20,000 for |
| the extreme example case. |
| |
| This change means that scope users must be carefule (as ever in C) to |
| initializae their scope elements |
| correctly as: |
| |
| 1) If not you may inherit values from a prior use of the scope |
| structure; |
| 2) SCope structure are now allocated with malloc and not calloc; |
| |
| Also, when using a custom free function to clean a scope when it is |
| popped, it is probably a good idea |
| to set any free'd pointers to NULL (this is generally good C programmig |
| practice in any case) |
| |
| Change 5566 on 2009/01/29 by [email protected] |
| |
| Remove redundant BACKTRACK checking so that MSVC9 does not get confused |
| about possibly uninitialized variables |
| |
| Change 5565 on 2009/01/28 by [email protected] |
| |
| Use malloc rather than calloc to allocate memory for new scopes. Note |
| that this means users will have to be careful to initialize any values in their |
| scopes that they expect to be 0 or NULL and I must document this. |
| |
| Change 5564 on 2009/01/28 by [email protected] |
| |
| Use malloc rather than calloc for copying list lable tokens for |
| rewrites. |
| |
| Change 5561 on 2009/01/28 by [email protected] |
| |
| Prevent warnigsn about retval.stop not being initialized when a rule |
| returns eraly because it is in backtracking mode |
| |
| Change 5560 on 2009/01/28 by [email protected] |
| |
| Add a NULL check before freeing rewrite streams used in AST rewrites |
| rather than auto-rewrites. |
| |
| While the NULL check is redundant as the free cannot be called unless |
| it is assigned, Visual Studio C 2008 |
| gets it wrong and thinks that there is a PATH than can arrive at the |
| free wihtout it being assigned and that is too annoying to ignore. |
| |
| Change 5559 on 2009/01/28 by [email protected] |
| |
| C target Tree rewrite optimization |
| |
| There is only one optimization in this change, but it is a huge one. |
| |
| The code generation templates were set up so that at the start of a rule, |
| any rewrite streams mentioned in the rule wer pre-created. However, this |
| is a massive overhead for rules where only one or two of the streams are |
| actually used, as we create them then free them without ever using them. |
| This was copied from the Java templates basically. |
| This caused literally millions of extra calls and vector allocations |
| in the case of the GNU C parser given to me for testing with a 20,000 |
| line program. |
| |
| After this change, the following comparison is avaiable against the gcc |
| compiler: |
| |
| Before (different machines here so use the relative difference for |
| comparison): |
| |
| gcc: |
| |
| real 0m0.425s |
| user 0m0.384s |
| sys 0m0.036s |
| |
| ANTLR C |
| real 0m1.958s |
| user 0m1.284s |
| sys 0m0.656s |
| |
| After the previous optimizations for vector pooling via a factory, |
| plus this huge win in removing redundant code, we have the following |
| (different machine to the one above): |
| |
| gcc: |
| 0.21user 0.01system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k |
| 0inputs+328outputs (0major+9922minor)pagefaults 0swaps |
| |
| ANTLR C: |
| |
| 0.37user 0.26system 0:00.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k |
| 0inputs+0outputs (0major+130944minor)pagefaults 0swaps |
| |
| The extra system time coming from the fact that although the tree |
| rewriting is now optimal in terms of not allocating things it does |
| not need, there is still a lot more overhead in a parser that is generated |
| for generic use, including much more use of structures for tokens and extra |
| copying and so on. I will |
| continue to work on improviing things where I can, but the next big |
| improvement will come from Ter's optimization of the actual code structures we |
| generate including not doing things with rewrite streams that we do not need to |
| do at all. |
| |
| The second machine I used is about twice as fast CPU wise as the system |
| that was used originally by the user that asked about this performance. |
| |
| Change 5558 on 2009/01/28 by [email protected] |
| |
| Lots of optimizations (though the next one to be checked in is the huge |
| win) for AST building and vector factories. |
| |
| A large part of tree rewriting was the creation of vectors to hold AST |
| nodes. Although I had created a vector factory, for some reason I never got |
| around to creating a proper one, that pre-allocated the vectors in chunks and |
| so on. I guess I just forgot to. Hence a big win here is prevention of calling |
| malloc lots and lots of times to create vectors. |
| |
| A second inprovement was to change teh vector definition such that it |
| holds a certain number of elements wihtin the vector structure itself, rather |
| than malloc and freeing these. Currently this is set to 8, but may increase. |
| For AST construction, this is generally a big win because AST nodes don't often |
| have many individual children unless there has not been any shaping going on in |
| the parser. But if you are not shaping, then you don't really need a tree. |
| |
| Other perforamnce inprovements here include not calling functions |
| indirectly within token stream and common token stream. Hence tokens are |
| claimed directly from the vectors. Users can override these funcitons of course |
| and all this means is that if you override tokenstreams then you pretty much |
| have to provide all the mehtods, but then I think you woudl have to anyway (and |
| I don't know of anyone that has wanted to do this as you can carry your own |
| structure around with the tokens anyway and that is much easier). |
| |
| Change 5554 on 2009/01/26 by [email protected] |
| |
| Fixed: ANTLR-379 |
| For some reason in the past, the ruleMemozation() template had required |
| that the name parameter be set to the rule name. This does not seem to be a |
| requirement any more. The name=xxx override when invoking the template was |
| causing all the scope names derived when cleaning up in memoization to be |
| called after the rule name, which was not correct. Howver, this only affected |
| the output when in output=AST mode. |
| |
| This template invocation is now corrected. |
| |
| Change 5553 on 2009/01/26 by [email protected] |
| |
| Fixed: ANTLR-330 |
| Managed to get the one rule that could not see the ASTLabelType to call |
| back in to the super template C.stg and ask it to construct hte name. I am not |
| 100% sure that this fixes all cases, but I cannot find any that fail. PLease |
| let me know if you find any exampoles of being unable to default the |
| ASTLabelType option in the C target. |
| |
| Change 5552 on 2009/01/25 by [email protected] |
| |
| Progress: ANTLR-327 |
| Fix debug code generation templates when output=AST such that code |
| can at least be generated and I can debug the output code correctly. |
| Note that this checkin does not implement the debugging requirements |
| for tree generating parsers. |
| |
| Change 5551 on 2009/01/25 by [email protected] |
| |
| Fixed: ANTLR-287 |
| Most of the source files did not include the BSD license. THis might |
| not be that big a deal given that I don't care what people do with it |
| other than take my name off it, but having the license reproduced |
| everywhere at least makes things perfectly clear. Hence this mass change of |
| sources and templates to include the license. |
| |
| Change 5549 on 2009/01/25 by [email protected] |
| |
| Fixed: ANTLR-354 |
| Using 0.0D as the default initialize value for a double caused |
| VS 2003 C compiler to bomb out. There seesm to be no reason other |
| than force of habit to set this to 0.0D so I have dropped the D so |
| that older compilers do not complain. |
| |
| Change 5547 on 2009/01/25 by [email protected] |
| |
| Fixed: ANTLR-282 |
| All references are now unadorned with any type of NULL check for the |
| following reasons: |
| |
| 1) A NULL reference means that there is a problem with the |
| grammar and we need the program to fail immediately so |
| that the programmer can work out where the problem occured; |
| 2) Most of the time, the only sensible value that can be |
| returned is NULL or 0 which |
| obviates the NULL check in the first place; |
| 3) If we replace a NULL reference with some value such as 0, |
| then the program may blithely continue but just do something |
| logically wrong, which will be very difficult for the |
| grammar programmer to detect and correct. |
| |
| Change 5545 on 2009/01/24 by [email protected] |
| |
| Fixed: ANTLR-357 |
| The bug report was correct in that the types of references to things |
| like $start were being incorrectly cast as they wer not changed from |
| Java style casts (and the casts are unneccessary). this is now fixed |
| and references are referencing the correct, uncast, types. |
| However, the bug report was wrong in that the reference in the bok to |
| $start.pos will only work for Java and really, it is incorrect in the |
| book because it shoudl not access the .pos member directly but shudl |
| be using $start.getCharPositionInLine(). |
| Because there is no access qualification in C, one could use |
| $start.charPosition, however |
| really this should be $start->getCharPositionInLine($start); |
| |
| Change 5541 on 2009/01/24 by [email protected] |
| |
| Fixed - ANTLR-367 |
| The code generation for the free method of a recognizer was not |
| distinguishing tree parsers from parsers when it came to calling delegate free |
| functions. |
| This is now corrected. |
| |
| Change 5540 on 2009/01/24 by [email protected] |
| |
| Fixed ANTLR-355 |
| Ensure that we do not attempt to free any memory that we did not |
| actually allocate because the parser rule was being executed in |
| backtracking mode. |
| |
| Change 5539 on 2009/01/24 by [email protected] |
| |
| Fixed: ANTLR-355 |
| When a C targetted parser is producing in backtracking mode, then the |
| creation of new stream rewrite structures shoudl not happen if the rule is |
| currently backtracking |
| |
| Change 5502 on 2008/12/11 by [email protected] |
| |
| Fixed: ANTLR-349 Ensure that all marker labels in the lexer are 64 bit |
| compatible |
| |
| Change 5473 on 2008/12/01 by [email protected] |
| |
| Fixed: ANTLR-350 - C runtime use of memcpy |
| Prior change to use memcpy instead of memmove in all cases missed the |
| fact that the string factory can be in a situation where overlaps occur. We now |
| have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately. |
| |
| Change 5387 on 2008/11/05 by [email protected] |
| |
| Fixed x+=. issue with tree grammars; added unit test |
| |
| Change 5325 on 2008/10/23 by [email protected] |
| |
| We were all ref'ing backtracking==0 hardcoded instead checking the |
| @synpredgate action. |
| |
| |