blob: acda372ef08fa1ea58aaf8dfd928f99a83b5911f [file] [log] [blame]
Andrew Hsieh9a7616f2013-05-21 20:32:42 +08001
2:mod:`parser` --- Access Python parse trees
3===========================================
4
5.. module:: parser
6 :synopsis: Access parse trees for Python source code.
7.. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
8.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
9
10
11.. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
12 L. Drake, Jr. This copyright notice must be distributed on all copies, but
13 this document otherwise may be distributed as part of the Python
14 distribution. No fee may be charged for this document in any representation,
15 either on paper or electronically. This restriction does not affect other
16 elements in a distributed package in any way.
17
18.. index:: single: parsing; Python source code
19
20The :mod:`parser` module provides an interface to Python's internal parser and
21byte-code compiler. The primary purpose for this interface is to allow Python
22code to edit the parse tree of a Python expression and create executable code
23from this. This is better than trying to parse and modify an arbitrary Python
24code fragment as a string because parsing is performed in a manner identical to
25the code forming the application. It is also faster.
26
27.. note::
28
29 From Python 2.5 onward, it's much more convenient to cut in at the Abstract
30 Syntax Tree (AST) generation and compilation stage, using the :mod:`ast`
31 module.
32
33 The :mod:`parser` module exports the names documented here also with "st"
34 replaced by "ast"; this is a legacy from the time when there was no other
35 AST and has nothing to do with the AST found in Python 2.5. This is also the
36 reason for the functions' keyword arguments being called *ast*, not *st*.
37 The "ast" functions have been removed in Python 3.
38
39There are a few things to note about this module which are important to making
40use of the data structures created. This is not a tutorial on editing the parse
41trees for Python code, but some examples of using the :mod:`parser` module are
42presented.
43
44Most importantly, a good understanding of the Python grammar processed by the
45internal parser is required. For full information on the language syntax, refer
46to :ref:`reference-index`. The parser
47itself is created from a grammar specification defined in the file
48:file:`Grammar/Grammar` in the standard Python distribution. The parse trees
49stored in the ST objects created by this module are the actual output from the
50internal parser when created by the :func:`expr` or :func:`suite` functions,
51described below. The ST objects created by :func:`sequence2st` faithfully
52simulate those structures. Be aware that the values of the sequences which are
53considered "correct" will vary from one version of Python to another as the
54formal grammar for the language is revised. However, transporting code from one
55Python version to another as source text will always allow correct parse trees
56to be created in the target version, with the only restriction being that
57migrating to an older version of the interpreter will not support more recent
58language constructs. The parse trees are not typically compatible from one
59version to another, whereas source code has always been forward-compatible.
60
61Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
62has a simple form. Sequences representing non-terminal elements in the grammar
63always have a length greater than one. The first element is an integer which
64identifies a production in the grammar. These integers are given symbolic names
65in the C header file :file:`Include/graminit.h` and the Python module
66:mod:`symbol`. Each additional element of the sequence represents a component
67of the production as recognized in the input string: these are always sequences
68which have the same form as the parent. An important aspect of this structure
69which should be noted is that keywords used to identify the parent node type,
70such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
71node tree without any special treatment. For example, the :keyword:`if` keyword
72is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
73associated with all :const:`NAME` tokens, including variable and function names
74defined by the user. In an alternate form returned when line number information
75is requested, the same token might be represented as ``(1, 'if', 12)``, where
76the ``12`` represents the line number at which the terminal symbol was found.
77
78Terminal elements are represented in much the same way, but without any child
79elements and the addition of the source text which was identified. The example
80of the :keyword:`if` keyword above is representative. The various types of
81terminal symbols are defined in the C header file :file:`Include/token.h` and
82the Python module :mod:`token`.
83
84The ST objects are not required to support the functionality of this module,
85but are provided for three purposes: to allow an application to amortize the
86cost of processing complex parse trees, to provide a parse tree representation
87which conserves memory space when compared to the Python list or tuple
88representation, and to ease the creation of additional modules in C which
89manipulate parse trees. A simple "wrapper" class may be created in Python to
90hide the use of ST objects.
91
92The :mod:`parser` module defines functions for a few distinct purposes. The
93most important purposes are to create ST objects and to convert ST objects to
94other representations such as parse trees and compiled code objects, but there
95are also functions which serve to query the type of parse tree represented by an
96ST object.
97
98
99.. seealso::
100
101 Module :mod:`symbol`
102 Useful constants representing internal nodes of the parse tree.
103
104 Module :mod:`token`
105 Useful constants representing leaf nodes of the parse tree and functions for
106 testing node values.
107
108
109.. _creating-sts:
110
111Creating ST Objects
112-------------------
113
114ST objects may be created from source code or from a parse tree. When creating
115an ST object from source, different functions are used to create the ``'eval'``
116and ``'exec'`` forms.
117
118
119.. function:: expr(source)
120
121 The :func:`expr` function parses the parameter *source* as if it were an input
122 to ``compile(source, 'file.py', 'eval')``. If the parse succeeds, an ST object
123 is created to hold the internal parse tree representation, otherwise an
124 appropriate exception is raised.
125
126
127.. function:: suite(source)
128
129 The :func:`suite` function parses the parameter *source* as if it were an input
130 to ``compile(source, 'file.py', 'exec')``. If the parse succeeds, an ST object
131 is created to hold the internal parse tree representation, otherwise an
132 appropriate exception is raised.
133
134
135.. function:: sequence2st(sequence)
136
137 This function accepts a parse tree represented as a sequence and builds an
138 internal representation if possible. If it can validate that the tree conforms
139 to the Python grammar and all nodes are valid node types in the host version of
140 Python, an ST object is created from the internal representation and returned
141 to the called. If there is a problem creating the internal representation, or
142 if the tree cannot be validated, a :exc:`ParserError` exception is raised. An
143 ST object created this way should not be assumed to compile correctly; normal
144 exceptions raised by compilation may still be initiated when the ST object is
145 passed to :func:`compilest`. This may indicate problems not related to syntax
146 (such as a :exc:`MemoryError` exception), but may also be due to constructs such
147 as the result of parsing ``del f(0)``, which escapes the Python parser but is
148 checked by the bytecode compiler.
149
150 Sequences representing terminal tokens may be represented as either two-element
151 lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
152 'name', 56)``. If the third element is present, it is assumed to be a valid
153 line number. The line number may be specified for any subset of the terminal
154 symbols in the input tree.
155
156
157.. function:: tuple2st(sequence)
158
159 This is the same function as :func:`sequence2st`. This entry point is
160 maintained for backward compatibility.
161
162
163.. _converting-sts:
164
165Converting ST Objects
166---------------------
167
168ST objects, regardless of the input used to create them, may be converted to
169parse trees represented as list- or tuple- trees, or may be compiled into
170executable code objects. Parse trees may be extracted with or without line
171numbering information.
172
173
174.. function:: st2list(ast[, line_info])
175
176 This function accepts an ST object from the caller in *ast* and returns a
177 Python list representing the equivalent parse tree. The resulting list
178 representation can be used for inspection or the creation of a new parse tree in
179 list form. This function does not fail so long as memory is available to build
180 the list representation. If the parse tree will only be used for inspection,
181 :func:`st2tuple` should be used instead to reduce memory consumption and
182 fragmentation. When the list representation is required, this function is
183 significantly faster than retrieving a tuple representation and converting that
184 to nested lists.
185
186 If *line_info* is true, line number information will be included for all
187 terminal tokens as a third element of the list representing the token. Note
188 that the line number provided specifies the line on which the token *ends*.
189 This information is omitted if the flag is false or omitted.
190
191
192.. function:: st2tuple(ast[, line_info])
193
194 This function accepts an ST object from the caller in *ast* and returns a
195 Python tuple representing the equivalent parse tree. Other than returning a
196 tuple instead of a list, this function is identical to :func:`st2list`.
197
198 If *line_info* is true, line number information will be included for all
199 terminal tokens as a third element of the list representing the token. This
200 information is omitted if the flag is false or omitted.
201
202
203.. function:: compilest(ast, filename='<syntax-tree>')
204
205 .. index:: builtin: eval
206
207 The Python byte compiler can be invoked on an ST object to produce code objects
208 which can be used as part of an :keyword:`exec` statement or a call to the
209 built-in :func:`eval` function. This function provides the interface to the
210 compiler, passing the internal parse tree from *ast* to the parser, using the
211 source file name specified by the *filename* parameter. The default value
212 supplied for *filename* indicates that the source was an ST object.
213
214 Compiling an ST object may result in exceptions related to compilation; an
215 example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
216 this statement is considered legal within the formal grammar for Python but is
217 not a legal language construct. The :exc:`SyntaxError` raised for this
218 condition is actually generated by the Python byte-compiler normally, which is
219 why it can be raised at this point by the :mod:`parser` module. Most causes of
220 compilation failure can be diagnosed programmatically by inspection of the parse
221 tree.
222
223
224.. _querying-sts:
225
226Queries on ST Objects
227---------------------
228
229Two functions are provided which allow an application to determine if an ST was
230created as an expression or a suite. Neither of these functions can be used to
231determine if an ST was created from source code via :func:`expr` or
232:func:`suite` or from a parse tree via :func:`sequence2st`.
233
234
235.. function:: isexpr(ast)
236
237 .. index:: builtin: compile
238
239 When *ast* represents an ``'eval'`` form, this function returns true, otherwise
240 it returns false. This is useful, since code objects normally cannot be queried
241 for this information using existing built-in functions. Note that the code
242 objects created by :func:`compilest` cannot be queried like this either, and
243 are identical to those created by the built-in :func:`compile` function.
244
245
246.. function:: issuite(ast)
247
248 This function mirrors :func:`isexpr` in that it reports whether an ST object
249 represents an ``'exec'`` form, commonly known as a "suite." It is not safe to
250 assume that this function is equivalent to ``not isexpr(ast)``, as additional
251 syntactic fragments may be supported in the future.
252
253
254.. _st-errors:
255
256Exceptions and Error Handling
257-----------------------------
258
259The parser module defines a single exception, but may also pass other built-in
260exceptions from other portions of the Python runtime environment. See each
261function for information about the exceptions it can raise.
262
263
264.. exception:: ParserError
265
266 Exception raised when a failure occurs within the parser module. This is
267 generally produced for validation failures rather than the built-in
268 :exc:`SyntaxError` raised during normal parsing. The exception argument is
269 either a string describing the reason of the failure or a tuple containing a
270 sequence causing the failure from a parse tree passed to :func:`sequence2st`
271 and an explanatory string. Calls to :func:`sequence2st` need to be able to
272 handle either type of exception, while calls to other functions in the module
273 will only need to be aware of the simple string values.
274
275Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
276raise exceptions which are normally raised by the parsing and compilation
277process. These include the built in exceptions :exc:`MemoryError`,
278:exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`. In these
279cases, these exceptions carry all the meaning normally associated with them.
280Refer to the descriptions of each function for detailed information.
281
282
283.. _st-objects:
284
285ST Objects
286----------
287
288Ordered and equality comparisons are supported between ST objects. Pickling of
289ST objects (using the :mod:`pickle` module) is also supported.
290
291
292.. data:: STType
293
294 The type of the objects returned by :func:`expr`, :func:`suite` and
295 :func:`sequence2st`.
296
297ST objects have the following methods:
298
299
300.. method:: ST.compile([filename])
301
302 Same as ``compilest(st, filename)``.
303
304
305.. method:: ST.isexpr()
306
307 Same as ``isexpr(st)``.
308
309
310.. method:: ST.issuite()
311
312 Same as ``issuite(st)``.
313
314
315.. method:: ST.tolist([line_info])
316
317 Same as ``st2list(st, line_info)``.
318
319
320.. method:: ST.totuple([line_info])
321
322 Same as ``st2tuple(st, line_info)``.
323
324
325Example: Emulation of :func:`compile`
326-------------------------------------
327
328While many useful operations may take place between parsing and bytecode
329generation, the simplest operation is to do nothing. For this purpose, using
330the :mod:`parser` module to produce an intermediate data structure is equivalent
331to the code ::
332
333 >>> code = compile('a + 5', 'file.py', 'eval')
334 >>> a = 5
335 >>> eval(code)
336 10
337
338The equivalent operation using the :mod:`parser` module is somewhat longer, and
339allows the intermediate internal parse tree to be retained as an ST object::
340
341 >>> import parser
342 >>> st = parser.expr('a + 5')
343 >>> code = st.compile('file.py')
344 >>> a = 5
345 >>> eval(code)
346 10
347
348An application which needs both ST and code objects can package this code into
349readily available functions::
350
351 import parser
352
353 def load_suite(source_string):
354 st = parser.suite(source_string)
355 return st, st.compile()
356
357 def load_expression(source_string):
358 st = parser.expr(source_string)
359 return st, st.compile()