1Usage of basic components
2=========================
3
4This document explains how to use the parser, the pretty printer and the node traverser.
5
6Bootstrapping
7-------------
8
9To bootstrap the library, include the autoloader generated by composer:
10
11```php
12require 'path/to/vendor/autoload.php';
13```
14
15Additionally, you may want to set the `xdebug.max_nesting_level` ini option to a higher value:
16
17```php
18ini_set('xdebug.max_nesting_level', 3000);
19```
20
21This ensures that there will be no errors when traversing highly nested node trees. However, it is
22preferable to disable Xdebug completely, as it can easily make this library more than five times
23slower.
24
25Parsing
26-------
27
28In order to parse code, you first have to create a parser instance:
29
30```php
31use PhpParser\ParserFactory;
32use PhpParser\PhpVersion;
33
34// Parser for the version you are running on.
35$parser = (new ParserFactory())->createForHostVersion();
36
37// Parser for the newest PHP version supported by the PHP-Parser library.
38$parser = (new ParserFactory())->createForNewestSupportedVersion();
39
40// Parser for a specific PHP version.
41$parser = (new ParserFactory())->createForVersion(PhpVersion::fromString('8.1'));
42```
43
44Which version you should target depends on your use case. In many cases you will want to use the
45host version, as people typically analyze code for the version they are running on. However, when
46analyzing arbitrary code you are usually best off using the newest supported version, which tends
47to accept the widest range of code (unless there are breaking changes in PHP).
48
49The `createXYZ()` methods optionally accept an array of lexer options. Some use cases that require
50customized lexer options are discussed in the [lexer documentation](component/Lexer.markdown).
51
52Subsequently, you can pass PHP code (including the opening `<?php` tag) to the `parse()` method in
53order to create a syntax tree. If a syntax error is encountered, a `PhpParser\Error` exception will
54be thrown by default:
55
56```php
57<?php
58use PhpParser\Error;
59use PhpParser\ParserFactory;
60
61$code = <<<'CODE'
62<?php
63function printLine($msg) {
64    echo $msg, "\n";
65}
66printLine('Hello World!!!');
67CODE;
68
69$parser = (new ParserFactory())->createForHostVersion();
70
71try {
72    $stmts = $parser->parse($code);
73    // $stmts is an array of statement nodes
74} catch (Error $e) {
75    echo 'Parse Error: ', $e->getMessage(), "\n";
76}
77```
78
79A parser instance can be reused to parse multiple files.
80
81Node dumping
82------------
83
84To dump the abstract syntax tree in human-readable form, a `NodeDumper` can be used:
85
86```php
87<?php
88use PhpParser\NodeDumper;
89
90$nodeDumper = new NodeDumper;
91echo $nodeDumper->dump($stmts), "\n";
92```
93
94For the sample code from the previous section, this will produce the following output:
95
96```
97array(
98    0: Stmt_Function(
99        attrGroups: array(
100        )
101        byRef: false
102        name: Identifier(
103            name: printLine
104        )
105        params: array(
106            0: Param(
107                attrGroups: array(
108                )
109                flags: 0
110                type: null
111                byRef: false
112                variadic: false
113                var: Expr_Variable(
114                    name: msg
115                )
116                default: null
117            )
118        )
119        returnType: null
120        stmts: array(
121            0: Stmt_Echo(
122                exprs: array(
123                    0: Expr_Variable(
124                        name: msg
125                    )
126                    1: Scalar_String(
127                        value:
128
129                    )
130                )
131            )
132        )
133    )
134    1: Stmt_Expression(
135        expr: Expr_FuncCall(
136            name: Name(
137                name: printLine
138            )
139            args: array(
140                0: Arg(
141                    name: null
142                    value: Scalar_String(
143                        value: Hello World!!!
144                    )
145                    byRef: false
146                    unpack: false
147                )
148            )
149        )
150    )
151)
152```
153
154You can also use the `php-parse` script to obtain such a node dump by calling it either with a file
155name or code string:
156
157```sh
158vendor/bin/php-parse file.php
159vendor/bin/php-parse "<?php foo();"
160```
161
162This can be very helpful if you want to quickly check how certain syntax is represented in the AST.
163
164Node tree structure
165-------------------
166
167Looking at the node dump above, you can see that `$stmts` for this example code is an array of two
168nodes, a `Stmt_Function` and a `Stmt_Expression`. The corresponding class names are:
169
170 * `Stmt_Function -> PhpParser\Node\Stmt\Function_`
171 * `Stmt_Expression -> PhpParser\Node\Stmt\Expression`
172
173The additional `_` at the end of the first class name is necessary, because `Function` is a
174reserved keyword. Many node class names in this library have a trailing `_` to avoid clashing with
175a keyword.
176
177As PHP is a large language there are approximately 140 different nodes. In order to make working
178with them easier they are grouped into three categories:
179
180 * `PhpParser\Node\Stmt`s are statement nodes, i.e. language constructs that do not return
181   a value and can not occur in an expression. For example a class definition is a statement.
182   It doesn't return a value, and you can't write something like `func(class A {});`.
183 * `PhpParser\Node\Expr`s are expression nodes, i.e. language constructs that return a value
184   and thus can occur in other expressions. Examples of expressions are `$var`
185   (`PhpParser\Node\Expr\Variable`) and `func()` (`PhpParser\Node\Expr\FuncCall`).
186 * `PhpParser\Node\Scalar`s are nodes representing scalar values, like `'string'`
187   (`PhpParser\Node\Scalar\String_`), `0` (`PhpParser\Node\Scalar\LNumber`) or magic constants
188   like `__FILE__` (`PhpParser\Node\Scalar\MagicConst\File`). All `PhpParser\Node\Scalar`s extend
189   `PhpParser\Node\Expr`, as scalars are expressions, too.
190 * There are some nodes not in either of these groups, for example names (`PhpParser\Node\Name`)
191   and call arguments (`PhpParser\Node\Arg`).
192
193The `Node\Stmt\Expression` node is somewhat confusing in that it contains both the terms "statement"
194and "expression". This node distinguishes `expr`, which is a `Node\Expr`, from `expr;`, which is
195an "expression statement" represented by `Node\Stmt\Expression` and containing `expr` as a sub-node.
196
197Every node has a (possibly zero) number of subnodes. You can access subnodes by writing
198`$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it
199in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function
200call, you would write `$stmts[0]->exprs[1]->name`.
201
202All nodes also define a `getType()` method that returns the node type. The type is the class name
203without the `PhpParser\Node\` prefix and `\` replaced with `_`. It also does not contain a trailing
204`_` for reserved-keyword class names.
205
206It is possible to associate custom metadata with a node using the `setAttribute()` method. This data
207can then be retrieved using `hasAttribute()`, `getAttribute()` and `getAttributes()`.
208
209By default, the parser adds the `startLine`, `endLine`, `startTokenPos`, `endTokenPos`,
210`startFilePos`, `endFilePos` and `comments` attributes. `comments` is an array of
211`PhpParser\Comment[\Doc]` instances.
212
213The pre-defined attributes can also be accessed using `getStartLine()` instead of
214`getAttribute('startLine')`, and so on. The last doc comment from the `comments` attribute can be
215obtained using `getDocComment()`.
216
217Pretty printer
218--------------
219
220The pretty printer component compiles the AST back to PHP code according to a specified scheme.
221Currently, there is only one scheme available, namely `PhpParser\PrettyPrinter\Standard`.
222
223```php
224use PhpParser\Error;
225use PhpParser\ParserFactory;
226use PhpParser\PrettyPrinter;
227
228$code = "<?php echo 'Hi ', hi\\getTarget();";
229
230$parser = (new ParserFactory())->createForHostVersion();
231$prettyPrinter = new PrettyPrinter\Standard();
232
233try {
234    // parse
235    $stmts = $parser->parse($code);
236
237    // change
238    $stmts[0]         // the echo statement
239          ->exprs     // sub expressions
240          [0]         // the first of them (the string node)
241          ->value     // it's value, i.e. 'Hi '
242          = 'Hello '; // change to 'Hello '
243
244    // pretty print
245    $code = $prettyPrinter->prettyPrint($stmts);
246
247    echo $code;
248} catch (Error $e) {
249    echo 'Parse Error: ', $e->getMessage(), "\n";
250}
251```
252
253The above code will output:
254
255    echo 'Hello ', hi\getTarget();
256
257As you can see, the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then
258again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
259
260The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a
261single expression using `prettyPrintExpr()`.
262
263The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag
264and handle inline HTML as the first/last statement more gracefully.
265
266There is also a pretty-printing mode which retains formatting for parts of the AST that have not
267been changed, which requires additional setup.
268
269> Read more: [Pretty printing documentation](component/Pretty_printing.markdown)
270
271Node traversal
272--------------
273
274The above pretty printing example used the fact that the source code was known and thus it was easy to
275write code that accesses a certain part of a node tree and changes it. Normally this is not the case.
276Usually you want to change / analyze code in a generic way, where you don't know how the node tree is
277going to look like.
278
279For this purpose the parser provides a component for traversing and visiting the node tree. The basic
280structure of a program using this `PhpParser\NodeTraverser` looks like this:
281
282```php
283use PhpParser\NodeTraverser;
284use PhpParser\ParserFactory;
285use PhpParser\PrettyPrinter;
286
287$parser        = (new ParserFactory())->createForHostVersion();
288$traverser     = new NodeTraverser;
289$prettyPrinter = new PrettyPrinter\Standard;
290
291// add your visitor
292$traverser->addVisitor(new MyNodeVisitor);
293
294try {
295    $code = file_get_contents($fileName);
296
297    // parse
298    $stmts = $parser->parse($code);
299
300    // traverse
301    $stmts = $traverser->traverse($stmts);
302
303    // pretty print
304    $code = $prettyPrinter->prettyPrintFile($stmts);
305
306    echo $code;
307} catch (PhpParser\Error $e) {
308    echo 'Parse Error: ', $e->getMessage();
309}
310```
311
312The corresponding node visitor might look like this:
313
314```php
315use PhpParser\Node;
316use PhpParser\NodeVisitorAbstract;
317
318class MyNodeVisitor extends NodeVisitorAbstract {
319    public function leaveNode(Node $node) {
320        if ($node instanceof Node\Scalar\String_) {
321            $node->value = 'foo';
322        }
323    }
324}
325```
326
327The above node visitor would change all string literals in the program to `'foo'`.
328
329All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four
330methods:
331
332```php
333public function beforeTraverse(array $nodes);
334public function enterNode(\PhpParser\Node $node);
335public function leaveNode(\PhpParser\Node $node);
336public function afterTraverse(array $nodes);
337```
338
339The `beforeTraverse()` method is called once before the traversal begins and is passed the nodes the
340traverser was called with. This method can be used for resetting values before traversal or
341preparing the tree for traversal.
342
343The `afterTraverse()` method is similar to the `beforeTraverse()` method, with the only difference that
344it is called once after the traversal.
345
346The `enterNode()` and `leaveNode()` methods are called on every node, the former when it is entered,
347i.e. before its subnodes are traversed, the latter when it is left.
348
349All four methods can either return the changed node or not return at all (i.e. `null`) in which
350case the current node is not changed.
351
352The `enterNode()` method can additionally return the value `NodeVisitor::DONT_TRAVERSE_CHILDREN`,
353which instructs the traverser to skip all children of the current node. To furthermore prevent subsequent
354visitors from visiting the current node, `NodeVisitor::DONT_TRAVERSE_CURRENT_AND_CHILDREN` can be used instead.
355
356Both methods can additionally return the following values:
357
358 * `NodeVisitor::STOP_TRAVERSAL`, in which case no further nodes will be visited.
359 * `NodeVisitor::REMOVE_NODE`, in which case the current node will be removed from the parent array.
360 * `NodeVisitor::REPLACE_WITH_NULL`, in which case the current node will be replaced with `null`.
361 * An array of nodes, which will be merged into the parent array at the offset of the current node.
362   I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will
363   be `array(A, X, Y, Z, C)`.
364
365Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract`
366class, which will define empty default implementations for all the above methods.
367
368> Read more: [Walking the AST](component/Walking_the_AST.markdown)
369
370The NameResolver node visitor
371-----------------------------
372
373One visitor that is already bundled with the package is `PhpParser\NodeVisitor\NameResolver`. This visitor
374helps you work with namespaced code by trying to resolve most names to fully qualified ones.
375
376For example, consider the following code:
377
378    use A as B;
379    new B\C();
380
381In order to know that `B\C` really is `A\C` you would need to track aliases and namespaces yourself.
382The `NameResolver` takes care of that and resolves names as far as possible.
383
384After running it, most names will be fully qualified. The only names that will stay unqualified are
385unqualified function and constant names. These are resolved at runtime and thus the visitor can't
386know which function they are referring to. In most cases this is a non-issue as the global functions
387are meant.
388
389Additionally, the `NameResolver` adds a `namespacedName` subnode to class, function and constant
390declarations that contains the namespaced name instead of only the shortname that is available via
391`name`.
392
393> Read more: [Name resolution documentation](component/Name_resolution.markdown)
394
395Example: Converting namespaced code to pseudo namespaces
396--------------------------------------------------------
397
398A small example to understand the concept: We want to convert namespaced code to pseudo namespaces,
399so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions
400are fairly complicated if you take PHP's dynamic features into account, so our conversion will
401assume that no dynamic features are used.
402
403We start off with the following base code:
404
405```php
406use PhpParser\ParserFactory;
407use PhpParser\PrettyPrinter;
408use PhpParser\NodeTraverser;
409use PhpParser\NodeVisitor\NameResolver;
410
411$inDir  = '/some/path';
412$outDir = '/some/other/path';
413
414$parser        = (new ParserFactory())->createForNewestSupportedVersion();
415$traverser     = new NodeTraverser;
416$prettyPrinter = new PrettyPrinter\Standard;
417
418$traverser->addVisitor(new NameResolver); // we will need resolved names
419$traverser->addVisitor(new NamespaceConverter); // our own node visitor
420
421// iterate over all .php files in the directory
422$files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($inDir));
423$files = new \RegexIterator($files, '/\.php$/');
424
425foreach ($files as $file) {
426    try {
427        // read the file that should be converted
428        $code = file_get_contents($file->getPathName());
429
430        // parse
431        $stmts = $parser->parse($code);
432
433        // traverse
434        $stmts = $traverser->traverse($stmts);
435
436        // pretty print
437        $code = $prettyPrinter->prettyPrintFile($stmts);
438
439        // write the converted file to the target directory
440        file_put_contents(
441            substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)),
442            $code
443        );
444    } catch (PhpParser\Error $e) {
445        echo 'Parse Error: ', $e->getMessage();
446    }
447}
448```
449
450Now lets start with the main code, the `NamespaceConverter`. One thing it needs to do
451is convert `A\\B` style names to `A_B` style ones.
452
453```php
454use PhpParser\Node;
455
456class NamespaceConverter extends \PhpParser\NodeVisitorAbstract
457{
458    public function leaveNode(Node $node) {
459        if ($node instanceof Node\Name) {
460            return new Node\Name(str_replace('\\', '_', $node->toString()));
461        }
462    }
463}
464```
465
466The above code profits from the fact that the `NameResolver` already resolved all names as far as
467possible, so we don't need to do that. We only need to create a string with the name parts separated
468by underscores instead of backslashes. This is what `str_replace('\\', '_', $node->toString())` does. (If you want to
469create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create
470a new name from the string and return it. Returning a new node replaces the old node.
471
472Another thing we need to do is change the class/function/const declarations. Currently they contain
473only the shortname (i.e. the last part of the name), but they need to contain the complete name including
474the namespace prefix:
475
476```php
477use PhpParser\Node;
478use PhpParser\Node\Stmt;
479
480class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
481{
482    public function leaveNode(Node $node) {
483        if ($node instanceof Node\Name) {
484            return new Node\Name(str_replace('\\', '_', $node->toString()));
485        } elseif ($node instanceof Stmt\Class_
486                  || $node instanceof Stmt\Interface_
487                  || $node instanceof Stmt\Function_) {
488            $node->name = str_replace('\\', '_', $node->namespacedName->toString());
489        } elseif ($node instanceof Stmt\Const_) {
490            foreach ($node->consts as $const) {
491                $const->name = str_replace('\\', '_', $const->namespacedName->toString());
492            }
493        }
494    }
495}
496```
497
498There is not much more to it than converting the namespaced name to string with `_` as separator.
499
500The last thing we need to do is remove the `namespace` and `use` statements:
501
502```php
503use PhpParser\Node;
504use PhpParser\Node\Stmt;
505use PhpParser\NodeVisitor;
506
507class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
508{
509    public function leaveNode(Node $node) {
510        if ($node instanceof Node\Name) {
511            return new Node\Name(str_replace('\\', '_', $node->toString()));
512        } elseif ($node instanceof Stmt\Class_
513                  || $node instanceof Stmt\Interface_
514                  || $node instanceof Stmt\Function_) {
515            $node->name = str_replace('\\', '_', $node->namespacedName->toString();
516        } elseif ($node instanceof Stmt\Const_) {
517            foreach ($node->consts as $const) {
518                $const->name = str_replace('\\', '_', $const->namespacedName->toString());
519            }
520        } elseif ($node instanceof Stmt\Namespace_) {
521            // returning an array merges is into the parent array
522            return $node->stmts;
523        } elseif ($node instanceof Stmt\Use_) {
524            // remove use nodes altogether
525            return NodeVisitor::REMOVE_NODE;
526        }
527    }
528}
529```
530
531That's all.
532