1Usage of basic components 2========================= 3 4This document explains how to use the parser, the pretty printer and the node traverser. 5 6Bootstrapping 7------------- 8 9To bootstrap the library, include the autoloader generated by composer: 10 11```php 12require 'path/to/vendor/autoload.php'; 13``` 14 15Additionally, you may want to set the `xdebug.max_nesting_level` ini option to a higher value: 16 17```php 18ini_set('xdebug.max_nesting_level', 3000); 19``` 20 21This ensures that there will be no errors when traversing highly nested node trees. However, it is 22preferable to disable Xdebug completely, as it can easily make this library more than five times 23slower. 24 25Parsing 26------- 27 28In order to parse code, you first have to create a parser instance: 29 30```php 31use PhpParser\ParserFactory; 32use PhpParser\PhpVersion; 33 34// Parser for the version you are running on. 35$parser = (new ParserFactory())->createForHostVersion(); 36 37// Parser for the newest PHP version supported by the PHP-Parser library. 38$parser = (new ParserFactory())->createForNewestSupportedVersion(); 39 40// Parser for a specific PHP version. 41$parser = (new ParserFactory())->createForVersion(PhpVersion::fromString('8.1')); 42``` 43 44Which version you should target depends on your use case. In many cases you will want to use the 45host version, as people typically analyze code for the version they are running on. However, when 46analyzing arbitrary code you are usually best off using the newest supported version, which tends 47to accept the widest range of code (unless there are breaking changes in PHP). 48 49The `createXYZ()` methods optionally accept an array of lexer options. Some use cases that require 50customized lexer options are discussed in the [lexer documentation](component/Lexer.markdown). 51 52Subsequently, you can pass PHP code (including the opening `<?php` tag) to the `parse()` method in 53order to create a syntax tree. If a syntax error is encountered, a `PhpParser\Error` exception will 54be thrown by default: 55 56```php 57<?php 58use PhpParser\Error; 59use PhpParser\ParserFactory; 60 61$code = <<<'CODE' 62<?php 63function printLine($msg) { 64 echo $msg, "\n"; 65} 66printLine('Hello World!!!'); 67CODE; 68 69$parser = (new ParserFactory())->createForHostVersion(); 70 71try { 72 $stmts = $parser->parse($code); 73 // $stmts is an array of statement nodes 74} catch (Error $e) { 75 echo 'Parse Error: ', $e->getMessage(), "\n"; 76} 77``` 78 79A parser instance can be reused to parse multiple files. 80 81Node dumping 82------------ 83 84To dump the abstract syntax tree in human-readable form, a `NodeDumper` can be used: 85 86```php 87<?php 88use PhpParser\NodeDumper; 89 90$nodeDumper = new NodeDumper; 91echo $nodeDumper->dump($stmts), "\n"; 92``` 93 94For the sample code from the previous section, this will produce the following output: 95 96``` 97array( 98 0: Stmt_Function( 99 attrGroups: array( 100 ) 101 byRef: false 102 name: Identifier( 103 name: printLine 104 ) 105 params: array( 106 0: Param( 107 attrGroups: array( 108 ) 109 flags: 0 110 type: null 111 byRef: false 112 variadic: false 113 var: Expr_Variable( 114 name: msg 115 ) 116 default: null 117 ) 118 ) 119 returnType: null 120 stmts: array( 121 0: Stmt_Echo( 122 exprs: array( 123 0: Expr_Variable( 124 name: msg 125 ) 126 1: Scalar_String( 127 value: 128 129 ) 130 ) 131 ) 132 ) 133 ) 134 1: Stmt_Expression( 135 expr: Expr_FuncCall( 136 name: Name( 137 name: printLine 138 ) 139 args: array( 140 0: Arg( 141 name: null 142 value: Scalar_String( 143 value: Hello World!!! 144 ) 145 byRef: false 146 unpack: false 147 ) 148 ) 149 ) 150 ) 151) 152``` 153 154You can also use the `php-parse` script to obtain such a node dump by calling it either with a file 155name or code string: 156 157```sh 158vendor/bin/php-parse file.php 159vendor/bin/php-parse "<?php foo();" 160``` 161 162This can be very helpful if you want to quickly check how certain syntax is represented in the AST. 163 164Node tree structure 165------------------- 166 167Looking at the node dump above, you can see that `$stmts` for this example code is an array of two 168nodes, a `Stmt_Function` and a `Stmt_Expression`. The corresponding class names are: 169 170 * `Stmt_Function -> PhpParser\Node\Stmt\Function_` 171 * `Stmt_Expression -> PhpParser\Node\Stmt\Expression` 172 173The additional `_` at the end of the first class name is necessary, because `Function` is a 174reserved keyword. Many node class names in this library have a trailing `_` to avoid clashing with 175a keyword. 176 177As PHP is a large language there are approximately 140 different nodes. In order to make working 178with them easier they are grouped into three categories: 179 180 * `PhpParser\Node\Stmt`s are statement nodes, i.e. language constructs that do not return 181 a value and can not occur in an expression. For example a class definition is a statement. 182 It doesn't return a value, and you can't write something like `func(class A {});`. 183 * `PhpParser\Node\Expr`s are expression nodes, i.e. language constructs that return a value 184 and thus can occur in other expressions. Examples of expressions are `$var` 185 (`PhpParser\Node\Expr\Variable`) and `func()` (`PhpParser\Node\Expr\FuncCall`). 186 * `PhpParser\Node\Scalar`s are nodes representing scalar values, like `'string'` 187 (`PhpParser\Node\Scalar\String_`), `0` (`PhpParser\Node\Scalar\LNumber`) or magic constants 188 like `__FILE__` (`PhpParser\Node\Scalar\MagicConst\File`). All `PhpParser\Node\Scalar`s extend 189 `PhpParser\Node\Expr`, as scalars are expressions, too. 190 * There are some nodes not in either of these groups, for example names (`PhpParser\Node\Name`) 191 and call arguments (`PhpParser\Node\Arg`). 192 193The `Node\Stmt\Expression` node is somewhat confusing in that it contains both the terms "statement" 194and "expression". This node distinguishes `expr`, which is a `Node\Expr`, from `expr;`, which is 195an "expression statement" represented by `Node\Stmt\Expression` and containing `expr` as a sub-node. 196 197Every node has a (possibly zero) number of subnodes. You can access subnodes by writing 198`$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it 199in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function 200call, you would write `$stmts[0]->exprs[1]->name`. 201 202All nodes also define a `getType()` method that returns the node type. The type is the class name 203without the `PhpParser\Node\` prefix and `\` replaced with `_`. It also does not contain a trailing 204`_` for reserved-keyword class names. 205 206It is possible to associate custom metadata with a node using the `setAttribute()` method. This data 207can then be retrieved using `hasAttribute()`, `getAttribute()` and `getAttributes()`. 208 209By default, the parser adds the `startLine`, `endLine`, `startTokenPos`, `endTokenPos`, 210`startFilePos`, `endFilePos` and `comments` attributes. `comments` is an array of 211`PhpParser\Comment[\Doc]` instances. 212 213The pre-defined attributes can also be accessed using `getStartLine()` instead of 214`getAttribute('startLine')`, and so on. The last doc comment from the `comments` attribute can be 215obtained using `getDocComment()`. 216 217Pretty printer 218-------------- 219 220The pretty printer component compiles the AST back to PHP code according to a specified scheme. 221Currently, there is only one scheme available, namely `PhpParser\PrettyPrinter\Standard`. 222 223```php 224use PhpParser\Error; 225use PhpParser\ParserFactory; 226use PhpParser\PrettyPrinter; 227 228$code = "<?php echo 'Hi ', hi\\getTarget();"; 229 230$parser = (new ParserFactory())->createForHostVersion(); 231$prettyPrinter = new PrettyPrinter\Standard(); 232 233try { 234 // parse 235 $stmts = $parser->parse($code); 236 237 // change 238 $stmts[0] // the echo statement 239 ->exprs // sub expressions 240 [0] // the first of them (the string node) 241 ->value // it's value, i.e. 'Hi ' 242 = 'Hello '; // change to 'Hello ' 243 244 // pretty print 245 $code = $prettyPrinter->prettyPrint($stmts); 246 247 echo $code; 248} catch (Error $e) { 249 echo 'Parse Error: ', $e->getMessage(), "\n"; 250} 251``` 252 253The above code will output: 254 255 echo 'Hello ', hi\getTarget(); 256 257As you can see, the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then 258again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`. 259 260The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a 261single expression using `prettyPrintExpr()`. 262 263The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag 264and handle inline HTML as the first/last statement more gracefully. 265 266There is also a pretty-printing mode which retains formatting for parts of the AST that have not 267been changed, which requires additional setup. 268 269> Read more: [Pretty printing documentation](component/Pretty_printing.markdown) 270 271Node traversal 272-------------- 273 274The above pretty printing example used the fact that the source code was known and thus it was easy to 275write code that accesses a certain part of a node tree and changes it. Normally this is not the case. 276Usually you want to change / analyze code in a generic way, where you don't know how the node tree is 277going to look like. 278 279For this purpose the parser provides a component for traversing and visiting the node tree. The basic 280structure of a program using this `PhpParser\NodeTraverser` looks like this: 281 282```php 283use PhpParser\NodeTraverser; 284use PhpParser\ParserFactory; 285use PhpParser\PrettyPrinter; 286 287$parser = (new ParserFactory())->createForHostVersion(); 288$traverser = new NodeTraverser; 289$prettyPrinter = new PrettyPrinter\Standard; 290 291// add your visitor 292$traverser->addVisitor(new MyNodeVisitor); 293 294try { 295 $code = file_get_contents($fileName); 296 297 // parse 298 $stmts = $parser->parse($code); 299 300 // traverse 301 $stmts = $traverser->traverse($stmts); 302 303 // pretty print 304 $code = $prettyPrinter->prettyPrintFile($stmts); 305 306 echo $code; 307} catch (PhpParser\Error $e) { 308 echo 'Parse Error: ', $e->getMessage(); 309} 310``` 311 312The corresponding node visitor might look like this: 313 314```php 315use PhpParser\Node; 316use PhpParser\NodeVisitorAbstract; 317 318class MyNodeVisitor extends NodeVisitorAbstract { 319 public function leaveNode(Node $node) { 320 if ($node instanceof Node\Scalar\String_) { 321 $node->value = 'foo'; 322 } 323 } 324} 325``` 326 327The above node visitor would change all string literals in the program to `'foo'`. 328 329All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four 330methods: 331 332```php 333public function beforeTraverse(array $nodes); 334public function enterNode(\PhpParser\Node $node); 335public function leaveNode(\PhpParser\Node $node); 336public function afterTraverse(array $nodes); 337``` 338 339The `beforeTraverse()` method is called once before the traversal begins and is passed the nodes the 340traverser was called with. This method can be used for resetting values before traversal or 341preparing the tree for traversal. 342 343The `afterTraverse()` method is similar to the `beforeTraverse()` method, with the only difference that 344it is called once after the traversal. 345 346The `enterNode()` and `leaveNode()` methods are called on every node, the former when it is entered, 347i.e. before its subnodes are traversed, the latter when it is left. 348 349All four methods can either return the changed node or not return at all (i.e. `null`) in which 350case the current node is not changed. 351 352The `enterNode()` method can additionally return the value `NodeVisitor::DONT_TRAVERSE_CHILDREN`, 353which instructs the traverser to skip all children of the current node. To furthermore prevent subsequent 354visitors from visiting the current node, `NodeVisitor::DONT_TRAVERSE_CURRENT_AND_CHILDREN` can be used instead. 355 356Both methods can additionally return the following values: 357 358 * `NodeVisitor::STOP_TRAVERSAL`, in which case no further nodes will be visited. 359 * `NodeVisitor::REMOVE_NODE`, in which case the current node will be removed from the parent array. 360 * `NodeVisitor::REPLACE_WITH_NULL`, in which case the current node will be replaced with `null`. 361 * An array of nodes, which will be merged into the parent array at the offset of the current node. 362 I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will 363 be `array(A, X, Y, Z, C)`. 364 365Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract` 366class, which will define empty default implementations for all the above methods. 367 368> Read more: [Walking the AST](component/Walking_the_AST.markdown) 369 370The NameResolver node visitor 371----------------------------- 372 373One visitor that is already bundled with the package is `PhpParser\NodeVisitor\NameResolver`. This visitor 374helps you work with namespaced code by trying to resolve most names to fully qualified ones. 375 376For example, consider the following code: 377 378 use A as B; 379 new B\C(); 380 381In order to know that `B\C` really is `A\C` you would need to track aliases and namespaces yourself. 382The `NameResolver` takes care of that and resolves names as far as possible. 383 384After running it, most names will be fully qualified. The only names that will stay unqualified are 385unqualified function and constant names. These are resolved at runtime and thus the visitor can't 386know which function they are referring to. In most cases this is a non-issue as the global functions 387are meant. 388 389Additionally, the `NameResolver` adds a `namespacedName` subnode to class, function and constant 390declarations that contains the namespaced name instead of only the shortname that is available via 391`name`. 392 393> Read more: [Name resolution documentation](component/Name_resolution.markdown) 394 395Example: Converting namespaced code to pseudo namespaces 396-------------------------------------------------------- 397 398A small example to understand the concept: We want to convert namespaced code to pseudo namespaces, 399so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions 400are fairly complicated if you take PHP's dynamic features into account, so our conversion will 401assume that no dynamic features are used. 402 403We start off with the following base code: 404 405```php 406use PhpParser\ParserFactory; 407use PhpParser\PrettyPrinter; 408use PhpParser\NodeTraverser; 409use PhpParser\NodeVisitor\NameResolver; 410 411$inDir = '/some/path'; 412$outDir = '/some/other/path'; 413 414$parser = (new ParserFactory())->createForNewestSupportedVersion(); 415$traverser = new NodeTraverser; 416$prettyPrinter = new PrettyPrinter\Standard; 417 418$traverser->addVisitor(new NameResolver); // we will need resolved names 419$traverser->addVisitor(new NamespaceConverter); // our own node visitor 420 421// iterate over all .php files in the directory 422$files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($inDir)); 423$files = new \RegexIterator($files, '/\.php$/'); 424 425foreach ($files as $file) { 426 try { 427 // read the file that should be converted 428 $code = file_get_contents($file->getPathName()); 429 430 // parse 431 $stmts = $parser->parse($code); 432 433 // traverse 434 $stmts = $traverser->traverse($stmts); 435 436 // pretty print 437 $code = $prettyPrinter->prettyPrintFile($stmts); 438 439 // write the converted file to the target directory 440 file_put_contents( 441 substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)), 442 $code 443 ); 444 } catch (PhpParser\Error $e) { 445 echo 'Parse Error: ', $e->getMessage(); 446 } 447} 448``` 449 450Now lets start with the main code, the `NamespaceConverter`. One thing it needs to do 451is convert `A\\B` style names to `A_B` style ones. 452 453```php 454use PhpParser\Node; 455 456class NamespaceConverter extends \PhpParser\NodeVisitorAbstract 457{ 458 public function leaveNode(Node $node) { 459 if ($node instanceof Node\Name) { 460 return new Node\Name(str_replace('\\', '_', $node->toString())); 461 } 462 } 463} 464``` 465 466The above code profits from the fact that the `NameResolver` already resolved all names as far as 467possible, so we don't need to do that. We only need to create a string with the name parts separated 468by underscores instead of backslashes. This is what `str_replace('\\', '_', $node->toString())` does. (If you want to 469create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create 470a new name from the string and return it. Returning a new node replaces the old node. 471 472Another thing we need to do is change the class/function/const declarations. Currently they contain 473only the shortname (i.e. the last part of the name), but they need to contain the complete name including 474the namespace prefix: 475 476```php 477use PhpParser\Node; 478use PhpParser\Node\Stmt; 479 480class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract 481{ 482 public function leaveNode(Node $node) { 483 if ($node instanceof Node\Name) { 484 return new Node\Name(str_replace('\\', '_', $node->toString())); 485 } elseif ($node instanceof Stmt\Class_ 486 || $node instanceof Stmt\Interface_ 487 || $node instanceof Stmt\Function_) { 488 $node->name = str_replace('\\', '_', $node->namespacedName->toString()); 489 } elseif ($node instanceof Stmt\Const_) { 490 foreach ($node->consts as $const) { 491 $const->name = str_replace('\\', '_', $const->namespacedName->toString()); 492 } 493 } 494 } 495} 496``` 497 498There is not much more to it than converting the namespaced name to string with `_` as separator. 499 500The last thing we need to do is remove the `namespace` and `use` statements: 501 502```php 503use PhpParser\Node; 504use PhpParser\Node\Stmt; 505use PhpParser\NodeVisitor; 506 507class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract 508{ 509 public function leaveNode(Node $node) { 510 if ($node instanceof Node\Name) { 511 return new Node\Name(str_replace('\\', '_', $node->toString())); 512 } elseif ($node instanceof Stmt\Class_ 513 || $node instanceof Stmt\Interface_ 514 || $node instanceof Stmt\Function_) { 515 $node->name = str_replace('\\', '_', $node->namespacedName->toString(); 516 } elseif ($node instanceof Stmt\Const_) { 517 foreach ($node->consts as $const) { 518 $const->name = str_replace('\\', '_', $const->namespacedName->toString()); 519 } 520 } elseif ($node instanceof Stmt\Namespace_) { 521 // returning an array merges is into the parent array 522 return $node->stmts; 523 } elseif ($node instanceof Stmt\Use_) { 524 // remove use nodes altogether 525 return NodeVisitor::REMOVE_NODE; 526 } 527 } 528} 529``` 530 531That's all. 532