xref: /PHP-Parser/UPGRADE-5.0.md (revision d619c8b4)
1Upgrading from PHP-Parser 4.x to 5.0
2====================================
3
4### PHP version requirements
5
6PHP-Parser now requires PHP 7.4 or newer to run. It is however still possible to *parse* code for older versions, while running on a newer version.
7
8### PHP 5 parsing support
9
10The dedicated parser for PHP 5 has been removed. The PHP 7 parser now accepts a `PhpVersion` argument, which can be used to improve compatibility with older PHP versions.
11
12In particular, if an older `PhpVersion` is specified, then:
13
14 * For versions before PHP 7.0, `$foo =& new Bar()` assignments are allowed without error.
15 * For versions before PHP 7.0, invalid octal literals `089` are allowed without error.
16 * For versions before PHP 7.0, unicode escape sequences `\u{123}` in strings are not parsed.
17 * Type hints are interpreted as a class `Name` or as a built-in `Identifier` depending on PHP
18   version, for example `int` is treated as a class name on PHP 5.6 and as a built-in on PHP 7.0.
19
20However, some aspects of PHP 5 parsing are no longer supported:
21
22 * Some variables like `$$foo[0]` are valid in both PHP 5 and PHP 7, but have different interpretation. In that case, the PHP 7 AST will always be constructed (`($$foo)[0]` rather than `${$foo[0]}`).
23 * Declarations of the form `global $$var[0]` are not supported in PHP 7 and will cause a parse error. In error recovery mode, it is possible to continue parsing after such declarations.
24 * The PHP 7 parser will accept many constructs that are not valid in PHP 5. However, this was also true of the dedicated PHP 5 parser.
25
26The following symbols are affected by this removal:
27
28 * The `PhpParser\Parser\Php5` class has been removed.
29 * The `PhpParser\Parser\Multiple` class has been removed. While not strictly related to PHP 5 support, this functionality is no longer useful without it.
30 * The `PhpParser\ParserFactory::ONLY_PHP5` and `PREFER_PHP5` options have been removed.
31
32### Changes to the parser factory
33
34The `ParserFactory::create()` method has been removed in favor of three new methods that provide more fine-grained control over the PHP version being targeted:
35
36 * `createForNewestSupportedVersion()`: Use this if you don't know the PHP version of the code you're parsing. It's better to assume a too new version than a too old one.
37 * `createForHostVersion()`: Use this if you're parsing code for the PHP version you're running on.
38 * `createForVersion()`: Use this if you know the PHP version of the code you want to parse.
39
40The `createForNewestSupportedVersion()` and `creatForHostVersion()` are available since PHP-Parser 4.18.0, to allow libraries to support PHP-Parser 4 and 5 at the same time more easily.
41
42In all cases, the PHP version is a fairly weak hint that is only used on a best-effort basis. The parser will usually accept code for newer versions if it does not have any backwards-compatibility implications.
43
44For example, if you specify version `"8.0"`, then `class ReadOnly {}` is treated as a valid class declaration, while using `public readonly int $prop` will lead to a parse error. However, `final public const X = Y;` will be accepted in both cases.
45
46```php
47use PhpParser\ParserFactory;
48use PhpParser\PhpVersion;
49
50$factory = new ParserFactory();
51
52# Before
53$parser = $factory->create(ParserFactory::PREFER_PHP7);
54
55# After (this is roughly equivalent to PREFER_PHP7 behavior)
56$parser = $factory->createForNewestSupportedVersion();
57# Or
58$parser = $factory->createForHostVersion();
59
60# Before
61$parser = $factory->create(ParserFactory::ONLY_PHP5);
62# After (supported on a best-effort basis)
63$parser = $factory->createForVersion(PhpVersion::fromString("5.6"));
64```
65
66### Changes to the throw representation
67
68Previously, `throw` statements like `throw $e;` were represented using the `Stmt\Throw_` class,
69while uses inside other expressions (such as `$x ?? throw $e`) used the `Expr\Throw_` class.
70
71Now, `throw $e;` is represented as a `Stmt\Expression` that contains an `Expr\Throw_`. The
72`Stmt\Throw_` class has been removed.
73
74```php
75# Code
76throw $e;
77
78# Before
79Stmt_Throw(
80    expr: Expr_Variable(
81        name: e
82    )
83)
84
85# After
86Stmt_Expression(
87    expr: Expr_Throw(
88        expr: Expr_Variable(
89            name: e
90        )
91    )
92)
93```
94
95### Changes to the array destructuring representation
96
97Previously, the `list($x) = $y` destructuring syntax was represented using a `Node\Expr\List_`
98node, while `[$x] = $y` used a `Node\Expr\Array_` node, the same used for the creation (rather than
99destructuring) of arrays.
100
101Now, destructuring is always represented using `Node\Expr\List_`. The `kind` attribute with value
102`Node\Expr\List_::KIND_LIST` or `Node\Expr\List_::KIND_ARRAY` specifies which syntax was actually
103used.
104
105```php
106# Code
107[$x] = $y;
108
109# Before
110Expr_Assign(
111   var: Expr_Array(
112       items: array(
113           0: Expr_ArrayItem(
114               key: null
115               value: Expr_Variable(
116                   name: x
117               )
118               byRef: false
119               unpack: false
120           )
121       )
122   )
123   expr: Expr_Variable(
124       name: y
125   )
126)
127
128# After
129Expr_Assign(
130   var: Expr_List(
131       items: array(
132           0: ArrayItem(
133               key: null
134               value: Expr_Variable(
135                   name: x
136               )
137               byRef: false
138               unpack: false
139           )
140       )
141   )
142   expr: Expr_Variable(
143       name: y
144   )
145)
146```
147
148### Changes to the name representation
149
150Previously, `Name` nodes had a `parts` subnode, which stores an array of name parts, split by
151namespace separators. Now, `Name` nodes instead have a `name` subnode, which stores a plain string.
152
153For example, the name `Foo\Bar` was previously represented by `Name(parts: ['Foo', 'Bar'])` and is
154now represented by `Name(name: 'Foo\Bar')` instead.
155
156It is possible to convert the name to the previous representation using `$name->getParts()`. The
157`Name` constructor continues to accept both the string and the array representation.
158
159The `Name::getParts()` method is available since PHP-Parser 4.16.0, to allow libraries to support
160PHP-Parser 4 and 5 at the same time more easily.
161
162### Changes to the block representation
163
164Previously, code blocks `{ ... }` were always flattened into their parent statement list. For
165example `while ($x) { $a; { $b; } $c; }` would produce the same node structure as
166`if ($x) { $a; $b; $c; }`, namely a `Stmt\While_` node whose `stmts` subnode is an array of three
167statements.
168
169Now, the nested `{ $b; }` block is represented using an explicit `Stmt\Block` node. However, the
170outer `{ $a; { $b; } $c; }` block is still represented using a simple array in the `stmts` subnode.
171
172```php
173# Code
174while ($x) { $a; { $b; } $c; }
175
176# Before
177Stmt_While(
178    cond: Expr_Variable(
179        name: x
180    )
181    stmts: array(
182        0: Stmt_Expression(
183            expr: Expr_Variable(
184                name: a
185            )
186        )
187        1: Stmt_Expression(
188            expr: Expr_Variable(
189                name: b
190            )
191        )
192        2: Stmt_Expression(
193            expr: Expr_Variable(
194                name: c
195            )
196        )
197    )
198)
199
200# After
201Stmt_While(
202    cond: Expr_Variable(
203        name: x
204    )
205    stmts: array(
206        0: Stmt_Expression(
207            expr: Expr_Variable(
208                name: a
209            )
210        )
211        1: Stmt_Block(
212            stmts: array(
213                0: Stmt_Expression(
214                    expr: Expr_Variable(
215                        name: b
216                    )
217                )
218            )
219        )
220        2: Stmt_Expression(
221            expr: Expr_Variable(
222                name: c
223            )
224        )
225    )
226)
227```
228
229### Changes to comment assignment
230
231Previously, comments were assigned to all nodes starting at the same position. Now they will be
232assigned to the outermost node only.
233
234```php
235# Code
236// Comment
237$a + $b;
238
239# Before
240Stmt_Expression(
241    expr: Expr_BinaryOp_Plus(
242        left: Expr_Variable(
243            name: a
244            comments: array(
245                0: // Comment
246            )
247        )
248        right: Expr_Variable(
249            name: b
250        )
251        comments: array(
252            0: // Comment
253        )
254    )
255    comments: array(
256        0: // Comment
257    )
258)
259
260# After
261Stmt_Expression(
262    expr: Expr_BinaryOp_Plus(
263        left: Expr_Variable(
264            name: a
265        )
266        right: Expr_Variable(
267            name: b
268        )
269    )
270    comments: array(
271        0: // Comment
272    )
273)
274```
275
276### Renamed nodes
277
278A number of AST nodes have been renamed or moved in the AST hierarchy:
279
280 * `Node\Scalar\LNumber` is now `Node\Scalar\Int_`.
281 * `Node\Scalar\DNumber` is now `Node\Scalar\Float_`.
282 * `Node\Scalar\Encapsed` is now `Node\Scalar\InterpolatedString`.
283 * `Node\Scalar\EncapsedStringPart` is now `Node\InterpolatedStringPart` and no longer extends
284   `Node\Scalar` or `Node\Expr`.
285 * `Node\Expr\ArrayItem` is now `Node\ArrayItem` and no longer extends `Node\Expr`.
286 * `Node\Expr\ClosureUse` is now `Node\ClosureUse` and no longer extends `Node\Expr`.
287 * `Node\Stmt\DeclareDeclare` is now `Node\DeclareItem` and no longer extends `Node\Stmt`.
288 * `Node\Stmt\PropertyProperty` is now `Node\PropertyItem` and no longer extends `Node\Stmt`.
289 * `Node\Stmt\StaticVar` is now `Node\StaticVar` and no longer extends `Node\Stmt`.
290 * `Node\Stmt\UseUse` is now `Node\UseItem` and no longer extends `Node\Stmt`.
291
292The old class names have been retained as aliases for backwards compatibility. However, the `Node::getType()` method will now always return the new name (e.g. `ClosureUse` instead of `Expr_ClosureUse`).
293
294### Modifiers
295
296Modifier flags (as used by the `$flags` subnode of `Class_`, `ClassMethod`, `Property`, etc.) are now available as class constants on a separate `PhpParser\Modifiers` class, instead of being part of `PhpParser\Node\Stmt\Class_`, to make it clearer that these are used by many different nodes. The old constants are deprecated, but are still available.
297
298```php
299PhpParser\Node\Stmt\Class_::MODIFIER_PUBLIC    -> PhpParser\Modifiers::PUBLIC
300PhpParser\Node\Stmt\Class_::MODIFIER_PROTECTED -> PhpParser\Modifiers::PROTECTED
301PhpParser\Node\Stmt\Class_::MODIFIER_PRIVATE   -> PhpParser\Modifiers::PRIVATE
302PhpParser\Node\Stmt\Class_::MODIFIER_STATIC    -> PhpParser\Modifiers::STATIC
303PhpParser\Node\Stmt\Class_::MODIFIER_ABSTRACT  -> PhpParser\Modifiers::ABSTRACT
304PhpParser\Node\Stmt\Class_::MODIFIER_FINAL     -> PhpParser\Modifiers::FINAL
305PhpParser\Node\Stmt\Class_::MODIFIER_READONLY  -> PhpParser\Modifiers::READONLY
306PhpParser\Node\Stmt\Class_::VISIBILITY_MODIFIER_MASK -> PhpParser\Modifiers::VISIBILITY_MASK
307```
308
309### Changes to node constructors
310
311Node constructor arguments accepting types now longer accept plain strings. Either an `Identifier` or `Name` (or `ComplexType`) should be passed instead. This affects the following constructor arguments:
312
313* The `'returnType'` key of `$subNodes` argument of `Node\Expr\ArrowFunction`.
314* The `'returnType'` key of `$subNodes` argument of `Node\Expr\Closure`.
315* The `'returnType'` key of `$subNodes` argument of `Node\Stmt\ClassMethod`.
316* The `'returnType'` key of `$subNodes` argument of `Node\Stmt\Function_`.
317* The `$type` argument of `Node\NullableType`.
318* The `$type` argument of `Node\Param`.
319* The `$type` argument of `Node\Stmt\Property`.
320* The `$type` argument of `Node\ClassConst`.
321
322To follow the previous behavior, an `Identifier` should be passed, which indicates a built-in type.
323
324### Changes to the pretty printer
325
326A number of changes to the standard pretty printer have been made, to make it match contemporary coding style conventions (and in particular PSR-12). Options to restore the previous behavior are not provided, but it is possible to override the formatting methods (such as `pStmt_ClassMethod`) with your preferred formatting.
327
328Return types are now formatted without a space before the `:`:
329
330```php
331# Before
332function test() : Type
333{
334}
335
336# After
337function test(): Type
338{
339}
340```
341
342`abstract` and `final` are now printed before visibility modifiers:
343
344```php
345# Before
346public abstract function test();
347
348# After
349abstract public function test();
350```
351
352A space is now printed between `use` and the following `(` for closures:
353
354```php
355# Before
356function () use($var) {
357};
358
359# After
360function () use ($var) {
361};
362```
363
364Backslashes in single-quoted strings are now only printed if they are necessary:
365
366```php
367# Before
368'Foo\\Bar';
369'\\\\';
370
371# After
372'Foo\Bar';
373'\\\\';
374```
375
376`else if` structures will now omit redundant parentheses:
377
378```php
379# Before
380else {
381    if ($x) {
382        // ...
383    }
384}
385
386# After
387else if ($x) {
388     // ...
389}
390```
391
392The pretty printer now accepts a `phpVersion` option, which accepts a `PhpVersion` object and defaults to PHP 7.4. The pretty printer will make formatting choices to make the code valid for that version. It currently controls the following behavior:
393
394* For PHP >= 7.0 (default), short array syntax `[]` will be used by default. This does not affect nodes that specify an explicit array syntax using the `kind` attribute.
395* For PHP >= 7.0 (default), parentheses around `yield` expressions will only be printed when necessary. Previously, parentheses were always printed, even if `yield` was used as a statement.
396* For PHP >= 7.1 (default), the short array syntax `[]` will be used for destructuring by default (instead of `list()`). This does not affect nodes that specify an explicit syntax using the `kind` attribute.
397* For PHP >= 7.3 (default), a newline is no longer forced after heredoc/nowdoc strings, as the requirement for this has been removed with the introduction of flexible heredoc/nowdoc strings.
398* For PHP >= 7.3 (default), heredoc/nowdoc strings are now indented just like regular code. This was allowed with the introduction of flexible heredoc/nowdoc strings.
399
400### Changes to precedence handling in the pretty printer
401
402The pretty printer now more accurately models operator precedence. Especially for unary operators, less unnecessary parentheses will be printed. Conversely, many bugs where semantically meaningful parentheses were omitted have been fixed.
403
404To support these changes, precedence is now handled differently in the pretty printer. The internal `p()` method, which is used to recursively print nodes, now has the following signature:
405```php
406protected function p(
407    Node $node, int $precedence = self::MAX_PRECEDENCE, int $lhsPrecedence = self::MAX_PRECEDENCE,
408    bool $parentFormatPreserved = false
409): string;
410```
411
412The `$precedence` is the precedence of the direct parent operator (if any), while `$lhsPrecedence` is that precedence of the nearest binary operator on whose left-hand-side the node occurs. For unary operators, only the `$lhsPrecedence` is relevant.
413
414Recursive calls in pretty-printer methods should generally continue calling `p()` without additional parameters. However, pretty-printer methods for operators that participate in precedence resolution need to be adjusted. For example, typical implementations for operators look as follows now:
415
416```php
417protected function pExpr_BinaryOp_Plus(
418    BinaryOp\Plus $node, int $precedence, int $lhsPrecedence
419): string {
420    return $this->pInfixOp(
421        BinaryOp\Plus::class, $node->left, ' + ', $node->right, $precedence, $lhsPrecedence);
422}
423
424protected function pExpr_UnaryPlus(
425    Expr\UnaryPlus $node, int $precedence, int $lhsPrecedence
426): string {
427    return $this->pPrefixOp(Expr\UnaryPlus::class, '+', $node->expr, $precedence, $lhsPrecedence);
428}
429```
430
431The new `$precedence` and `$lhsPrecedence` arguments need to be passed down to the `pInfixOp()`, `pPrefixOp()` and `pPostfixOp()` methods.
432
433### Changes to the node traverser
434
435If there are multiple visitors, the node traverser will now call `leaveNode()` and `afterTraverse()` methods in the reverse order of the corresponding `enterNode()` and `beforeTraverse()` calls:
436
437```php
438# Before
439$visitor1->enterNode($node);
440$visitor2->enterNode($node);
441$visitor1->leaveNode($node);
442$visitor2->leaveNode($node);
443
444# After
445$visitor1->enterNode($node);
446$visitor2->enterNode($node);
447$visitor2->leaveNode($node);
448$visitor1->leaveNode($node);
449```
450
451Additionally, the special `NodeVisitor` return values have been moved from `NodeTraverser` to `NodeVisitor`. The old names are deprecated, but still available.
452
453```php
454PhpParser\NodeTraverser::REMOVE_NODE -> PhpParser\NodeVisitor::REMOVE_NODE
455PhpParser\NodeTraverser::DONT_TRAVERSE_CHILDREN -> PhpParser\NodeVisitor::DONT_TRAVERSE_CHILDREN
456PhpParser\NodeTraverser::DONT_TRAVERSE_CURRENT_AND_CHILDREN -> PhpParser\NodeVisitor::DONT_TRAVERSE_CURRENT_AND_CHILDREN
457PhpParser\NodeTraverser::STOP_TRAVERSAL -> PhpParser\NodeVisitor::STOP_TRAVERSAL
458```
459
460Visitors can now also be passed directly to the `NodeTraverser` constructor:
461
462```php
463# Before (and still supported)
464$traverser = new NodeTraverser();
465$traverser->addVisitor(new NameResolver());
466
467# After
468$traverser = new NodeTraverser(new NameResolver());
469```
470
471### Changes to token representation
472
473Tokens are now internally represented using the `PhpParser\Token` class, which exposes the same base interface as
474the `PhpToken` class introduced in PHP 8.0. On PHP 8.0 or newer, `PhpParser\Token` extends from `PhpToken`, otherwise
475it extends from a polyfill implementation. The most important parts of the interface may be summarized as follows:
476
477```php
478class Token {
479    public int $id;
480    public string $text;
481    public int $line;
482    public int $pos;
483
484    public function is(int|string|array $kind): bool;
485}
486```
487
488The token array is now an array of `Token`s, rather than an array of arrays and strings.
489Additionally, the token array is now terminated by a sentinel token with ID 0.
490
491### Changes to the lexer
492
493The lexer API is reduced to a single `Lexer::tokenize()` method, which returns an array of tokens. The `startLexing()` and `getNextToken()` methods have been removed.
494
495Responsibility for determining start and end attributes for nodes has been moved from the lexer to the parser. The lexer no longer accepts an options array. The `usedAttributes` option has been removed without replacement, and the parser will now unconditionally add the `comments`, `startLine`, `endLine`, `startFilePos`, `endFilePos`, `startTokenPos` and `endTokenPos` attributes.
496
497There should no longer be a need to directly interact with the `Lexer` for end users, as the `ParserFactory` will create an appropriate instance, and no additional configuration of the lexer is necessary. To use formatting-preserving pretty printing, the setup boilerplate changes as follows:
498
499```php
500# Before
501
502$lexer = new Lexer\Emulative([
503    'usedAttributes' => [
504        'comments',
505        'startLine', 'endLine',
506        'startTokenPos', 'endTokenPos',
507    ],
508]);
509
510$parser = new Parser\Php7($lexer);
511$oldStmts = $parser->parse($code);
512$oldTokens = $lexer->getTokens();
513
514$traverser = new NodeTraverser();
515$traverser->addVisitor(new NodeVisitor\CloningVisitor());
516$newStmts = $traverser->traverse($oldStmts);
517
518# After
519
520$parser = (new ParserFactory())->createForNewestSupportedVersion();
521$oldStmts = $parser->parse($code);
522$oldTokens = $parser->getTokens();
523
524$traverser = new NodeTraverser(new NodeVisitor\CloningVisitor());
525$newStmts = $traverser->traverse($oldStmts);
526```
527
528### Miscellaneous changes
529
530 * The deprecated `Builder\Param::setTypeHint()` method has been removed in favor of `Builder\Param::setType()`.
531 * The deprecated `Error` constructor taking a start line has been removed. Pass `['startLine' => $startLine]` attributes instead.
532 * The deprecated `Comment::getLine()`, `Comment::getTokenPos()` and `Comment::getFilePos()` methods have been removed. Use `Comment::getStartLine()`, `Comment::getStartTokenPos()` and `Comment::getStartFilePos()` instead.
533 * `Comment::getReformattedText()` now normalizes CRLF newlines to LF newlines.
534 * The `Node::getLine()` method has been deprecated. Use `Node::getStartLine()` instead.
535