xref: /PHP-Parser/doc/0_Introduction.markdown (revision ee3e7db3)
1Introduction
2============
3
4This project is a PHP parser **written in PHP itself**.
5
6What is this for?
7-----------------
8
9A parser is useful for [static analysis][0], manipulation of code and basically any other
10application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1]
11(AST) of the code and thus allows dealing with it in an abstract and robust way.
12
13There are other ways of processing source code. One that PHP supports natively is using the
14token stream generated by [`token_get_all`][2]. The token stream is much more low level than
15the AST and thus has different applications: It allows to also analyze the exact formatting of
16a file. On the other hand, the token stream is much harder to deal with for more complex analysis.
17For example, an AST abstracts away the fact that, in PHP, variables can be written as `$foo`, but also
18as `$$bar`, `${'foobar'}` or even `${!${''}=barfoo()}`. You don't have to worry about recognizing
19all the different syntaxes from a stream of tokens.
20
21Another question is: Why would I want to have a PHP parser *written in PHP*? Well, PHP might not be
22a language especially suited for fast parsing, but processing the AST is much easier in PHP than it
23would be in other, faster languages like C. Furthermore the people most likely wanting to do
24programmatic PHP code analysis are incidentally PHP developers, not C developers.
25
26What can it parse?
27------------------
28
29The parser supports parsing PHP 7 and PHP 8 code, with the following exceptions:
30
31 * Namespaced names containing whitespace (e.g. `Foo \ Bar` instead of `Foo\Bar`) are not supported.
32   These are illegal in PHP 8, but are legal in earlier versions. However, PHP-Parser does not
33   support them for any version.
34
35PHP-Parser 4.x had full support for parsing PHP 5. PHP-Parser 5.x has only limited support, with the
36following caveats:
37
38 * Some variable expressions like `$$foo[0]` are valid in both PHP 5 and PHP 7, but have different
39   interpretation. In such cases, the PHP 7 AST will always be constructed (using `($$foo)[0]`
40   rather than `${$foo[0]}`).
41 * Declarations of the form `global $$var[0]` are not supported in PHP 7 and will cause a parse
42   error. In error recovery mode, it is possible to continue parsing after such declarations.
43
44As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
45version it runs on), additionally a wrapper for emulating tokens from newer versions is provided.
46This allows to parse PHP 8.3 source code running on PHP 7.4, for example. This emulation is not
47perfect, but works well in practice.
48
49Finally, it should be noted that the parser aims to accept all valid code, not reject all invalid
50code. It will generally accept code that is only valid in newer versions (even when targeting an
51older one), and accept code that is syntactically correct, but would result in a compiler error.
52
53What output does it produce?
54----------------------------
55
56The parser produces an [Abstract Syntax Tree][1] (AST) also known as a node tree. How this looks
57can best be seen in an example. The program `<?php echo 'Hi', 'World';` will give you a node tree
58roughly looking like this:
59
60```
61array(
62    0: Stmt_Echo(
63        exprs: array(
64            0: Scalar_String(
65                value: Hi
66            )
67            1: Scalar_String(
68                value: World
69            )
70        )
71    )
72)
73```
74
75This matches the structure of the code: An echo statement, which takes two strings as expressions,
76with the values `Hi` and `World`.
77
78You can also see that the AST does not contain any whitespace information (but most comments are saved).
79However, it does retain accurate position information, which can be used to inspect precise formatting.
80
81What else can it do?
82--------------------
83
84Apart from the parser itself, this package also bundles support for some other, related features:
85
86 * Support for pretty printing, which is the act of converting an AST into PHP code. Please note
87   that "pretty printing" does not imply that the output is especially pretty. It's just how it's
88   called ;)
89 * Support for serializing and unserializing the node tree to JSON.
90 * Support for dumping the node tree in a human-readable form (see the section above for an
91   example of how the output looks like).
92 * Infrastructure for traversing and changing the AST (node traverser and node visitors).
93 * A node visitor for resolving namespaced names.
94
95 [0]: http://en.wikipedia.org/wiki/Static_program_analysis
96 [1]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
97 [2]: http://php.net/token_get_all
98