Introduction ============ This project is a PHP parser **written in PHP itself**. What is this for? ----------------- A parser is useful for [static analysis][0], manipulation of code and basically any other application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1] (AST) of the code and thus allows dealing with it in an abstract and robust way. There are other ways of processing source code. One that PHP supports natively is using the token stream generated by [`token_get_all`][2]. The token stream is much more low level than the AST and thus has different applications: It allows to also analyze the exact formatting of a file. On the other hand, the token stream is much harder to deal with for more complex analysis. For example, an AST abstracts away the fact that, in PHP, variables can be written as `$foo`, but also as `$$bar`, `${'foobar'}` or even `${!${''}=barfoo()}`. You don't have to worry about recognizing all the different syntaxes from a stream of tokens. Another question is: Why would I want to have a PHP parser *written in PHP*? Well, PHP might not be a language especially suited for fast parsing, but processing the AST is much easier in PHP than it would be in other, faster languages like C. Furthermore the people most likely wanting to do programmatic PHP code analysis are incidentally PHP developers, not C developers. What can it parse? ------------------ The parser supports parsing PHP 7 and PHP 8 code, with the following exceptions: * Namespaced names containing whitespace (e.g. `Foo \ Bar` instead of `Foo\Bar`) are not supported. These are illegal in PHP 8, but are legal in earlier versions. However, PHP-Parser does not support them for any version. PHP-Parser 4.x had full support for parsing PHP 5. PHP-Parser 5.x has only limited support, with the following caveats: * Some variable expressions like `$$foo[0]` are valid in both PHP 5 and PHP 7, but have different interpretation. In such cases, the PHP 7 AST will always be constructed (using `($$foo)[0]` rather than `${$foo[0]}`). * Declarations of the form `global $$var[0]` are not supported in PHP 7 and will cause a parse error. In error recovery mode, it is possible to continue parsing after such declarations. As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP version it runs on), additionally a wrapper for emulating tokens from newer versions is provided. This allows to parse PHP 8.4 source code running on PHP 7.4, for example. This emulation is not perfect, but works well in practice. Finally, it should be noted that the parser aims to accept all valid code, not reject all invalid code. It will generally accept code that is only valid in newer versions (even when targeting an older one), and accept code that is syntactically correct, but would result in a compiler error. What output does it produce? ---------------------------- The parser produces an [Abstract Syntax Tree][1] (AST) also known as a node tree. How this looks can best be seen in an example. The program `