www.aleph1.co.uk Git - yaffs-website/blob - vendor/nikic/php-parser/doc/component/Walking_the_AST.markdown

   1 Walking the AST
   2 ===============
   3
   4 The most common way to work with the AST is by using a node traverser and one or more node visitors.
   5 As a basic example, the following code changes all literal integers in the AST into strings (e.g.,
   6 `42` becomes `'42'`.)
   7
   8 ```php
   9 use PhpParser\{Node, NodeTraverser, NodeVisitorAbstract};
  10
  11 $traverser = new NodeTraverser;
  12 $traverser->addVisitor(new class extends NodeVisitorAbstract {
  13     public function leaveNode(Node $node) {
  14         if ($node instanceof Node\Scalar\LNumber) {
  15             return new Node\Scalar\String_((string) $node->value);
  16         }
  17     }
  18 });
  19
  20 $stmts = ...;
  21 $modifiedStmts = $traverser->traverse($stmts);
  22 ```
  23
  24 Node visitors
  25 -------------
  26
  27 Each node visitor implements an interface with following four methods:
  28
  29 ```php
  30 interface NodeVisitor {
  31     public function beforeTraverse(array $nodes);
  32     public function enterNode(Node $node);
  33     public function leaveNode(Node $node);
  34     public function afterTraverse(array $nodes);
  35 }
  36 ```
  37
  38 The `beforeTraverse()` and `afterTraverse()` methods are called before and after the traversal
  39 respectively, and are passed the entire AST. They can be used to perform any necessary state
  40 setup or cleanup.
  41
  42 The `enterNode()` method is called when a node is first encountered, before its children are
  43 processed ("preorder"). The `leaveNode()` method is called after all children have been visited
  44 ("postorder").
  45
  46 For example, if we have the following excerpt of an AST
  47
  48 ```
  49 Expr_FuncCall(
  50     name: Name(
  51         parts: array(
  52             0: printLine
  53         )
  54     )
  55     args: array(
  56         0: Arg(
  57             value: Scalar_String(
  58                 value: Hello World!!!
  59             )
  60             byRef: false
  61             unpack: false
  62         )
  63     )
  64 )
  65 ```
  66
  67 then the enter/leave methods will be called in the following order:
  68
  69 ```
  70 enterNode(Expr_FuncCall)
  71 enterNode(Name)
  72 leaveNode(Name)
  73 enterNode(Arg)
  74 enterNode(Scalar_String)
  75 leaveNode(Scalar_String)
  76 leaveNode(Arg)
  77 leaveNode(Expr_FuncCall)
  78 ```
  79
  80 A common pattern is that `enterNode` is used to collect some information and then `leaveNode`
  81 performs modifications based on that. At the time when `leaveNode` is called, all the code inside
  82 the node will have already been visited and necessary information collected.
  83
  84 As you usually do not want to implement all four methods, it is recommended that you extend
  85 `NodeVisitorAbstract` instead of implementing the interface directly. The abstract class provides
  86 empty default implementations.
  87
  88 Modifying the AST
  89 -----------------
  90
  91 There are a number of ways in which the AST can be modified from inside a node visitor. The first
  92 and simplest is to simply change AST properties inside the visitor:
  93
  94 ```php
  95 public function leaveNode(Node $node) {
  96     if ($node instanceof Node\Scalar\LNumber) {
  97         // increment all integer literals
  98         $node->value++;
  99     }
 100 }
 101 ```
 102
 103 The second is to replace a node entirely by returning a new node:
 104
 105 ```php
 106 public function leaveNode(Node $node) {
 107     if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
 108         // Convert all $a && $b expressions into !($a && $b)
 109         return new Node\Expr\BooleanNot($node);
 110     }
 111 }
 112 ```
 113
 114 Doing this is supported both inside enterNode and leaveNode. However, you have to be mindful about
 115 where you perform the replacement: If a node is replaced in enterNode, then the recursive traversal
 116 will also consider the children of the new node. If you aren't careful, this can lead to infinite
 117 recursion. For example, let's take the previous code sample and use enterNode instead:
 118
 119 ```php
 120 public function enterNode(Node $node) {
 121     if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
 122         // Convert all $a && $b expressions into !($a && $b)
 123         return new Node\Expr\BooleanNot($node);
 124     }
 125 }
 126 ```
 127
 128 Now `$a && $b` will be replaced by `!($a && $b)`. Then the traverser will go into the first (and
 129 only) child of `!($a && $b)`, which is `$a && $b`. The transformation applies again and we end up
 130 with `!!($a && $b)`. This will continue until PHP hits the memory limit.
 131
 132 Finally, two special replacement types are supported only by leaveNode. The first is removal of a
 133 node:
 134
 135 ```php
 136 public function leaveNode(Node $node) {
 137     if ($node instanceof Node\Stmt\Return_) {
 138         // Remove all return statements
 139         return NodeTraverser::REMOVE_NODE;
 140     }
 141 }
 142 ```
 143
 144 Node removal only works if the parent structure is an array. This means that usually it only makes
 145 sense to remove nodes of type `Node\Stmt`, as they always occur inside statement lists (and a few
 146 more node types like `Arg` or `Expr\ArrayItem`, which are also always part of lists).
 147
 148 On the other hand, removing a `Node\Expr` does not make sense: If you have `$a * $b`, there is no
 149 meaningful way in which the `$a` part could be removed. If you want to remove an expression, you
 150 generally want to remove it together with a surrounding expression statement:
 151
 152 ```php
 153 public function leaveNode(Node $node) {
 154     if ($node instanceof Node\Stmt\Expression
 155         && $node->expr instanceof Node\Expr\FuncCall
 156         && $node->expr->name instanceof Node\Name
 157         && $node->expr->name->toString() === 'var_dump'
 158     ) {
 159         return NodeTraverser::REMOVE_NODE;
 160     }
 161 }
 162 ```
 163
 164 This example will remove all calls to `var_dump()` which occur as expression statements. This means
 165 that `var_dump($a);` will be removed, but `if (var_dump($a))` will not be removed (and there is no
 166 obvious way in which it can be removed).
 167
 168 Next to removing nodes, it is also possible to replace one node with multiple nodes. Again, this
 169 only works inside leaveNode and only if the parent structure is an array.
 170
 171 ```php
 172 public function leaveNode(Node $node) {
 173     if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) {
 174         // Convert "return foo();" into "$retval = foo(); return $retval;"
 175         $var = new Node\Expr\Variable('retval');
 176         return [
 177             new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)),
 178             new Node\Stmt\Return_($var),
 179         ];
 180     }
 181 }
 182 ```
 183
 184 Short-circuiting traversal
 185 --------------------------
 186
 187 An AST can easily contain thousands of nodes, and traversing over all of them may be slow,
 188 especially if you have more than one visitor. In some cases, it is possible to avoid a full
 189 traversal.
 190
 191 If you are looking for all class declarations in a file (and assuming you're not interested in
 192 anonymous classes), you know that once you've seen a class declaration, there is no point in also
 193 checking all it's child nodes, because PHP does not allow nesting classes. In this case, you can
 194 instruct the traverser to not recurse into the class node:
 195
 196 ```
 197 private $classes = [];
 198 public function enterNode(Node $node) {
 199     if ($node instanceof Node\Stmt\Class_) {
 200         $this->classes[] = $node;
 201         return NodeTraverser::DONT_TRAVERSE_CHILDREN;
 202     }
 203 }
 204 ```
 205
 206 Of course, this option is only available in enterNode, because it's already too late by the time
 207 leaveNode is reached.
 208
 209 If you are only looking for one specific node, it is also possible to abort the traversal entirely
 210 after finding it. For example, if you are looking for the node of a class with a certain name (and
 211 discounting exotic cases like conditionally defining a class two times), you can stop traversal
 212 once you found it:
 213
 214 ```
 215 private $class = null;
 216 public function enterNode(Node $node) {
 217     if ($node instanceof Node\Stmt\Class_ &&
 218         $node->namespaceName->toString() === 'Foo\Bar\Baz'
 219     ) {
 220         $this->class = $node;
 221         return NodeTraverser::STOP_TRAVERSAL;
 222     }
 223 }
 224 ```
 225
 226 This works both in enterNode and leaveNode. Note that this particular case can also be more easily
 227 handled using a NodeFinder, which will be introduced below.
 228
 229 Multiple visitors
 230 -----------------
 231
 232 A single traverser can be used with multiple visitors:
 233
 234 ```php
 235 $traverser = new NodeTraverser;
 236 $traverser->addVisitor($visitorA);
 237 $traverser->addVisitor($visitorB);
 238 $stmts = $traverser->traverser($stmts);
 239 ```
 240
 241 It is important to understand that if a traverser is run with multiple visitors, the visitors will
 242 be interleaved. Given the following AST excerpt
 243
 244 ```
 245 Stmt_Return(
 246     expr: Expr_Variable(
 247         name: foobar
 248     )
 249 )
 250 ```
 251
 252 the following method calls will be performed:
 253
 254 ```
 255 $visitorA->enterNode(Stmt_Return)
 256 $visitorB->enterNode(Stmt_Return)
 257 $visitorA->enterNode(Expr_Variable)
 258 $visitorB->enterNode(Expr_Variable)
 259 $visitorA->leaveNode(Expr_Variable)
 260 $visitorB->leaveNode(Expr_Variable)
 261 $visitorA->leaveNode(Stmt_Return)
 262 $visitorB->leaveNode(Stmt_Return)
 263 ```
 264
 265 That is, when visiting a node, enterNode and leaveNode will always be called for all visitors.
 266 Running multiple visitors in parallel improves performance, as the AST only has to be traversed
 267 once. However, it is not always possible to write visitors in a way that allows interleaved
 268 execution. In this case, you can always fall back to performing multiple traversals:
 269
 270 ```php
 271 $traverserA = new NodeTraverser;
 272 $traverserA->addVisitor($visitorA);
 273 $traverserB = new NodeTraverser;
 274 $traverserB->addVisitor($visitorB);
 275 $stmts = $traverserA->traverser($stmts);
 276 $stmts = $traverserB->traverser($stmts);
 277 ```
 278
 279 When using multiple visitors, it is important to understand how they interact with the various
 280 special enterNode/leaveNode return values:
 281
 282  * If *any* visitor returns `DONT_TRAVERSE_CHILDREN`, the children will be skipped for *all*
 283    visitors.
 284  * If *any* visitor returns `STOP_TRAVERSAL`, traversal is stopped for *all* visitors.
 285  * If a visitor returns a replacement node, subsequent visitors will be passed the replacement node,
 286    not the original one.
 287  * If a visitor returns `REMOVE_NODE`, subsequent visitors will not see this node.
 288  * If a visitor returns an array of replacement nodes, subsequent visitors will see neither the node
 289    that was replaced, nor the replacement nodes.
 290
 291 Simple node finding
 292 -------------------
 293
 294 While the node visitor mechanism is very flexible, creating a node visitor can be overly cumbersome
 295 for minor tasks. For this reason a `NodeFinder` is provided, which can find AST nodes that either
 296 satisfy a certain callback, or which are instanced of a certain node type. A couple of examples are
 297 shown in the following:
 298
 299 ```php
 300 use PhpParser\{Node, NodeFinder};
 301
 302 $nodeFinder = new NodeFinder;
 303
 304 // Find all class nodes.
 305 $classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class);
 306
 307 // Find all classes that extend another class
 308 $extendingClasses = $nodeFinder->findInstanceOf($stmts, function(Node $node) {
 309     return $node instanceof Node\Stmt\Class_
 310         && $node->extends !== null;
 311 });
 312
 313 // Find first class occuring in the AST. Returns null if no class exists.
 314 $class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class);
 315
 316 // Find first class that has name $name
 317 $class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) {
 318     return $node instanceof Node\Stmt\Class_
 319         && $node->resolvedName->toString() === $name;
 320 });
 321 ```
 322
 323 Internally, the `NodeFinder` also uses a node traverser. It only simplifies the interface for a
 324 common use case.
 325
 326 Parent and sibling references
 327 -----------------------------
 328
 329 The node visitor mechanism is somewhat rigid, in that it prescribes an order in which nodes should
 330 be accessed: From parents to children. However, it can often be convenient to operate in the
 331 reverse direction: When working on a node, you might want to check if the parent node satisfies a
 332 certain property.
 333
 334 PHP-Parser does not add parent (or sibling) references to nodes by itself, but you can easily
 335 emulate this with a visitor. See the [FAQ](FAQ.markdown) for more information.