The AST, Abstract Syntax Tree, is a tree representation of *something*, in our case of the regular expression we are parsing. To use TDRDP, once all these functions are created, we call the top-level production, serving as an entry point, which in our case is the one corresponding to the non-terminal RE and the recursive calls will take care of the entire process of parsing by themselves.Īt the end, the function we called will return us the result of the parsing, whether it is a boolean, or something more complex like an AST as it is in our case. In this case, the function corresponding to the non-terminals RANGE_EL and RE_SEQ itself can be called, hence the recursion.
Each function can call the functions of the non-terminals that appears in its productions.įor example, our grammar has the production rule RE_SEQ ::= '^'? GROUP ‘$'? ('|' RE_SEQ)?in which two non-terminals appear on the right side, RANGE_EL and RE_SEQ itself.Well, it turns out the recursion come up in the functions calls:
Now you’re probably curious about where the recursion comes up. So, in our project, a function to parse the RE non-terminal, one for the GROUP, one for the RANGE_EL and so on.Īs you noticed, in the techniques’ acronym appear the infamous word ‘recursive’ (and now you’re all scared). To implement a recursive-descent parser, we have to create a function for each non-terminal symbol of the grammar. I won’t get into the details of when and how you can apply this technique, there’s already plenty of good articles about it, explaining it way better than I could.Īnyway, luckily (?) the grammar we built in the first article supports it. TDRDP stands for Top-Down Recursive Descent Parsing - a useful technique to create a parser. In this article, we will finally begin the parser, the second-hardest part of the project (the first is the engine). In the previous articles we spoke about how to define our grammar, how to build the lexer, and a bit about Python strings.