Semantic Code Validation

Ivaylo Fiziev
Feb 27, 2023
3 min read

Next I would like to talk about semantic code validation. This is also widely used in each and every development environment. It happens as you type. The goal is to pinpoint problematic areas in the code (usually by underlying them with a wavy red line). However these are not syntax related errors. The code tend to be grammatically correct but the meaning is unclear. And this is also functionality that we take for granted but hardly anyone thinks about the inner-workings of it.

A typical example for a semantic check is: Making sure a variable actually exists when using it in the code. In this particular case the 'jointName' array is missing.

Of course the validation should also provide a hint explaining the problem. Usually a tooltip is used for that. In v2304 we are still missing the tooltip unfortunately. I hope we'll have time to implement it at some point … you see … the text editor that we use is not that perfect and needs to be replaced.

Anyway here again we have support from ANTLR4.

The parser rules that we define in the grammar file are used by ANTLR to generate code. The parser rules are usually represented by classes in the target language. In our case - C++ classes. When the parser does its job (recognizing the syntax) it actually builds a tree.

They call it the "Program Tree". The tree is the data model representing your program. This data model is what makes it possible for a computer to understand programming languages. The tree is used for code validation, interpretation, compilation, debugging etc.

Each node in the tree is an instance of a class generated by ANTLR. Child nodes can be accessed using a pointer stored in the parent node. So eventually the grammar that you define determines how the program tree will look like. The better you define the grammar the easier it will be to process the tree later.

Here is how a typical ANTLR program tree looks like.

So what's next? How do I use this tree to validate the code? Well ANTLR4 provides you with the tools (visitor or listener) to traverse the tree. And while traversing you do some language specific checks (like checking if the variables exist). This is exactly what we do in SCL. e.g. we have a semantic visitor.

This all sounds pretty impressive but what happens when the syntax is wrong? In this case ANTLR tries to recover from the syntax error and do it's best to continue parsing. You can imagine that this will lead to building an inadequate program tree at the end. Sure you cannot rely on that tree. If you use it for validation it will pinpoint a number of fake errors in the code. This is why a grammar check should be done prior the semantic check. And the grammar check is something ANTLR provides for us out of the box. It reports the problematic tokens in the text based on the grammar.