Code highlighting based on the lexer output

Ivaylo Fiziev
Feb 20
2 min read

How does text highlighting work in an integrated development environment? Tough question ... The straight-forward approach would be to introduce some language specific highlighting rules in the text editor. Usually this means to roughly describe the language syntax (keywords, identifiers, operators, literals etc.) in an xml file, providing the regular expressions and their related colors. This is what we do in Process Simulate in the SCL editor and the Robot Program Viewer. However with this approach we have to update the highlighting rules from time to time (when we enrich the language syntax). Often it is the case that we forget to do it and have to come back to this after the development has already finished. Leaving this aside updating these rules is also a painful process in case you have pattern collisions for the regular expressions. (example: BYTE vs. BYTE#10). Sometimes it is not possible to come up with the right rule and this inevitably leads to a bug. The struggle is real. We have to introduce the syntax in the language parser and then do it again just for the highlighting to work. But is this really necessary? Isn't there a better way to do it?

Of course there is - use the output from the lexer as a feed for the highlighting logic. This way we will update only the language syntax. The colors will come for granted. Now ... how is this possible?

If you think about it the lexer is responsible for splitting the program text into known lexemes. Lexemes are the supported primitives of the language. For each of them the lexer allocates a token object. The token object carries the start/end offset of the lexeme (within the program text), its type id (basically an integer) as well as its text. So what is left for us is to collect all the tokens that the lexer can recognize and then use the data inside to build a language neutral representation of the text attributes. Then the text editor uses this representation to apply the colors.

Language neutral? Yes. This means that another language can use the same mechanism. It just needs the right abstraction. But how do we abstract a language? Well every high level language is build around the the same concept. It has keywords, identifiers, operators, literals etc.) and these are the primitives that we need to highlight. The lexer gives us this separation for free. Additional benefit is the lack of ambiguity among the tokens. They are always uniquely recognized.

In v2509 the SCL debugger will be the first one to rely on the SCL lexer for the colors.

I must say it works really well. SCL editor and Robot Program Viewer need to migrate to a new text editor control first.

Hope you like it!

Code highlighting based on the lexer output

Recent Posts

Comments

Subscribe Form