C# Code Analysis With Roslyn's Syntax Trees

watch_later 07 April, 2021

Introduction

In the previous article, we compared Visual Studio's templates that one can use to build their own static analyzer. There we detailed how to write your first simple analyzer and talked about how Roslyn maps source code as syntax trees. Now we will talk about syntax tree structure in more detail. You will learn more about using the CSharpSyntaxWalker class to traverse syntax tree nodes.

Syntax Trees

We have already touched upon the syntax tree concept in the previous article. Let's delve deeper into the topic and talk more about syntax trees. As you already know, syntax trees are built based on source code. Each .cs source code file has its own syntax tree. One way to look at a syntax tree is to use the Syntax Visualizer window. If you've already installed the .NET Compiler Platform SDK for your VS, use the following path to access this window: View -> Other Windows -> Syntax Visualizer.

Syntax Trees


This is a very useful tool, especially for those who are just starting to learn about syntax trees and element types they contain. When you navigate the code in the VS editor, the Syntax Visualizer proceeds to and highlights the tree element that corresponds to your position in the code. The Syntax Visualizer also shows certain properties for a currently selected element. For example, the screenshot above shows the MethodDeclaration highlighted element's type: MethodDeclarationSyntax. To get an even better tree visualization, open the context menu for the item you selected within the Syntax Visualizer, and click "View Directed Syntax Graph". If your context menu lacks this item, install the DGML editor. To install it, open the Visual Studio Installer. Locate the required VS installation instance, and click More -> Modify. Then switch to the following tab: Individual Component -> Code tools -> DGML editor.

Take a look at the code below:
if (number > 0)
{
 
}
Here's the syntax tree Roslyn builds for this code in the DGML editor:

Flowchart


In the image above, the tree's elements are highlighted in four colors. In fact, all tree elements fall into 3 groups:
  • Syntax nodes (blue) - syntax tree nodes;
  • Syntax tokens (green) - tokens;
  • Syntax trivia (white and gray) - various additional syntax information.
Let's look at each tree element type in more detail.

Syntax Nodes


Syntax nodes (further referred to as nodes) are syntax constructs, such as declarations, operators, expressions, etc. A lot of code analysis involves processing nodes. Code analyzers traverse these nodes. Diagnostic rules are written for specific node types. The basic node type is the SyntaxNode abstract class. Each node that represents a language construct has its own type (inherited from SyntaxNode). This type defines a number of properties that facilitate working with the tree. Here are some node types along with the language constructs they represent:
  • IfStatementSyntax – the if statement;
  • InvocationExpressionSyntax – a method call;
  • ReturnStatementSyntax – the return operator;
  • MemberAccessExpressionSyntax – access to structure/class members.
For example, the IFStatementSyntax class inherits features from the SyntaxNode class, and also contains the following useful properties: Condition (the statement's condition), Statement (the if statement's body), and Else (the else block).

The SyntaxNode abstract class provides methods that apply to all nodes. Below are some of these methods:
  • ChildNodes - gets the current node's child nodes.
  • DescendantNodes - gets a list of all nodes that are the current node's descendants.
  • Contains - checks whether the current node includes another node, which is passed as an argument;
  • IsKind - takes a SyntaxKind enumeration element as a parameter and returns a boolean value. Determines whether a particular node type matches the type of the node passed as an argument.
The class also defines a number of properties. For example, the Parent property is frequently used. It returns a reference to the parent node.

Syntax Tokens


Syntax tokens (further referred to as tokens) are language grammar terminals. Tokens are elements that cannot be further deconstructed. They include identifiers, keywords, and special characters. Code analysis involves them much less often than nodes. Analyzers may use tokens to get their text representation to check token type.

Syntax Trivia


Syntax trivia (additional syntax information) - includes those tree elements that are not compiled into IL-code. These are formatting elements (spaces, new line characters), comments, and processor directives.

Tokens carry additional syntax information. There are concepts of Leading trivia and Trailing trivia. Leading trivia is additional syntax information that precedes the token (these elements are marked white in the image above). Trailing trivia (elements marked in gray) is additional syntax information that follows tokens.

How to Create a Diagnostic Rule: Tree Nodes


Now let's create a diagnostic rule that relies on tree node analysis. For example, I've just thought of a useful rule. The diagnostic would detect code fragments where an if statement's true and false branches match. You might say, no one makes such mistakes? However, this error is surprisingly frequent - just take a look at errors that were found in open-source projects. 

As a base for our analyzer, we'll use the "Standalone Code Analysis Tool" project template. When creating the analyzer project, let's specify "MyAnalyzer" in the "Project name" field. Then remove all methods except for Main from the project's Program.cs file. Also, remove all code from inside the Main method. Now create a new solution with a console application project. Name it "MyTest". This is a project for the "MyAnalyzer" static analyzer to test. Add the following method to "MyTest" project's Program.cs file:
public static void MyFunc1(int count)
{
    if (count > 100)
    {
        Console.WriteLine("Hello world!");
    }
    else
    {
        Console.WriteLine("Hello world!");
    }
}
This is the code example that must trigger our diagnostic. Now add the GetProjectFromSolution method to the "MyAnalyzer" project. This method gets a reference to the "MyTest" project:
public static Project GetProjectFromSolution(String solutionPath)
{
    MSBuildLocator.RegisterDefaults();
    MSBuildWorkspace workspace = MSBuildWorkspace.Create();
    Solution currSolution = workspace.OpenSolutionAsync(solutionPath)
                                     .Result;
 
    return currSolution.Projects.Single();
}
Add the warnings static field of the StringBuilder type. The field will store error messages:
public static StringBuilder warnings = new StringBuilder();
The analyzer's Main method will read like this:
static void Main(string[] args)
{
    string solutionPath = @"D:\Test\TestApp.sln";
    string logPath = @"D:\Test\warnings.txt";
    Project project = GetProjectFromSolution(solutionPath);
 
    foreach (var document in project.Documents)
    {
        var tree = document.GetSyntaxTreeAsync().Result;
        var ifStatementNodes = tree.GetRoot()
                                   .DescendantNodesAndSelf()
                                   .OfType<IfStatementSyntax>();
 
        foreach (var ifStatement in ifStatementNodes)
        {
 
            if (ApplyRule(ifStatement))
            {
                int lineNumber = ifStatement.GetLocation()
                                            .GetLineSpan()
                                            .StartLinePosition.Line + 1;
 
                warnings.AppendLine($"'if' with equal 'then' and " +
                                    $"'else' blocks is found in file" +
                                    $" {document.FilePath} at line" +
                                    $" {lineNumber}");
            }
        }
    }
 
    if (warnings.Length != 0)
        File.AppendAllText(logPath, warnings.ToString());
 
}
If you read the previous article about on creating a static analyzer from a Visual Studio project template, then the Main method's code above must be familiar to you. However, I'll still talk through the list of necessary steps. In the Main method, get a reference to a Project type object and access its Documents collection property. Use the foreach loop to traverse elements within the collection. For each Document type element, get a syntax tree and its root. Then use a LINQ query to get all IfStatementSyntax type nodes that belong to the current root. Iterate through this list and run our diagnostic (the ApplyRule method) for the current node. If the diagnostic detects that the current node's then block is identical to the else block, we get an error.

Here's what the ApplyRule method looks like in our case:
public static bool ApplyRule(IfStatementSyntax ifStatement)
{
    if (ifStatement.Else == null)
        return false;
 
    StatementSyntax thenBody = ifStatement.Statement;
    StatementSyntax elseBody = ifStatement.Else.Statement;
 
    return SyntaxFactory.AreEquivalent(thenBody, elseBody);
}
Note the mechanics of how the ApplyRule method works. It calls the SyntaxFactory.AreEquivalent method. The SyntaxFactory class belongs to the Microsoft.CodeAnalysis.CSharp.dll assembly. The AreEquivalent method checks two nodes for equality.

For a complete picture, here's "MyAnalyzer's" complete code:

How to Create a Diagnostic Rule: CSharpSyntaxWalker


In the example above, we accessed the tree root and used a LINQ query to retrieve IfStatementSyntax nodes required for analysis. But there is an alternative - and more elegant - solution: you can use the CSharpSyntaxWalker class. CSharpSyntaxWalker is an abstract class. You can call its Visit method and pass the required syntax tree. The method will traverse the tree. The CSharpSyntaxWalker class defines many methods called when a node is visited. For example, when CSharpSyntaxWalker visits the IfStatementSyntax node, the VisitIfStatement method is called.

Our task, however, is to create a CSharpSyntaxWalker class descendant, and override a required method that is called when the code visits a particular language construct.
public class IfWalker : CSharpSyntaxWalker
{
    public override void VisitIfStatement(IfStatementSyntax node)
    {
        if (ApplyRule(node))
        {
            int lineNumber = node.GetLocation()
                                 .GetLineSpan()
                                 .StartLinePosition.Line + 1;
 
            warnings.AppendLine($"'if' with equal 'then' and " +
                                $"'else' blocks is found in file" +
                                $" {node.SyntaxTree.FilePath} at line" +
                                $" {lineNumber}");
        }
        base.VisitIfStatement(node);
 
    }
}
Note that the overridden VisitIfStatement method calls the base.VisitIfStatement method. This is necessary so that our IfWalker type object visits all of the current node's children.

Let's create a method that uses an IfWalker type object to traverse a tree:
public static void StartWalker(SyntaxNode syntaxNode)
{
    var walker = new IfWalker();
    walker.Visit(syntaxNode);
}
Here's what our Main method will look like then:
static void Main(string[] args)
{
    string solutionPath = @"D:\Test\TestApp.sln";
    string logPath = @"D:\Test\warnings.txt";
    Project project = GetProjectFromSolution(solutionPath);
 
    foreach (var document in project.Documents)
    {
        warnings.Clear();
        var tree = document.GetSyntaxTreeAsync().Result;
        StartWalker(tree.GetRoot());
 
        if (warnings.Length != 0)
            File.AppendAllText(logPath, warnings.ToString());
    }
}
It's upon you to decide how you prefer to get tree nodes for analysis. You can use a LINQ query or override the CSharpSyntaxWalker class methods - choose whichever is best for your current tasks. I would recommend to override the CSharpSyntaxWalker methods if your analyzer contains many diagnostic rules. If your analysis tool is very simple and you intend to process nodes of one type only - use a LINQ query.

Summary


In this article we learned a bit more about Roslyn's syntax trees and their elements. Saw how to traverse a syntax tree using different techniques. Wrote our diagnostic for detecting if then block is identical to the else block of if statement.

Credits

Article Type : Guest Article
Author : Ilya Gainulin
Tags : CSharp, Knowledge
Article Date : 26-02-2021
Article Publish Date : 07-04-2021
Note : All content of this article are copyright of their author.

Codingvila provides articles and blogs on web and software development for beginners as well as free Academic projects for final year students in Asp.Net, MVC, C#, Vb.Net, SQL Server, Angular Js, Android, PHP, Java, Python, Desktop Software Application and etc.

Thank you for your valuable time, to read this article, If you like this article, please share this article and post your valuable comments.

Once, you post your comment, we will review your posted comment and publish it. It may take a time around 24 business working hours.

If you have any questions regarding this article/blog you can contact us on info.codingvila@gmail.com

sentiment_satisfied Emoticon