Introduction

Learning a new programming language is always hard. You have to read the documentation, stop, go to the IDE of your choice, start a new project, compile and run the program and see the results. Stop. Restart. After some time, you have no more patience for the process and start to skip some steps (maybe I don’t need to run this sample here…) until the point that you try to find an easier way, but most of the time there is no easier way.

Well, now there is an easier way. The dotnet team has created a new tool, called dotnet try that allows you to have documentation mixed with a code sample window, where you can try the code while reading the documentation. Cool, no?

Starting with dotnet try

To use this tool, you must install dotnet core in your machine and then install dotnet try with:

dotnet tool install -g dotnet-try

Once you do that, you can use the tool with dotnet try. But, the better way to start is to use the tool’s own demo with dotnet try demo. This will load the tutorial, create a server and open the browser with the tutorial:

There you can see the documentation side-by-side with the code. You can change the code and click on the arrow key and look at the results. Nice! You can  follow the tutorial and learn how to create your documentation easily. Or you can continue reading and I will tell you how to use it (sorry, no dotnet try here :-)).

If you want to learn something about .NET, you can run the samples. Just clone the dotnet/try-samples repository, change to the cloned folder and type dotnet try to open it.

There you can see some tutorials:

  • Beginner’s guide
  • 101 Linq Samples
  • C# 7
  • C# 8

With that, you can start playing with the tools. I recommend checking the Linq samples, there is always something to learn about Linq, and the new features in C#7 and 8.

Once you’ve played with the tool, it’s the time to create your own documentation.

Creating documentation for your code

All documentation is based on the MarkDown language. It’s a very simple way to create your documentation, If you don’t know how to use the MarkDown language, you can learn it here.

Create a new folder and, in this folder, create a readme.md file and add this code:

# Documenting your code with **dotnet try**
## Introduction
To document your code with dotnet try, you must mix your markdown text with some fences

Once you save it and run dotnet try, you can see something like this in the browser:

Now you can start adding code. To add the code window, we must add a code fence, a block of code delimited by triple backticks. You can add code directly to the code fence with something like this:

using System;

namespace dotnettrypost
{
    class Program
    {
        static void Main(string[] args)
        {
            #region HelloWorld
            Console.WriteLine("Hello World!");
            #endregion
        }
    }
}

This is great, but can pose a problem: if you need to change something in your code, you must also change the markdown file, and this is less than optimum, as at some point you will forget to update both places. But there is a better way: you can use the source code as the only source of truth and you don’t have to update both places – when you change the source code, the code in the page changes accordingly. To do this, you can create a console app with

dotnet new console

This creates a simple project that writes Hello World in the console. It’s pretty simple, but fits our needs. Now, we will document it. You can add the following code at the end of the readme.md file to show the contents of the program in editable form:

Below, you should see the code in the program.cs file:
```cs --source-file ./Program.cs --project ./dotnettrypost.csproj
```

In this code fence, you must add the name of the file and the project. If you run dotnet try again, you will see something like this:

You can click the arrow to run the code, the first time, it will take a little bit to compile the code, but the other times, it will be faster. You can change the code and see the results:

You have here a safe environment, where you can try changes in the code. If you make a mistake, the window will point it, so you can fix it.

As you can see, the code for the entire file is shown, but sometimes that’s not what you want. Sometimes you just want to show a code snippet from the file. To do that, you must create regions in your code. In the command prompt, type code . to open Visual Studio Code (you must have it installed. If you don’t have it installed, go to https://code.visualstudio.com/download). Then add a region to your code:

using System;

namespace dotnettrypost
{
    class Program
    {
        static void Main(string[] args)
        {
            #region HelloWorld
            Console.WriteLine("Hello World!");
            #endregion
        }
    }
}

 

Save the file and then edit the Readme.md file to change the code fence:

```cs --source-file ./Program.cs --project ./dotnettrypost.csproj --region HelloWorld
```

When you save the file and refresh the page, you will see something like this:

Creating regions in your code and referencing them in the code fence allows you to separate the code you want in the code window. That way, a single file can provide code for many code windows.

If you don’t want to have a runnable window, but still synchronized with the code, you can use the editable parameter:

```cs --source-file ./Program.cs --project ./dotnettrypost.csproj --region HelloWorld --editable false
```

When you are running the code, even if you aren’t showing the full code, everything in the program is executed. For example, if you change the source code to:

using System;
namespace dotnettrypost
{
    class Program
    {
        static void Main(string[] args)
        {
            #region HelloWorld
            Console.WriteLine("Hello World!");
            #endregion
            Console.WriteLine("This won't be shown in the snippet but will be shown when you run");
        }
    }
}

You will see the same code snippet but when you run it you will see something like this:

If you don’t want this, you need to do some special treatment in your code: you must make each snippet of code run alone. You can do that by processing the parameters passed to the program:

using System;

namespace dotnettrypost
{
    class Program
    {
        static void Main(string[] args)
        {
            for (var i = 0; i < args.Length; i++)
                if (args[i] == "--region")
                {
                    if (args[i + 1] == "HelloWorld")
                        HelloWorld();
                    else if (args[i + 1] == "ByeBye")
                        ByeBye();
                }
        }

        private static void HelloWorld()
        {
            #region HelloWorld
            Console.WriteLine("Hello World!");
            #endregion
        }

        private static void ByeBye()
        {
            #region ByeBye
            Console.WriteLine("ByeBye!");
            #endregion
        }
    }
}

Now, when you run the code, it will run only the code for the HelloWorld snippet. As you can see there are a lot of options to show and run code in the web page. I see many ways to improve documentation and experimentation of new APIs. And you, don’t this bring you new ideas?

All the source code for this project is at https://github.com/bsonnino/dotnettry

One of the perks of being an MVP is to receive some tools for my own use, so I can evaluate them and if, I find them valuable, I can use them on a daily basis. On my work as an architect/consultant, one task that I often find is to analyse an application and suggest changes to make it more robust and maintainable. One tool that can help me in this task is NDepend (https://www.ndepend.com/). With this tool, you can analyse your code, verify its dependencies, set coding rules and verify if the how code changes are affecting the quality of your application. For this article, we will be analyzing eShopOnWeb (https://github.com/dotnet-architecture/eShopOnWeb), a sample application created by Microsoft to demonstrate architectural patterns on Dotnet Core. It’s an Asp.NET MVC Core 2.2 app that shows a sample shopping app, that can be run on premises or in Azure or in containers, using a Microservices architecture. It has a companion book that describes the application architecture and can be downloaded at https://aka.ms/webappebook.

When you download, compile and run the app, you will see something like this:

You have a full featured Shopping app, with everything you would expect to have in this kind of app. The next step is to start analyzing it.

Download NDepend (you have a 14 day trial to use and evaluate it), install it in your machine, it will install as an Add-In to Visual Studio. The, in Visual Studio, select Extensions/NDepend/Attach new NDepend Project to current VS Solution. A window like this opens:

It has the projects in the solution selected, you can click on Analyze 3 .NET Assemblies. After it runs, it opens a web page with a report of its findings:

This report has an analysis of the project, where you can verify the problems NDepend found, the project dependencies, and drill down in the issues. At the same time, a window like this opens in Visual Studio:

If you want a more dynamic view, you can view the dashboard:

 

In the dashboard, you have a full analysis of your application: lines of code, methods, assemblies and so on. One interesting metric there is the technical debt, where you can see how much technical debt there is in this app and how long will it take to fix this debt (in this case, we have 3.72% of technical debt and 2 days to fix it. We have also the code metrics and violated coding rules. If you click in some item in the dashboard, like the # of Lines, you will see the detail in the properties window:

If we take a look at the dashboard, we’ll see some issues that must be improved. In the Quality Gates, we have two issues that fail. By clicking on the number 2 there, we see this in the Quality Gates evolution window:

If we hover the mouse on one of the failed issues, we get a tooltip that explains the failure:

If we double-click in the failure we drill-down to what caused it:

If we click in the second issue, we see that there are two names used in different classes: Basket and IEmailSender:

Basket is the name of the class in Microsoft.eShopWeb.Web.Pages.Shared.Components.BasketComponent and in Microsoft.eShopWeb.ApplicationCore.Entities.BasketAggregate

One other thing that you can see is the dependency graph:

With it, you can see how the assemblies relate to each other and give a first look on the architecture. If you filter the graph to show only the application assemblies, you have a good overview of what’s happening:

The largest assembly is Web, followed by Infrastructure and ApplicationCore. The application is well layered, there are no cyclic calls between assemblies (Assembly A calling assembly B that calls A), there is a weak relation between Web and Infrastructure (given by the width of the line that joins them) and a strong one between Web and ApplicationCore. If we have a large solution with many assemblies, just with that diagram, we can take a look of what’s happening and if we are doing the things right. The next step is go to the details and look at the assemblies dependencies. You can hover the assembly and get some info about it:

For example, the Web assembly has 58 issues detected and has an estimated time to fix them in 1 day. This is an estimate calculated by NDepend using the complexity of the methods that must be fixed, but you can set your own formula to calculate the technical debt if this isn’t ok for your team.

Now that we got an overview of the project, we can start fixing the issues. Let’s start with the easiest ones :-).  The Infrastructure assembly has only two issues and a debt of 20 minutes. In the diagram, we can right-click on the Infrastructure assembly and select Select Issues/On Me and on Child Code elements. This will open the issues in the Queries and Rules Edit window, at the right:

We can then open the tree and double click on the second issue. It will point you to a rule “Non-static classes should be instantiated or turned to static”, pointing to the SpecificatorEvaluator<T> class. This is a class that has only one static method and is referenced only one time, so there’s no harm to make it static.  Once you make it static, build the app and run the dependency check again, you will see this:

Oh-Oh. We fixed an issue and introduced another one – an API Breaking Change – when we made that class static, we removed the constructor. In this case, it wasn’t really an API change, because nobody would instantiate a class with only static methods, so we should do a restart, here. Go to the Dashboard, in Choose Baseline and select define:

Then select the most recent analysis and click OK.  That will open the NDepend settings, where you will see the new baseline. Save the settings and rerun the analysis and the error is gone.

We can then open the tree again and double click on another issue that remains in Infrastructure. That will open the source code, pointing to a readonly declaration for a DBContext. This is not a big issue, it’s only telling us that we are declaring the variable as readonly, but the object it’s storing is mutable, so it can be changed. There is a mention of this issue in the Framework Design Guidelines, by Microsoft – https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/. If you hover the mouse on the issue, there is a tooltip on how to fix it:

We have three ways to fix this issue:

  • Remove the readonly from the field
  • Make the field private and not protected
  • Use an attribute to say “ok, I am aware of this, but I don’t mind”

The first option will suppress the error, but will remove what we want to do – show that this field should not be entirely replaced with another dbcontext. The second option will remove the possibility to use the dbcontext in derived classes, so I’ll choose the third option and add the attribute. If I right-click on the issue in the Rules and Queries window and select Suppress Issue, a window opens:

All I have to do is to copy the attribute to the clipboard and paste it into the source code. I also have to declare the symbol CODE_ANALYSIS in the project (Project/Properties/Build). That was easy! Let’s go to the next one.

This is an obsolete method used. Fortunately, the description shows us to use the UseHiLo method. We change the method, run the app to see if there’s nothing broken, and we’re good. W can run the analysis again and see what happened:

We had a slight decrease in the technical debt, we solved one high issue and one violated rule. As you can see, NDepend not only analyzes your code, but it also gives you a position on what you are doing with your code changes. This is a very well architected code (as it should be – it’s an architecture sample), so the issues are minor, but you can see what can be done with NDepend. When you have a messy project, this will be surely an invaluable tool!

 

One thing that comes to me sometimes when I am debugging a program is to see a variable that has changed value, but I don’t know where and when it was changed. That leaded me to long and boring sessions of debugging, checking where the variable could have its changed.

I just wanted some kind of breakpoint that I could set that told me “break when this variable changes value”, but Visual Studio never had such a feature. Until now.

With .Net Core 3.0, Microsoft introduced a new feature in Visual Studio that will enhance a lot the debugging experience. The only drawback is that it only works with .Net Core 3.0 apps. But, nevertheless, it’s a great, long waited feature!

This feature is a little buried in Visula Studio (no, there is no right-click in a variable name with something like “break when this variable value changes”), so how do I set a data breakpoint?

To show this feature, I will use a .Net Core console app that gets the prime numbers:

static void Main(string[] args)
{
    if (args.Length < 1)
    {
        Console.WriteLine("usage: GetPrimesLessThan <number>");
        return;
    }

    if (!int.TryParse(args[0], out int number))
    {
        Console.WriteLine("parameter should be a valid integer number");
        return;
    }
    var primes = GetPrimesLessThan(number);
    Console.WriteLine($"Found {primes.Length} primes less than {number}");
    Console.WriteLine($"Last prime last than 10000 is {primes.Last()}");
}
private static int[] GetPrimesLessThan(int maxValue)
{
    if (maxValue <= 1)
        return new int[0];
    ;
    var primeArray = Enumerable.Range(0, maxValue).ToArray();
    var sizeOfArray = primeArray.Length;

    primeArray[0] = primeArray[1] = 0;

    for (int i = 2; i < Math.Sqrt(sizeOfArray - 1) + 1; i++)
    {
        if (primeArray[i] <= 0) continue;
        for (var j = 2 * i; j < sizeOfArray; j += i)
            primeArray[j] = 0;
    }

    return primeArray.Where(n => n > 0).ToArray();
}

This algorithm uses an array of integers that are set to zero if the number is not prime. At the end, all non prime numbers in the array are set to zero, so I can extract the primes with this linq line:

return primeArray.Where(n => n > 0).ToArray();

Let’s say that I want to know when the number 10 changes to 0. If I didn’t have this feature, I would have to follow the code, set breakpoints and analyze the data for every hit. Not anymore.

Here’s what must be done:

  • Set a breakpoint in the line just after the PrimeArray declaration
  • Run the code and let it break.
  • Open the Locals Window and expand the PrimeArray node
  • Right Click on the node with the value 10
  • Select Break when Value Changes

That’s it! You have set a data breakpoint for the value 10 in the array. Now let the code run, when the value changes, the code will break:

There you can analyze your code and see what has changed the variable. Nice, no?

This is a great feature and will enhance a lot my debugging, I’m glad that Visual Studio people could add it. I hope that you’ll enjoy it as much as I do.

The source code for the used project is at https://github.com/bsonnino/DataBreakpoint

One book that I recommend the reading is Clean Code, by Robert Martin. It is a well written book with wonderful techniques to create better code and improve your current programs, so they become easier to read, maintain and understand.

While going through it again, I found an excellent opportunity to improve my skills trying to do some refactoring: in listing 4.7 there is a prime generator function that he uses to show some refactoring concepts and turn int listing 4.8. I then thought do do the same and show my results here.

We can start with the listing converted to C#. This is a very easy task. The original program is written in  Java, but converting it to C# is just a matter of one or two small fixes:

using System;

namespace PrimeNumbers
{
/**
* This class Generates prime numbers up to a user specified
* maximum. The algorithm used is the Sieve of Eratosthenes.
* <p>
* Eratosthenes of Cyrene, b. c. 276 BC, Cyrene, Libya --
* d. c. 194, Alexandria. The first man to calculate the
* circumference of the Earth. Also known for working on
* calendars with leap years and ran the library at Alexandria.
* <p>
* The algorithm is quite simple. Given an array of integers
* starting at 2. Cross out all multiples of 2. Find the next
* uncrossed integer, and cross out all of its multiples.
* Repeat untilyou have passed the square root of the maximum
* value.
*
* @author Alphonse
* @version 13 Feb 2002 atp
*/
    public class GeneratePrimes
    {
        /**
        * @param maxValue is the generation limit.
*/
        public static int[] generatePrimes(int maxValue)
        {
            if (maxValue >= 2) // the only valid case
            {
                // declarations
                int s = maxValue + 1; // size of array
                bool[] f = new bool[s];
                int i;

                // initialize array to true.
                for (i = 0; i < s; i++)
                    f[i] = true;
                // get rid of known non-primes
                f[0] = f[1] = false;
                // sieve
                int j;
                for (i = 2; i < Math.Sqrt(s) + 1; i++)
                {
                    if (f[i]) // if i is uncrossed, cross its multiples.
                    {
                        for (j = 2 * i; j < s; j += i)
                            f[j] = false; // multiple is not prime
                    }
                }
                // how many primes are there?
                int count = 0;
                for (i = 0; i < s; i++)
                {
                    if (f[i])
                        count++; // bump count.
                }
                int[] primes = new int[count];
                // move the primes into the result
                for (i = 0, j = 0; i < s; i++)
                {
                    if (f[i]) // if prime
                        primes[j++] = i;
                }
                return primes; // return the primes
            }
            else // maxValue < 2
                return new int[0]; // return null array if bad input.
        }
    }
}

The first step is to put in place some tests, so we can be sure that we are not breaking anything while refactoring the code. In the solution, I added a new Class Library project, named it GeneratePrimes.Tests and added the packages NUnit, NUnit3TestAdapter and FluentAssertions to get fluent assertions in a NUnit test project. Then I added these tests:

using NUnit.Framework;
using FluentAssertions;

namespace PrimeNumbers.Tests
{
    [TestFixture]
    public class GeneratePrimesTests
    {
        [Test]
        public void GeneratePrimes0ReturnsEmptyArray()
        {
            var actual = GeneratePrimes.generatePrimes(0);
            actual.Should().BeEmpty();
        }

        [Test]
        public void GeneratePrimes1ReturnsEmptyArray()
        {
            var actual = GeneratePrimes.generatePrimes(1);
            actual.Should().BeEmpty();
        }

        [Test]
        public void GeneratePrimes2ReturnsArrayWith2()
        {
            var actual = GeneratePrimes.generatePrimes(2);
            actual.Should().BeEquivalentTo(new[] { 2 });
        }

        [Test]
        public void GeneratePrimes10ReturnsArray()
        {
            var actual = GeneratePrimes.generatePrimes(10);
            actual.Should().BeEquivalentTo(new[] { 2,3,5,7 });
        }

        [Test]
        public void GeneratePrimes10000ReturnsArray()
        {
            var actual = GeneratePrimes.generatePrimes(10000);
            actual.Should().HaveCount(1229).And.EndWith(9973);
        }
    }
}

These tests check that there are no primes for 0 and 1, one prime for 2, the primes for 10 are 2, 3, 5, 7 and that there are 1229 primes less than 10,000 and the largest one is 9973. Once we run the tests, we can see that the pass and we can start doing our changes.

The easiest fix we can do is to revise the comments at the beginning. We don’t need the history of Erasthotenes (you can go to Wikipedia for that). We don’t need the author and version, thanks to source control technology :-). We don’t need either the initial comment:

/**
    * This class Generates prime numbers up to a user specified
    * maximum. The algorithm used is the Sieve of Eratosthenes.
    *  https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes   
*/
public class GeneratePrimes
{
    public static int[] generatePrimes(int maxValue)

Then we can invert the initial test, to reduce nesting. If we hover the mouse in the line of the first if, an arrow appears at the border, indicating a quick fix:

We can do the quick fix, then eliminate the else clause (don’t forget to remove the extra comments that are not needed):

public static int[] generatePrimes(int maxValue)
{
    if (maxValue < 2) 
        return new int[0]; 

    // declarations
    int s = maxValue + 1; // size of array
    bool[] f = new bool[s];
    int i;

Save the code and check that all tests pass. The next step is to rename the variables:

  • s can be renamed to sizeOfArray
  • f can be renamed as isPrimeArray

Go to the declaration of s and press Ctrl-R-R to rename and rename it to sizeOfArray. Do the same with the f variable. Don’t forget to remove the comments (and to run the tests):

int sizeOfArray = maxValue + 1; 
bool[] isPrimeArray = new bool[sizeOfArray];
int i;

To go to the next refactorings, we can use the comments as indicators for extracting methods. We can extract the InitializeArray method:

The extracted code isn’t what I expected, so I change it to:

private static bool[] InitializeArray(int sizeOfArray)
{
    bool[] isPrimeArray = new bool[sizeOfArray];
    // initialize array to true.
    for (var i = 0; i < sizeOfArray; i++)
        isPrimeArray[i] = true;
    return isPrimeArray;
}

I can use the code like this:

var isPrimeArray = InitializeArray(sizeOfArray);

After passing the tests, I can refactor the code of InitializeArray to:

private static bool[] InitializeArray(int sizeOfArray)
{
    return Enumerable
        .Range(0, sizeOfArray)
        .Select(n => true)
        .ToArray();
}

The next step is the sieve:

The code for the sieve is really bad:

private static void Sieve(int sizeOfArray, bool[] isPrimeArray, 
    out int i, out int j)
{
    // get rid of known non-primes
    isPrimeArray[0] = isPrimeArray[1] = false;
    for (i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
    {
        if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
        {
            for (j = 2 * i; j < sizeOfArray; j += i)
                isPrimeArray[j] = false; // multiple is not prime
        }
    }
}

It has two out parameters (which, for me, is a code smell), and has an error (the out parameter j must be assigned) before exiting the method. So we can change it to remove the out parameters and remove the sizeOfArray parameter:

private static void Sieve(bool[] isPrimeArray)
{
    var sizeOfArray = isPrimeArray.Length;

    isPrimeArray[0] = isPrimeArray[1] = false;

    for (int i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
    {
        if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
        {
            for (int j = 2 * i; j < sizeOfArray; j += i)
                isPrimeArray[j] = false; 
        }
    }

Then, we can extract the method to count primes:

CountPrimes has the same flaws as Sieve, so we change it to:

private static int CountPrimes(bool[] isPrimeArray)
{
    var sizeOfArray = isPrimeArray.Length;
    var count = 0;
    for (var i = 0; i < sizeOfArray; i++)
    {
        if (isPrimeArray[i])
            count++; 
    }
    return count;
}

We can refactor it to:

private static int CountPrimes(bool[] isPrimeArray) => 
    isPrimeArray.Count(i => i);

The next step is MovePrimes:

After we tweak the MovePrimes code, we get:

private static int[] MovePrimes(bool[] isPrimeArray, int count)
{
    var sizeOfArray = isPrimeArray.Length;
    var primes = new int[count];
    for (int i = 0, j = 0; i < sizeOfArray; i++)
    {
        if (isPrimeArray[i]) // if prime
            primes[j++] = i;
    }
    return primes;
}

Then we can refactor MovePrimes:

 private static int[] MovePrimes(bool[] isPrimeArray, int count) =>
     isPrimeArray
         .Select((p, i) => new { Index = i, IsPrime = p })
         .Where(v => v.IsPrime)
         .Select(v => v.Index)
         .ToArray();

Notice that we aren’t using the primes count in this case, so we can remove the calculation of the count and the parameter. After some cleaning and name changing, we get:

public static int[] GetPrimes(int maxValue)
{
    if (maxValue < 2)
        return new int[0];

    bool[] isPrimeArray = InitializeArray(maxValue);
    Sieve(isPrimeArray);
    return MovePrimes(isPrimeArray);
}

Much cleaner, no? Now, it’s easier to read the method, the details are hidden, but the code still runs the same way. We have a more maintainable method, and it shows clearly what it does.

But there is a change we can do here: we are using static methods only. We can then use extension methods and add the keyword this to allow the methods to be used as extension methods. For example, if we change MovePrimes and Sieve to:

private static int[] MovePrimes(this bool[] isPrimeArray) =>
    isPrimeArray
        .Select((p, i) => new { Index = i, IsPrime = p })
        .Where(v => v.IsPrime)
        .Select(v => v.Index)
        .ToArray();

private static bool[] Sieve(this bool[] isPrimeArray)
{
    var sizeOfArray = isPrimeArray.Length;

    isPrimeArray[0] = isPrimeArray[1] = false;

    for (int i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
    {
        if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
        {
            for (int j = 2 * i; j < sizeOfArray; j += i)
                isPrimeArray[j] = false;
        }
    }
    return isPrimeArray;

We can have the GetPrimes method to be changed to:

public static int[] PrimesSmallerOrEqual(this int maxValue)
{
    if (maxValue < 2)
        return new int[0];

    return maxValue.InitializeArray()
        .Sieve()
        .MovePrimes();
}

Cool, no? With this change, the tests become:

public class GeneratePrimesTests
{
    [Test]
    public void GeneratePrimes0ReturnsEmptyArray()
    {
        0.PrimesSmallerOrEqual().Should().BeEmpty();
    }

    [Test]
    public void GeneratePrimes1ReturnsEmptyArray()
    {
        1.PrimesSmallerOrEqual().Should().BeEmpty();
    }

    [Test]
    public void GeneratePrimes2ReturnsArrayWith2()
    {
        2.PrimesSmallerOrEqual()
            .Should().BeEquivalentTo(new[] { 2 });
    }

    [Test]
    public void GeneratePrimes10ReturnsArray()
    {
        10.PrimesSmallerOrEqual()
            .Should().BeEquivalentTo(new[] { 2, 3, 5, 7 });
    }

    [Test]
    public void GeneratePrimes10000ReturnsArray()
    {
        10000.PrimesSmallerOrEqual()
            .Should().HaveCount(1229).And.EndWith(9973);
    }
}

The full code is at https://github.com/bsonnino/PrimeNumbers. Each commit there is a phase of the refactoring.

Sometimes, when we open an Explorer window in the main computer, we see red bars in some disks, telling us that the disk is almost full and that we need to do some cleanup. We call the system cleanup, that removes some unused space, but this isn’t enough to  make things better.

So, we try to find the duplicate files in the disk to remove some extra space, but we have a problem: where are the duplicate files? The first answer is to check the files with the same name and size, but that isn’t enough – files can be renamed, and still be duplicates.

So, the best thing to do is to find a way to find and list all duplicates in the disk. But how can we do this?

The naive approach is to get all files with the same size and compare them one with the other. But this is really cumbersome, because if there are 100 files in the group, there will be 100!/(2!*98!) = 100*99/2 = 4950 comparisons and has a complexity of O(n^2).

One other approach is to get a checksum of the file and compare checksums. That way, you will still have the O(n^2) complexity, but you’ll have less data to compare (but you will have to compute the time to calculate the checksums). A third approach would be to use a dictionary to group the files with the same hash. The search in a dictionary has a O(1) complexity, so this would do a O(n) complexity.

Now, we only have to choose the checksum. Every checksum has a number of bits and, roughly, the larger the number of bits, the longer it takes to compute it. But the larger number of bits make it more difficult to get wrong results: if you are using CRC16 checksum (16 bits), you will have 65,535 combinations and the probability of two different files have the same checksum is very large. CRC32 allows 2,147,483,647 combinations and, thus, is more difficult to have a wrong result. You can use other algorithms, like MD5 (128 bits), SHA1 (196 bits) or SHA256 (256 bits), but computing these will be way longer than computing the CRC32 bits. As we are not seeking for huge accuracy, but for speed, we’ll use the CRC32 algorithm to compute the hashes. A fast implementation of this algorithm can be found here , and you can use it by installing the CRC32C.NET NuGet package.

From there, we can create our program to find and list the duplicates in the disk. In Visual Studio, create a new WPF application. In the Solution Explorer, right-click on the references node and select the WpfFolderBrowser and Crc32C.NET packages. Then add this code in MainWindow.xaml:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <Button Width="85" Height="30" Content="Start" Click="StartClick"
                HorizontalAlignment="Right" Margin="5" Grid.Row="0"/>
    <Grid Grid.Row="1">
        <Grid.RowDefinitions>
            <RowDefinition Height="*"/>
            <RowDefinition Height="30"/>
        </Grid.RowDefinitions>
        <ScrollViewer HorizontalScrollBarVisibility="Disabled">
        <ItemsControl x:Name="FilesList" HorizontalContentAlignment="Stretch">
            <ItemsControl.ItemTemplate>
                <DataTemplate>
                    <Grid HorizontalAlignment="Stretch">
                        <Grid.RowDefinitions>
                            <RowDefinition Height="30" />
                            <RowDefinition Height="Auto" />
                        </Grid.RowDefinitions>
                        <TextBlock Text="{Binding Value[0].Length, StringFormat=N0}"
                                   Margin="5" FontWeight="Bold"/>
                        <TextBlock Text="{Binding Key, StringFormat=X}"
                                   Margin="5" FontWeight="Bold" HorizontalAlignment="Right"/>
                        <ItemsControl ItemsSource="{Binding Value}" Grid.Row="1" 
                                      HorizontalAlignment="Stretch"
                                      ScrollViewer.HorizontalScrollBarVisibility="Disabled"
                                      Background="Aquamarine">
                            <ItemsControl.ItemTemplate>
                                <DataTemplate>
                                    <TextBlock Text="{Binding FullName}" Margin="15,0"  />
                                </DataTemplate>
                            </ItemsControl.ItemTemplate>
                        </ItemsControl>
                    </Grid>
                </DataTemplate>
            </ItemsControl.ItemTemplate>
        </ItemsControl>
        </ScrollViewer>
        <StackPanel Grid.Row="1" Orientation="Horizontal">
            <TextBlock x:Name="TotalFilesText" Margin="5,0" VerticalAlignment="Center"/>
            <TextBlock x:Name="LengthFilesText" Margin="5,0" VerticalAlignment="Center"/>
        </StackPanel>
    </Grid>
</Grid>

In the button’s click event handler, we will open a Folder browser dialog and, if the user selects a folder, we will process it, enumerating the files and  finding the ones that have the same size. Then, we calculate the Crc32 for these files and add them to a dictionary, grouped by hash:

private async void StartClick(object sender, RoutedEventArgs e)
{
    var fbd = new WPFFolderBrowserDialog();
    if (fbd.ShowDialog() != true)
        return;
    FilesList.ItemsSource = null;
    var selectedPath = fbd.FileName;

    var files = await GetPossibleDuplicatesAsync(selectedPath);
     FilesList.ItemsSource = await GetRealDuplicatesAsync(files);
}

The GetPossibleDuplicatesAsync will enumerate the files and group them by size, returning only the groups that have more than one file:

private async Task<List<IGrouping<long, FileInfo>>> GetPossibleDuplicates(string selectedPath)
{
    List<IGrouping<long, FileInfo>> files = null;
    await Task.Factory.StartNew(() =>
    {
        files = GetFilesInDirectory(selectedPath)
                       .OrderByDescending(f => f.Length)
                         .GroupBy(f => f.Length)
                         .Where(g => g.Count() > 1)
                         .ToList();
    });
    return files;
}

GetFilesInDirectory enumerates the files in the selected directory:

private List<FileInfo> GetFilesInDirectory(string directory)
{
    var files = new List<FileInfo>();
    try
    {
        var directories = Directory.GetDirectories(directory);
        try
        {
            var di = new DirectoryInfo(directory);
            files.AddRange(di.GetFiles("*"));
        }
        catch
        {
        }
        foreach (var dir in directories)
        {
            files.AddRange(GetFilesInDirectory(System.IO.Path.Combine(directory, dir)));
        }
    }
    catch
    {
    }

    return files;
}

After we have the duplicate files grouped, we can search the real duplicates with GetRealDuplicatesAsync:

private static async Task<Dictionary<uint,List<FileInfo>>> GetRealDuplicatesAsync(
    List<IGrouping<long, FileInfo>> files)
{
    var dictFiles = new Dictionary<uint, List<FileInfo>>();
    await Task.Factory.StartNew(() =>
    {
        foreach (var file in files.SelectMany(g => g))
        {
            var hash = GetCrc32FromFile(file.FullName);
            if (hash == 0)
                continue;
            if (dictFiles.ContainsKey(hash))
                dictFiles[hash].Add(file);
            else
                dictFiles.Add(hash, new List<FileInfo>(new[] { file }));
        }
    });
    return dictFiles.Where(p => p.Value.Count > 1).ToDictionary(p => p.Key, p => p.Value);
}

The GetCrc32FromFile method with use the Crc32C library to compute the Crc32 hash from the file. Note that we can’t compute the hash in one pass, by reading the whole file, as this will fail with files with more than 2Gb. So, we read chunks of 10,000 bytes and process them.

public static uint GetCrc32FromFile(string fileName)
{
    try
    {
        using (FileStream file = new FileStream(fileName, FileMode.Open))
        {
            const int NumBytes = 10000;
            var bytes = new byte[NumBytes];
            var numRead = file.Read(bytes, 0, NumBytes);
            if (numRead == 0)
                return 0;
            var crc = Crc32CAlgorithm.Compute(bytes, 0, numRead);
            while (numRead > 0)
            {
                numRead = file.Read(bytes, 0, NumBytes);
                if (numRead > 0)
                    Crc32CAlgorithm.Append(crc, bytes, 0, numRead);
            }
            return crc;
        }
    }
    catch (Exception ex) when (ex is UnauthorizedAccessException || ex is IOException)
    {
        return 0;
    }
}

Now, when you run the app, you will get something like this:

You can then verify the files you want to remove and then go to Explorer and remove them. But there is one thing to do here: the time to compute the hash is very large, especially if you have a lot of data to process (large files, large number of files or both). Could it be improved?

This issue is somewhat complicated to solve. Fortunately, .NET provide us with an excellent tool to improve performance in this case: Parallel programming. By making a small change in the code, you can calculate the CRC of the files in parallel, thus improving the performance. But there is a catch: we are using classes that are not thread safe. If you use the common Dictionary and List to store the data, you will end up with wrong results. But, once again, .NET comes to rescue us: it provides the ConcurrentDictionary and ConcurrentBag to replace the common classes, so we can store the data in a thread safe way. We can then change the code to this:

private static async Task<Dictionary<uint, List<FileInfo>>> GetRealDuplicatesAsync(
    List<IGrouping<long, FileInfo>> files)
{
    var dictFiles = new ConcurrentDictionary<uint, ConcurrentBag<FileInfo>>();
    await Task.Factory.StartNew(() =>
    {
        Parallel.ForEach(files.SelectMany(g => g), file =>
        {
            var hash = GetCrc32FromFile(file.FullName);
            if (hash != 0)
            {
                if (dictFiles.ContainsKey(hash))
                    dictFiles[hash].Add(file);
                else
                    dictFiles.TryAdd(hash, new ConcurrentBag<FileInfo>(new[] { file }));
            }
        });
    });
    return dictFiles.Where(p => p.Value.Count > 1)
        .OrderByDescending(p => p.Value.First().Length)
        .ToDictionary(p => p.Key, p => p.Value.ToList());
}

When we do that and run our program again, we will see that more CPU is used for the processing and the times to get the list come to 46 seconds from 78 seconds (for 18GB of duplicate files).

Conclusions

With this program, we can show the largest duplicates in a folder and see what can be safely deleted in our disk, thus retrieving some space (in our case, we would have potentially got 9Gb extra). We’ve done some optimization in the code by parallelizing the calculations using the parallel extensions in .NET.

The source code for this article is in https://github.com/bsonnino/FindDuplicates

Sometimes, you need to parse some html data to do some processing and present it to the user. That may be a daunting task, as some pages can become very complex and it may be difficult to do it.

For that, you can use an excellent tool, named HTML Agility Pack. With it, you can parse HTML from a string, a file, a web site or even from a WebBrowser: you can add a WebBrowser to your app, navigate to an URL and parse the data from there.

In this article, I’ll show how to make a query in Bing, retrieve and parse the response. For that, we need to create the query url and pass it to Bing. You may ask why I’m querying Bing and not Google – I’m doing that because Google makes it difficult to get its data, and I want to show you how to use HTML Agility Pack, and not how to retrieve data from Google :-). The query should be something like this:

https://www.bing.com/search?q=html+agility+pack&count=100

We will use the Query (q) and the number of results (count) parameters. With them, we can create our program. We will create a WPF program that gets the query text, parses it and presents the results in a Listbox.

Create a new WPF program and name it BingSearch.

The next step is to add the HTML Agility Pack to the project. Right-click the References node in the Solution Explorer and select Manage NuGet Packages. Then add the Html Agility Pack to the project.

Then, in the main window, add this XAML code:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <StackPanel Grid.Row="0" Orientation ="Horizontal" 
                Margin="5,0" VerticalAlignment="Center">
        <TextBlock Text="Search" VerticalAlignment="Center"/>
        <TextBox x:Name="TxtSearch" Width="300" Height="30" 
                 Margin="5,0" VerticalContentAlignment="Center"/>
    </StackPanel>
    <Button Grid.Row="0" HorizontalAlignment="Right" 
            Content="Search" Margin="5,0" VerticalAlignment="Center"
            Width="65" Height="30" Click="SearchClick"/>
    <ListBox Grid.Row="1" x:Name="LbxResults" />
</Grid>

Right click in the button’s click event handler in the XAML and press F12 to add the handler in code and go to it. Then, add this code to the handler:

private async void SearchClick(object sender, RoutedEventArgs e)
{
    if (string.IsNullOrWhiteSpace(TxtSearch.Text))
        return;
    var queryString = WebUtility.UrlEncode(TxtSearch.Text);
    var htmlWeb = new HtmlWeb();
    var query = $"https://bing.com/search?q={queryString}&count=100";
    var doc = await htmlWeb.LoadFromWebAsync(query);
    var response = doc.DocumentNode.SelectSingleNode("//ol[@id='b_results']");
    var results = response.SelectNodes("//li[@class='b_algo']");
    if (results == null)
    {
        LbxResults.ItemsSource = null;
        return;
    }
    var searchResults = new List<SearchResult>();
    foreach (var result in results)
    {
        var refNode = result.Element("h2").Element("a");
        var url = refNode.Attributes["href"].Value;
        var text = refNode.InnerText;
        var description = result.Element("div").Element("p").InnerText;
        searchResults.Add(new SearchResult(text, url, description));
    }
    LbxResults.ItemsSource = searchResults;
}

Initially we encode the text to search to add it to the query and create the query string. Then we call the LoadFromWebAsync method to load the HTML data from the query response. When the response comes, we get the response node, from the ordered list with id b_results and extract from it the individual results. Finally, we parse each result and add it to a list of SearchResult, and assign the list to the items in the ListBox. You can note that we can finde the nodes using XPath, like in

var results = response.SelectNodes("//li[@class='b_algo']");

Or we can traverse the elements and get the text of the resulting node with something like:

var refNode = result.Element("h2").Element("a");
var url = refNode.Attributes["href"].Value;
var text = refNode.InnerText;
var description = WebUtility.HtmlDecode(
    result.Element("div").Element("p").InnerText);

SearchResult is declared as:

internal class SearchResult
{
    public string Text { get; }
    public string Url { get; }
    public string Description { get; }

    public SearchResult(string text, string url, string description)
    {
        Text = text;
        Url = url;
        Description = description;
    }
}

if you run the program, you will see something like this:

The data isn’t displayed because we haven’t defined any data template for the list items. You can define an item template like that in the XAML:

<ListBox.ItemTemplate>
    <DataTemplate>
        <StackPanel Margin="0,3">
            <TextBlock Text="{Binding Text}" FontWeight="Bold"/>
            <TextBlock >
              <Hyperlink NavigateUri="{Binding Url}" RequestNavigate="LinkNavigate">
                 <TextBlock Text="{Binding Url}"/>
              </Hyperlink>
            </TextBlock>
            <TextBlock Text="{Binding Description}" TextWrapping="Wrap"/>
        </StackPanel>
    </DataTemplate>
</ListBox.ItemTemplate>

The LinkNavigate event handler is:

private void LinkNavigate(object sender, RequestNavigateEventArgs e)
{
    System.Diagnostics.Process.Start(e.Uri.AbsoluteUri);
}

Now, when you run the program, you will get something like this:

You can click on the hyperlink and it will open a browser window with the selected page. We can even go further and add a WebBrowser to our app that will show the selected page when you click on an item. For that, you have to modify the XAML code with something like this:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <Grid.ColumnDefinitions>
        <ColumnDefinition Width="*"/>
        <ColumnDefinition Width="*"/>
    </Grid.ColumnDefinitions>
    <StackPanel Grid.Row="0" Orientation ="Horizontal" 
                Margin="5,0" VerticalAlignment="Center">
        <TextBlock Text="Search" VerticalAlignment="Center"/>
        <TextBox x:Name="TxtSearch" Width="300" Height="30" 
                 Margin="5,0" VerticalContentAlignment="Center"/>
    </StackPanel>
    <Button Grid.Row="0" HorizontalAlignment="Right" 
            Content="Search" Margin="5,0" VerticalAlignment="Center"
            Width="65" Height="30" Click="SearchClick"/>
    <ListBox Grid.Row="1" x:Name="LbxResults" 
             ScrollViewer.HorizontalScrollBarVisibility="Disabled"
             SelectionChanged="LinkChanged">
        <ListBox.ItemTemplate>
            <DataTemplate>
                <StackPanel Margin="0,3">
                    <TextBlock Text="{Binding Text}" FontWeight="Bold"/>
                    <TextBlock >
                      <Hyperlink NavigateUri="{Binding Url}" RequestNavigate="LinkNavigate">
                         <TextBlock Text="{Binding Url}"/>
                      </Hyperlink>
                    </TextBlock>
                    <TextBlock Text="{Binding Description}" TextWrapping="Wrap"/>
                </StackPanel>
            </DataTemplate>
        </ListBox.ItemTemplate>
    </ListBox>
    <WebBrowser Grid.Column="1" Grid.RowSpan="2" x:Name="WebPage"  />
</Grid>

We’ve added a second column to the window and added a WebBrwser to it, then added a SelectionChanged event to the listbox, so we can navigate to the selected page.

The SelectionChanged event handler is:

private void LinkChanged(object sender, SelectionChangedEventArgs e)
{
    if (e.AddedItems?.Count > 0)
    {
        WebPage.Navigate(((SearchResult)e.AddedItems[0]).Url);
    }
}

Now, when you run the app and click on a result, it will show the page in the WebBrowser. One thing that happened is that, sometimes a Javascript error pops up. To remove these errors, I used the solution obtained from here:

public MainWindow()
{
    InitializeComponent();
    WebPage.Navigated += (s, e) => SetSilent(WebPage, true);
}

public static void SetSilent(WebBrowser browser, bool silent)
{
    if (browser == null)
        throw new ArgumentNullException("browser");

    // get an IWebBrowser2 from the document
    IOleServiceProvider sp = browser.Document as IOleServiceProvider;
    if (sp != null)
    {
        Guid IID_IWebBrowserApp = new Guid("0002DF05-0000-0000-C000-000000000046");
        Guid IID_IWebBrowser2 = new Guid("D30C1661-CDAF-11d0-8A3E-00C04FC9E26E");

        object webBrowser;
        sp.QueryService(ref IID_IWebBrowserApp, ref IID_IWebBrowser2, out webBrowser
        if (webBrowser != null)
        {
            webBrowser.GetType().InvokeMember("Silent", 
                BindingFlags.Instance | BindingFlags.Public | 
                BindingFlags.PutDispProperty, null, webBrowser, 
                new object[] { silent });
        }
    }
}


[ComImport, Guid("6D5140C1-7436-11CE-8034-00AA006009FA"), 
    InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
private interface IOleServiceProvider
{
    [PreserveSig]
    int QueryService([In] ref Guid guidService, [In] ref Guid riid, 
        [MarshalAs(UnmanagedType.IDispatch)] out object ppvObject);
}

With this code, the Javascript errors disappear and when you run the app, you will see something like this:

As you can see, the HTML Agility Pack makes it easy to process and parse HTML Pages, allowing you to manipulate them the way you want.

The full source code for this article is in https://github.com/bsonnino/BingSearch

One thing that I use a lot is sample data. Every article I write needs some data to explain the concepts, I need some data to see how it fits in my designs or even sample data for testing. This is a real trouble, as I must find some reliable data for my programs. Sometimes, I go to databases (Northwind and AdventureWorks are my good friends), sometimes, I use Json or XML data and other times I create the sample data by myself.

None of them are perfect, and it’s not consistent. Every time I get a new way of accessing data (yes, it can be good for learning purposes, but it’s a nightmare for maintenance). Then, looking around, I found Bogus (https://github.com/bchavez/Bogus), It’s a simple data generator for C#. All you have to do is create rules for your data and generate it. Simple as that! Then, you’ve got the data to use in your programs. It can be fixed (every time you run your program, you have the same data) or variable (every time you get a different set of data), and once you got it, you can serialize it to whichever data format you want: json files, databases, xml or plain text files.

Generating sample data

The first step to generate sample data is to create your classes. Create a new console app and add these two classes:

public class Customer
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
    public string City { get; set; }
    public string Country { get; set; }
    public string ZipCode { get; set; }
    public string Phone { get; set; }
    public string Email { get; set; }
    public string ContactName { get; set; }
    public IEnumerable<Order> Orders { get; set; }
}
public class Order
{
    public Guid Id { get; set; }
    public DateTime Date { get; set; }
    public Decimal OrderValue { get; set; }
    public bool Shipped { get; set; }
}

Once you’ve got the classes, you can add the repositories to get the sample data. To use the sample data generator, you must add the Bogus NuGet package to your project, with the command Install-Package Bogus, in the package manager console. Then we can add the repository class to retrieve the data. Add a new class to the project and name it SampleCustomerRepository. Then add this code in the class:

public IEnumerable<Customer> GetCustomers()
{
    Randomizer.Seed = new Random(123456);
    var ordergenerator = new Faker<Order>()
        .RuleFor(o => o.Id, Guid.NewGuid)
        .RuleFor(o => o.Date, f => f.Date.Past(3))
        .RuleFor(o => o.OrderValue, f => f.Finance.Amount(0, 10000))
        .RuleFor(o => o.Shipped, f => f.Random.Bool(0.9f));
    var customerGenerator = new Faker<Customer>()
        .RuleFor(c => c.Id, Guid.NewGuid())
        .RuleFor(c => c.Name, f => f.Company.CompanyName())
        .RuleFor(c => c.Address, f => f.Address.FullAddress())
        .RuleFor(c => c.City, f => f.Address.City())
        .RuleFor(c => c.Country, f => f.Address.Country())
        .RuleFor(c => c.ZipCode, f => f.Address.ZipCode())
        .RuleFor(c => c.Phone, f => f.Phone.PhoneNumber())
        .RuleFor(c => c.Email, f => f.Internet.Email())
        .RuleFor(c => c.ContactName, (f, c) => f.Name.FullName())
        .RuleFor(c => c.Orders, f => ordergenerator.Generate(f.Random.Number(10)).ToList());
    return customerGenerator.Generate(100);
}

In line 3, we have set the Randomizer.Seed to a fixed seed, so the data is always the same for all runs. If we don’t want it, we just don’t need to set it up. Then we set the rules for the order and customer generation. Then we call the Generate method to generate the sample data. Easy as that.

As you can see, the generator has a lot of classes to generate data. For example, the Company  class generates data for the company, like CompanyName. You can use this data as sample data for your programs. I can see some uses for it:

  • Test data for unit testing
  • Sample data for design purposes
  • Sample data for prototypes

but I’m sure you can find some more.

To use the data, you can add this code in the main program:

static void Main(string[] args)
{
    var repository = new SampleCustomerRepository();
    var customers = repository.GetCustomers();
    Console.WriteLine(JsonConvert.SerializeObject(customers, 
        Formatting.Indented));
}

We are serializing the data to Json, so you must add the Newtonsoft.Json Nuget package to your project. When you run this project, you will see something like this:

As you can see, it generated a whole set of customers with their orders, so you can use in your programs.

You may say that this is just dummy data, it will be cluttering your project, and it this data will remain unused when the program goes to production, and you’re right. But that’s not the way I would recommend to use this kind of data. It is disposable and will clutter the project. So, a better way would be create another project, a class library with the repository. That solves one problem, the maintenance one. When entering in production, you replace the sample dll with the real repository and you’re done. Or no? Not too fast. If you see the code in the main program, you will see that we are instantiating an instance of the SampleCustomerRepository, that won’t exist anymore. That way, you must also change the code for the main program. Not a good solution.

You can use conditional compilation to instantiate the repository you want, like this:

        static void Main(string[] args)
        {
#if SAMPLEREPO
            var repository = new SampleCustomerRepository();
#else
            var repository = new CustomerRepository();
#endif
            var customers = repository.GetCustomers();
            Console.WriteLine(JsonConvert.SerializeObject(customers, 
                Formatting.Indented));
        }

That’s better, but still not optimal: you need to compile the program twice: one time to use the sample data and the other one for going to production. With automated builds that may be less than a problem, but if you need to deploy by yourself, it can be a nightmare.

The solution? Dependency injection. Just create an interface for the repository, use a dependency injection framework (I like to use Unity or Ninject, but there are many others out there, just check this list). That way, the code will be completely detached from the data and you won’t need to recompile the project to use this or that data. Just add the correct dll and you are using the data you want. This is a nice approach, but the topic gives space for another post, just wait for it!

The code for this article is at https://github.com/bsonnino/SampleData

 

Many times I need to enumerate the files in my disk or in a folder and subfolders, but that always has been slow. All the file enumeration techniques go through the disk structures querying the file names and going to the next one. With the Windows file indexing, this has gone to another level of speed: you can query and filter your data almost instantaneously, with one pitfall: it only works in the indexed parts of the disk (usually the libraries and Windows folders), being unusable for your data folders, unless you add them to the indexer:

Wouldn’t it be nice to have a database that stores all files in the system and is updated as the files change? Some apps, like Copernic Desktop Search or X1 Search do exactly that: they have a database that indexes your system and can do fast queries for you. The pitfall is that you don’t have an API to integrate to your programs, so the only way you have is to query the files is to use their apps.

At some time, Microsoft thought of doing something like a database of files, creating what was called WinFS – Windows Future Storage, but the project was cancelled. So, we have to stick with our current APIs to query files. Or no? As a matter of fact there is something in the Windows system that allows us to query the files in a very fast way, and it’s called the NTFS MFT (NT file system master file table).

The NTFS MFT is a file structure use internally by Windows that allows querying files in a very fast way. It was designed to be fast and safe (there are two copies of the MFT, in case one of them gets corrupt), and we can access it to get our files enumerated. But some things should be noted when accessing the MFT:

  • The MFT is only available for NTFS volumes. So, you cannot access FAT drives with this API
  • To access the MFT structures, you must have elevated privileges – a normal user won’t be able to access it
  • With great power comes great responsibility (this is a SpiderMan comic book quote), so you should know that accessing the internal NTFS structures may harm you system irreversively – use the code with care, and don’t blame me if something goes wrong (but here’s a suggestion to Windows API designers: why not create some APIs that query the NTFS structures safely for normal users? That could be even be added to UWP programming).

Acessing the MFT structure

There is no formal API to access the MFT structure. You will have to decipher the structure (there is a lot of material here) and access the data using raw disk data read (that’s why you need elevated privileges). This is a lot of work and hours of trial and error.

Fortunately, there are some libraries that do that in C#, and I’ve used this one, which is licensed as LGPL. You can use the library in your compiled work as a library, with no restriction. If you include the library source code in your code, it will be “derived work” and you must distribute all code as LGPL.

We will create a WPF program that will show the disk usage. It will enumerate all files and show them in the list, so you can see what’s taking space in your disk. You will be able to select any of the NTFS disks in your machine.

Open Visual Studio with administrative rights (this is very important, or you won’t be able to debug your program). Then create a new WPF project and add a new item. Choose Application Manifest File, you will have an app.manifest file added to your project. Then, you must change the requestedExecutionLevel tag of the file to:

<requestedExecutionLevel level="requireAdministrator" uiAccess="false" />

The next step is to detect the NTFS disks in your system. This is done with this code:

var ntfsDrives = DriveInfo.GetDrives()
    .Where(d => d.DriveFormat == "NTFS").ToList();

Then, add the NtfsReader project to the solution and add a reference to it in the WPF project. Then, add the following UI in MainWindow.xaml.cs:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
        <RowDefinition Height="30"/>
    </Grid.RowDefinitions>
    <StackPanel Orientation="Horizontal" Margin="5">
        <TextBlock Text="Drive" VerticalAlignment="Center"/>
        <ComboBox x:Name="DrvCombo" Margin="5,0" Width="100" 
                  VerticalContentAlignment="Center"/>
    </StackPanel>
    <ListBox x:Name="FilesList" Grid.Row="1" 
             VirtualizingPanel.IsVirtualizing="True"
             VirtualizingPanel.IsVirtualizingWhenGrouping="True"
             >
        <ListBox.ItemTemplate>
            <DataTemplate>
                <StackPanel Orientation="Horizontal">
                    <TextBlock Text="{Binding FullName}" 
                               Margin="5,0" Width="450"/>
                    <TextBlock Text="{Binding Size,StringFormat=N0}" 
                               Margin="5,0" Width="150" TextAlignment="Right"/>
                    <TextBlock Text="{Binding LastChangeTime, StringFormat=g}" 
                               Margin="5,0" Width="200"/>
                </StackPanel>
            </DataTemplate>
        </ListBox.ItemTemplate>
    </ListBox>
    <TextBlock x:Name="StatusTxt" Grid.Row="2" HorizontalAlignment="Center" Margin="5"/>
</Grid>

We will have a combobox with all drives in the first row and a listbox with the files. The listbox has an ItemTemplate that will show the name, size and date of last change of each file. To fill this data, you will have to add this code in MainWindow.xaml.cs:

public MainWindow()
{
    InitializeComponent();
    var ntfsDrives = DriveInfo.GetDrives()
        .Where(d => d.DriveFormat == "NTFS").ToList();
    DrvCombo.ItemsSource = ntfsDrives;
    DrvCombo.SelectionChanged += DrvCombo_SelectionChanged;
}

private void DrvCombo_SelectionChanged(object sender, 
    System.Windows.Controls.SelectionChangedEventArgs e)
{
    if (DrvCombo.SelectedItem != null)
    {
        var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
        var ntfsReader =
            new NtfsReader(driveToAnalyze, RetrieveMode.All);
        var nodes =
            ntfsReader.GetNodes(driveToAnalyze.Name)
                .Where(n => (n.Attributes & 
                             (Attributes.Hidden | Attributes.System | 
                              Attributes.Temporary | Attributes.Device | 
                              Attributes.Directory | Attributes.Offline | 
                              Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
                .OrderByDescending(n => n.Size);
        FilesList.ItemsSource = nodes;
    }
}

It gets all NTFS drives in your system and fills the combobox. In the SelectionChanged event handler, the reader gets all nodes in the drive. These nodes are filtered to remove all that are not normal files and then ordered descending by size and added to the listbox.

If you run the program you will see some things:

  • If you look at the output window in Visual Studio, you will see these debug messages:
1333.951 MB of volume metadata has been read in 26.814 s at 49.748 MB/s
1324082 nodes have been retrieved in 2593.669 ms

This means that it took 2.6s to read and analyze all files in the disk (pretty fast for 1.3 million files, no?).

  • When you change the drive in the combobox, the program will freeze for some time and the list will be filled with the files. The freezing is due to the fact that you are blocking the main thread while you are analyzing the disk. To avoid this, you should run the code in a secondary thread, like this code:
private async void DrvCombo_SelectionChanged(object sender, 
    System.Windows.Controls.SelectionChangedEventArgs e)
{
    if (DrvCombo.SelectedItem != null)
    {
        var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
        DrvCombo.IsEnabled = false;
        StatusTxt.Text = "Analyzing drive";
        List<INode> nodes = null;
        await Task.Factory.StartNew(() =>
        {
            var ntfsReader =
                new NtfsReader(driveToAnalyze, RetrieveMode.All);
            nodes =
                ntfsReader.GetNodes(driveToAnalyze.Name)
                    .Where(n => (n.Attributes &
                                 (Attributes.Hidden | Attributes.System |
                                  Attributes.Temporary | Attributes.Device |
                                  Attributes.Directory | Attributes.Offline |
                                  Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
                    .OrderByDescending(n => n.Size).ToList();
        });
        FilesList.ItemsSource = nodes;
        DrvCombo.IsEnabled = true;
        StatusTxt.Text = $"{nodes.Count} files listed. " +
                         $"Total size: {nodes.Sum(n => (double)n.Size):N0}";
    }
}

This code creates a task and runs the analyzing code in it, and doesn’t freeze the UI. I just took care of disabling the combobox and putting a warning for the user. After the code is run, the nodes list is assigned to the listbox and the UI is re-enabled.

This code can show you the list of the largest files in your disk, but you may want to analyze it by other ways, like grouping by extension or by folder. WPF has an easy way to group and show data: the CollectionViewSource. With it, you can do grouping and sorting in the ListBox. We will change our UI to add a new ComboBox to show the new groupings:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
        <RowDefinition Height="30"/>
    </Grid.RowDefinitions>
    <StackPanel Orientation="Horizontal" Margin="5">
        <TextBlock Text="Drive" VerticalAlignment="Center"/>
        <ComboBox x:Name="DrvCombo" Margin="5,0" Width="100" 
                  VerticalContentAlignment="Center"/>
    </StackPanel>
    <StackPanel Grid.Row="0" HorizontalAlignment="Right" Orientation="Horizontal" Margin="5">
        <TextBlock Text="Sort" VerticalAlignment="Center"/>
        <ComboBox x:Name="SortCombo" Margin="5,0" Width="100" 
                  VerticalContentAlignment="Center" SelectedIndex="0" 
                  SelectionChanged="SortCombo_OnSelectionChanged">
            <ComboBoxItem>Size</ComboBoxItem>
            <ComboBoxItem>Extension</ComboBoxItem>
            <ComboBoxItem>Folder</ComboBoxItem>
        </ComboBox>
    </StackPanel>
    <ListBox x:Name="FilesList" Grid.Row="1" 
             VirtualizingPanel.IsVirtualizing="True"
             VirtualizingPanel.IsVirtualizingWhenGrouping="True" >
        <ListBox.ItemTemplate>
            <DataTemplate>
                <StackPanel Orientation="Horizontal">
                    <TextBlock Text="{Binding FullName}" 
                               Margin="5,0" Width="450"/>
                    <TextBlock Text="{Binding Size,StringFormat=N0}" 
                               Margin="5,0" Width="150" TextAlignment="Right"/>
                    <TextBlock Text="{Binding LastChangeTime, StringFormat=g}" 
                               Margin="5,0" Width="200"/>
                </StackPanel>
            </DataTemplate>
        </ListBox.ItemTemplate>
        <ListBox.GroupStyle>
            <GroupStyle>
                <GroupStyle.HeaderTemplate>
                    <DataTemplate>
                        <StackPanel Orientation="Horizontal">
                            <TextBlock Text="{Binding Name}" FontSize="15" FontWeight="Bold" 
                                       Margin="5,0"/>
                            <TextBlock Text="(" VerticalAlignment="Center" Margin="5,0,0,0" />
                            <TextBlock Text="{Binding Items.Count}" VerticalAlignment="Center"/>
                            <TextBlock Text=" files - " VerticalAlignment="Center"/>
                            <TextBlock Text="{Binding Items, 
                                Converter={StaticResource ItemsSizeConverter}, StringFormat=N0}"
                                         VerticalAlignment="Center"/>
                            <TextBlock Text=" bytes)" VerticalAlignment="Center"/>
                        </StackPanel>
                    </DataTemplate>
                </GroupStyle.HeaderTemplate>
            </GroupStyle>
        </ListBox.GroupStyle>
    </ListBox>
    <TextBlock x:Name="StatusTxt" Grid.Row="2" HorizontalAlignment="Center" Margin="5"/>
</Grid>

The combobox has three options, Size, Extension and Folder. The first one is the same thing we’ve had until now; the second will group the files by extension and the third will group the files by top folder. We’ve also added a GroupStyle to the listbox. If we don’t do that, the data will be grouped, but the groups won’t be shown. If you notice the GroupStyle, you will see that we’re adding the name, then the count of the items (number of files in the group), then we have a third TextBox where we pass the Items and a converter. That’s because we want to show the total size in bytes of the group. For that, I’ve created a converter that converts the Items in the group to the sum of the bytes of the file:

public class ItemsSizeConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        var items = value as ReadOnlyObservableCollection<object>;
        return items?.Sum(n => (double) ((INode)n).Size);
    }

    public object ConvertBack(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        throw new NotImplementedException();
    }
}

The code for the SelectionChanged for the sort combobox is:

private void SortCombo_OnSelectionChanged(object sender, 
    SelectionChangedEventArgs e)
{
    if (_view == null)
        return;
    _view.GroupDescriptions.Clear();
    _view.SortDescriptions.Clear();
    switch (SortCombo.SelectedIndex)
    {
        case 1:
            _view.GroupDescriptions.Add(new PropertyGroupDescription("FullName", 
                new FileExtConverter()));
            break;
        case 2:
            _view.SortDescriptions.Add(new SortDescription("FullName",
                ListSortDirection.Ascending));
            _view.GroupDescriptions.Add(new PropertyGroupDescription("FullName", 
                new FilePathConverter()));
            break;
    }
}

We add GroupDescriptions for each kind of group. As we don’t have the extension and top path properties in the nodes shown in the listbox, I’ve created two converters to get these from the full name. The converter that gets the extension from the name is:

class FileExtConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        var fileName = value as string;
        return string.IsNullOrWhiteSpace(fileName) ? 
            null : 
            Path.GetExtension(fileName).ToLowerInvariant();
    }

    public object ConvertBack(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        throw new NotImplementedException();
    }
}

The converter that gets the top path of the file is:

class FilePathConverter :IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        var fileName = value as string;
        return string.IsNullOrWhiteSpace(fileName) ?
            null :
            GetTopPath(fileName);
    }

    private string GetTopPath(string fileName)
    {
        var paths = fileName.Split(Path.DirectorySeparatorChar).Take(2);
        return string.Join(Path.DirectorySeparatorChar.ToString(), paths);
    }

    public object ConvertBack(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        throw new NotImplementedException();
    }
}

One last thing is to create the _view field, when we are filling the listbox:

 private async void DrvCombo_SelectionChanged(object sender, 
     System.Windows.Controls.SelectionChangedEventArgs e)
 {
     if (DrvCombo.SelectedItem != null)
     {
         var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
         DrvCombo.IsEnabled = false;
         StatusTxt.Text = "Analyzing drive";
         List<INode> nodes = null;
         await Task.Factory.StartNew(() =>
         {
             var ntfsReader =
                 new NtfsReader(driveToAnalyze, RetrieveMode.All);
             nodes =
                 ntfsReader.GetNodes(driveToAnalyze.Name)
                     .Where(n => (n.Attributes &
                                  (Attributes.Hidden | Attributes.System |
                                   Attributes.Temporary | Attributes.Device |
                                   Attributes.Directory | Attributes.Offline |
                                   Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
                     .OrderByDescending(n => n.Size).ToList();
         });
         FilesList.ItemsSource = nodes;
         _view = (CollectionView)CollectionViewSource.GetDefaultView(FilesList.ItemsSource);

         DrvCombo.IsEnabled = true;
         StatusTxt.Text = $"{nodes.Count} files listed. " +
                          $"Total size: {nodes.Sum(n => (double)n.Size):N0}";
     }
     else
     {
         _view = null;
     }
 }

With all these in place, you can run the app and get a result like this:

Conclusions

As you can see, there is a way to get fast file enumeration for your NTFS disks, but you must have admin privileges to use it. We’ve created a WPF program that uses this kind of enumeration and allows you to group the data in different ways, using the WPF resources. If you need to enumerate  your files very fast, you can consider this way to do it.

The full source code for this article is at https://github.com/bsonnino/NtfsFileEnum

As an MVP, I sometimes receive licenses to software from the vendors for my usage. Some of them become indispensable for me and I feel in the obligation to write a review (yes, it’s a biased review, as I really like the tool and use it on a daily basis :-)) as a way to say thank you!

One of these tools is Linqpad (https://www.linqpad.net/). It’s a simple tool, with a small footprint, but I have used it in so many ways that I find it incredible. There is a free version that has a lot of features to start, but I really recommend the paid version (if you have the $95 to spend, the Premium edition has even a debugger to debug your snippets).

Introduction

Once you open Linqpad, you will see a simple desktop like this:

At first, the name of the tool may indicate that this is a notepad for linq queries, but it’s much more than that! If you take a look at the Samples pane, you can see that there’s even an Interactive Regex Evaluator.

A closer look at that pane shows that you are not tied to C#: you can also use F# there. In fact, there is a full F# tutorial there. If you open the Language combo, you can see that you can use also VB or SQL queries.

My first usages in Linqpad were to learn Linq (the name is Linqpad, no?). At the beginning, Linq seems a little bit daunting, with all those extension methods and lambdas. So, I started to try some Linq queries, making them more difficult as my knowledge was improving. In Linqpad, you have three flavors of code: Expressions, where you have a single expression evaluated; Statements, where you have some statements evaluated and Program, where you can have a full program run in Linqpad (I use this when I want to run a console program and don’t want to open Visual Studio and create a new project).

In the Expression mode, you can enter a single expression, like this:

from i in Enumerable.Range(1,1000)
  where i % 2 == 0
  select i

If you run it, you will see the result in the Results pane:

As you can see, all the results are there, there is no need to open a console window or anything else. And, what’s better, you can export the results to Excel, Word or HTML. You can also use the other Linq format, the functional one:

Enumerable.Range(1,1000).Where(i => i %2 == 0)

After that, you can start tweaking your code and clicking on the Run button and observing the results. If you have the paid version, you also have Intellisense in the code, so you can check the syntax.

For example, to get the sum of the squares of the even numbers, we can do something like this:

If we have something more complicated than a single expression, we can run it using the C# statements. For example, to get all methods and parameters of the methods in the Directory class, we can use these statements:

var methodInfos = typeof(Directory).GetMethods(BindingFlags.Public | 
  BindingFlags.Static);

methodInfos.Select(m => new 
{
  m.Name, 
  Parameters = m.GetParameters() 
}).Dump();

You may have noticed something different in the code above: the Dump method. Linqpad adds this method to dump the values to the results pane. It is very powerful, you don’t need to know the type of the object, all the properties are shown there:

And you are not limited to old C#, you can also use C#7 features and even async programming. For example, this code (based on https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/walkthrough-accessing-the-web-by-using-async-and-await) will download asynchronously some pages from the web and will display their sizes:

async Task Main()
{
	await SumPageSizesAsync().Dump();
}

private async Task<List<string>> SumPageSizesAsync()
{
	var results = new List<string>();
	// Declare an HttpClient object and increase the buffer size. The
	// default buffer size is 65,536.
	HttpClient client =
		new HttpClient() { MaxResponseContentBufferSize = 1000000 };

	// Make a list of web addresses.
	List<string> urlList = SetUpURLList();

	var total = 0;

	foreach (var url in urlList)
	{
		// GetByteArrayAsync returns a task. At completion, the task
		// produces a byte array.
		byte[] urlContents = await client.GetByteArrayAsync(url);

		// The following two lines can replace the previous assignment statement.
		//Task<byte[]> getContentsTask = client.GetByteArrayAsync(url);
		//byte[] urlContents = await getContentsTask;

		results.Add(DisplayResults(url, urlContents));

		// Update the total.
		total += urlContents.Length;
	}

	// Display the total count for all of the websites.
	results.Add(
		$"\r\n\r\nTotal bytes returned:  {total}\r\n");
	return results;
}

private List<string> SetUpURLList()
{
	List<string> urls = new List<string>
			{
				"https://msdn.microsoft.com/library/windows/apps/br211380.aspx",
				"https://msdn.microsoft.com",
				"https://msdn.microsoft.com/library/hh290136.aspx",
				"https://msdn.microsoft.com/library/ee256749.aspx",
				"https://msdn.microsoft.com/library/hh290138.aspx",
				"https://msdn.microsoft.com/library/hh290140.aspx",
				"https://msdn.microsoft.com/library/dd470362.aspx",
				"https://msdn.microsoft.com/library/aa578028.aspx",
				"https://msdn.microsoft.com/library/ms404677.aspx",
				"https://msdn.microsoft.com/library/ff730837.aspx"
			};
	return urls;
}

private string DisplayResults(string url, byte[] content)
{
	// Display the length of each website. The string format
	// is designed to be used with a monospaced font, such as
	// Lucida Console or Global Monospace.
	var bytes = content.Length;
	// Strip off the "https://".
	var displayURL = url.Replace("https://", "");
	return $"\n{displayURL,-58} {bytes,8}";
}

When you run it, you will see something like this:

And you are not tied to the default C# libraries. If you have the Developer or Premium versions, you can download and use NuGet packages in your queries. For example in this previous article, I’ve shown how to use the Microsoft.SqlServer.TransactSql.ScriptDom package to parse your Sql Server code. You don’t even need to open Visual Studio for that. Just put this code in the Linqpad window:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con)
		  .Select(n => new { ProcName = n, Tree = ParseSql(GetProcText(con, n)) })
		  .Dump();
	}
}

private static List<string> GetStoredProcedures(SqlConnection con)
{
	using (SqlCommand sqlCommand = new SqlCommand("select s.name+'.'+p.name as name from sys.procedures p " +
	  "inner join sys.schemas s on p.schema_id = s.schema_id order by name", con))
	{
		using (DataTable procs = new DataTable())
		{
			procs.Load(sqlCommand.ExecuteReader());
			return procs.Rows.OfType<DataRow>().Select(r => r.Field<String>("name")).ToList();
		}
	}
}

private static string GetProcText(SqlConnection con, string procName)
{
	using (SqlCommand sqlCommand = new SqlCommand("sys.sp_helpText", con)
	{
		CommandType = CommandType.StoredProcedure
	})
	{
		sqlCommand.Parameters.AddWithValue("@objname", procName);
		using (var proc = new DataTable())
		{
			try
			{
				proc.Load(sqlCommand.ExecuteReader());
				return string.Join("", proc.Rows.OfType<DataRow>().Select(r => r.Field<string>("Text")));
			}
			catch (SqlException)
			{
				return null;
			}
		}
	}
}

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
	var parser = new TSql150Parser(true);
	using (var textReader = new StringReader(procText))
	{
		var sqlTree = parser.Parse(textReader, out var errors);

		return (sqlTree, errors);
	}
}

You will see some missing references. Just press F4 and it will open the following screen:

Click the Add NuGet button and add the Microsoft.SqlServer.TransactSql.ScriptDom package, then run the program. You will see something like this:

You can even click on the ScriptTokenStream result, to see the list of tokens in the procedure:

You can also simplify the query by using the connections available in Linqpad. Just go to the connections pane, add a new connection and point it to the WorldWideImporters database. Then select the connection in the connections combo and use this code:

void Main()
{
	ExecuteQuery<string>("select s.name+'.'+p.name as name from sys.procedures p " +
	  "inner join sys.schemas s on p.schema_id = s.schema_id order by name")
		  .Select(n => new 
		    { 
			  ProcName = n, 
			  Tree = ParseSql(ExecuteQuery<string>("exec sys.sp_helpText @objname={0}",n).FirstOrDefault()) 
			})
		  .Dump();
}

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
	var parser = new TSql150Parser(true);
	using (var textReader = new StringReader(procText))
	{
		var sqlTree = parser.Parse(textReader, out var errors);

		return (sqlTree, errors);
	}
}

You will see the same results. As you can see, you don’t even need to open the connection and create the command to run it. You can run your queries against your databases the same way you would do with any data. And if you are a SQL guy, you can run your queries directly using the SQL language. And, if you are brave and want to learn F#, you have here a really nice tool to learn.

Conclusions

At first, the size and appearance of Linqpad may fool you, but it’s a very nice tool to work, saving you a lot of time to try and debug your code. If you have some code snipped that you want to test and improve, this is the tool to use. And, one feature that I didn’t mention that’s invaluable when you are optimizing your code is the timing feature. After the execution of each query, Linqpad shows the execution time, so you can know how long did it take to execute it.

Some time ago, I’ve written this post about sending parametrized emails with C#. After some time, the emails weren’t arriving to their destination and I didn’t know why that happened – there was no message and the program was out in production, so I didn’t have an easy way to debug it.

Then, I went to the oracle that knows everything and asked “what is the easiest way to debug a C# program that sends emails in production?”. The oracle answered: “go to this page and you will find the answers you are looking for”. Once again, it was right.

Yes, the post pointed me to the easiest way to add debugging to your app that sends emails. There is not even the need to change the app. Just add a new clause to your app.config file and that’s it. Your app will generate a log file with all the data sent and received. The clause to add is system.diagnostics and you should add the traces you want and send them to a listener:

<system.diagnostics>
  <trace autoflush="true" />
  <sources>
    <source name="System.Net" >
      <listeners>
        <add name="MyTraceFile"/>
      </listeners>
    </source>
    <source name="System.Net.Sockets">
      <listeners>
        <add name="MyTraceFile"/>
      </listeners>
    </source>
  </sources>
  <sharedListeners>
    <add  name="MyTraceFile" type="System.Diagnostics.TextWriterTraceListener" 
          initializeData="System.Net.trace.log"   />
  </sharedListeners>
  <switches>
    <add name="System.Net" value="Verbose" />
    <add name="System.Net.Sockets" value="Verbose" />
  </switches>
</system.diagnostics>

The clause above will trace all calls to System.Net and System.Net.Sockets, with a Verbose level, to a file named System,Net.trace.log.

You can change the level of the tracing to something less verbose, but one note here: if you try to use the Info level, as stated here you will get an error. You should use the Information level. Using the number levels don’t show an error, but don’t generate a log either. So, you should use the named levels.

After that, I could see the logs and notice what has happened:

System.Net Error: 0 : [22720] Exception in SmtpClient#58328727::Send - The SMTP server requires a secure connection or the client was not authenticated. The server response was: 5.5.1 Authentication Required. Learn more at.
   at System.Net.Mail.MailCommand.CheckResponse(SmtpStatusCode statusCode, String response)
   at System.Net.Mail.MailCommand.Send(SmtpConnection conn, Byte[] command, MailAddress from, Boolean allowUnicode)
   at System.Net.Mail.SmtpTransport.SendMail(MailAddress sender, MailAddressCollection recipients, String deliveryNotify, Boolean allowUnicode, SmtpFailedRecipientException& exception)
   at System.Net.Mail.SmtpClient.Send(MailMessage message)
System.Net.Sockets Verbose: 0 : [22720] Entering Socket#24827179::Dispose()
System.Net Error: 0 : [22720] Exception in AppDomain#41560081::UnhandledExceptionHandler - The SMTP server requires a secure connection or the client was not authenticated. The server response was: 5.5.1 Authentication Required. Learn more at.
   at System.Net.Mail.MailCommand.CheckResponse(SmtpStatusCode statusCode, String response)
   at System.Net.Mail.MailCommand.Send(SmtpConnection conn, Byte[] command, MailAddress from, Boolean allowUnicode)
   at System.Net.Mail.SmtpTransport.SendMail(MailAddress sender, MailAddressCollection recipients, String deliveryNotify, Boolean allowUnicode, SmtpFailedRecipientException& exception)
   at System.Net.Mail.SmtpClient.Send(MailMessage message)
   at HtmlEmail.Program.Main(String[] args) in D:\Documentos\Artigos\Artigos\CSharp\HtmlEmail\HtmlEmail\HtmlEmail\Program.cs:line 41

Just a change in the password!