Sometimes, when we open an Explorer window in the main computer, we see red bars in some disks, telling us that the disk is almost full and that we need to do some cleanup. We call the system cleanup, that removes some unused space, but this isn’t enough to  make things better.

So, we try to find the duplicate files in the disk to remove some extra space, but we have a problem: where are the duplicate files? The first answer is to check the files with the same name and size, but that isn’t enough – files can be renamed, and still be duplicates.

So, the best thing to do is to find a way to find and list all duplicates in the disk. But how can we do this?

The naive approach is to get all files with the same size and compare them one with the other. But this is really cumbersome, because if there are 100 files in the group, there will be 100!/(2!*98!) = 100*99/2 = 4950 comparisons and has a complexity of O(n^2).

One other approach is to get a checksum of the file and compare checksums. That way, you will still have the O(n^2) complexity, but you’ll have less data to compare (but you will have to compute the time to calculate the checksums). A third approach would be to use a dictionary to group the files with the same hash. The search in a dictionary has a O(1) complexity, so this would do a O(n) complexity.

Now, we only have to choose the checksum. Every checksum has a number of bits and, roughly, the larger the number of bits, the longer it takes to compute it. But the larger number of bits make it more difficult to get wrong results: if you are using CRC16 checksum (16 bits), you will have 65,535 combinations and the probability of two different files have the same checksum is very large. CRC32 allows 2,147,483,647 combinations and, thus, is more difficult to have a wrong result. You can use other algorithms, like MD5 (128 bits), SHA1 (196 bits) or SHA256 (256 bits), but computing these will be way longer than computing the CRC32 bits. As we are not seeking for huge accuracy, but for speed, we’ll use the CRC32 algorithm to compute the hashes. A fast implementation of this algorithm can be found here , and you can use it by installing the CRC32C.NET NuGet package.

From there, we can create our program to find and list the duplicates in the disk. In Visual Studio, create a new WPF application. In the Solution Explorer, right-click on the references node and select the WpfFolderBrowser and Crc32C.NET packages. Then add this code in MainWindow.xaml:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <Button Width="85" Height="30" Content="Start" Click="StartClick"
                HorizontalAlignment="Right" Margin="5" Grid.Row="0"/>
    <Grid Grid.Row="1">
        <Grid.RowDefinitions>
            <RowDefinition Height="*"/>
            <RowDefinition Height="30"/>
        </Grid.RowDefinitions>
        <ScrollViewer HorizontalScrollBarVisibility="Disabled">
        <ItemsControl x:Name="FilesList" HorizontalContentAlignment="Stretch">
            <ItemsControl.ItemTemplate>
                <DataTemplate>
                    <Grid HorizontalAlignment="Stretch">
                        <Grid.RowDefinitions>
                            <RowDefinition Height="30" />
                            <RowDefinition Height="Auto" />
                        </Grid.RowDefinitions>
                        <TextBlock Text="{Binding Value[0].Length, StringFormat=N0}"
                                   Margin="5" FontWeight="Bold"/>
                        <TextBlock Text="{Binding Key, StringFormat=X}"
                                   Margin="5" FontWeight="Bold" HorizontalAlignment="Right"/>
                        <ItemsControl ItemsSource="{Binding Value}" Grid.Row="1" 
                                      HorizontalAlignment="Stretch"
                                      ScrollViewer.HorizontalScrollBarVisibility="Disabled"
                                      Background="Aquamarine">
                            <ItemsControl.ItemTemplate>
                                <DataTemplate>
                                    <TextBlock Text="{Binding FullName}" Margin="15,0"  />
                                </DataTemplate>
                            </ItemsControl.ItemTemplate>
                        </ItemsControl>
                    </Grid>
                </DataTemplate>
            </ItemsControl.ItemTemplate>
        </ItemsControl>
        </ScrollViewer>
        <StackPanel Grid.Row="1" Orientation="Horizontal">
            <TextBlock x:Name="TotalFilesText" Margin="5,0" VerticalAlignment="Center"/>
            <TextBlock x:Name="LengthFilesText" Margin="5,0" VerticalAlignment="Center"/>
        </StackPanel>
    </Grid>
</Grid>

In the button’s click event handler, we will open a Folder browser dialog and, if the user selects a folder, we will process it, enumerating the files and  finding the ones that have the same size. Then, we calculate the Crc32 for these files and add them to a dictionary, grouped by hash:

private async void StartClick(object sender, RoutedEventArgs e)
{
    var fbd = new WPFFolderBrowserDialog();
    if (fbd.ShowDialog() != true)
        return;
    FilesList.ItemsSource = null;
    var selectedPath = fbd.FileName;

    var files = await GetPossibleDuplicatesAsync(selectedPath);
     FilesList.ItemsSource = await GetRealDuplicatesAsync(files);
}

The GetPossibleDuplicatesAsync will enumerate the files and group them by size, returning only the groups that have more than one file:

private async Task<List<IGrouping<long, FileInfo>>> GetPossibleDuplicates(string selectedPath)
{
    List<IGrouping<long, FileInfo>> files = null;
    await Task.Factory.StartNew(() =>
    {
        files = GetFilesInDirectory(selectedPath)
                       .OrderByDescending(f => f.Length)
                         .GroupBy(f => f.Length)
                         .Where(g => g.Count() > 1)
                         .ToList();
    });
    return files;
}

GetFilesInDirectory enumerates the files in the selected directory:

private List<FileInfo> GetFilesInDirectory(string directory)
{
    var files = new List<FileInfo>();
    try
    {
        var directories = Directory.GetDirectories(directory);
        try
        {
            var di = new DirectoryInfo(directory);
            files.AddRange(di.GetFiles("*"));
        }
        catch
        {
        }
        foreach (var dir in directories)
        {
            files.AddRange(GetFilesInDirectory(System.IO.Path.Combine(directory, dir)));
        }
    }
    catch
    {
    }

    return files;
}

After we have the duplicate files grouped, we can search the real duplicates with GetRealDuplicatesAsync:

private static async Task<Dictionary<uint,List<FileInfo>>> GetRealDuplicatesAsync(
    List<IGrouping<long, FileInfo>> files)
{
    var dictFiles = new Dictionary<uint, List<FileInfo>>();
    await Task.Factory.StartNew(() =>
    {
        foreach (var file in files.SelectMany(g => g))
        {
            var hash = GetCrc32FromFile(file.FullName);
            if (hash == 0)
                continue;
            if (dictFiles.ContainsKey(hash))
                dictFiles[hash].Add(file);
            else
                dictFiles.Add(hash, new List<FileInfo>(new[] { file }));
        }
    });
    return dictFiles.Where(p => p.Value.Count > 1).ToDictionary(p => p.Key, p => p.Value);
}

The GetCrc32FromFile method with use the Crc32C library to compute the Crc32 hash from the file. Note that we can’t compute the hash in one pass, by reading the whole file, as this will fail with files with more than 2Gb. So, we read chunks of 10,000 bytes and process them.

public static uint GetCrc32FromFile(string fileName)
{
    try
    {
        using (FileStream file = new FileStream(fileName, FileMode.Open))
        {
            const int NumBytes = 10000;
            var bytes = new byte[NumBytes];
            var numRead = file.Read(bytes, 0, NumBytes);
            if (numRead == 0)
                return 0;
            var crc = Crc32CAlgorithm.Compute(bytes, 0, numRead);
            while (numRead > 0)
            {
                numRead = file.Read(bytes, 0, NumBytes);
                if (numRead > 0)
                    Crc32CAlgorithm.Append(crc, bytes, 0, numRead);
            }
            return crc;
        }
    }
    catch (Exception ex) when (ex is UnauthorizedAccessException || ex is IOException)
    {
        return 0;
    }
}

Now, when you run the app, you will get something like this:

You can then verify the files you want to remove and then go to Explorer and remove them. But there is one thing to do here: the time to compute the hash is very large, especially if you have a lot of data to process (large files, large number of files or both). Could it be improved?

This issue is somewhat complicated to solve. Fortunately, .NET provide us with an excellent tool to improve performance in this case: Parallel programming. By making a small change in the code, you can calculate the CRC of the files in parallel, thus improving the performance. But there is a catch: we are using classes that are not thread safe. If you use the common Dictionary and List to store the data, you will end up with wrong results. But, once again, .NET comes to rescue us: it provides the ConcurrentDictionary and ConcurrentBag to replace the common classes, so we can store the data in a thread safe way. We can then change the code to this:

private static async Task<Dictionary<uint, List<FileInfo>>> GetRealDuplicatesAsync(
    List<IGrouping<long, FileInfo>> files)
{
    var dictFiles = new ConcurrentDictionary<uint, ConcurrentBag<FileInfo>>();
    await Task.Factory.StartNew(() =>
    {
        Parallel.ForEach(files.SelectMany(g => g), file =>
        {
            var hash = GetCrc32FromFile(file.FullName);
            if (hash != 0)
            {
                if (dictFiles.ContainsKey(hash))
                    dictFiles[hash].Add(file);
                else
                    dictFiles.TryAdd(hash, new ConcurrentBag<FileInfo>(new[] { file }));
            }
        });
    });
    return dictFiles.Where(p => p.Value.Count > 1)
        .OrderByDescending(p => p.Value.First().Length)
        .ToDictionary(p => p.Key, p => p.Value.ToList());
}

When we do that and run our program again, we will see that more CPU is used for the processing and the times to get the list come to 46 seconds from 78 seconds (for 18GB of duplicate files).

Conclusions

With this program, we can show the largest duplicates in a folder and see what can be safely deleted in our disk, thus retrieving some space (in our case, we would have potentially got 9Gb extra). We’ve done some optimization in the code by parallelizing the calculations using the parallel extensions in .NET.

The source code for this article is in https://github.com/bsonnino/FindDuplicates

Some time ago I wrote a post about converting a WPF application into .NET Core. One thing that called my attention in this Build 2019 talk was that the performance for file enumerations was enhanced in the .NET core apps. So I decided to check this with my own app and see what happens in my machine.

I added some measuring data in the app, so I could see what happens there:

private async void StartClick(object sender, RoutedEventArgs e)
{
    var fbd = new WPFFolderBrowserDialog();
    if (fbd.ShowDialog() != true)
        return;
    FilesList.ItemsSource = null;
    ExtList.ItemsSource = null;
    ExtSeries.ItemsSource = null;
    AbcList.ItemsSource = null;
    AbcSeries.ItemsSource = null;
    var selectedPath = fbd.FileName;
    Int64 minSize;
    if (!Int64.TryParse(MinSizeBox.Text, out minSize))
        return;
    List<FileInfo> files = null;
    var sw = new Stopwatch();
    var timeStr = "";
    await Task.Factory.StartNew(() =>
    {
       sw.Start();
       files = GetFilesInDirectory(selectedPath).ToList();
       timeStr = $" {sw.ElapsedMilliseconds} for enumeration";
       sw.Restart();
       files = files.Where(f => f.Length >= minSize)
         .OrderByDescending(f => f.Length)
         .ToList();
       timeStr += $" {sw.ElapsedMilliseconds} for ordering and filtering";
    });
    var totalSize = files.Sum(f => f.Length);
    TotalFilesText.Text = $"# Files: {files.Count}";
    LengthFilesText.Text = $"({totalSize:N0} bytes)";
    sw.Restart();
    FilesList.ItemsSource = files;
    var extensions = files.GroupBy(f => f.Extension)
        .Select(g => new { Extension = g.Key, Quantity = g.Count(), Size = g.Sum(f => f.Length) })
        .OrderByDescending(t => t.Size).ToList();
    ExtList.ItemsSource = extensions;
    ExtSeries.ItemsSource = extensions;
    var tmp = 0.0;
    var abcData = files.Select(f =>
    {
        tmp += f.Length;
        return new { f.Name, Percent = tmp / totalSize * 100 };
    }).ToList();
    AbcList.ItemsSource = abcData;
    AbcSeries.ItemsSource = abcData.OrderBy(d => d.Percent).Select((d, i) => new { Item = i, d.Percent });
    timeStr += $"  {sw.ElapsedMilliseconds} to fill data";
    TimesText.Text = timeStr;
}

That way, I could measure two things: the time to enumerate the files and the times to sort, filter and assign the files to the lists. Then, I run the two programs, to see what happened.

The machine I’ve run is a Virtual machine with a Core I5 and 4 virtual processors and a virtualized hard disk, with 12,230 files (93.13 GB of data). The measures may vary on your machine, but the differences should be comparable. To avoid bias, I ran 3 times each program (in Admin mode), then rebooted and run the other one.

Here are the results I’ve got:

Run Enumerate Sort/Filter Assign
.NET
1 137031 96 43
2 58828 56 9
3 59474 55 8
Avg 85111 69 20
.NET Core
1 91105 120 32
2 33422 90 14
3 32907 87 20
Avg 52478 99 22

 

As you can see by the numbers, the .NET Core application improved a lot the times for file enumeration, but still lacks some effort for sorting/filtering and assigning data to the UI lists. But that’s not bad for a platform still in preview!

If you do some testingfor the performance, I’m curious to see what you’ve got, you can put your results and comments in the Comments section.

 

One thing that has been recently announced by Microsoft is the availability of .NET Core 3. With it, you will be able to create WPF and Winforms apps with .NET Core. And one extra bonus is that both WPF and Winforms are being open sourced. You can check these in https://github.com/dotnet/wpf and https://github.com/dotnet/winforms.

The first step to create a .NET Core WPF program is to download the .NET Core 3.0 preview from https://dotnet.microsoft.com/download/dotnet-core/3.0. Once you have it installed, you can check that it was installed correctly by open a Command Line window and typing dotnet –info and seeing the installed version:

:

With that in place, you can change the current folder to a new folder and type

dotnet new wpf
dotnet run

This will create a new .NET Core 3.0 WPF project and will compile and run it. You should get something like this:

If you click on the Exit button, the application exits. If you take a look at the folder, you will see that it generated the WPF project file, App.xaml and App.xaml.cs, MainWindow.xaml and MainWindow.xaml.cs. The easiest way to edit these files is to use Visual Studio Code. Just open Visual Studio Code and go to menu File/Open Folder and open the folder for the project. There you will see the project files and will be able to run and debug your code:

A big difference can be noted in the csproj file. If you open it, you will see something like this:

<Project Sdk="Microsoft.NET.Sdk.WindowsDesktop">

  <PropertyGroup>
    <OutputType>WinExe</OutputType>
    <TargetFramework>netcoreapp3.0</TargetFramework>
    <UseWPF>true</UseWPF>
  </PropertyGroup>

</Project>

That’s very simple and there’s nothing else in the project file. There are some differences between this project and other types of .NET Core, like the console one:

  • The output type is WinExe, and not Exe, in the console app
  • The UseWPF clause is there and it’s set to true

Now, you can modify and run the project inside VS Code. Modify MainWindow.xaml and put this code in it:

<Window x:Class="DotNetCoreWPF.MainWindow" 
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" 
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" 
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008" 
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
    xmlns:local="clr-namespace:DotNetCoreWPF" mc:Ignorable="d" Title="MainWindow" Height="450" Width="800">
    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height="*"/>
            <RowDefinition Height="40"/>
        </Grid.RowDefinitions>
        <Grid>
            <Grid.RowDefinitions>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
            </Grid.RowDefinitions>
            <Grid.ColumnDefinitions>
                <ColumnDefinition Width="*"/>
                <ColumnDefinition Width="2*"/>
            </Grid.ColumnDefinitions>

            <TextBlock Text="Id"      Grid.Column="0" Grid.Row="0" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="Name"    Grid.Column="0" Grid.Row="1" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="Address" Grid.Column="0" Grid.Row="2" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="City"    Grid.Column="0" Grid.Row="3" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="Email"   Grid.Column="0" Grid.Row="4" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="Phone"   Grid.Column="0" Grid.Row="5" Margin="5" VerticalAlignment="Center"/>
            <TextBox Grid.Column="1" Grid.Row="0" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="1" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="2" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="3" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="4" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="5" Margin="5"/>
        </Grid>
        <Button Content="Submit" Width="65" Height="35" Grid.Row="1" HorizontalAlignment="Right" VerticalAlignment="Center" Margin="5,0"/>
    </Grid>
</Window>

Now, you can compile and run the app in VS Code with F5, and you will get something like this:

If you don’t want to use Visual Studio Code, you can edit your project in Visual Studio 2019. The first preview still doesn’t have a visual editor for the XAML file, but you can edit the XAML file in the editor, it will work fine.

Porting a WPF project to .NET Core

To port a WPF project to .NET Core, you should run the Portability Analyzer tool first, to see what problems you will find before porting it to .NET Core. This tool can be found here. You can download it and run on your current application, and check what APIs that are not portable.

I will be porting my DiskAnalisys project. This is a simple project, that uses the File.IO functions to enumerate the files in a folder and uses two NuGet packages to add a Folder Browser and Charts to WPF. The first step is to run the portability analysis on it. Run the PortabilityAnalizer app and point it to the folder where the executable is located:

When you click on the Analyze button, it will analyze the executable and generate an Excel spreadsheet with the results:

As you can see, all the code is compatible with .NET Core 3.0. So, let’s port it to .NET Core 3.0. I will show you three ways to do it: creating a new project, updating the .csproj file and using a tool.

Upgrading by Creating a new project

This way needs the most work, but it’s the simpler to fix. Just create a new folder and name it DiskAnalysisCorePrj. Then open a command line window and change the directory to the folder you’ve created. Then, type these commands:

dotnet new wpf
dotnet add package wpffolderbrowser
dotnet add package dotnetprojects.wpf.toolkit
dotnet run

These commands will create the WPF project, add the two required NuGet packages and run the default app. You may see a warning like this:

D:\Documentos\Artigos\Artigos\CSharp\WPFCore\DiskAnalysisCorePrj\DiskAnalysisCorePrj.csproj : warning NU1701: Package 'DotNetProjects.Wpf.Toolkit 5.0.43' was restored using '.NETFramework,Version=v4.6.1' instead of the project target framework '.NETCoreApp,Version=v3.0'. This package may not be fully compatible with your project.
D:\Documentos\Artigos\Artigos\CSharp\WPFCore\DiskAnalysisCorePrj\DiskAnalysisCorePrj.csproj : warning NU1701: Package 'WPFFolderBrowser 1.0.2' was restored using '.NETFramework,Version=v4.6.1' instead of the project target framework '.NETCoreApp,Version=v3.0'. This package may not be fully compatible with your project.
D:\Documentos\Artigos\Artigos\CSharp\WPFCore\DiskAnalysisCorePrj\DiskAnalysisCorePrj.csproj : warning NU1701: Package 'DotNetProjects.Wpf.Toolkit 5.0.43' was restored using '.NETFramework,Version=v4.6.1' instead of the project target framework '.NETCoreApp,Version=v3.0'. This package may not be fully compatible with your project.
D:\Documentos\Artigos\Artigos\CSharp\WPFCore\DiskAnalysisCorePrj\DiskAnalysisCorePrj.csproj : warning NU1701: Package 'WPFFolderBrowser 1.0.2' was restored using '.NETFramework,Version=v4.6.1' instead of the project target framework '.NETCoreApp,Version=v3.0'. This package may not be fully compatible with your project.

This means that the NuGet packages weren’t converted to .NET Core 3.0, but they are still usable (remember, the compatibility report showed 100% compatibility). Then, copy MainWindow.xaml and MainWindow.xaml.cs from the original folder to the new one. We don’t need to copy any other files, as no other files were changed. Then, type

dotnet run

and the program is executed:

Converting by Changing the .csproj file

This way is very simple, just changing the project file, but can be challenging, especially for very large projects. Just create a new folder and name it DiskAnalysisCoreCsp. Copy all files from the main folder of the original project (there’s no need of copying the Properties folder) and edit the .csproj file, changing it to:

<Project Sdk="Microsoft.NET.Sdk.WindowsDesktop">
  <PropertyGroup>
    <OutputType>WinExe</OutputType>
    <TargetFramework>netcoreapp3.0</TargetFramework>
    <UseWPF>true</UseWPF>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="dotnetprojects.wpf.toolkit" Version="5.0.43" />
    <PackageReference Include="wpffolderbrowser" Version="1.0.2" />
  </ItemGroup>
</Project>

Then, type

dotnet run

and the program is executed.

Converting using a tool

The third way is to use a tool to convert the project. You must install the conversion extension created by Brian Lagunas, from here. Then, open your WPF project in Visual Studio, right-click in the project and select “Convert Project to .NET Core 3”.

That’s all. You now have a NET Core 3 app. If you did that in Visual Studio 2017, you won’t be able to open the project, you will need to compile it with dotnet run, or open it in Visual Studio code.

Conclusions

As you can see, although this is the first preview of WPF .NET Core, it has a lot of work done, and you will be able to port most of your WPF projects to .NET Core.

As an MVP, I sometimes receive licenses to software from the vendors for my usage. Some of them become indispensable for me and I feel in the obligation to write a review (yes, it’s a biased review, as I really like the tool and use it on a daily basis :-)) as a way to say thank you!

One of these tools is Linqpad (https://www.linqpad.net/). It’s a simple tool, with a small footprint, but I have used it in so many ways that I find it incredible. There is a free version that has a lot of features to start, but I really recommend the paid version (if you have the $95 to spend, the Premium edition has even a debugger to debug your snippets).

Introduction

Once you open Linqpad, you will see a simple desktop like this:

At first, the name of the tool may indicate that this is a notepad for linq queries, but it’s much more than that! If you take a look at the Samples pane, you can see that there’s even an Interactive Regex Evaluator.

A closer look at that pane shows that you are not tied to C#: you can also use F# there. In fact, there is a full F# tutorial there. If you open the Language combo, you can see that you can use also VB or SQL queries.

My first usages in Linqpad were to learn Linq (the name is Linqpad, no?). At the beginning, Linq seems a little bit daunting, with all those extension methods and lambdas. So, I started to try some Linq queries, making them more difficult as my knowledge was improving. In Linqpad, you have three flavors of code: Expressions, where you have a single expression evaluated; Statements, where you have some statements evaluated and Program, where you can have a full program run in Linqpad (I use this when I want to run a console program and don’t want to open Visual Studio and create a new project).

In the Expression mode, you can enter a single expression, like this:

from i in Enumerable.Range(1,1000)
  where i % 2 == 0
  select i

If you run it, you will see the result in the Results pane:

As you can see, all the results are there, there is no need to open a console window or anything else. And, what’s better, you can export the results to Excel, Word or HTML. You can also use the other Linq format, the functional one:

Enumerable.Range(1,1000).Where(i => i %2 == 0)

After that, you can start tweaking your code and clicking on the Run button and observing the results. If you have the paid version, you also have Intellisense in the code, so you can check the syntax.

For example, to get the sum of the squares of the even numbers, we can do something like this:

If we have something more complicated than a single expression, we can run it using the C# statements. For example, to get all methods and parameters of the methods in the Directory class, we can use these statements:

var methodInfos = typeof(Directory).GetMethods(BindingFlags.Public | 
  BindingFlags.Static);

methodInfos.Select(m => new 
{
  m.Name, 
  Parameters = m.GetParameters() 
}).Dump();

You may have noticed something different in the code above: the Dump method. Linqpad adds this method to dump the values to the results pane. It is very powerful, you don’t need to know the type of the object, all the properties are shown there:

And you are not limited to old C#, you can also use C#7 features and even async programming. For example, this code (based on https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/walkthrough-accessing-the-web-by-using-async-and-await) will download asynchronously some pages from the web and will display their sizes:

async Task Main()
{
	await SumPageSizesAsync().Dump();
}

private async Task<List<string>> SumPageSizesAsync()
{
	var results = new List<string>();
	// Declare an HttpClient object and increase the buffer size. The
	// default buffer size is 65,536.
	HttpClient client =
		new HttpClient() { MaxResponseContentBufferSize = 1000000 };

	// Make a list of web addresses.
	List<string> urlList = SetUpURLList();

	var total = 0;

	foreach (var url in urlList)
	{
		// GetByteArrayAsync returns a task. At completion, the task
		// produces a byte array.
		byte[] urlContents = await client.GetByteArrayAsync(url);

		// The following two lines can replace the previous assignment statement.
		//Task<byte[]> getContentsTask = client.GetByteArrayAsync(url);
		//byte[] urlContents = await getContentsTask;

		results.Add(DisplayResults(url, urlContents));

		// Update the total.
		total += urlContents.Length;
	}

	// Display the total count for all of the websites.
	results.Add(
		$"\r\n\r\nTotal bytes returned:  {total}\r\n");
	return results;
}

private List<string> SetUpURLList()
{
	List<string> urls = new List<string>
			{
				"https://msdn.microsoft.com/library/windows/apps/br211380.aspx",
				"https://msdn.microsoft.com",
				"https://msdn.microsoft.com/library/hh290136.aspx",
				"https://msdn.microsoft.com/library/ee256749.aspx",
				"https://msdn.microsoft.com/library/hh290138.aspx",
				"https://msdn.microsoft.com/library/hh290140.aspx",
				"https://msdn.microsoft.com/library/dd470362.aspx",
				"https://msdn.microsoft.com/library/aa578028.aspx",
				"https://msdn.microsoft.com/library/ms404677.aspx",
				"https://msdn.microsoft.com/library/ff730837.aspx"
			};
	return urls;
}

private string DisplayResults(string url, byte[] content)
{
	// Display the length of each website. The string format
	// is designed to be used with a monospaced font, such as
	// Lucida Console or Global Monospace.
	var bytes = content.Length;
	// Strip off the "https://".
	var displayURL = url.Replace("https://", "");
	return $"\n{displayURL,-58} {bytes,8}";
}

When you run it, you will see something like this:

And you are not tied to the default C# libraries. If you have the Developer or Premium versions, you can download and use NuGet packages in your queries. For example in this previous article, I’ve shown how to use the Microsoft.SqlServer.TransactSql.ScriptDom package to parse your Sql Server code. You don’t even need to open Visual Studio for that. Just put this code in the Linqpad window:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con)
		  .Select(n => new { ProcName = n, Tree = ParseSql(GetProcText(con, n)) })
		  .Dump();
	}
}

private static List<string> GetStoredProcedures(SqlConnection con)
{
	using (SqlCommand sqlCommand = new SqlCommand("select s.name+'.'+p.name as name from sys.procedures p " +
	  "inner join sys.schemas s on p.schema_id = s.schema_id order by name", con))
	{
		using (DataTable procs = new DataTable())
		{
			procs.Load(sqlCommand.ExecuteReader());
			return procs.Rows.OfType<DataRow>().Select(r => r.Field<String>("name")).ToList();
		}
	}
}

private static string GetProcText(SqlConnection con, string procName)
{
	using (SqlCommand sqlCommand = new SqlCommand("sys.sp_helpText", con)
	{
		CommandType = CommandType.StoredProcedure
	})
	{
		sqlCommand.Parameters.AddWithValue("@objname", procName);
		using (var proc = new DataTable())
		{
			try
			{
				proc.Load(sqlCommand.ExecuteReader());
				return string.Join("", proc.Rows.OfType<DataRow>().Select(r => r.Field<string>("Text")));
			}
			catch (SqlException)
			{
				return null;
			}
		}
	}
}

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
	var parser = new TSql150Parser(true);
	using (var textReader = new StringReader(procText))
	{
		var sqlTree = parser.Parse(textReader, out var errors);

		return (sqlTree, errors);
	}
}

You will see some missing references. Just press F4 and it will open the following screen:

Click the Add NuGet button and add the Microsoft.SqlServer.TransactSql.ScriptDom package, then run the program. You will see something like this:

You can even click on the ScriptTokenStream result, to see the list of tokens in the procedure:

You can also simplify the query by using the connections available in Linqpad. Just go to the connections pane, add a new connection and point it to the WorldWideImporters database. Then select the connection in the connections combo and use this code:

void Main()
{
	ExecuteQuery<string>("select s.name+'.'+p.name as name from sys.procedures p " +
	  "inner join sys.schemas s on p.schema_id = s.schema_id order by name")
		  .Select(n => new 
		    { 
			  ProcName = n, 
			  Tree = ParseSql(ExecuteQuery<string>("exec sys.sp_helpText @objname={0}",n).FirstOrDefault()) 
			})
		  .Dump();
}

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
	var parser = new TSql150Parser(true);
	using (var textReader = new StringReader(procText))
	{
		var sqlTree = parser.Parse(textReader, out var errors);

		return (sqlTree, errors);
	}
}

You will see the same results. As you can see, you don’t even need to open the connection and create the command to run it. You can run your queries against your databases the same way you would do with any data. And if you are a SQL guy, you can run your queries directly using the SQL language. And, if you are brave and want to learn F#, you have here a really nice tool to learn.

Conclusions

At first, the size and appearance of Linqpad may fool you, but it’s a very nice tool to work, saving you a lot of time to try and debug your code. If you have some code snipped that you want to test and improve, this is the tool to use. And, one feature that I didn’t mention that’s invaluable when you are optimizing your code is the timing feature. After the execution of each query, Linqpad shows the execution time, so you can know how long did it take to execute it.

Some time ago, I’ve written this post about sending parametrized emails with C#. After some time, the emails weren’t arriving to their destination and I didn’t know why that happened – there was no message and the program was out in production, so I didn’t have an easy way to debug it.

Then, I went to the oracle that knows everything and asked “what is the easiest way to debug a C# program that sends emails in production?”. The oracle answered: “go to this page and you will find the answers you are looking for”. Once again, it was right.

Yes, the post pointed me to the easiest way to add debugging to your app that sends emails. There is not even the need to change the app. Just add a new clause to your app.config file and that’s it. Your app will generate a log file with all the data sent and received. The clause to add is system.diagnostics and you should add the traces you want and send them to a listener:

<system.diagnostics>
  <trace autoflush="true" />
  <sources>
    <source name="System.Net" >
      <listeners>
        <add name="MyTraceFile"/>
      </listeners>
    </source>
    <source name="System.Net.Sockets">
      <listeners>
        <add name="MyTraceFile"/>
      </listeners>
    </source>
  </sources>
  <sharedListeners>
    <add  name="MyTraceFile" type="System.Diagnostics.TextWriterTraceListener" 
          initializeData="System.Net.trace.log"   />
  </sharedListeners>
  <switches>
    <add name="System.Net" value="Verbose" />
    <add name="System.Net.Sockets" value="Verbose" />
  </switches>
</system.diagnostics>

The clause above will trace all calls to System.Net and System.Net.Sockets, with a Verbose level, to a file named System,Net.trace.log.

You can change the level of the tracing to something less verbose, but one note here: if you try to use the Info level, as stated here you will get an error. You should use the Information level. Using the number levels don’t show an error, but don’t generate a log either. So, you should use the named levels.

After that, I could see the logs and notice what has happened:

System.Net Error: 0 : [22720] Exception in SmtpClient#58328727::Send - The SMTP server requires a secure connection or the client was not authenticated. The server response was: 5.5.1 Authentication Required. Learn more at.
   at System.Net.Mail.MailCommand.CheckResponse(SmtpStatusCode statusCode, String response)
   at System.Net.Mail.MailCommand.Send(SmtpConnection conn, Byte[] command, MailAddress from, Boolean allowUnicode)
   at System.Net.Mail.SmtpTransport.SendMail(MailAddress sender, MailAddressCollection recipients, String deliveryNotify, Boolean allowUnicode, SmtpFailedRecipientException& exception)
   at System.Net.Mail.SmtpClient.Send(MailMessage message)
System.Net.Sockets Verbose: 0 : [22720] Entering Socket#24827179::Dispose()
System.Net Error: 0 : [22720] Exception in AppDomain#41560081::UnhandledExceptionHandler - The SMTP server requires a secure connection or the client was not authenticated. The server response was: 5.5.1 Authentication Required. Learn more at.
   at System.Net.Mail.MailCommand.CheckResponse(SmtpStatusCode statusCode, String response)
   at System.Net.Mail.MailCommand.Send(SmtpConnection conn, Byte[] command, MailAddress from, Boolean allowUnicode)
   at System.Net.Mail.SmtpTransport.SendMail(MailAddress sender, MailAddressCollection recipients, String deliveryNotify, Boolean allowUnicode, SmtpFailedRecipientException& exception)
   at System.Net.Mail.SmtpClient.Send(MailMessage message)
   at HtmlEmail.Program.Main(String[] args) in D:\Documentos\Artigos\Artigos\CSharp\HtmlEmail\HtmlEmail\HtmlEmail\Program.cs:line 41

Just a change in the password!

In the last post, I’ve shown how you can use the Microsoft libraries to parse your SQL Server code in C#. You can use the same parser to reformat your code, but that would be a lot of work: parse the code, check all structures and reformat the code according to your needs.

But there is an easier way: the SqlScriptGenerator class. This class makes a breeze to get your unformatted code and reformat it according to your options. In this article, we will develop a WPF program that takes a SQL Code snippet typed in TextBox and reformats it using the options you give.

Create a WPF program and name it SqlReformatter. In the solution explorer, add the NuGet package Microsoft.SqlServer.TransactSql.ScriptDom. In the XAML file for the Main page, add this code:

<Window x:Class="SqlReformatter.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        mc:Ignorable="d"
        Title="MainWindow" Height="700" Width="1024">
    <Grid>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="200"/>
            <ColumnDefinition Width="*"/>
        </Grid.ColumnDefinitions>
        <StackPanel Grid.Column="0" Margin="5" x:Name="stackChecks">
            <CheckBox Content="AlignClauseBodies" Click="OptionClick"/>
            <CheckBox Content="AlignColumnDefinitionFields" Click="OptionClick"/>
            <CheckBox Content="AlignSetClauseItem" Click="OptionClick"/>
            <CheckBox Content="AsKeywordOnOwnLine" Click="OptionClick"/>
            <CheckBox Content="IncludeSemicolons" Click="OptionClick"/>
            <CheckBox Content="IndentSetClause" Click="OptionClick"/>
            <CheckBox Content="IndentViewBody" Click="OptionClick"/>
            <CheckBox Content="MultilineInsertSourcesList" Click="OptionClick"/>
            <CheckBox Content="MultilineInsertTargetsList" Click="OptionClick"/>
            <CheckBox Content="MultilineSelectElementsList" Click="OptionClick"/>
            <CheckBox Content="MultilineSetClauseItems" Click="OptionClick"/>
            <CheckBox Content="MultilineViewColumnsList" Click="OptionClick"/>
            <CheckBox Content="MultilineWherePredicatesList" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeCloseParenthesisInMultilineList" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeFromClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeGroupByClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeHavingClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeJoinClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeOffsetClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeOpenParenthesisInMultilineList" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeOrderByClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeOutputClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeWhereClause" Click="OptionClick"/>
            <StackPanel Margin="0,5">
                <TextBlock Text="KeywordCasing"/>
                <ComboBox SelectionChanged="CaseChanged" x:Name="cbxCase">
                    <ComboBoxItem>LowerCase</ComboBoxItem>
                    <ComboBoxItem>UpperCase</ComboBoxItem>
                    <ComboBoxItem>PascalCase</ComboBoxItem>
                </ComboBox>
            </StackPanel>
            <StackPanel Orientation="Horizontal" Margin="0,5">
                <TextBlock Text="IndentationSize"/>
                <TextBox Width="60" Margin="5,0" TextChanged="IndentChanged" x:Name="txtIdent"/>
            </StackPanel>
        </StackPanel>
        <Button VerticalAlignment="Bottom" HorizontalAlignment="Right" Grid.Column="0"
                Margin="5" Content="Reformat" Click="ReformatSqlClick"/>
        <Grid Grid.Column="1">
            <Grid.RowDefinitions>
                <RowDefinition Height="*"/>
                <RowDefinition Height="*"/>
            </Grid.RowDefinitions>
            <TextBox x:Name="SourceBox" Margin="5" Grid.Row="0"/>
            <TextBox x:Name="DestBox" Margin="5" Grid.Row="1" IsReadOnly="True"/>
        </Grid>
    </Grid>
</Window>

As you can see, we are adding at the right a set of Checkboxes, each one with one property of the SqlScriptOptions class. When we check each box, we will change the reformatting options. To do that, we must use this code:

private void OptionClick(object sender, RoutedEventArgs e)
{
    if (sender is CheckBox checkBox)
    {
        var option = checkBox.Content.ToString();
        PropertyInfo pinfo = typeof(SqlScriptGeneratorOptions).GetProperty(option);
        pinfo?.SetValue(_options, checkBox.IsChecked == true);
    }
}

In this case, we are using reflection to get the property corresponding to the clicked Checkbox and we set its value according to the IsChecked value. We must also add two event handlers, for the case combobox and for the indent textbox:

private void CaseChanged(object sender, SelectionChangedEventArgs e)
{
    var selectedCase = (sender as ComboBox)?.SelectedIndex;
    if (selectedCase != null)
        _options.KeywordCasing = (KeywordCasing) selectedCase;
}

private void IndentChanged(object sender, TextChangedEventArgs e)
{
    if (int.TryParse((sender as TextBox)?.Text, out int size))
        _options.IndentationSize = size;
}

Once we have this set, we need to add the code to the click of the Reformat  button:

private void ReformatSqlClick(object sender, RoutedEventArgs e)
{
    var sqlSrc = SourceBox.Text;
    if (string.IsNullOrWhiteSpace(sqlSrc))
        return;
    var processed = ParseSql(sqlSrc);
    if (processed.errors.Any())
    {
        var sb = new StringBuilder("Errors found:");
        foreach (var error in processed.errors)
        {
            sb.AppendLine($"     Line: {error.Line}  Col: {error.Column}: {error.Message}");
        }
    }
    else
    {
        var scriptGenerator = new Sql150ScriptGenerator(_options);
        scriptGenerator.GenerateScript(processed.sqlTree, out string sqlDst);
        DestBox.Text = sqlDst;
    }
}

We parse the code in the source box, and if there are any errors, we show them in the destination box. If there are no errors, the code is reformatted according to the selected options. The ParseSql method is similar to the one shown in the last article:

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
    var parser = new TSql150Parser(true);
    using (var textReader = new StringReader(procText))
    {
        var sqlTree = parser.Parse(textReader, out var errors);

        return (sqlTree, errors);
    }
}

Now, we only need to initialize the UI in the beginning, so all the options are up-to-date with the SqlScriptGeneratorOptions instance:

public MainWindow()
{
    InitializeComponent();
    _options = new SqlScriptGeneratorOptions();
    txtIdent.Text = _options.IndentationSize.ToString();
    cbxCase.SelectedIndex = (int) _options.KeywordCasing;
    foreach (var child in stackChecks.Children)
    {
        if (child is CheckBox check)
        {
            var checkContent = check.Content.ToString();
            PropertyInfo pinfo = typeof(SqlScriptGeneratorOptions).GetProperty(checkContent);
            check.IsChecked = (bool?)pinfo?.GetValue(_options) == true;
        }

    }
}

This code also uses reflection to get the property values and fill the data in the boxes. When you run this program, you have something like this:

You can change any options and click the Reformat button and the code will be reformatted accordingly.

Conclusions

As you can see, with very little code you can create a SQL Reformatter to reformat your code and make sure that it agrees with your standards. This formatter has many options and can be used also to parse Sql source code.

All the source code for this article is at https://github.com/bsonnino/SqlReformatter

In the last post, I’ve shown how to parse SQL Server code with C# and get all tokens in it, showing their types. This is very nice, there is a lot you can do with that, but there is a pitfall: you don’t have the context the token was used. For example, you have an identifier, but you don’t know if it is a parameter or a variable declared in the procedure. You don’t know its type and where it is used.

Sometimes you need more information about the token and you can’t get it unless you analyze the code. Fortunately, Microsoft has already done that for us and provided a wonderful tool, so we can get the context of the tokens: you can create a new class that inherits from TSqlFragmentVisitor and override its methods, to visit the node types you want (if this doesn’t make sense to you right now, keep on reading, you’ll see it’s very simple).

The TSqlFragmentVisitor class is a massive class that uses the Visitor Pattern to analyze the SQL tree and visit all the nodes, so you can take action on the nodes that come to your attention. If you take a look at it, you will see that there are a huge number of Visit and ExplicitVisit overrrides, one for each kind of node you can have in a SQL Server procedure. Using Visit is similar to ExplicitVisit, with the difference that with ExplicitVisit you can control if the children nodes are also visited.

To use it, you must parse the tree in the same way you did in the last article and then create a visitor instance and pass it in the Accept method. This code shows how this is done:

static void Main(string[] args)
{
    var sql = "Select * from customer";
    var parsed = ParseSql(sql);
    if (parsed.errors.Any())
        return;
    var visitor = new SelectVisitor();
    parsed.sqlTree.Accept(visitor);
    Console.ReadLine();
}

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
    var parser = new TSql150Parser(true);
    using (var textReader = new StringReader(procText))
    {
        var sqlTree = parser.Parse(textReader, out var errors);

        return (sqlTree, errors);
    }
}

The visitor class is something like this:

internal class SelectVisitor : TSqlFragmentVisitor
{
    public override void Visit(SelectStatement node)
    {
        Console.WriteLine($"Visiting Select: {node}");
    }

    public override void Visit(QueryExpression node)
    {
        Console.WriteLine($"Visiting QueryExpression: {node}");
    }

    public override void Visit(QuerySpecification node)
    {
        Console.WriteLine($"Visiting QuerySpecification: {node}");
    }

    public override void Visit(SelectStarExpression node)
    {
        Console.WriteLine($"Visiting SelectStarExpression: {node}");
    }
}

Running this code, you will get something like this:

As you can see, the node and its children are visited and we have no control on that. If you want to have some control, you must use the ExplicitVisit:

internal class SelectVisitor : TSqlFragmentVisitor
{
    public override void ExplicitVisit(SelectStatement node)
    {
        Console.WriteLine($"Visiting Select: {node}");
    }

    public override void Visit(QueryExpression node)
    {
        Console.WriteLine($"Visiting QueryExpression: {node}");
    }

    public override void Visit(QuerySpecification node)
    {
        Console.WriteLine($"Visiting QuerySpecification: {node}");
    }

    public override void Visit(SelectStarExpression node)
    {
        Console.WriteLine($"Visiting SelectStarExpression: {node}");
    }
}

As you can see, the child nodes are not visited. To visit them, you must call the base method, like this:

internal class SelectVisitor : TSqlFragmentVisitor
{
    public override void ExplicitVisit(SelectStatement node)
    {
        Console.WriteLine($"Visiting Select: {node}");
        base.ExplicitVisit(node);
    }

    public override void ExplicitVisit(QueryExpression node)
    {
        Console.WriteLine($"Visiting QueryExpression: {node}");
    }

    public override void ExplicitVisit(QuerySpecification node)
    {
        Console.WriteLine($"Visiting QuerySpecification: {node}");
    }

    public override void ExplicitVisit(SelectStarExpression node)
    {
        Console.WriteLine($"Visiting SelectStarExpression: {node}");
    }
}

Nice, no? With this simple code, you can write something to check if any procedure in your database has a “select *”, which can be flagged as an error. Just override the Visit for the SelectStarExpression and flag the error if it’s visited.

Generating statistics from the database procedures

With this knowledge, we can write a program that generates statistics from the database procedures. We want to know how many inserts, deletes and updates there are in the procedures and what are the tables used with these commands. We also can know how many tables are created and dropped in the procedures.

For that, create a new console application and name it DataStats Add the NuGet package Microsoft.SqlServer.TransactSql.ScriptDom and add this code to Program.cs:

     static void Main(string[] args)
     {
         using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
         {
             con.Open();
             var procTexts = GetStoredProcedures(con)
                 .Select(n => new { ProcName = n, ProcText = GetProcText(con, n) })
                 .ToList();
             var procTrees = procTexts.Select(p =>
             {
                 var processed = ParseSql(p.ProcText);
                 var visitor = new StatsVisitor();
                 if (!processed.errors.Any())
                     processed.sqlTree.Accept(visitor);
                 return new { p.ProcName, processed.sqlTree, processed.errors, visitor };
             }).ToList();
             foreach (var procTree in procTrees)
             {
                 Console.WriteLine(procTree.ProcName);
                 if (procTree.errors.Any())
                 {
                     Console.WriteLine("   Errors found:");
                     foreach (var error in procTree.errors)
                     {
                         Console.WriteLine($"     Line: {error.Line}  Col: {error.Column}: {error.Message}");
                     }
                 }
                 else
                 {
                     var visitor = procTree.visitor;
                     Console.WriteLine($"  Inserts: {visitor.Inserts}");
                     foreach (var table in visitor.InsertTables)
                     {
                         Console.WriteLine($"      {table}");
                     }

                     Console.WriteLine($"  Updates: {visitor.Updates}");
                     foreach (var table in visitor.UpdateTables)
                     {
                         Console.WriteLine($"      {table}");
                     }

                     Console.WriteLine($"  Deletes: {visitor.Deletes}");
                     foreach (var table in visitor.DeleteTables)
                     {
                         Console.WriteLine($"      {table}");
                     }

                     Console.WriteLine($"  Creates: {visitor.Creates}");
                     foreach (var table in visitor.CreateTables)
                     {
                         Console.WriteLine($"      {table}");
                     }

                     Console.WriteLine($"  Drops: {visitor.Drops}");
                     foreach (var table in visitor.DropTables)
                     {
                         Console.WriteLine($"      {table}");
                     }
                 }
             }
         }

         Console.ReadLine();
     }

     private static List<string> GetStoredProcedures(SqlConnection con)
     {
         using (SqlCommand sqlCommand = new SqlCommand(
             "select s.name+'.'+p.name as name from sys.procedures p " + 
             "inner join sys.schemas s on p.schema_id = s.schema_id order by name",
             con))
         {
             using (DataTable procs = new DataTable())
             {
                 procs.Load(sqlCommand.ExecuteReader());
                 return procs.Rows.OfType<DataRow>().Select(r => r.Field<String>("name")).ToList();
             }
         }
     }

     private static string GetProcText(SqlConnection con, string procName)
     {
         using (SqlCommand sqlCommand = new SqlCommand("sys.sp_helpText", con)
         {
             CommandType = CommandType.StoredProcedure
         })
         {
             sqlCommand.Parameters.AddWithValue("@objname", procName);
             using (var proc = new DataTable())
             {
                 try
                 {
                     proc.Load(sqlCommand.ExecuteReader());
                     return string.Join("", proc.Rows.OfType<DataRow>().Select(r => r.Field<string>("Text")));
                 }
                 catch (SqlException)
                 {
                     return null;
                 }
             }
         }
     }

     private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
     {
         var parser = new TSql150Parser(true);
         using (var textReader = new StringReader(procText))
         {
             var sqlTree = parser.Parse(textReader, out var errors);

             return (sqlTree, errors);
         }
     }
 }

This code is very similar to the one in the last post, but it will create a visitor and will call the Accept method to visit its nodes. The visitor’s code is:

internal class StatsVisitor : TSqlFragmentVisitor
{
    public int Inserts { get; private set; }
    public int Updates { get; private set; }
    public int Deletes { get; private set; }
    public int Creates { get; private set; }
    public int Drops { get; private set; }
    public List<string> InsertTables { get; }
    public List<string> UpdateTables { get; }
    public List<string> DeleteTables { get; }
    public List<string> CreateTables { get; }
    public List<string> DropTables { get; }

    public StatsVisitor()
    {
        InsertTables = new List<string>();
        UpdateTables = new List<string>();
        DeleteTables = new List<string>();
        CreateTables = new List<string>();
        DropTables = new List<string>();
    }

    public override void Visit(InsertStatement node)
    {
        Inserts++;
        InsertTables.Add((node.InsertSpecification.Target as NamedTableReference)?.
            SchemaObject.BaseIdentifier.Value);
    }

    public override void Visit(UpdateStatement node)
    {
        Updates++;
        UpdateTables.Add((node.UpdateSpecification.Target as NamedTableReference)?.
            SchemaObject.BaseIdentifier.Value);
    }

    public override void Visit(DeleteStatement node)
    {
        Deletes++;
        UpdateTables.Add((node.DeleteSpecification.Target as NamedTableReference)?.
            SchemaObject.BaseIdentifier.Value);
    }

    public override void Visit(CreateTableStatement node)
    {
        Creates++;
        CreateTables.Add(node.SchemaObjectName.BaseIdentifier.Value);
    }

    public override void Visit(DropTableStatement node)
    {
        Drops++;
        DropTables.AddRange(node.Objects.Select(o => o.BaseIdentifier.Value));
    }

}

When there is an Insert, Delete, Update, Create Table or Drop Table, the corresponding method will be called and the properties referring to the node will be updated. At the end you will have the statistics for all the procedures in the database:

There is just a minor glitch in this code: as you can see in the Inserts, there is a blank line. That is due to the fact that the insert target isn’t a table, but a variable. This code solves the issue:

public override void Visit(InsertStatement node)
{
    Inserts++;
    var targetName = (node.InsertSpecification.Target as NamedTableReference)?.
        SchemaObject.BaseIdentifier.Value ??
      (node.InsertSpecification.Target as VariableTableReference)?.Variable.Name;
    InsertTables.Add(targetName);
}

Conclusions

As you can see, Microsoft has put a lot of work in this parser, you can have a lot of power when working with SQL Server code in C# with this parser. The options are endless: check quality of the code, refactor code, make statistics, and so on. In the next post we will see a simple way to reformat your SQL Server code, so you can ensure that the code follows your standards. See you then.

The code for this article is in https://github.com/bsonnino/SqlParsing

When you are using SQL Server stored procedures, you get a lot of flexibility in your code – changing the code doesn’t need a recompile, deploying the new or updated procedures is as easy as sending and running text scripts, and so on. I won’t enter on the pros and cons of keeping code in the database, as this will open a very long discussion, with no winners.

In this article, I will focus on one aspect of the SQL Server code, the lack of rigid rules for the code. It all starts with being case insensitive, something like “SELECT * from customer” is the same of “select * FROM customer”. It’s not a great deal, but when you have a lot of stored procedures and many people maintaining it, you may need to enforce some standards. In this article I will show you how to use C# to parse SQL Server code and enforce some coding standards in your SQL Server procedures.

Retrieving the stored procedures from the database

The first step for processing the stored procedures is to retrieve them from the database. For this article, we will be using the WorldWideImporters sample database, that can be downloaded from https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0.

To get the names of all procedures in the database, you must use a command like this one:

select s.name+'.'+p.name as name from sys.procedures p 
inner join sys.schemas s on p.schema_id = s.schema_id 
order by name

Once you have the names of all the procedures in the database, you can get their code with the procedure sys.sp_helpText, passing the procedure name as a parameter. This code will get all procedures from the database and store them in a list:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => new {ProcName = n, ProcText = GetProcText(con, n)}).ToList();
	}
}

private static List<string> GetStoredProcedures(SqlConnection con)
{
	using (SqlCommand sqlCommand = new SqlCommand("select s.name+'.'+p.name as name from sys.procedures p " +
	  "inner join sys.schemas s on p.schema_id = s.schema_id order by name", con))
	{
		using (DataTable procs = new DataTable())
		{
			procs.Load(sqlCommand.ExecuteReader());
			return procs.Rows.OfType<DataRow>().Select(r => r.Field<String>("name")).ToList();
		}
	}
}

private static string GetProcText(SqlConnection con, string procName)
{
	using (SqlCommand sqlCommand = new SqlCommand("sys.sp_helpText", con)
	{
		CommandType = CommandType.StoredProcedure
	})
	{
		sqlCommand.Parameters.AddWithValue("@objname", procName);
		using (var proc = new DataTable())
		{
			try
			{
				proc.Load(sqlCommand.ExecuteReader());
				return string.Join("", proc.Rows.OfType<DataRow>().Select(r => r.Field<string>("Text")));
			}
			catch (SqlException)
			{
				return null;
			}
		}
	}
}

This code will open a connection to the database, get all procedure names and then get their text, returning a list of an anonymous class with two members: ProcName and ProcText. You can then process all the procedures in the database.

Parsing the procedures

The next step is to parse the procedures to split into the containing parts. This is not an easy task to make from scratch. The SQL Server language is not easy and evolves with each version of SQL Server. There are many ways to do the same thing, so writing a parser is not something I would recommend. Fortunately, Microsoft has already written a parser for us, and it’s made available via NuGet package. To use it, just add the Microsoft.SqlServer.TransactSql.ScriptDom package to your app. When you add the NuGet package to the app you will see it’s not an official package: this dll is part of a larger package, Microsoft.SqlServer.DacFx.x64 and it’s there only to make the number of dlls added smaller.

Once you add the package, you can use this code to parse the code of the procedure:

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
	var parser = new TSql150Parser(true);
	using (var textReader = new StringReader(procText))
	{
		var sqlTree = parser.Parse(textReader, out var errors);

		return (sqlTree, errors);
	}
}

This method uses a new feature of C# 7.0, ValueTuples. If you are using .NET version 4.7 or higher, you don’t need to add anything to use it. If you are using an older version, you must add the System.ValueTuple NuGet package to your project. At the beginning, we instantiate a TSql150Parser class. Depending on the version of SQL server that you want to parse, you can choose a different parser class. Then, we call the Parse method, that will parse the code, return any errors in the errors variable and return a TSqlFragment with the parsed data. Just with this simple code, you already have many benefits: you can parse the procedure for many versions of SQL Server, check if the procedure has errors in it and parse the procedure into its components. The returned tree is of type TSqlScript, that has some interesting properties:

  • Batches – gets the batches in the procedure, so you can process them individually
  • FirstTokenIndex – shows where the fragment begins – for the script, it usually starts at 0, but when you are working with other kinds of fragments, that may not be the case
  • FragmentLength – size of the fragment, in bytes
  • LastTokenIndex – shows where the fragment ends
  • ScriptTokenStream – the list of parsed tokens
  • StartColumn – start column of the fragment
  • StartLine – start line of the fragment
  • Offset – offset in bytes from the start

As you can see, there is already a lot of information there. When you take a look at the ScriptTokenStream, you can see a lot more there:

The SQL code is parsed into tokens and every token is categorized. With this code, you can group the tokens and have brief statistics of your code:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => 
		  new { ProcName = n, ProcText = GetProcText(con, n) }).ToList();
		var procTrees = procTexts.Select(p =>
		{
			var processed = ParseSql(p.ProcText);
			return new { p.ProcName, processed.sqlTree, processed.errors };
		});
		var procStats = procTrees
		  .SelectMany(p => p.sqlTree.ScriptTokenStream)
		  .GroupBy(t => t.TokenType)
		  .Select(g => new { Key = g.Key, Count = g.Count()})
		  .OrderByDescending(g => g.Count);
	}
}

With this code, we can see something like this in the database:

WhiteSpace              15693
Identifier               3571
Variable                 1976
Plus                     1667
UnicodeStringLiteral     1192
Dot                      1085
Semicolon                1051
Comma                     957
LeftParenthesis           937
RightParenthesis          937

We can see the variable names used with this code:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => 
		  new { ProcName = n, ProcText = GetProcText(con, n) }).ToList();
		var procTrees = procTexts.Select(p =>
		{
			var processed = ParseSql(p.ProcText);
			return new { p.ProcName, processed.sqlTree, processed.errors };
		});
		var procVariables = procTrees
		  .SelectMany(p => p.sqlTree.ScriptTokenStream)
		  .Where(t => t.TokenType == TSqlTokenType.Variable)
		  .GroupBy(t => t.Text)
		  .Select(g => g.Key)
		  .OrderBy(t => t);
	}
}

If you run this code, you will see that, although there are 1976 tokens of type Variable, there are only 76 different names used.One thing that called me my attention is that there are two similar variables used, @Crlf and @CrLf. We can use this parsing to know in which procedures these variables are used. Note that this is different than a text search: a text search could find the variables inside a comment or in a string. In this case, we will only find the real SQL variables:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => new { ProcName = n, ProcText = GetProcText(con, n) }).ToList();
		var procTrees = procTexts.Select(p =>
		{
			var processed = ParseSql(p.ProcText);
			return new { p.ProcName, processed.sqlTree, processed.errors };
		});
		Func<TSqlParserToken, bool> queryVariables = t => t.TokenType == TSqlTokenType.Variable
			  && (t.Text == "@CrLf" || t.Text == "@Crlf");
		var variablesFound = procTrees
		  .Where(p => p.sqlTree.ScriptTokenStream.Any(queryVariables))
		   .Select(t => new { t.ProcName, FoundVariables = t.sqlTree.ScriptTokenStream.Where(queryVariables)});
	}
}

If you run this code, you will see that most of the variables is written as @CrLf, while some are written as @Crlf. We will change all variables to @CrLf with this code:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => new { ProcName = n, ProcText = GetProcText(con, n) }).ToList();
		var procTrees = procTexts.Select(p =>
		{
			var processed = ParseSql(p.ProcText);
			return new { p.ProcName, processed.sqlTree, processed.errors };
		});
		var variablesFound = procTrees
		  .Where(p => p.sqlTree.ScriptTokenStream.Any(queryVariables))
		   .Select(t => new { t.ProcName, FoundVariables = t.sqlTree.ScriptTokenStream.Where(queryVariables)});
		foreach (var procName in variablesFound.Select(p => p.ProcName))
		{
			var sqlTree = procTrees.First(p => p.ProcName == procName).sqlTree;
			foreach (var token in sqlTree.ScriptTokenStream.Where(t => t.Text == "@Crlf"))
			{
				token.Text = "@CrLf";
			}
			var changedSql = string.Join("", sqlTree.ScriptTokenStream.Select(t => t.Text).ToArray());
		}

	}
}

ChangedSql has the changed SQL, that can be applied to the database, making the variable names uniform.

Conclusions

As you can see, you can do a lot with this parser, like have statistics, change variable names and so on. But still we haven’t scratched the surface of what can be done with the parser. We will see more in the next article. See you then!

Sending emails in your app is a normal thing – after some operation, you must send confirmation data to the user, so he can keep the record of it. All the infrastructure for that is available in  .NET, in the System.Net.Mail namespace, there is not much secret in that.

But sometimes things can be more difficult than that – there may be many templates for the email, depending on the kind of email you want to send, they are HTML based and the data in it must come from a different source, like a database.

Sending HTML emails

Sending an HTML email in .NET is fairly easy: you just have to create a MailMessage object, set its From, To, Subject and Body fields, set the IsBodyHtml property to true and send the message using an SmtpClient object:

static void Main(string[] args)
{
    var mail = new MailMessage
    {
        From = new MailAddress("sender@gmail.com"),
        Subject = "Test Mail",
        Body = @" ... ",
        IsBodyHtml = true
    };
    mail.To.Add("email@server.com");
    var client = new SmtpClient("smtp.gmail.com")
    {
        Port = 587,
        Credentials = new System.Net.NetworkCredential(
             username, password),
                EnableSsl = true
    };
 
    client.Send(mail);
 }

Templated emails

The previous code sends a fixed email, but sometimes we have to send a templated email, where we have a template and must fill it with some variables that come from a different source. The variable parts can come embedded in many ways, like between special characters, but I’ll choose the Mustache (http://mustache.github.io/) way: using double braces before and after every variable, like this {{variable}}.

We now need to read the template, find the variables and replace them with the data, creating an HTML string that will be assigned to the Body property of the email:

static void Main(string[] args)
{
    var template = File.ReadAllText("EmailTemplate.html");
    var data = JsonConvert.DeserializeObject<Dictionary>(
        File.ReadAllText("EmailData.json"));
    data.Add("date",DateTime.Now.ToShortDateString());
    var emailBody = ProcessTemplate(template, data);
    var mail = new MailMessage
    {
        From = new MailAddress("sender@gmail.com"),
        Subject = "Test Mail",
        Body = emailBody,
        IsBodyHtml = true
    };
    mail.To.Add("email@server.com");
    var client = new SmtpClient("smtp.gmail.com")
    {
        Port = 587,
        Credentials = new System.Net.NetworkCredential(
           username, password),         
        EnableSsl = true
    };

    client.Send(mail);
}

As we are using a dictionary with the variables, I’ve added a new variable that doesn’t come from the file: the current date. This method is very flexible and you can add data that comes from many sources.

The ProcessTemplate method uses a simple Regex replace to replace the found variables with the data in the dictionary:

private static string ProcessTemplate(string template, 
    Dictionary data)
{
    return Regex.Replace(template, "\\{\\{(.*?)\\}\\}", m =>
        m.Groups.Count > 1 && data.ContainsKey(m.Groups[1].Value) ? 
            data[m.Groups[1].Value] : m.Value);
}

Improving the processing

The processing we’ve chosen is very simple and we can improve it. We can use a library that processes the Mustache variables, like Nustache (https://github.com/jdiamond/Nustache). To use it, we just have to add the Nustache Nuget package. With it installed, the ProcessTemplate method becomes:

private static string ProcessTemplate(string template, 
    Dictionary data)
{
    return Render.StringToString(template, data);
}

This is easier and doesn’t rely in the Regex processing. You can also use some helper functions in the processing. For example, if your data file has the first and last name and you want the full name in the replaced template, you can define a function like this:

private static void FullName(RenderContext context, 
    IList arguments, IDictionary options, 
    RenderBlock fn, RenderBlock inverse)
{
    if (arguments?.Count >= 2) 
        context.Write($"{arguments[0]} {arguments[1]}");
}

And register it like this:

Helpers.Register("FullName", FullName);

Easy, no? Now we have a simple way to create a templated email with data coming from different sources. We’ve used Nustache to process the template, but you could use something else like Handlebars.net (https://github.com/rexm/Handlebars.Net), which would be very similar to what we did here.

The full source code for the project is in https://github.com/bsonnino/HtmlEmail

On last post, I’ve shown how to publish a .NET app to the store. This is quite easy with Visual Studio, but the app must be a .NET app and it must be ported to .NET 4.6.1 or later. Many times, this is not possible – you have a .NET app that targets an older version of the framework and cannot be updated or you have a Win32 app (have I heard VB6 or Delphi?) that you want to publish to the store. Sometimes, even the source code is lost and all you have is the installer or executable. Even with this case there is no problem: you can still publish it to the store, the only thing is that you won’t be able to use Visual Studio.

To demonstrate it, I will use an old Delphi game sample, Swat!, which source code is in the Delphi samples since the earlier versions of Delphi (I think it’s there since Delphi 1, in 1994).

In this game, you must kill the ants with a hammer, this is an old game and I used it as an example of what you can convert to the Store.

The first step is to download the Desktop app converter from the store

Once you have it installed, you should run it with admin rights: in the start menu, right click in its icon and select “More/Run as administrator”. That will open a console window where you will run the converter app.

This window shows some examples of command line for packaging your app, but the first example must always be run once to setup the environment. You must download a base image for your system and run the setup. The base image can be downloaded from here, it must be compatible with your system version. You can check the system version by pressing Win+R and running winver:

My Windows version is 16299. so I have to donwload the base image for that version. It will be downloaded in my downloads folder, with the name “Windows_InsiderPreview_DAC_16299.wim”. So, the first step is to run the command line:

DesktopAppConverter.exe -Setup -BaseImage "E:\Downloads\Windows_InsiderPreview_DAC_16299.wim"

You may have an error after running this command line. This is because the Containers feature must be enabled in your system. Note: you must have Windows 10 Pro or Enterprise to enable this feature – if you have Windows 10 Home, you won’t be able to use the packager:

The converter will add this feature, but you must restart your machine. When you restart it, the Desktop App Converter will reopen and re-run the setup. You are then ready to package your app. The program we are running doesn’t have an installer, it is a single file. If you have an installer, you can point DesktopAppConverter to it, but in this case, we just need to create a folder and put all the files we need there (in this case, just the executable) and another folder for the output. Note: don’t create an output folder as a subfolder from the install folder – I did that and got an error “Path too long” – this is because the output folder is added as an install folder and this is recursive – newbie mistake. Then we must run this command line:

DesktopAppConverter.exe -Installer D:\Temp\Swat\ -AppExecutable Swat.exe -Destination D:\Temp\SwatUWP -PackageName "SwatUWP" -Publisher "CN=RevolutionSoft" -Version 0.0.0.1 -MakeAppx -Sign -Verbose -Verify

This will add all the files in D:\Temp\Swat and create the output file in D:\Temp\SwatUWP, creating and signing an appx file. You can go to the output folder and double click the Appx file. It will open the installer for the app:

If it’s the first time you are installing the app, you will get an error, because the certificate is not installed. You can install it by going to the output folder and double clicking the cer file. That will open the install certificate window:

You must click on the Install Certificate button and select Local Machine as the store location. Then click on Next and then in Place all certificates in the following store, clicking on Browse and selecting Trusted People. The certificate will be installed and you can click again in the Appx file to install it. Now you have your app running as a UWP app.

If you have an installer (it can be an MSI or an EXE installer) that can be run unattended, you can use it to convert your app. In this case, all the files in the installer will be added to the package.

Now it’s time to customize the package, but this is something for another post.