You need to create customized reports based on Sharepoint data and you don’t have the access to the server to create a new web part to access it directly. You need to resort to other options to generate the report. Fortunately, Sharepoint allows some methods to access its data, so you can get the information you need. If you are using C#, you can use one of these three methods:

  • Client Side Object Model (CSOM) – this is a set of APIs that allow access to the Sharepoint data and allow you to maintain lists and documents in Sharepoint
  • REST API – with this API you can access Sharepoint data, not only using C#, but with any other platform that can perform and process REST requests, like web or mobile apps
  • SOAP Web Services – although this kind of access is being deprecated, there are a lot of programs that depend on it, so it’s being actively used until now. You can use this API with any platform that can process SOAP web services.

This article will show how to use these three APIs to access lists and documents from a Sharepoint server and put them in a WPF program, with an extra touch: the user will be able to export the data to Excel, for further manipulation.

To start, let’s create a WPF project in Visual Studio and name it SharepointAccess. We will use the MVVM pattern to develop it, so right click the References node in the Solution Explorer and select Manage NuGet Packages and add the MVVM Light package. That will add a ViewModel folder, with two files in it to your project. The next step is to make a correction in the ViewModelLocator.cs file. If you open it, you will see an error in the line

using Microsoft.Practices.ServiceLocation;

Just replace this using clause with

using CommonServiceLocator;

The next step is to link the DataContext of MainWindow to the MainViewModel, like it’s described in the header of ViewModelLocator:

<Window x:Class="SharepointAccess.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        xmlns:local="clr-namespace:SharepointAccess"
        mc:Ignorable="d"
        Title="Sharepoint Access" Height="700" Width="900"
        DataContext="{Binding Source={StaticResource Locator}, Path=Main}">

Then, let’s add the UI to MainWindow.xaml:

<Grid>
    <Grid.Resources>
        <DataTemplate x:Key="DocumentsListTemplate">
            <StackPanel>
                <TextBlock Text="{Binding Title}" />
            </StackPanel>
        </DataTemplate>
        <DataTemplate x:Key="DocumentTemplate">
            <StackPanel Margin="0,5">
                <TextBlock Text="{Binding Title}" />
                <TextBlock Text="{Binding Url}" />

            </StackPanel>
        </DataTemplate>
        <DataTemplate x:Key="FieldsListTemplate">
            <Grid >
                <Grid.ColumnDefinitions>
                    <ColumnDefinition Width="150" />
                    <ColumnDefinition Width="*" />
                </Grid.ColumnDefinitions>
                <TextBlock Text="{Binding Key}" TextTrimming="CharacterEllipsis"/>
                <TextBlock Text="{Binding Value}" Grid.Column="1" TextTrimming="CharacterEllipsis"/>
            </Grid>
        </DataTemplate>
    </Grid.Resources>
    <Grid.ColumnDefinitions>
        <ColumnDefinition Width="*" />
        <ColumnDefinition Width="*" />
        <ColumnDefinition Width="*" />
    </Grid.ColumnDefinitions>
    <Grid.RowDefinitions>
        <RowDefinition Height="40" />
        <RowDefinition Height="*" />
        <RowDefinition Height="40" />
    </Grid.RowDefinitions>
    <StackPanel Grid.Row="0" Grid.ColumnSpan="3" HorizontalAlignment="Stretch" Orientation="Horizontal">
        <TextBox Text="{Binding Address}" Width="400" Margin="5" HorizontalAlignment="Left"
                 VerticalContentAlignment="Center"/>
        <Button Content="Go" Command="{Binding GoCommand}" Width="65" Margin="5" HorizontalAlignment="Left"/>
    </StackPanel>
    <StackPanel Orientation="Horizontal"  Grid.Row="0" Grid.Column="3" 
                      HorizontalAlignment="Right" Margin="5,5,10,5">
        <RadioButton Content=".NET Api" IsChecked="True" GroupName="ApiGroup" Margin="5,0"
                     Command="{Binding ApiSelectCommand}" CommandParameter="NetApi" />
        <RadioButton Content="REST" GroupName="ApiGroup"
                     Command="{Binding ApiSelectCommand}" CommandParameter="Rest" Margin="5,0"/>
        <RadioButton Content="SOAP"  GroupName="ApiGroup"
                     Command="{Binding ApiSelectCommand}" CommandParameter="Soap" Margin="5,0"/>
    </StackPanel>
    <ListBox Grid.Column="0" Grid.Row="1" ItemsSource="{Binding DocumentsLists}" 
             ItemTemplate="{StaticResource DocumentsListTemplate}"
             SelectedItem="{Binding SelectedList}"/>
    <ListBox Grid.Column="1" Grid.Row="1" ItemsSource="{Binding Documents}"
             ItemTemplate="{StaticResource DocumentTemplate}"
             SelectedItem="{Binding SelectedDocument}"/>
    <ListBox Grid.Column="2" Grid.Row="1" ItemsSource="{Binding Fields}"
             ItemTemplate="{StaticResource FieldsListTemplate}"
             />
    <TextBlock Text="{Binding ListTiming}" VerticalAlignment="Center" Margin="5" Grid.Row="2" Grid.Column="0" />
    <TextBlock Text="{Binding ItemTiming}" VerticalAlignment="Center" Margin="5" Grid.Row="2" Grid.Column="1" />
  </Grid>

The first line in the grid will have a text box to enter the address of the Sharepoint site, a button to go to the address and three radiobuttons to select the kind of access you want.

The main part of the window will have three listboxes to show the lists on the selected site, the documents in each list and the properties of the document.

As you can see, we’ve used data binding to fill the properties of the UI controls. We’re using the MVVM pattern and all these propperties should be bound to properties in the ViewModel. The buttons and radiobuttons have their Command properties bound to a property in the ViewModel, so we don`t have to add code to the code behind file. We’ve also used templates for the items in the listboxes, so the data is properly presented.

The last line will show the timings for getting the data.

If you run the program, it will run without errors and you will get an UI like this, that doesn’t do anything:

 

It’s time to add the properties in the MainViewModel to achieve the functionality we want. Before that, we’ll add two classes to manipulate the data that comes from the Sharepoint server, no matter the access we are using. Create a new folder named Model, and add these two files, Document.cs and DocumentsList.cs:

public class Document
{
    public Document(string id, string title, 
        Dictionary<string, object> fields,
        string url)
    {
        Id = id;
        Title = title;
        Fields = fields;
        Url = url;
    }
    public string Id { get; }
    public string Title { get; }
    public string Url { get; }
    public Dictionary<string, object> Fields { get; }
}<span id="mce_marker" data-mce-type="bookmark" data-mce-fragment="1">​</span>
public class DocumentsList
{
    public DocumentsList(string title, string description)
    {
        Title = title;
        Description = description;
    }

    public string Title { get; }
    public string Description { get; }
}

These are very simple classes, that will store the data that comes from the Sharepoint server. The next step is to get the data from the server. For that we will use a repository that gets the data and returns it using these two classes. Create a new folder, named Repository and add a new interface, named IListRepository:

public interface IListRepository
{
    Task<List<Document>> GetDocumentsFromListAsync(string title);
    Task<List<DocumentsList>> GetListsAsync();
}

This interface declares two methods, GetListsAsync and GetDocumentsFromListAsync. These are asynchronous methods because we don want them to block the UI while they are being called. Now, its time to create the first access to Sharepoint, using the CSOM API. For that, you must add the NuGet package Microsoft.SharePointOnline.CSOM . This package will provide all the APIs to access Sharepoint data. We can now create our first repository. In the Repository folder, add a new class and name it CsomListRepository.cs:

public class CsomListRepository : IListRepository
{
    private string _sharepointSite;

    public CsomListRepository(string sharepointSite)
    {
        _sharepointSite = sharepointSite;
    }

    public Task<List<DocumentsList>> GetListsAsync()
    {
        return Task.Run(() =>
        {
            using (var context = new ClientContext(_sharepointSite))
            {
                var web = context.Web;
                
                var query = web.Lists.Include(l => l.Title, l => l.Description)
                     .Where(l => !l.Hidden && l.ItemCount > 0);

                var lists = context.LoadQuery(query);
                context.ExecuteQuery();

                return lists.Select(l => new DocumentsList(l.Title, l.Description)).ToList();
            }
        });
    }

    public Task<List<Document>> GetDocumentsFromListAsync(string listTitle)
    {
        return Task.Run(() =>
        {
            using (var context = new ClientContext(_sharepointSite))
            {
                var web = context.Web;
                var list = web.Lists.GetByTitle(listTitle);
                var query = new CamlQuery();

                query.ViewXml = "<View />";
                var items = list.GetItems(query);
                context.Load(list,
                    l => l.Title);
                context.Load(items, l => l.IncludeWithDefaultProperties(
                    i => i.Folder, i => i.File, i => i.DisplayName));
                context.ExecuteQuery();

                return items
                    .Where(i => i["Title"] != null)
                    .Select(i => new Document(i["ID"].ToString(), 
                    i["Title"].ToString(), i.FieldValues, i["FileRef"].ToString()))
                    .ToList();
            }
        });
    }
}

For accessing the Sharepoint data, we have to create a ClientContext, passing the Sharepoint site to access. Then, we get a reference to the Web property of the context and then we do a query for the lists that aren’t hidden and that have any items in them. The query should return with the titles and description of the lists in the website. To get the documents from a list we use a similar way: create a context, then a query and load the items of a list.

We can call this code in the creation of the ViewModel:

private IListRepository _listRepository;

public MainViewModel()
{
    Address = ConfigurationManager.AppSettings["WebSite"];
    GoToAddress();
}

This code will get the initial web site url from the configuration file for the app and call GoToAddress:

private async void GoToAddress()
{
    var sw = new Stopwatch();
    sw.Start();
    _listRepository = new CsomListRepository(Address);

    DocumentsLists = await _listRepository.GetListsAsync();
    ListTiming = $"Time to get lists: {sw.ElapsedMilliseconds}";
    ItemTiming = "";
}

The method calls the GetListsAsync method of the repository to get the lists and sets the DocumentsLists property with the result.

We must also declare some properties that will be used to bind to the control properties in the UI:

public List<DocumentsList> DocumentsLists
{
    get => _documentsLists;
    set
    {
        _documentsLists = value;
        RaisePropertyChanged();
    }
}
public string ListTiming
{
    get => _listTiming;
    set
    {
        _listTiming = value;
        RaisePropertyChanged();
    }
}
public string ItemTiming
{
    get => itemTiming;
    set
    {
        itemTiming = value;
        RaisePropertyChanged();
    }
}
public string Address
{
    get => address;
    set
    {
        address = value;
        RaisePropertyChanged();
    }
}

These properties will trigger the PropertyChanged event handler when they are modified, so the UI can be notified of their change.

If you run this code, you will have something like this:

Then, we must get the documents when we select a list. This is done when the SelectedList property changes:

public DocumentsList SelectedList
{
    get => _selectedList;
    set
    {
        if (_selectedList == value)
            return;
        _selectedList = value;
        GetDocumentsForList(_selectedList);
        RaisePropertyChanged();
    }
}

GetDocumentsForList is:

private async void GetDocumentsForList(DocumentsList list)
{
    var sw = new Stopwatch();
    sw.Start();
    if (list != null)
    {
        Documents = await _listRepository.GetDocumentsFromListAsync(list.Title);
        ItemTiming = $"Time to get items: {sw.ElapsedMilliseconds}";
    }
    else
    {
        Documents = null;
        ItemTiming = "";
    }
}

You have to declare the Documents and the Fields properties:

public List<Document> Documents
{
    get => documents;
    set
    {
        documents = value;
        RaisePropertyChanged();
    }
}

public Dictionary<string, object> Fields => _selectedDocument?.Fields;

One other change that must be made is to create a property named SelectedDocument that will be bound to the second list. When the user selects a document, it will fill the third list with the document’s properties:

public Document SelectedDocument
{
    get => _selectedDocument;
    set
    {
        if (_selectedDocument == value)
            return;
        _selectedDocument = value;
        RaisePropertyChanged();
        RaisePropertyChanged("Fields");
    }
}

Now, when you click on a list, you get all documents in it. Clicking on a document, opens its properties:

Everything works with the CSOM access, so it’s time to add the other two ways to access the data: REST and SOAP. We will implement these by creating two new repositories that will be selected at runtime.

To get items using the REST API, we must do HTTP calls to http://<website>/_api/Lists and  http://<website>/_api/Lists/GetByTitle(‘title’)/Items. In the Repository folder, create a new class and name it RestListRepository. This class should implement the IListRepository interface:

public class RestListRepository : IListRepository
{
    public Task<List<Document>> GetDocumentsFromListAsync(string title)
    {
        throw new NotImplementedException();
    }

    public Task<List<DocumentsList>> GetListsAsync()
    {
        throw new NotImplementedException();
    }
}

GetListsAsync will be:

private XNamespace ns = "http://www.w3.org/2005/Atom";
public Task<List<DocumentsList>> GetListsAsync()
{
    var doc = await GetResponseDocumentAsync(_sharepointSite + "Lists");
    if (doc == null)
        return null;

    var entries = doc.Element(ns + "feed").Descendants(ns + "entry");
    return entries.Select(GetDocumentsListFromElement)
        .Where(d => !string.IsNullOrEmpty(d.Title)).ToList();
}

GetResponseDocumentAsync will issue a HTTP Get request, will process the response and will return an XDocument:

public async Task<XDocument> GetResponseDocumentAsync(string url)
{
    var handler = new HttpClientHandler
    {
        UseDefaultCredentials = true
    };
    HttpClient httpClient = new HttpClient(handler);
    var headers = httpClient.DefaultRequestHeaders;
    var header = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " +
        "(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36";
    if (!headers.UserAgent.TryParseAdd(header))
    {
        throw new Exception("Invalid header value: " + header);
    }
    Uri requestUri = new Uri(url);

    try
    {
        var httpResponse = await httpClient.GetAsync(requestUri);
        httpResponse.EnsureSuccessStatusCode();
        var httpResponseBody = await httpResponse.Content.ReadAsStringAsync();
        return XDocument.Parse(httpResponseBody);
    }
    catch
    {
        return null;
    }
}

The response will be a XML String. We could get the response as a Json object if we pass the accept header as application/json.  After the document is parsed, we process all entry elements, retrieving the lists, in GetDocumentsListFromElement:

private XNamespace mns = "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata";
private XNamespace dns = "http://schemas.microsoft.com/ado/2007/08/dataservices";
private DocumentsList GetDocumentsListFromElement(XElement e)
{
    var element = e.Element(ns + "content")?.Element(mns + "properties");
    if (element == null)
        return new DocumentsList("", "");
    bool.TryParse(element.Element(dns + "Hidden")?.Value ?? "true", out bool isHidden);
    int.TryParse(element.Element(dns + "ItemCount")?.Value ?? "0", out int ItemCount);
    return !isHidden && ItemCount > 0 ?
      new DocumentsList(element.Element(dns + "Title")?.Value ?? "",
        element.Element(dns + "Description")?.Value ?? "") :
      new DocumentsList("", "");
}

Here we filter the list by parsing the Hidden and ItemCount properties and returning an empty document if the document is hidden or has no items. GetDocumentsFromListAsync is:

private async Task<Document> GetDocumentFromElementAsync(XElement e)
{
    var element = e.Element(ns + "content")?.Element(mns + "properties");
    if (element == null)
        return new Document("", "", null, "");
    var id = element.Element(dns + "Id")?.Value ?? "";
    var title = element.Element(dns + "Title")?.Value ?? "";
    var description = element.Element(dns + "Description")?.Value ?? "";
    var fields = element.Descendants().ToDictionary(el => el.Name.LocalName, el => (object)el.Value);
    int.TryParse(element.Element(dns + "FileSystemObjectType")?.Value ?? "-1", out int fileType);
    string docUrl = "";

    var url = GetUrlFromTitle(e, fileType == 0 ? "File" : "Folder");
    if (url != null)
    {
        var fileDoc = await GetResponseDocumentAsync(_sharepointSite + url);
        docUrl = fileDoc.Element(ns + "entry")?.
            Element(ns + "content")?.
            Element(mns + "properties")?.
            Element(dns + "ServerRelativeUrl")?.
            Value;
    }

    return new Document(id, title, fields, docUrl);
}

It parses the XML and extracts a document and its properties. GetUrlFromTitle gets the Url from the Title property and is:

private string GetUrlFromTitle(XElement element, string title)
{
    return element.Descendants(ns + "link")
            ?.FirstOrDefault(e1 => e1.Attribute("title")?.Value == title)
            ?.Attribute("href")?.Value;
}

The third access method is using the Soap service that Sharepoint makes available. This access method is listed as deprecated, but it’s still alive. You have to create a reference to the http://<website>/_vti_bin/Lists.asmx and create a WCF client for it. I preferred to create a .NET 2.0 Client instead of a WCF service, as I found easier to authenticate with this service.

In Visual Studio, right-click the References node and select the Add Service Reference. Then, click on the Advanced button and then, Add Web Reference. Put the url in the box and click the arrow button:

When you click the Add Reference button, the reference will be added to the project and it can be used. Create a new class in the Repository folder and name it SoapListRepository. Make the class implement the IListRepository interface. The GetListsAsync method will be:

XNamespace ns = "http://schemas.microsoft.com/sharepoint/soap/";
private Lists _proxy;

public async Task<List<DocumentsList>> GetListsAsync()
{
    var tcs = new TaskCompletionSource<XmlNode>();
    _proxy = new Lists
    {
        Url = _address,
        UseDefaultCredentials = true
    };
    _proxy.GetListCollectionCompleted += ProxyGetListCollectionCompleted;
    _proxy.GetListCollectionAsync(tcs);
    XmlNode response;
    try
    {
        response = await tcs.Task;
    }
    finally
    {
        _proxy.GetListCollectionCompleted -= ProxyGetListCollectionCompleted;
    }

    var list = XElement.Parse(response.OuterXml);
    var result = list?.Descendants(ns + "List")
        ?.Where(e => e.Attribute("Hidden").Value == "False")
        ?.Select(e => new DocumentsList(e.Attribute("Title").Value,
        e.Attribute("Description").Value)).ToList();
    return result;
}

private void ProxyGetListCollectionCompleted(object sender, GetListCollectionCompletedEventArgs e)
{
    var tcs = (TaskCompletionSource<XmlNode>)e.UserState;
    if (e.Cancelled)
    {
        tcs.TrySetCanceled();
    }
    else if (e.Error != null)
    {
        tcs.TrySetException(e.Error);
    }
    else
    {
        tcs.TrySetResult(e.Result);
    }
}

As we are using the .NET 2.0 web service, in order to convert the call to an asynchronous method, we must use a TaskCompletionSource to detect when the call to the service returns. Then we fire the call to the service. When it returns, the completed event is called and sets the TaskCompletionSource to the desired state: cancelled, if the call was cancelled, exception, if there was an error or set the result if the call succeeds. Then, we remove the event handler for the completed event and process the result (a XmlNode), to transform into a list of DocumentsList.

The call to GetDocumentsFromListAsync is very similar to GetListsAsync:

XNamespace rs = "urn:schemas-microsoft-com:rowset";
XNamespace z = "#RowsetSchema";

public async Task<List<Document>> GetDocumentsFromListAsync(string title)
{
    var tcs = new TaskCompletionSource<XmlNode>();
    _proxy = new Lists
    {
        Url = _address,
        UseDefaultCredentials = true
    };
    _proxy.GetListItemsCompleted += ProxyGetListItemsCompleted;
    _proxy.GetListItemsAsync(title, "", null, null, "", null, "", tcs);
    XmlNode response;
    try
    {
        response = await tcs.Task;
    }
    finally
    {
        _proxy.GetListItemsCompleted -= ProxyGetListItemsCompleted;
    }

    var list = XElement.Parse(response.OuterXml);

    var result = list?.Element(rs + "data").Descendants(z + "row")
        ?.Select(e => new Document(e.Attribute("ows_ID")?.Value,
        e.Attribute("ows_LinkFilename")?.Value, AttributesToDictionary(e),
        e.Attribute("ows_FileRef")?.Value)).ToList();
    return result;
}

private Dictionary<string, object> AttributesToDictionary(XElement e)
{
    return e.Attributes().ToDictionary(a => a.Name.ToString().Replace("ows_", ""), a => (object)a.Value);
}

private void ProxyGetListItemsCompleted(object sender, GetListItemsCompletedEventArgs e)
{
    var tcs = (TaskCompletionSource<XmlNode>)e.UserState;
    if (e.Cancelled)
    {
        tcs.TrySetCanceled();
    }
    else if (e.Error != null)
    {
        tcs.TrySetException(e.Error);
    }
    else
    {
        tcs.TrySetResult(e.Result);
    }
}

The main difference is the processing of the response, to get the documents list. Once you have the two methods in place, the only thing to do is select the correct repository in MainViewModel. For that, we create an enum for the API Selection:

public enum ApiSelection
{
    NetApi,
    Rest,
    Soap
};

Then, we need to declare a command bound to the radiobuttons, that will receive a string with the enum value:

public ICommand ApiSelectCommand =>
    _apiSelectCommand ?? (_apiSelectCommand = new RelayCommand<string>(s => SelectApi(s)));

private void SelectApi(string s)
{
    _selectedApi = (ApiSelection)Enum.Parse(typeof(ApiSelection), s, true);
    GoToAddress();
}

The last step is to select the repository in the GoToAddress method:

private async void GoToAddress()
{
    var sw = new Stopwatch();
    sw.Start();
    _listRepository = _selectedApi == ApiSelection.Rest ?
        (IListRepository)new RestListRepository(Address) :
        _selectedApi == ApiSelection.NetApi ?
        (IListRepository)new CsomListRepository(Address) :
        new SoapListRepository(Address);

    DocumentsLists = await _listRepository.GetListsAsync();
    ListTiming = $"Time to get lists: {sw.ElapsedMilliseconds}";
    ItemTiming = "";
}

With the code in place, you can run the app and see the data shown for each API.

One last change to the program is to add a command bound to the Go button, so you can change the address of the web site and get the lists and documents for the new site:

public ICommand GoCommand =>
            _goCommand ?? (_goCommand = new RelayCommand(GoToAddress, () => !string.IsNullOrEmpty(Address)));

This command has an extra touch: it will only enable the button if there is an address in the address box. If it’s empty, the button will be disabled. Now you can run the program, change the address of the website, and get the lists for the new website.

Conclusions

As you can see, we’ve created a WPF program that uses the MVVM pattern and accesses Sharepoint data using three different methods – it even has a time measuring feature, so you can check the performance difference and choose the right one for your case.

The full source code for this project is at https://github.com/bsonnino/SharepointAccess

One book that I recommend the reading is Clean Code, by Robert Martin. It is a well written book with wonderful techniques to create better code and improve your current programs, so they become easier to read, maintain and understand.

While going through it again, I found an excellent opportunity to improve my skills trying to do some refactoring: in listing 4.7 there is a prime generator function that he uses to show some refactoring concepts and turn int listing 4.8. I then thought do do the same and show my results here.

We can start with the listing converted to C#. This is a very easy task. The original program is written in  Java, but converting it to C# is just a matter of one or two small fixes:

using System;

namespace PrimeNumbers
{
/**
* This class Generates prime numbers up to a user specified
* maximum. The algorithm used is the Sieve of Eratosthenes.
* <p>
* Eratosthenes of Cyrene, b. c. 276 BC, Cyrene, Libya --
* d. c. 194, Alexandria. The first man to calculate the
* circumference of the Earth. Also known for working on
* calendars with leap years and ran the library at Alexandria.
* <p>
* The algorithm is quite simple. Given an array of integers
* starting at 2. Cross out all multiples of 2. Find the next
* uncrossed integer, and cross out all of its multiples.
* Repeat untilyou have passed the square root of the maximum
* value.
*
* @author Alphonse
* @version 13 Feb 2002 atp
*/
    public class GeneratePrimes
    {
        /**
        * @param maxValue is the generation limit.
*/
        public static int[] generatePrimes(int maxValue)
        {
            if (maxValue >= 2) // the only valid case
            {
                // declarations
                int s = maxValue + 1; // size of array
                bool[] f = new bool[s];
                int i;

                // initialize array to true.
                for (i = 0; i < s; i++)
                    f[i] = true;
                // get rid of known non-primes
                f[0] = f[1] = false;
                // sieve
                int j;
                for (i = 2; i < Math.Sqrt(s) + 1; i++)
                {
                    if (f[i]) // if i is uncrossed, cross its multiples.
                    {
                        for (j = 2 * i; j < s; j += i)
                            f[j] = false; // multiple is not prime
                    }
                }
                // how many primes are there?
                int count = 0;
                for (i = 0; i < s; i++)
                {
                    if (f[i])
                        count++; // bump count.
                }
                int[] primes = new int[count];
                // move the primes into the result
                for (i = 0, j = 0; i < s; i++)
                {
                    if (f[i]) // if prime
                        primes[j++] = i;
                }
                return primes; // return the primes
            }
            else // maxValue < 2
                return new int[0]; // return null array if bad input.
        }
    }
}

The first step is to put in place some tests, so we can be sure that we are not breaking anything while refactoring the code. In the solution, I added a new Class Library project, named it GeneratePrimes.Tests and added the packages NUnit, NUnit3TestAdapter and FluentAssertions to get fluent assertions in a NUnit test project. Then I added these tests:

using NUnit.Framework;
using FluentAssertions;

namespace PrimeNumbers.Tests
{
    [TestFixture]
    public class GeneratePrimesTests
    {
        [Test]
        public void GeneratePrimes0ReturnsEmptyArray()
        {
            var actual = GeneratePrimes.generatePrimes(0);
            actual.Should().BeEmpty();
        }

        [Test]
        public void GeneratePrimes1ReturnsEmptyArray()
        {
            var actual = GeneratePrimes.generatePrimes(1);
            actual.Should().BeEmpty();
        }

        [Test]
        public void GeneratePrimes2ReturnsArrayWith2()
        {
            var actual = GeneratePrimes.generatePrimes(2);
            actual.Should().BeEquivalentTo(new[] { 2 });
        }

        [Test]
        public void GeneratePrimes10ReturnsArray()
        {
            var actual = GeneratePrimes.generatePrimes(10);
            actual.Should().BeEquivalentTo(new[] { 2,3,5,7 });
        }

        [Test]
        public void GeneratePrimes10000ReturnsArray()
        {
            var actual = GeneratePrimes.generatePrimes(10000);
            actual.Should().HaveCount(1229).And.EndWith(9973);
        }
    }
}

These tests check that there are no primes for 0 and 1, one prime for 2, the primes for 10 are 2, 3, 5, 7 and that there are 1229 primes less than 10,000 and the largest one is 9973. Once we run the tests, we can see that the pass and we can start doing our changes.

The easiest fix we can do is to revise the comments at the beginning. We don’t need the history of Erasthotenes (you can go to Wikipedia for that). We don’t need the author and version, thanks to source control technology :-). We don’t need either the initial comment:

/**
    * This class Generates prime numbers up to a user specified
    * maximum. The algorithm used is the Sieve of Eratosthenes.
    *  https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes   
*/
public class GeneratePrimes
{
    public static int[] generatePrimes(int maxValue)

Then we can invert the initial test, to reduce nesting. If we hover the mouse in the line of the first if, an arrow appears at the border, indicating a quick fix:

We can do the quick fix, then eliminate the else clause (don’t forget to remove the extra comments that are not needed):

public static int[] generatePrimes(int maxValue)
{
    if (maxValue < 2) 
        return new int[0]; 

    // declarations
    int s = maxValue + 1; // size of array
    bool[] f = new bool[s];
    int i;

Save the code and check that all tests pass. The next step is to rename the variables:

  • s can be renamed to sizeOfArray
  • f can be renamed as isPrimeArray

Go to the declaration of s and press Ctrl-R-R to rename and rename it to sizeOfArray. Do the same with the f variable. Don’t forget to remove the comments (and to run the tests):

int sizeOfArray = maxValue + 1; 
bool[] isPrimeArray = new bool[sizeOfArray];
int i;

To go to the next refactorings, we can use the comments as indicators for extracting methods. We can extract the InitializeArray method:

The extracted code isn’t what I expected, so I change it to:

private static bool[] InitializeArray(int sizeOfArray)
{
    bool[] isPrimeArray = new bool[sizeOfArray];
    // initialize array to true.
    for (var i = 0; i < sizeOfArray; i++)
        isPrimeArray[i] = true;
    return isPrimeArray;
}

I can use the code like this:

var isPrimeArray = InitializeArray(sizeOfArray);

After passing the tests, I can refactor the code of InitializeArray to:

private static bool[] InitializeArray(int sizeOfArray)
{
    return Enumerable
        .Range(0, sizeOfArray)
        .Select(n => true)
        .ToArray();
}

The next step is the sieve:

The code for the sieve is really bad:

private static void Sieve(int sizeOfArray, bool[] isPrimeArray, 
    out int i, out int j)
{
    // get rid of known non-primes
    isPrimeArray[0] = isPrimeArray[1] = false;
    for (i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
    {
        if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
        {
            for (j = 2 * i; j < sizeOfArray; j += i)
                isPrimeArray[j] = false; // multiple is not prime
        }
    }
}

It has two out parameters (which, for me, is a code smell), and has an error (the out parameter j must be assigned) before exiting the method. So we can change it to remove the out parameters and remove the sizeOfArray parameter:

private static void Sieve(bool[] isPrimeArray)
{
    var sizeOfArray = isPrimeArray.Length;

    isPrimeArray[0] = isPrimeArray[1] = false;

    for (int i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
    {
        if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
        {
            for (int j = 2 * i; j < sizeOfArray; j += i)
                isPrimeArray[j] = false; 
        }
    }

Then, we can extract the method to count primes:

CountPrimes has the same flaws as Sieve, so we change it to:

private static int CountPrimes(bool[] isPrimeArray)
{
    var sizeOfArray = isPrimeArray.Length;
    var count = 0;
    for (var i = 0; i < sizeOfArray; i++)
    {
        if (isPrimeArray[i])
            count++; 
    }
    return count;
}

We can refactor it to:

private static int CountPrimes(bool[] isPrimeArray) => 
    isPrimeArray.Count(i => i);

The next step is MovePrimes:

After we tweak the MovePrimes code, we get:

private static int[] MovePrimes(bool[] isPrimeArray, int count)
{
    var sizeOfArray = isPrimeArray.Length;
    var primes = new int[count];
    for (int i = 0, j = 0; i < sizeOfArray; i++)
    {
        if (isPrimeArray[i]) // if prime
            primes[j++] = i;
    }
    return primes;
}

Then we can refactor MovePrimes:

 private static int[] MovePrimes(bool[] isPrimeArray, int count) =>
     isPrimeArray
         .Select((p, i) => new { Index = i, IsPrime = p })
         .Where(v => v.IsPrime)
         .Select(v => v.Index)
         .ToArray();

Notice that we aren’t using the primes count in this case, so we can remove the calculation of the count and the parameter. After some cleaning and name changing, we get:

public static int[] GetPrimes(int maxValue)
{
    if (maxValue < 2)
        return new int[0];

    bool[] isPrimeArray = InitializeArray(maxValue);
    Sieve(isPrimeArray);
    return MovePrimes(isPrimeArray);
}

Much cleaner, no? Now, it’s easier to read the method, the details are hidden, but the code still runs the same way. We have a more maintainable method, and it shows clearly what it does.

But there is a change we can do here: we are using static methods only. We can then use extension methods and add the keyword this to allow the methods to be used as extension methods. For example, if we change MovePrimes and Sieve to:

private static int[] MovePrimes(this bool[] isPrimeArray) =>
    isPrimeArray
        .Select((p, i) => new { Index = i, IsPrime = p })
        .Where(v => v.IsPrime)
        .Select(v => v.Index)
        .ToArray();

private static bool[] Sieve(this bool[] isPrimeArray)
{
    var sizeOfArray = isPrimeArray.Length;

    isPrimeArray[0] = isPrimeArray[1] = false;

    for (int i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
    {
        if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
        {
            for (int j = 2 * i; j < sizeOfArray; j += i)
                isPrimeArray[j] = false;
        }
    }
    return isPrimeArray;

We can have the GetPrimes method to be changed to:

public static int[] PrimesSmallerOrEqual(this int maxValue)
{
    if (maxValue < 2)
        return new int[0];

    return maxValue.InitializeArray()
        .Sieve()
        .MovePrimes();
}

Cool, no? With this change, the tests become:

public class GeneratePrimesTests
{
    [Test]
    public void GeneratePrimes0ReturnsEmptyArray()
    {
        0.PrimesSmallerOrEqual().Should().BeEmpty();
    }

    [Test]
    public void GeneratePrimes1ReturnsEmptyArray()
    {
        1.PrimesSmallerOrEqual().Should().BeEmpty();
    }

    [Test]
    public void GeneratePrimes2ReturnsArrayWith2()
    {
        2.PrimesSmallerOrEqual()
            .Should().BeEquivalentTo(new[] { 2 });
    }

    [Test]
    public void GeneratePrimes10ReturnsArray()
    {
        10.PrimesSmallerOrEqual()
            .Should().BeEquivalentTo(new[] { 2, 3, 5, 7 });
    }

    [Test]
    public void GeneratePrimes10000ReturnsArray()
    {
        10000.PrimesSmallerOrEqual()
            .Should().HaveCount(1229).And.EndWith(9973);
    }
}

The full code is at https://github.com/bsonnino/PrimeNumbers. Each commit there is a phase of the refactoring.

Sometimes, when we open an Explorer window in the main computer, we see red bars in some disks, telling us that the disk is almost full and that we need to do some cleanup. We call the system cleanup, that removes some unused space, but this isn’t enough to  make things better.

So, we try to find the duplicate files in the disk to remove some extra space, but we have a problem: where are the duplicate files? The first answer is to check the files with the same name and size, but that isn’t enough – files can be renamed, and still be duplicates.

So, the best thing to do is to find a way to find and list all duplicates in the disk. But how can we do this?

The naive approach is to get all files with the same size and compare them one with the other. But this is really cumbersome, because if there are 100 files in the group, there will be 100!/(2!*98!) = 100*99/2 = 4950 comparisons and has a complexity of O(n^2).

One other approach is to get a checksum of the file and compare checksums. That way, you will still have the O(n^2) complexity, but you’ll have less data to compare (but you will have to compute the time to calculate the checksums). A third approach would be to use a dictionary to group the files with the same hash. The search in a dictionary has a O(1) complexity, so this would do a O(n) complexity.

Now, we only have to choose the checksum. Every checksum has a number of bits and, roughly, the larger the number of bits, the longer it takes to compute it. But the larger number of bits make it more difficult to get wrong results: if you are using CRC16 checksum (16 bits), you will have 65,535 combinations and the probability of two different files have the same checksum is very large. CRC32 allows 2,147,483,647 combinations and, thus, is more difficult to have a wrong result. You can use other algorithms, like MD5 (128 bits), SHA1 (196 bits) or SHA256 (256 bits), but computing these will be way longer than computing the CRC32 bits. As we are not seeking for huge accuracy, but for speed, we’ll use the CRC32 algorithm to compute the hashes. A fast implementation of this algorithm can be found here , and you can use it by installing the CRC32C.NET NuGet package.

From there, we can create our program to find and list the duplicates in the disk. In Visual Studio, create a new WPF application. In the Solution Explorer, right-click on the references node and select the WpfFolderBrowser and Crc32C.NET packages. Then add this code in MainWindow.xaml:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <Button Width="85" Height="30" Content="Start" Click="StartClick"
                HorizontalAlignment="Right" Margin="5" Grid.Row="0"/>
    <Grid Grid.Row="1">
        <Grid.RowDefinitions>
            <RowDefinition Height="*"/>
            <RowDefinition Height="30"/>
        </Grid.RowDefinitions>
        <ScrollViewer HorizontalScrollBarVisibility="Disabled">
        <ItemsControl x:Name="FilesList" HorizontalContentAlignment="Stretch">
            <ItemsControl.ItemTemplate>
                <DataTemplate>
                    <Grid HorizontalAlignment="Stretch">
                        <Grid.RowDefinitions>
                            <RowDefinition Height="30" />
                            <RowDefinition Height="Auto" />
                        </Grid.RowDefinitions>
                        <TextBlock Text="{Binding Value[0].Length, StringFormat=N0}"
                                   Margin="5" FontWeight="Bold"/>
                        <TextBlock Text="{Binding Key, StringFormat=X}"
                                   Margin="5" FontWeight="Bold" HorizontalAlignment="Right"/>
                        <ItemsControl ItemsSource="{Binding Value}" Grid.Row="1" 
                                      HorizontalAlignment="Stretch"
                                      ScrollViewer.HorizontalScrollBarVisibility="Disabled"
                                      Background="Aquamarine">
                            <ItemsControl.ItemTemplate>
                                <DataTemplate>
                                    <TextBlock Text="{Binding FullName}" Margin="15,0"  />
                                </DataTemplate>
                            </ItemsControl.ItemTemplate>
                        </ItemsControl>
                    </Grid>
                </DataTemplate>
            </ItemsControl.ItemTemplate>
        </ItemsControl>
        </ScrollViewer>
        <StackPanel Grid.Row="1" Orientation="Horizontal">
            <TextBlock x:Name="TotalFilesText" Margin="5,0" VerticalAlignment="Center"/>
            <TextBlock x:Name="LengthFilesText" Margin="5,0" VerticalAlignment="Center"/>
        </StackPanel>
    </Grid>
</Grid>

In the button’s click event handler, we will open a Folder browser dialog and, if the user selects a folder, we will process it, enumerating the files and  finding the ones that have the same size. Then, we calculate the Crc32 for these files and add them to a dictionary, grouped by hash:

private async void StartClick(object sender, RoutedEventArgs e)
{
    var fbd = new WPFFolderBrowserDialog();
    if (fbd.ShowDialog() != true)
        return;
    FilesList.ItemsSource = null;
    var selectedPath = fbd.FileName;

    var files = await GetPossibleDuplicatesAsync(selectedPath);
     FilesList.ItemsSource = await GetRealDuplicatesAsync(files);
}

The GetPossibleDuplicatesAsync will enumerate the files and group them by size, returning only the groups that have more than one file:

private async Task<List<IGrouping<long, FileInfo>>> GetPossibleDuplicates(string selectedPath)
{
    List<IGrouping<long, FileInfo>> files = null;
    await Task.Factory.StartNew(() =>
    {
        files = GetFilesInDirectory(selectedPath)
                       .OrderByDescending(f => f.Length)
                         .GroupBy(f => f.Length)
                         .Where(g => g.Count() > 1)
                         .ToList();
    });
    return files;
}

GetFilesInDirectory enumerates the files in the selected directory:

private List<FileInfo> GetFilesInDirectory(string directory)
{
    var files = new List<FileInfo>();
    try
    {
        var directories = Directory.GetDirectories(directory);
        try
        {
            var di = new DirectoryInfo(directory);
            files.AddRange(di.GetFiles("*"));
        }
        catch
        {
        }
        foreach (var dir in directories)
        {
            files.AddRange(GetFilesInDirectory(System.IO.Path.Combine(directory, dir)));
        }
    }
    catch
    {
    }

    return files;
}

After we have the duplicate files grouped, we can search the real duplicates with GetRealDuplicatesAsync:

private static async Task<Dictionary<uint,List<FileInfo>>> GetRealDuplicatesAsync(
    List<IGrouping<long, FileInfo>> files)
{
    var dictFiles = new Dictionary<uint, List<FileInfo>>();
    await Task.Factory.StartNew(() =>
    {
        foreach (var file in files.SelectMany(g => g))
        {
            var hash = GetCrc32FromFile(file.FullName);
            if (hash == 0)
                continue;
            if (dictFiles.ContainsKey(hash))
                dictFiles[hash].Add(file);
            else
                dictFiles.Add(hash, new List<FileInfo>(new[] { file }));
        }
    });
    return dictFiles.Where(p => p.Value.Count > 1).ToDictionary(p => p.Key, p => p.Value);
}

The GetCrc32FromFile method with use the Crc32C library to compute the Crc32 hash from the file. Note that we can’t compute the hash in one pass, by reading the whole file, as this will fail with files with more than 2Gb. So, we read chunks of 10,000 bytes and process them.

public static uint GetCrc32FromFile(string fileName)
{
    try
    {
        using (FileStream file = new FileStream(fileName, FileMode.Open))
        {
            const int NumBytes = 10000;
            var bytes = new byte[NumBytes];
            var numRead = file.Read(bytes, 0, NumBytes);
            if (numRead == 0)
                return 0;
            var crc = Crc32CAlgorithm.Compute(bytes, 0, numRead);
            while (numRead > 0)
            {
                numRead = file.Read(bytes, 0, NumBytes);
                if (numRead > 0)
                    Crc32CAlgorithm.Append(crc, bytes, 0, numRead);
            }
            return crc;
        }
    }
    catch (Exception ex) when (ex is UnauthorizedAccessException || ex is IOException)
    {
        return 0;
    }
}

Now, when you run the app, you will get something like this:

You can then verify the files you want to remove and then go to Explorer and remove them. But there is one thing to do here: the time to compute the hash is very large, especially if you have a lot of data to process (large files, large number of files or both). Could it be improved?

This issue is somewhat complicated to solve. Fortunately, .NET provide us with an excellent tool to improve performance in this case: Parallel programming. By making a small change in the code, you can calculate the CRC of the files in parallel, thus improving the performance. But there is a catch: we are using classes that are not thread safe. If you use the common Dictionary and List to store the data, you will end up with wrong results. But, once again, .NET comes to rescue us: it provides the ConcurrentDictionary and ConcurrentBag to replace the common classes, so we can store the data in a thread safe way. We can then change the code to this:

private static async Task<Dictionary<uint, List<FileInfo>>> GetRealDuplicatesAsync(
    List<IGrouping<long, FileInfo>> files)
{
    var dictFiles = new ConcurrentDictionary<uint, ConcurrentBag<FileInfo>>();
    await Task.Factory.StartNew(() =>
    {
        Parallel.ForEach(files.SelectMany(g => g), file =>
        {
            var hash = GetCrc32FromFile(file.FullName);
            if (hash != 0)
            {
                if (dictFiles.ContainsKey(hash))
                    dictFiles[hash].Add(file);
                else
                    dictFiles.TryAdd(hash, new ConcurrentBag<FileInfo>(new[] { file }));
            }
        });
    });
    return dictFiles.Where(p => p.Value.Count > 1)
        .OrderByDescending(p => p.Value.First().Length)
        .ToDictionary(p => p.Key, p => p.Value.ToList());
}

When we do that and run our program again, we will see that more CPU is used for the processing and the times to get the list come to 46 seconds from 78 seconds (for 18GB of duplicate files).

Conclusions

With this program, we can show the largest duplicates in a folder and see what can be safely deleted in our disk, thus retrieving some space (in our case, we would have potentially got 9Gb extra). We’ve done some optimization in the code by parallelizing the calculations using the parallel extensions in .NET.

The source code for this article is in https://github.com/bsonnino/FindDuplicates

Some time ago I wrote a post about converting a WPF application into .NET Core. One thing that called my attention in this Build 2019 talk was that the performance for file enumerations was enhanced in the .NET core apps. So I decided to check this with my own app and see what happens in my machine.

I added some measuring data in the app, so I could see what happens there:

private async void StartClick(object sender, RoutedEventArgs e)
{
    var fbd = new WPFFolderBrowserDialog();
    if (fbd.ShowDialog() != true)
        return;
    FilesList.ItemsSource = null;
    ExtList.ItemsSource = null;
    ExtSeries.ItemsSource = null;
    AbcList.ItemsSource = null;
    AbcSeries.ItemsSource = null;
    var selectedPath = fbd.FileName;
    Int64 minSize;
    if (!Int64.TryParse(MinSizeBox.Text, out minSize))
        return;
    List<FileInfo> files = null;
    var sw = new Stopwatch();
    var timeStr = "";
    await Task.Factory.StartNew(() =>
    {
       sw.Start();
       files = GetFilesInDirectory(selectedPath).ToList();
       timeStr = $" {sw.ElapsedMilliseconds} for enumeration";
       sw.Restart();
       files = files.Where(f => f.Length >= minSize)
         .OrderByDescending(f => f.Length)
         .ToList();
       timeStr += $" {sw.ElapsedMilliseconds} for ordering and filtering";
    });
    var totalSize = files.Sum(f => f.Length);
    TotalFilesText.Text = $"# Files: {files.Count}";
    LengthFilesText.Text = $"({totalSize:N0} bytes)";
    sw.Restart();
    FilesList.ItemsSource = files;
    var extensions = files.GroupBy(f => f.Extension)
        .Select(g => new { Extension = g.Key, Quantity = g.Count(), Size = g.Sum(f => f.Length) })
        .OrderByDescending(t => t.Size).ToList();
    ExtList.ItemsSource = extensions;
    ExtSeries.ItemsSource = extensions;
    var tmp = 0.0;
    var abcData = files.Select(f =>
    {
        tmp += f.Length;
        return new { f.Name, Percent = tmp / totalSize * 100 };
    }).ToList();
    AbcList.ItemsSource = abcData;
    AbcSeries.ItemsSource = abcData.OrderBy(d => d.Percent).Select((d, i) => new { Item = i, d.Percent });
    timeStr += $"  {sw.ElapsedMilliseconds} to fill data";
    TimesText.Text = timeStr;
}

That way, I could measure two things: the time to enumerate the files and the times to sort, filter and assign the files to the lists. Then, I run the two programs, to see what happened.

The machine I’ve run is a Virtual machine with a Core I5 and 4 virtual processors and a virtualized hard disk, with 12,230 files (93.13 GB of data). The measures may vary on your machine, but the differences should be comparable. To avoid bias, I ran 3 times each program (in Admin mode), then rebooted and run the other one.

Here are the results I’ve got:

Run Enumerate Sort/Filter Assign
.NET
1 137031 96 43
2 58828 56 9
3 59474 55 8
Avg 85111 69 20
.NET Core
1 91105 120 32
2 33422 90 14
3 32907 87 20
Avg 52478 99 22

 

As you can see by the numbers, the .NET Core application improved a lot the times for file enumeration, but still lacks some effort for sorting/filtering and assigning data to the UI lists. But that’s not bad for a platform still in preview!

If you do some testingfor the performance, I’m curious to see what you’ve got, you can put your results and comments in the Comments section.

 

One thing that has been recently announced by Microsoft is the availability of .NET Core 3. With it, you will be able to create WPF and Winforms apps with .NET Core. And one extra bonus is that both WPF and Winforms are being open sourced. You can check these in https://github.com/dotnet/wpf and https://github.com/dotnet/winforms.

The first step to create a .NET Core WPF program is to download the .NET Core 3.0 preview from https://dotnet.microsoft.com/download/dotnet-core/3.0. Once you have it installed, you can check that it was installed correctly by open a Command Line window and typing dotnet –info and seeing the installed version:

:

With that in place, you can change the current folder to a new folder and type

dotnet new wpf
dotnet run

This will create a new .NET Core 3.0 WPF project and will compile and run it. You should get something like this:

If you click on the Exit button, the application exits. If you take a look at the folder, you will see that it generated the WPF project file, App.xaml and App.xaml.cs, MainWindow.xaml and MainWindow.xaml.cs. The easiest way to edit these files is to use Visual Studio Code. Just open Visual Studio Code and go to menu File/Open Folder and open the folder for the project. There you will see the project files and will be able to run and debug your code:

A big difference can be noted in the csproj file. If you open it, you will see something like this:

<Project Sdk="Microsoft.NET.Sdk.WindowsDesktop">

  <PropertyGroup>
    <OutputType>WinExe</OutputType>
    <TargetFramework>netcoreapp3.0</TargetFramework>
    <UseWPF>true</UseWPF>
  </PropertyGroup>

</Project>

That’s very simple and there’s nothing else in the project file. There are some differences between this project and other types of .NET Core, like the console one:

  • The output type is WinExe, and not Exe, in the console app
  • The UseWPF clause is there and it’s set to true

Now, you can modify and run the project inside VS Code. Modify MainWindow.xaml and put this code in it:

<Window x:Class="DotNetCoreWPF.MainWindow" 
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" 
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" 
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008" 
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
    xmlns:local="clr-namespace:DotNetCoreWPF" mc:Ignorable="d" Title="MainWindow" Height="450" Width="800">
    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height="*"/>
            <RowDefinition Height="40"/>
        </Grid.RowDefinitions>
        <Grid>
            <Grid.RowDefinitions>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
                <RowDefinition Height="40"/>
            </Grid.RowDefinitions>
            <Grid.ColumnDefinitions>
                <ColumnDefinition Width="*"/>
                <ColumnDefinition Width="2*"/>
            </Grid.ColumnDefinitions>

            <TextBlock Text="Id"      Grid.Column="0" Grid.Row="0" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="Name"    Grid.Column="0" Grid.Row="1" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="Address" Grid.Column="0" Grid.Row="2" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="City"    Grid.Column="0" Grid.Row="3" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="Email"   Grid.Column="0" Grid.Row="4" Margin="5" VerticalAlignment="Center"/>
            <TextBlock Text="Phone"   Grid.Column="0" Grid.Row="5" Margin="5" VerticalAlignment="Center"/>
            <TextBox Grid.Column="1" Grid.Row="0" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="1" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="2" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="3" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="4" Margin="5"/>
            <TextBox Grid.Column="1" Grid.Row="5" Margin="5"/>
        </Grid>
        <Button Content="Submit" Width="65" Height="35" Grid.Row="1" HorizontalAlignment="Right" VerticalAlignment="Center" Margin="5,0"/>
    </Grid>
</Window>

Now, you can compile and run the app in VS Code with F5, and you will get something like this:

If you don’t want to use Visual Studio Code, you can edit your project in Visual Studio 2019. The first preview still doesn’t have a visual editor for the XAML file, but you can edit the XAML file in the editor, it will work fine.

Porting a WPF project to .NET Core

To port a WPF project to .NET Core, you should run the Portability Analyzer tool first, to see what problems you will find before porting it to .NET Core. This tool can be found here. You can download it and run on your current application, and check what APIs that are not portable.

I will be porting my DiskAnalisys project. This is a simple project, that uses the File.IO functions to enumerate the files in a folder and uses two NuGet packages to add a Folder Browser and Charts to WPF. The first step is to run the portability analysis on it. Run the PortabilityAnalizer app and point it to the folder where the executable is located:

When you click on the Analyze button, it will analyze the executable and generate an Excel spreadsheet with the results:

As you can see, all the code is compatible with .NET Core 3.0. So, let’s port it to .NET Core 3.0. I will show you three ways to do it: creating a new project, updating the .csproj file and using a tool.

Upgrading by Creating a new project

This way needs the most work, but it’s the simpler to fix. Just create a new folder and name it DiskAnalysisCorePrj. Then open a command line window and change the directory to the folder you’ve created. Then, type these commands:

dotnet new wpf
dotnet add package wpffolderbrowser
dotnet add package dotnetprojects.wpf.toolkit
dotnet run

These commands will create the WPF project, add the two required NuGet packages and run the default app. You may see a warning like this:

D:\Documentos\Artigos\Artigos\CSharp\WPFCore\DiskAnalysisCorePrj\DiskAnalysisCorePrj.csproj : warning NU1701: Package 'DotNetProjects.Wpf.Toolkit 5.0.43' was restored using '.NETFramework,Version=v4.6.1' instead of the project target framework '.NETCoreApp,Version=v3.0'. This package may not be fully compatible with your project.
D:\Documentos\Artigos\Artigos\CSharp\WPFCore\DiskAnalysisCorePrj\DiskAnalysisCorePrj.csproj : warning NU1701: Package 'WPFFolderBrowser 1.0.2' was restored using '.NETFramework,Version=v4.6.1' instead of the project target framework '.NETCoreApp,Version=v3.0'. This package may not be fully compatible with your project.
D:\Documentos\Artigos\Artigos\CSharp\WPFCore\DiskAnalysisCorePrj\DiskAnalysisCorePrj.csproj : warning NU1701: Package 'DotNetProjects.Wpf.Toolkit 5.0.43' was restored using '.NETFramework,Version=v4.6.1' instead of the project target framework '.NETCoreApp,Version=v3.0'. This package may not be fully compatible with your project.
D:\Documentos\Artigos\Artigos\CSharp\WPFCore\DiskAnalysisCorePrj\DiskAnalysisCorePrj.csproj : warning NU1701: Package 'WPFFolderBrowser 1.0.2' was restored using '.NETFramework,Version=v4.6.1' instead of the project target framework '.NETCoreApp,Version=v3.0'. This package may not be fully compatible with your project.

This means that the NuGet packages weren’t converted to .NET Core 3.0, but they are still usable (remember, the compatibility report showed 100% compatibility). Then, copy MainWindow.xaml and MainWindow.xaml.cs from the original folder to the new one. We don’t need to copy any other files, as no other files were changed. Then, type

dotnet run

and the program is executed:

Converting by Changing the .csproj file

This way is very simple, just changing the project file, but can be challenging, especially for very large projects. Just create a new folder and name it DiskAnalysisCoreCsp. Copy all files from the main folder of the original project (there’s no need of copying the Properties folder) and edit the .csproj file, changing it to:

<Project Sdk="Microsoft.NET.Sdk.WindowsDesktop">
  <PropertyGroup>
    <OutputType>WinExe</OutputType>
    <TargetFramework>netcoreapp3.0</TargetFramework>
    <UseWPF>true</UseWPF>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="dotnetprojects.wpf.toolkit" Version="5.0.43" />
    <PackageReference Include="wpffolderbrowser" Version="1.0.2" />
  </ItemGroup>
</Project>

Then, type

dotnet run

and the program is executed.

Converting using a tool

The third way is to use a tool to convert the project. You must install the conversion extension created by Brian Lagunas, from here. Then, open your WPF project in Visual Studio, right-click in the project and select “Convert Project to .NET Core 3”.

That’s all. You now have a NET Core 3 app. If you did that in Visual Studio 2017, you won’t be able to open the project, you will need to compile it with dotnet run, or open it in Visual Studio code.

Conclusions

As you can see, although this is the first preview of WPF .NET Core, it has a lot of work done, and you will be able to port most of your WPF projects to .NET Core.

As an MVP, I sometimes receive licenses to software from the vendors for my usage. Some of them become indispensable for me and I feel in the obligation to write a review (yes, it’s a biased review, as I really like the tool and use it on a daily basis :-)) as a way to say thank you!

One of these tools is Linqpad (https://www.linqpad.net/). It’s a simple tool, with a small footprint, but I have used it in so many ways that I find it incredible. There is a free version that has a lot of features to start, but I really recommend the paid version (if you have the $95 to spend, the Premium edition has even a debugger to debug your snippets).

Introduction

Once you open Linqpad, you will see a simple desktop like this:

At first, the name of the tool may indicate that this is a notepad for linq queries, but it’s much more than that! If you take a look at the Samples pane, you can see that there’s even an Interactive Regex Evaluator.

A closer look at that pane shows that you are not tied to C#: you can also use F# there. In fact, there is a full F# tutorial there. If you open the Language combo, you can see that you can use also VB or SQL queries.

My first usages in Linqpad were to learn Linq (the name is Linqpad, no?). At the beginning, Linq seems a little bit daunting, with all those extension methods and lambdas. So, I started to try some Linq queries, making them more difficult as my knowledge was improving. In Linqpad, you have three flavors of code: Expressions, where you have a single expression evaluated; Statements, where you have some statements evaluated and Program, where you can have a full program run in Linqpad (I use this when I want to run a console program and don’t want to open Visual Studio and create a new project).

In the Expression mode, you can enter a single expression, like this:

from i in Enumerable.Range(1,1000)
  where i % 2 == 0
  select i

If you run it, you will see the result in the Results pane:

As you can see, all the results are there, there is no need to open a console window or anything else. And, what’s better, you can export the results to Excel, Word or HTML. You can also use the other Linq format, the functional one:

Enumerable.Range(1,1000).Where(i => i %2 == 0)

After that, you can start tweaking your code and clicking on the Run button and observing the results. If you have the paid version, you also have Intellisense in the code, so you can check the syntax.

For example, to get the sum of the squares of the even numbers, we can do something like this:

If we have something more complicated than a single expression, we can run it using the C# statements. For example, to get all methods and parameters of the methods in the Directory class, we can use these statements:

var methodInfos = typeof(Directory).GetMethods(BindingFlags.Public | 
  BindingFlags.Static);

methodInfos.Select(m => new 
{
  m.Name, 
  Parameters = m.GetParameters() 
}).Dump();

You may have noticed something different in the code above: the Dump method. Linqpad adds this method to dump the values to the results pane. It is very powerful, you don’t need to know the type of the object, all the properties are shown there:

And you are not limited to old C#, you can also use C#7 features and even async programming. For example, this code (based on https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/walkthrough-accessing-the-web-by-using-async-and-await) will download asynchronously some pages from the web and will display their sizes:

async Task Main()
{
	await SumPageSizesAsync().Dump();
}

private async Task<List<string>> SumPageSizesAsync()
{
	var results = new List<string>();
	// Declare an HttpClient object and increase the buffer size. The
	// default buffer size is 65,536.
	HttpClient client =
		new HttpClient() { MaxResponseContentBufferSize = 1000000 };

	// Make a list of web addresses.
	List<string> urlList = SetUpURLList();

	var total = 0;

	foreach (var url in urlList)
	{
		// GetByteArrayAsync returns a task. At completion, the task
		// produces a byte array.
		byte[] urlContents = await client.GetByteArrayAsync(url);

		// The following two lines can replace the previous assignment statement.
		//Task<byte[]> getContentsTask = client.GetByteArrayAsync(url);
		//byte[] urlContents = await getContentsTask;

		results.Add(DisplayResults(url, urlContents));

		// Update the total.
		total += urlContents.Length;
	}

	// Display the total count for all of the websites.
	results.Add(
		$"\r\n\r\nTotal bytes returned:  {total}\r\n");
	return results;
}

private List<string> SetUpURLList()
{
	List<string> urls = new List<string>
			{
				"https://msdn.microsoft.com/library/windows/apps/br211380.aspx",
				"https://msdn.microsoft.com",
				"https://msdn.microsoft.com/library/hh290136.aspx",
				"https://msdn.microsoft.com/library/ee256749.aspx",
				"https://msdn.microsoft.com/library/hh290138.aspx",
				"https://msdn.microsoft.com/library/hh290140.aspx",
				"https://msdn.microsoft.com/library/dd470362.aspx",
				"https://msdn.microsoft.com/library/aa578028.aspx",
				"https://msdn.microsoft.com/library/ms404677.aspx",
				"https://msdn.microsoft.com/library/ff730837.aspx"
			};
	return urls;
}

private string DisplayResults(string url, byte[] content)
{
	// Display the length of each website. The string format
	// is designed to be used with a monospaced font, such as
	// Lucida Console or Global Monospace.
	var bytes = content.Length;
	// Strip off the "https://".
	var displayURL = url.Replace("https://", "");
	return $"\n{displayURL,-58} {bytes,8}";
}

When you run it, you will see something like this:

And you are not tied to the default C# libraries. If you have the Developer or Premium versions, you can download and use NuGet packages in your queries. For example in this previous article, I’ve shown how to use the Microsoft.SqlServer.TransactSql.ScriptDom package to parse your Sql Server code. You don’t even need to open Visual Studio for that. Just put this code in the Linqpad window:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con)
		  .Select(n => new { ProcName = n, Tree = ParseSql(GetProcText(con, n)) })
		  .Dump();
	}
}

private static List<string> GetStoredProcedures(SqlConnection con)
{
	using (SqlCommand sqlCommand = new SqlCommand("select s.name+'.'+p.name as name from sys.procedures p " +
	  "inner join sys.schemas s on p.schema_id = s.schema_id order by name", con))
	{
		using (DataTable procs = new DataTable())
		{
			procs.Load(sqlCommand.ExecuteReader());
			return procs.Rows.OfType<DataRow>().Select(r => r.Field<String>("name")).ToList();
		}
	}
}

private static string GetProcText(SqlConnection con, string procName)
{
	using (SqlCommand sqlCommand = new SqlCommand("sys.sp_helpText", con)
	{
		CommandType = CommandType.StoredProcedure
	})
	{
		sqlCommand.Parameters.AddWithValue("@objname", procName);
		using (var proc = new DataTable())
		{
			try
			{
				proc.Load(sqlCommand.ExecuteReader());
				return string.Join("", proc.Rows.OfType<DataRow>().Select(r => r.Field<string>("Text")));
			}
			catch (SqlException)
			{
				return null;
			}
		}
	}
}

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
	var parser = new TSql150Parser(true);
	using (var textReader = new StringReader(procText))
	{
		var sqlTree = parser.Parse(textReader, out var errors);

		return (sqlTree, errors);
	}
}

You will see some missing references. Just press F4 and it will open the following screen:

Click the Add NuGet button and add the Microsoft.SqlServer.TransactSql.ScriptDom package, then run the program. You will see something like this:

You can even click on the ScriptTokenStream result, to see the list of tokens in the procedure:

You can also simplify the query by using the connections available in Linqpad. Just go to the connections pane, add a new connection and point it to the WorldWideImporters database. Then select the connection in the connections combo and use this code:

void Main()
{
	ExecuteQuery<string>("select s.name+'.'+p.name as name from sys.procedures p " +
	  "inner join sys.schemas s on p.schema_id = s.schema_id order by name")
		  .Select(n => new 
		    { 
			  ProcName = n, 
			  Tree = ParseSql(ExecuteQuery<string>("exec sys.sp_helpText @objname={0}",n).FirstOrDefault()) 
			})
		  .Dump();
}

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
	var parser = new TSql150Parser(true);
	using (var textReader = new StringReader(procText))
	{
		var sqlTree = parser.Parse(textReader, out var errors);

		return (sqlTree, errors);
	}
}

You will see the same results. As you can see, you don’t even need to open the connection and create the command to run it. You can run your queries against your databases the same way you would do with any data. And if you are a SQL guy, you can run your queries directly using the SQL language. And, if you are brave and want to learn F#, you have here a really nice tool to learn.

Conclusions

At first, the size and appearance of Linqpad may fool you, but it’s a very nice tool to work, saving you a lot of time to try and debug your code. If you have some code snipped that you want to test and improve, this is the tool to use. And, one feature that I didn’t mention that’s invaluable when you are optimizing your code is the timing feature. After the execution of each query, Linqpad shows the execution time, so you can know how long did it take to execute it.

Some time ago, I’ve written this post about sending parametrized emails with C#. After some time, the emails weren’t arriving to their destination and I didn’t know why that happened – there was no message and the program was out in production, so I didn’t have an easy way to debug it.

Then, I went to the oracle that knows everything and asked “what is the easiest way to debug a C# program that sends emails in production?”. The oracle answered: “go to this page and you will find the answers you are looking for”. Once again, it was right.

Yes, the post pointed me to the easiest way to add debugging to your app that sends emails. There is not even the need to change the app. Just add a new clause to your app.config file and that’s it. Your app will generate a log file with all the data sent and received. The clause to add is system.diagnostics and you should add the traces you want and send them to a listener:

<system.diagnostics>
  <trace autoflush="true" />
  <sources>
    <source name="System.Net" >
      <listeners>
        <add name="MyTraceFile"/>
      </listeners>
    </source>
    <source name="System.Net.Sockets">
      <listeners>
        <add name="MyTraceFile"/>
      </listeners>
    </source>
  </sources>
  <sharedListeners>
    <add  name="MyTraceFile" type="System.Diagnostics.TextWriterTraceListener" 
          initializeData="System.Net.trace.log"   />
  </sharedListeners>
  <switches>
    <add name="System.Net" value="Verbose" />
    <add name="System.Net.Sockets" value="Verbose" />
  </switches>
</system.diagnostics>

The clause above will trace all calls to System.Net and System.Net.Sockets, with a Verbose level, to a file named System,Net.trace.log.

You can change the level of the tracing to something less verbose, but one note here: if you try to use the Info level, as stated here you will get an error. You should use the Information level. Using the number levels don’t show an error, but don’t generate a log either. So, you should use the named levels.

After that, I could see the logs and notice what has happened:

System.Net Error: 0 : [22720] Exception in SmtpClient#58328727::Send - The SMTP server requires a secure connection or the client was not authenticated. The server response was: 5.5.1 Authentication Required. Learn more at.
   at System.Net.Mail.MailCommand.CheckResponse(SmtpStatusCode statusCode, String response)
   at System.Net.Mail.MailCommand.Send(SmtpConnection conn, Byte[] command, MailAddress from, Boolean allowUnicode)
   at System.Net.Mail.SmtpTransport.SendMail(MailAddress sender, MailAddressCollection recipients, String deliveryNotify, Boolean allowUnicode, SmtpFailedRecipientException& exception)
   at System.Net.Mail.SmtpClient.Send(MailMessage message)
System.Net.Sockets Verbose: 0 : [22720] Entering Socket#24827179::Dispose()
System.Net Error: 0 : [22720] Exception in AppDomain#41560081::UnhandledExceptionHandler - The SMTP server requires a secure connection or the client was not authenticated. The server response was: 5.5.1 Authentication Required. Learn more at.
   at System.Net.Mail.MailCommand.CheckResponse(SmtpStatusCode statusCode, String response)
   at System.Net.Mail.MailCommand.Send(SmtpConnection conn, Byte[] command, MailAddress from, Boolean allowUnicode)
   at System.Net.Mail.SmtpTransport.SendMail(MailAddress sender, MailAddressCollection recipients, String deliveryNotify, Boolean allowUnicode, SmtpFailedRecipientException& exception)
   at System.Net.Mail.SmtpClient.Send(MailMessage message)
   at HtmlEmail.Program.Main(String[] args) in D:\Documentos\Artigos\Artigos\CSharp\HtmlEmail\HtmlEmail\HtmlEmail\Program.cs:line 41

Just a change in the password!

In the last post, I’ve shown how you can use the Microsoft libraries to parse your SQL Server code in C#. You can use the same parser to reformat your code, but that would be a lot of work: parse the code, check all structures and reformat the code according to your needs.

But there is an easier way: the SqlScriptGenerator class. This class makes a breeze to get your unformatted code and reformat it according to your options. In this article, we will develop a WPF program that takes a SQL Code snippet typed in TextBox and reformats it using the options you give.

Create a WPF program and name it SqlReformatter. In the solution explorer, add the NuGet package Microsoft.SqlServer.TransactSql.ScriptDom. In the XAML file for the Main page, add this code:

<Window x:Class="SqlReformatter.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        mc:Ignorable="d"
        Title="MainWindow" Height="700" Width="1024">
    <Grid>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="200"/>
            <ColumnDefinition Width="*"/>
        </Grid.ColumnDefinitions>
        <StackPanel Grid.Column="0" Margin="5" x:Name="stackChecks">
            <CheckBox Content="AlignClauseBodies" Click="OptionClick"/>
            <CheckBox Content="AlignColumnDefinitionFields" Click="OptionClick"/>
            <CheckBox Content="AlignSetClauseItem" Click="OptionClick"/>
            <CheckBox Content="AsKeywordOnOwnLine" Click="OptionClick"/>
            <CheckBox Content="IncludeSemicolons" Click="OptionClick"/>
            <CheckBox Content="IndentSetClause" Click="OptionClick"/>
            <CheckBox Content="IndentViewBody" Click="OptionClick"/>
            <CheckBox Content="MultilineInsertSourcesList" Click="OptionClick"/>
            <CheckBox Content="MultilineInsertTargetsList" Click="OptionClick"/>
            <CheckBox Content="MultilineSelectElementsList" Click="OptionClick"/>
            <CheckBox Content="MultilineSetClauseItems" Click="OptionClick"/>
            <CheckBox Content="MultilineViewColumnsList" Click="OptionClick"/>
            <CheckBox Content="MultilineWherePredicatesList" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeCloseParenthesisInMultilineList" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeFromClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeGroupByClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeHavingClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeJoinClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeOffsetClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeOpenParenthesisInMultilineList" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeOrderByClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeOutputClause" Click="OptionClick"/>
            <CheckBox Content="NewLineBeforeWhereClause" Click="OptionClick"/>
            <StackPanel Margin="0,5">
                <TextBlock Text="KeywordCasing"/>
                <ComboBox SelectionChanged="CaseChanged" x:Name="cbxCase">
                    <ComboBoxItem>LowerCase</ComboBoxItem>
                    <ComboBoxItem>UpperCase</ComboBoxItem>
                    <ComboBoxItem>PascalCase</ComboBoxItem>
                </ComboBox>
            </StackPanel>
            <StackPanel Orientation="Horizontal" Margin="0,5">
                <TextBlock Text="IndentationSize"/>
                <TextBox Width="60" Margin="5,0" TextChanged="IndentChanged" x:Name="txtIdent"/>
            </StackPanel>
        </StackPanel>
        <Button VerticalAlignment="Bottom" HorizontalAlignment="Right" Grid.Column="0"
                Margin="5" Content="Reformat" Click="ReformatSqlClick"/>
        <Grid Grid.Column="1">
            <Grid.RowDefinitions>
                <RowDefinition Height="*"/>
                <RowDefinition Height="*"/>
            </Grid.RowDefinitions>
            <TextBox x:Name="SourceBox" Margin="5" Grid.Row="0"/>
            <TextBox x:Name="DestBox" Margin="5" Grid.Row="1" IsReadOnly="True"/>
        </Grid>
    </Grid>
</Window>

As you can see, we are adding at the right a set of Checkboxes, each one with one property of the SqlScriptOptions class. When we check each box, we will change the reformatting options. To do that, we must use this code:

private void OptionClick(object sender, RoutedEventArgs e)
{
    if (sender is CheckBox checkBox)
    {
        var option = checkBox.Content.ToString();
        PropertyInfo pinfo = typeof(SqlScriptGeneratorOptions).GetProperty(option);
        pinfo?.SetValue(_options, checkBox.IsChecked == true);
    }
}

In this case, we are using reflection to get the property corresponding to the clicked Checkbox and we set its value according to the IsChecked value. We must also add two event handlers, for the case combobox and for the indent textbox:

private void CaseChanged(object sender, SelectionChangedEventArgs e)
{
    var selectedCase = (sender as ComboBox)?.SelectedIndex;
    if (selectedCase != null)
        _options.KeywordCasing = (KeywordCasing) selectedCase;
}

private void IndentChanged(object sender, TextChangedEventArgs e)
{
    if (int.TryParse((sender as TextBox)?.Text, out int size))
        _options.IndentationSize = size;
}

Once we have this set, we need to add the code to the click of the Reformat  button:

private void ReformatSqlClick(object sender, RoutedEventArgs e)
{
    var sqlSrc = SourceBox.Text;
    if (string.IsNullOrWhiteSpace(sqlSrc))
        return;
    var processed = ParseSql(sqlSrc);
    if (processed.errors.Any())
    {
        var sb = new StringBuilder("Errors found:");
        foreach (var error in processed.errors)
        {
            sb.AppendLine($"     Line: {error.Line}  Col: {error.Column}: {error.Message}");
        }
    }
    else
    {
        var scriptGenerator = new Sql150ScriptGenerator(_options);
        scriptGenerator.GenerateScript(processed.sqlTree, out string sqlDst);
        DestBox.Text = sqlDst;
    }
}

We parse the code in the source box, and if there are any errors, we show them in the destination box. If there are no errors, the code is reformatted according to the selected options. The ParseSql method is similar to the one shown in the last article:

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
    var parser = new TSql150Parser(true);
    using (var textReader = new StringReader(procText))
    {
        var sqlTree = parser.Parse(textReader, out var errors);

        return (sqlTree, errors);
    }
}

Now, we only need to initialize the UI in the beginning, so all the options are up-to-date with the SqlScriptGeneratorOptions instance:

public MainWindow()
{
    InitializeComponent();
    _options = new SqlScriptGeneratorOptions();
    txtIdent.Text = _options.IndentationSize.ToString();
    cbxCase.SelectedIndex = (int) _options.KeywordCasing;
    foreach (var child in stackChecks.Children)
    {
        if (child is CheckBox check)
        {
            var checkContent = check.Content.ToString();
            PropertyInfo pinfo = typeof(SqlScriptGeneratorOptions).GetProperty(checkContent);
            check.IsChecked = (bool?)pinfo?.GetValue(_options) == true;
        }

    }
}

This code also uses reflection to get the property values and fill the data in the boxes. When you run this program, you have something like this:

You can change any options and click the Reformat button and the code will be reformatted accordingly.

Conclusions

As you can see, with very little code you can create a SQL Reformatter to reformat your code and make sure that it agrees with your standards. This formatter has many options and can be used also to parse Sql source code.

All the source code for this article is at https://github.com/bsonnino/SqlReformatter

In the last post, I’ve shown how to parse SQL Server code with C# and get all tokens in it, showing their types. This is very nice, there is a lot you can do with that, but there is a pitfall: you don’t have the context the token was used. For example, you have an identifier, but you don’t know if it is a parameter or a variable declared in the procedure. You don’t know its type and where it is used.

Sometimes you need more information about the token and you can’t get it unless you analyze the code. Fortunately, Microsoft has already done that for us and provided a wonderful tool, so we can get the context of the tokens: you can create a new class that inherits from TSqlFragmentVisitor and override its methods, to visit the node types you want (if this doesn’t make sense to you right now, keep on reading, you’ll see it’s very simple).

The TSqlFragmentVisitor class is a massive class that uses the Visitor Pattern to analyze the SQL tree and visit all the nodes, so you can take action on the nodes that come to your attention. If you take a look at it, you will see that there are a huge number of Visit and ExplicitVisit overrrides, one for each kind of node you can have in a SQL Server procedure. Using Visit is similar to ExplicitVisit, with the difference that with ExplicitVisit you can control if the children nodes are also visited.

To use it, you must parse the tree in the same way you did in the last article and then create a visitor instance and pass it in the Accept method. This code shows how this is done:

static void Main(string[] args)
{
    var sql = "Select * from customer";
    var parsed = ParseSql(sql);
    if (parsed.errors.Any())
        return;
    var visitor = new SelectVisitor();
    parsed.sqlTree.Accept(visitor);
    Console.ReadLine();
}

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
    var parser = new TSql150Parser(true);
    using (var textReader = new StringReader(procText))
    {
        var sqlTree = parser.Parse(textReader, out var errors);

        return (sqlTree, errors);
    }
}

The visitor class is something like this:

internal class SelectVisitor : TSqlFragmentVisitor
{
    public override void Visit(SelectStatement node)
    {
        Console.WriteLine($"Visiting Select: {node}");
    }

    public override void Visit(QueryExpression node)
    {
        Console.WriteLine($"Visiting QueryExpression: {node}");
    }

    public override void Visit(QuerySpecification node)
    {
        Console.WriteLine($"Visiting QuerySpecification: {node}");
    }

    public override void Visit(SelectStarExpression node)
    {
        Console.WriteLine($"Visiting SelectStarExpression: {node}");
    }
}

Running this code, you will get something like this:

As you can see, the node and its children are visited and we have no control on that. If you want to have some control, you must use the ExplicitVisit:

internal class SelectVisitor : TSqlFragmentVisitor
{
    public override void ExplicitVisit(SelectStatement node)
    {
        Console.WriteLine($"Visiting Select: {node}");
    }

    public override void Visit(QueryExpression node)
    {
        Console.WriteLine($"Visiting QueryExpression: {node}");
    }

    public override void Visit(QuerySpecification node)
    {
        Console.WriteLine($"Visiting QuerySpecification: {node}");
    }

    public override void Visit(SelectStarExpression node)
    {
        Console.WriteLine($"Visiting SelectStarExpression: {node}");
    }
}

As you can see, the child nodes are not visited. To visit them, you must call the base method, like this:

internal class SelectVisitor : TSqlFragmentVisitor
{
    public override void ExplicitVisit(SelectStatement node)
    {
        Console.WriteLine($"Visiting Select: {node}");
        base.ExplicitVisit(node);
    }

    public override void ExplicitVisit(QueryExpression node)
    {
        Console.WriteLine($"Visiting QueryExpression: {node}");
    }

    public override void ExplicitVisit(QuerySpecification node)
    {
        Console.WriteLine($"Visiting QuerySpecification: {node}");
    }

    public override void ExplicitVisit(SelectStarExpression node)
    {
        Console.WriteLine($"Visiting SelectStarExpression: {node}");
    }
}

Nice, no? With this simple code, you can write something to check if any procedure in your database has a “select *”, which can be flagged as an error. Just override the Visit for the SelectStarExpression and flag the error if it’s visited.

Generating statistics from the database procedures

With this knowledge, we can write a program that generates statistics from the database procedures. We want to know how many inserts, deletes and updates there are in the procedures and what are the tables used with these commands. We also can know how many tables are created and dropped in the procedures.

For that, create a new console application and name it DataStats Add the NuGet package Microsoft.SqlServer.TransactSql.ScriptDom and add this code to Program.cs:

     static void Main(string[] args)
     {
         using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
         {
             con.Open();
             var procTexts = GetStoredProcedures(con)
                 .Select(n => new { ProcName = n, ProcText = GetProcText(con, n) })
                 .ToList();
             var procTrees = procTexts.Select(p =>
             {
                 var processed = ParseSql(p.ProcText);
                 var visitor = new StatsVisitor();
                 if (!processed.errors.Any())
                     processed.sqlTree.Accept(visitor);
                 return new { p.ProcName, processed.sqlTree, processed.errors, visitor };
             }).ToList();
             foreach (var procTree in procTrees)
             {
                 Console.WriteLine(procTree.ProcName);
                 if (procTree.errors.Any())
                 {
                     Console.WriteLine("   Errors found:");
                     foreach (var error in procTree.errors)
                     {
                         Console.WriteLine($"     Line: {error.Line}  Col: {error.Column}: {error.Message}");
                     }
                 }
                 else
                 {
                     var visitor = procTree.visitor;
                     Console.WriteLine($"  Inserts: {visitor.Inserts}");
                     foreach (var table in visitor.InsertTables)
                     {
                         Console.WriteLine($"      {table}");
                     }

                     Console.WriteLine($"  Updates: {visitor.Updates}");
                     foreach (var table in visitor.UpdateTables)
                     {
                         Console.WriteLine($"      {table}");
                     }

                     Console.WriteLine($"  Deletes: {visitor.Deletes}");
                     foreach (var table in visitor.DeleteTables)
                     {
                         Console.WriteLine($"      {table}");
                     }

                     Console.WriteLine($"  Creates: {visitor.Creates}");
                     foreach (var table in visitor.CreateTables)
                     {
                         Console.WriteLine($"      {table}");
                     }

                     Console.WriteLine($"  Drops: {visitor.Drops}");
                     foreach (var table in visitor.DropTables)
                     {
                         Console.WriteLine($"      {table}");
                     }
                 }
             }
         }

         Console.ReadLine();
     }

     private static List<string> GetStoredProcedures(SqlConnection con)
     {
         using (SqlCommand sqlCommand = new SqlCommand(
             "select s.name+'.'+p.name as name from sys.procedures p " + 
             "inner join sys.schemas s on p.schema_id = s.schema_id order by name",
             con))
         {
             using (DataTable procs = new DataTable())
             {
                 procs.Load(sqlCommand.ExecuteReader());
                 return procs.Rows.OfType<DataRow>().Select(r => r.Field<String>("name")).ToList();
             }
         }
     }

     private static string GetProcText(SqlConnection con, string procName)
     {
         using (SqlCommand sqlCommand = new SqlCommand("sys.sp_helpText", con)
         {
             CommandType = CommandType.StoredProcedure
         })
         {
             sqlCommand.Parameters.AddWithValue("@objname", procName);
             using (var proc = new DataTable())
             {
                 try
                 {
                     proc.Load(sqlCommand.ExecuteReader());
                     return string.Join("", proc.Rows.OfType<DataRow>().Select(r => r.Field<string>("Text")));
                 }
                 catch (SqlException)
                 {
                     return null;
                 }
             }
         }
     }

     private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
     {
         var parser = new TSql150Parser(true);
         using (var textReader = new StringReader(procText))
         {
             var sqlTree = parser.Parse(textReader, out var errors);

             return (sqlTree, errors);
         }
     }
 }

This code is very similar to the one in the last post, but it will create a visitor and will call the Accept method to visit its nodes. The visitor’s code is:

internal class StatsVisitor : TSqlFragmentVisitor
{
    public int Inserts { get; private set; }
    public int Updates { get; private set; }
    public int Deletes { get; private set; }
    public int Creates { get; private set; }
    public int Drops { get; private set; }
    public List<string> InsertTables { get; }
    public List<string> UpdateTables { get; }
    public List<string> DeleteTables { get; }
    public List<string> CreateTables { get; }
    public List<string> DropTables { get; }

    public StatsVisitor()
    {
        InsertTables = new List<string>();
        UpdateTables = new List<string>();
        DeleteTables = new List<string>();
        CreateTables = new List<string>();
        DropTables = new List<string>();
    }

    public override void Visit(InsertStatement node)
    {
        Inserts++;
        InsertTables.Add((node.InsertSpecification.Target as NamedTableReference)?.
            SchemaObject.BaseIdentifier.Value);
    }

    public override void Visit(UpdateStatement node)
    {
        Updates++;
        UpdateTables.Add((node.UpdateSpecification.Target as NamedTableReference)?.
            SchemaObject.BaseIdentifier.Value);
    }

    public override void Visit(DeleteStatement node)
    {
        Deletes++;
        UpdateTables.Add((node.DeleteSpecification.Target as NamedTableReference)?.
            SchemaObject.BaseIdentifier.Value);
    }

    public override void Visit(CreateTableStatement node)
    {
        Creates++;
        CreateTables.Add(node.SchemaObjectName.BaseIdentifier.Value);
    }

    public override void Visit(DropTableStatement node)
    {
        Drops++;
        DropTables.AddRange(node.Objects.Select(o => o.BaseIdentifier.Value));
    }

}

When there is an Insert, Delete, Update, Create Table or Drop Table, the corresponding method will be called and the properties referring to the node will be updated. At the end you will have the statistics for all the procedures in the database:

There is just a minor glitch in this code: as you can see in the Inserts, there is a blank line. That is due to the fact that the insert target isn’t a table, but a variable. This code solves the issue:

public override void Visit(InsertStatement node)
{
    Inserts++;
    var targetName = (node.InsertSpecification.Target as NamedTableReference)?.
        SchemaObject.BaseIdentifier.Value ??
      (node.InsertSpecification.Target as VariableTableReference)?.Variable.Name;
    InsertTables.Add(targetName);
}

Conclusions

As you can see, Microsoft has put a lot of work in this parser, you can have a lot of power when working with SQL Server code in C# with this parser. The options are endless: check quality of the code, refactor code, make statistics, and so on. In the next post we will see a simple way to reformat your SQL Server code, so you can ensure that the code follows your standards. See you then.

The code for this article is in https://github.com/bsonnino/SqlParsing

When you are using SQL Server stored procedures, you get a lot of flexibility in your code – changing the code doesn’t need a recompile, deploying the new or updated procedures is as easy as sending and running text scripts, and so on. I won’t enter on the pros and cons of keeping code in the database, as this will open a very long discussion, with no winners.

In this article, I will focus on one aspect of the SQL Server code, the lack of rigid rules for the code. It all starts with being case insensitive, something like “SELECT * from customer” is the same of “select * FROM customer”. It’s not a great deal, but when you have a lot of stored procedures and many people maintaining it, you may need to enforce some standards. In this article I will show you how to use C# to parse SQL Server code and enforce some coding standards in your SQL Server procedures.

Retrieving the stored procedures from the database

The first step for processing the stored procedures is to retrieve them from the database. For this article, we will be using the WorldWideImporters sample database, that can be downloaded from https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0.

To get the names of all procedures in the database, you must use a command like this one:

select s.name+'.'+p.name as name from sys.procedures p 
inner join sys.schemas s on p.schema_id = s.schema_id 
order by name

Once you have the names of all the procedures in the database, you can get their code with the procedure sys.sp_helpText, passing the procedure name as a parameter. This code will get all procedures from the database and store them in a list:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => new {ProcName = n, ProcText = GetProcText(con, n)}).ToList();
	}
}

private static List<string> GetStoredProcedures(SqlConnection con)
{
	using (SqlCommand sqlCommand = new SqlCommand("select s.name+'.'+p.name as name from sys.procedures p " +
	  "inner join sys.schemas s on p.schema_id = s.schema_id order by name", con))
	{
		using (DataTable procs = new DataTable())
		{
			procs.Load(sqlCommand.ExecuteReader());
			return procs.Rows.OfType<DataRow>().Select(r => r.Field<String>("name")).ToList();
		}
	}
}

private static string GetProcText(SqlConnection con, string procName)
{
	using (SqlCommand sqlCommand = new SqlCommand("sys.sp_helpText", con)
	{
		CommandType = CommandType.StoredProcedure
	})
	{
		sqlCommand.Parameters.AddWithValue("@objname", procName);
		using (var proc = new DataTable())
		{
			try
			{
				proc.Load(sqlCommand.ExecuteReader());
				return string.Join("", proc.Rows.OfType<DataRow>().Select(r => r.Field<string>("Text")));
			}
			catch (SqlException)
			{
				return null;
			}
		}
	}
}

This code will open a connection to the database, get all procedure names and then get their text, returning a list of an anonymous class with two members: ProcName and ProcText. You can then process all the procedures in the database.

Parsing the procedures

The next step is to parse the procedures to split into the containing parts. This is not an easy task to make from scratch. The SQL Server language is not easy and evolves with each version of SQL Server. There are many ways to do the same thing, so writing a parser is not something I would recommend. Fortunately, Microsoft has already written a parser for us, and it’s made available via NuGet package. To use it, just add the Microsoft.SqlServer.TransactSql.ScriptDom package to your app. When you add the NuGet package to the app you will see it’s not an official package: this dll is part of a larger package, Microsoft.SqlServer.DacFx.x64 and it’s there only to make the number of dlls added smaller.

Once you add the package, you can use this code to parse the code of the procedure:

private static (TSqlFragment sqlTree, IList<ParseError> errors) ParseSql(string procText)
{
	var parser = new TSql150Parser(true);
	using (var textReader = new StringReader(procText))
	{
		var sqlTree = parser.Parse(textReader, out var errors);

		return (sqlTree, errors);
	}
}

This method uses a new feature of C# 7.0, ValueTuples. If you are using .NET version 4.7 or higher, you don’t need to add anything to use it. If you are using an older version, you must add the System.ValueTuple NuGet package to your project. At the beginning, we instantiate a TSql150Parser class. Depending on the version of SQL server that you want to parse, you can choose a different parser class. Then, we call the Parse method, that will parse the code, return any errors in the errors variable and return a TSqlFragment with the parsed data. Just with this simple code, you already have many benefits: you can parse the procedure for many versions of SQL Server, check if the procedure has errors in it and parse the procedure into its components. The returned tree is of type TSqlScript, that has some interesting properties:

  • Batches – gets the batches in the procedure, so you can process them individually
  • FirstTokenIndex – shows where the fragment begins – for the script, it usually starts at 0, but when you are working with other kinds of fragments, that may not be the case
  • FragmentLength – size of the fragment, in bytes
  • LastTokenIndex – shows where the fragment ends
  • ScriptTokenStream – the list of parsed tokens
  • StartColumn – start column of the fragment
  • StartLine – start line of the fragment
  • Offset – offset in bytes from the start

As you can see, there is already a lot of information there. When you take a look at the ScriptTokenStream, you can see a lot more there:

The SQL code is parsed into tokens and every token is categorized. With this code, you can group the tokens and have brief statistics of your code:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => 
		  new { ProcName = n, ProcText = GetProcText(con, n) }).ToList();
		var procTrees = procTexts.Select(p =>
		{
			var processed = ParseSql(p.ProcText);
			return new { p.ProcName, processed.sqlTree, processed.errors };
		});
		var procStats = procTrees
		  .SelectMany(p => p.sqlTree.ScriptTokenStream)
		  .GroupBy(t => t.TokenType)
		  .Select(g => new { Key = g.Key, Count = g.Count()})
		  .OrderByDescending(g => g.Count);
	}
}

With this code, we can see something like this in the database:

WhiteSpace              15693
Identifier               3571
Variable                 1976
Plus                     1667
UnicodeStringLiteral     1192
Dot                      1085
Semicolon                1051
Comma                     957
LeftParenthesis           937
RightParenthesis          937

We can see the variable names used with this code:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => 
		  new { ProcName = n, ProcText = GetProcText(con, n) }).ToList();
		var procTrees = procTexts.Select(p =>
		{
			var processed = ParseSql(p.ProcText);
			return new { p.ProcName, processed.sqlTree, processed.errors };
		});
		var procVariables = procTrees
		  .SelectMany(p => p.sqlTree.ScriptTokenStream)
		  .Where(t => t.TokenType == TSqlTokenType.Variable)
		  .GroupBy(t => t.Text)
		  .Select(g => g.Key)
		  .OrderBy(t => t);
	}
}

If you run this code, you will see that, although there are 1976 tokens of type Variable, there are only 76 different names used.One thing that called me my attention is that there are two similar variables used, @Crlf and @CrLf. We can use this parsing to know in which procedures these variables are used. Note that this is different than a text search: a text search could find the variables inside a comment or in a string. In this case, we will only find the real SQL variables:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => new { ProcName = n, ProcText = GetProcText(con, n) }).ToList();
		var procTrees = procTexts.Select(p =>
		{
			var processed = ParseSql(p.ProcText);
			return new { p.ProcName, processed.sqlTree, processed.errors };
		});
		Func<TSqlParserToken, bool> queryVariables = t => t.TokenType == TSqlTokenType.Variable
			  && (t.Text == "@CrLf" || t.Text == "@Crlf");
		var variablesFound = procTrees
		  .Where(p => p.sqlTree.ScriptTokenStream.Any(queryVariables))
		   .Select(t => new { t.ProcName, FoundVariables = t.sqlTree.ScriptTokenStream.Where(queryVariables)});
	}
}

If you run this code, you will see that most of the variables is written as @CrLf, while some are written as @Crlf. We will change all variables to @CrLf with this code:

static void Main()
{
	using (var con = new SqlConnection("Server=.;Database=WideWorldImporters;Trusted_Connection=True;"))
	{
		con.Open();
		var procTexts = GetStoredProcedures(con).Select(n => new { ProcName = n, ProcText = GetProcText(con, n) }).ToList();
		var procTrees = procTexts.Select(p =>
		{
			var processed = ParseSql(p.ProcText);
			return new { p.ProcName, processed.sqlTree, processed.errors };
		});
		var variablesFound = procTrees
		  .Where(p => p.sqlTree.ScriptTokenStream.Any(queryVariables))
		   .Select(t => new { t.ProcName, FoundVariables = t.sqlTree.ScriptTokenStream.Where(queryVariables)});
		foreach (var procName in variablesFound.Select(p => p.ProcName))
		{
			var sqlTree = procTrees.First(p => p.ProcName == procName).sqlTree;
			foreach (var token in sqlTree.ScriptTokenStream.Where(t => t.Text == "@Crlf"))
			{
				token.Text = "@CrLf";
			}
			var changedSql = string.Join("", sqlTree.ScriptTokenStream.Select(t => t.Text).ToArray());
		}

	}
}

ChangedSql has the changed SQL, that can be applied to the database, making the variable names uniform.

Conclusions

As you can see, you can do a lot with this parser, like have statistics, change variable names and so on. But still we haven’t scratched the surface of what can be done with the parser. We will see more in the next article. See you then!