Introduction

Learning a new programming language is always hard. You have to read the documentation, stop, go to the IDE of your choice, start a new project, compile and run the program and see the results. Stop. Restart. After some time, you have no more patience for the process and start to skip some steps (maybe I don’t need to run this sample here…) until the point that you try to find an easier way, but most of the time there is no easier way.

Well, now there is an easier way. The dotnet team has created a new tool, called dotnet try that allows you to have documentation mixed with a code sample window, where you can try the code while reading the documentation. Cool, no?

Starting with dotnet try

To use this tool, you must install dotnet core in your machine and then install dotnet try with:

dotnet tool install -g dotnet-try

Once you do that, you can use the tool with dotnet try. But, the better way to start is to use the tool’s own demo with dotnet try demo. This will load the tutorial, create a server and open the browser with the tutorial:

There you can see the documentation side-by-side with the code. You can change the code and click on the arrow key and look at the results. Nice! You can  follow the tutorial and learn how to create your documentation easily. Or you can continue reading and I will tell you how to use it (sorry, no dotnet try here :-)).

If you want to learn something about .NET, you can run the samples. Just clone the dotnet/try-samples repository, change to the cloned folder and type dotnet try to open it.

There you can see some tutorials:

  • Beginner’s guide
  • 101 Linq Samples
  • C# 7
  • C# 8

With that, you can start playing with the tools. I recommend checking the Linq samples, there is always something to learn about Linq, and the new features in C#7 and 8.

Once you’ve played with the tool, it’s the time to create your own documentation.

Creating documentation for your code

All documentation is based on the MarkDown language. It’s a very simple way to create your documentation, If you don’t know how to use the MarkDown language, you can learn it here.

Create a new folder and, in this folder, create a readme.md file and add this code:

# Documenting your code with **dotnet try**
## Introduction
To document your code with dotnet try, you must mix your markdown text with some fences

Once you save it and run dotnet try, you can see something like this in the browser:

Now you can start adding code. To add the code window, we must add a code fence, a block of code delimited by triple backticks. You can add code directly to the code fence with something like this:

using System;

namespace dotnettrypost
{
    class Program
    {
        static void Main(string[] args)
        {
            #region HelloWorld
            Console.WriteLine("Hello World!");
            #endregion
        }
    }
}

This is great, but can pose a problem: if you need to change something in your code, you must also change the markdown file, and this is less than optimum, as at some point you will forget to update both places. But there is a better way: you can use the source code as the only source of truth and you don’t have to update both places – when you change the source code, the code in the page changes accordingly. To do this, you can create a console app with

dotnet new console

This creates a simple project that writes Hello World in the console. It’s pretty simple, but fits our needs. Now, we will document it. You can add the following code at the end of the readme.md file to show the contents of the program in editable form:

Below, you should see the code in the program.cs file:
```cs --source-file ./Program.cs --project ./dotnettrypost.csproj
```

In this code fence, you must add the name of the file and the project. If you run dotnet try again, you will see something like this:

You can click the arrow to run the code, the first time, it will take a little bit to compile the code, but the other times, it will be faster. You can change the code and see the results:

You have here a safe environment, where you can try changes in the code. If you make a mistake, the window will point it, so you can fix it.

As you can see, the code for the entire file is shown, but sometimes that’s not what you want. Sometimes you just want to show a code snippet from the file. To do that, you must create regions in your code. In the command prompt, type code . to open Visual Studio Code (you must have it installed. If you don’t have it installed, go to https://code.visualstudio.com/download). Then add a region to your code:

using System;

namespace dotnettrypost
{
    class Program
    {
        static void Main(string[] args)
        {
            #region HelloWorld
            Console.WriteLine("Hello World!");
            #endregion
        }
    }
}

 

Save the file and then edit the Readme.md file to change the code fence:

```cs --source-file ./Program.cs --project ./dotnettrypost.csproj --region HelloWorld
```

When you save the file and refresh the page, you will see something like this:

Creating regions in your code and referencing them in the code fence allows you to separate the code you want in the code window. That way, a single file can provide code for many code windows.

If you don’t want to have a runnable window, but still synchronized with the code, you can use the editable parameter:

```cs --source-file ./Program.cs --project ./dotnettrypost.csproj --region HelloWorld --editable false
```

When you are running the code, even if you aren’t showing the full code, everything in the program is executed. For example, if you change the source code to:

using System;
namespace dotnettrypost
{
    class Program
    {
        static void Main(string[] args)
        {
            #region HelloWorld
            Console.WriteLine("Hello World!");
            #endregion
            Console.WriteLine("This won't be shown in the snippet but will be shown when you run");
        }
    }
}

You will see the same code snippet but when you run it you will see something like this:

If you don’t want this, you need to do some special treatment in your code: you must make each snippet of code run alone. You can do that by processing the parameters passed to the program:

using System;

namespace dotnettrypost
{
    class Program
    {
        static void Main(string[] args)
        {
            for (var i = 0; i < args.Length; i++)
                if (args[i] == "--region")
                {
                    if (args[i + 1] == "HelloWorld")
                        HelloWorld();
                    else if (args[i + 1] == "ByeBye")
                        ByeBye();
                }
        }

        private static void HelloWorld()
        {
            #region HelloWorld
            Console.WriteLine("Hello World!");
            #endregion
        }

        private static void ByeBye()
        {
            #region ByeBye
            Console.WriteLine("ByeBye!");
            #endregion
        }
    }
}

Now, when you run the code, it will run only the code for the HelloWorld snippet. As you can see there are a lot of options to show and run code in the web page. I see many ways to improve documentation and experimentation of new APIs. And you, don’t this bring you new ideas?

All the source code for this project is at https://github.com/bsonnino/dotnettry

One of the perks of being an MVP is to receive some tools for my own use, so I can evaluate them and if, I find them valuable, I can use them on a daily basis. On my work as an architect/consultant, one task that I often find is to analyse an application and suggest changes to make it more robust and maintainable. One tool that can help me in this task is NDepend (https://www.ndepend.com/). With this tool, you can analyse your code, verify its dependencies, set coding rules and verify if the how code changes are affecting the quality of your application. For this article, we will be analyzing eShopOnWeb (https://github.com/dotnet-architecture/eShopOnWeb), a sample application created by Microsoft to demonstrate architectural patterns on Dotnet Core. It’s an Asp.NET MVC Core 2.2 app that shows a sample shopping app, that can be run on premises or in Azure or in containers, using a Microservices architecture. It has a companion book that describes the application architecture and can be downloaded at https://aka.ms/webappebook.

When you download, compile and run the app, you will see something like this:

You have a full featured Shopping app, with everything you would expect to have in this kind of app. The next step is to start analyzing it.

Download NDepend (you have a 14 day trial to use and evaluate it), install it in your machine, it will install as an Add-In to Visual Studio. The, in Visual Studio, select Extensions/NDepend/Attach new NDepend Project to current VS Solution. A window like this opens:

It has the projects in the solution selected, you can click on Analyze 3 .NET Assemblies. After it runs, it opens a web page with a report of its findings:

This report has an analysis of the project, where you can verify the problems NDepend found, the project dependencies, and drill down in the issues. At the same time, a window like this opens in Visual Studio:

If you want a more dynamic view, you can view the dashboard:

 

In the dashboard, you have a full analysis of your application: lines of code, methods, assemblies and so on. One interesting metric there is the technical debt, where you can see how much technical debt there is in this app and how long will it take to fix this debt (in this case, we have 3.72% of technical debt and 2 days to fix it. We have also the code metrics and violated coding rules. If you click in some item in the dashboard, like the # of Lines, you will see the detail in the properties window:

If we take a look at the dashboard, we’ll see some issues that must be improved. In the Quality Gates, we have two issues that fail. By clicking on the number 2 there, we see this in the Quality Gates evolution window:

If we hover the mouse on one of the failed issues, we get a tooltip that explains the failure:

If we double-click in the failure we drill-down to what caused it:

If we click in the second issue, we see that there are two names used in different classes: Basket and IEmailSender:

Basket is the name of the class in Microsoft.eShopWeb.Web.Pages.Shared.Components.BasketComponent and in Microsoft.eShopWeb.ApplicationCore.Entities.BasketAggregate

One other thing that you can see is the dependency graph:

With it, you can see how the assemblies relate to each other and give a first look on the architecture. If you filter the graph to show only the application assemblies, you have a good overview of what’s happening:

The largest assembly is Web, followed by Infrastructure and ApplicationCore. The application is well layered, there are no cyclic calls between assemblies (Assembly A calling assembly B that calls A), there is a weak relation between Web and Infrastructure (given by the width of the line that joins them) and a strong one between Web and ApplicationCore. If we have a large solution with many assemblies, just with that diagram, we can take a look of what’s happening and if we are doing the things right. The next step is go to the details and look at the assemblies dependencies. You can hover the assembly and get some info about it:

For example, the Web assembly has 58 issues detected and has an estimated time to fix them in 1 day. This is an estimate calculated by NDepend using the complexity of the methods that must be fixed, but you can set your own formula to calculate the technical debt if this isn’t ok for your team.

Now that we got an overview of the project, we can start fixing the issues. Let’s start with the easiest ones :-).  The Infrastructure assembly has only two issues and a debt of 20 minutes. In the diagram, we can right-click on the Infrastructure assembly and select Select Issues/On Me and on Child Code elements. This will open the issues in the Queries and Rules Edit window, at the right:

We can then open the tree and double click on the second issue. It will point you to a rule “Non-static classes should be instantiated or turned to static”, pointing to the SpecificatorEvaluator<T> class. This is a class that has only one static method and is referenced only one time, so there’s no harm to make it static.  Once you make it static, build the app and run the dependency check again, you will see this:

Oh-Oh. We fixed an issue and introduced another one – an API Breaking Change – when we made that class static, we removed the constructor. In this case, it wasn’t really an API change, because nobody would instantiate a class with only static methods, so we should do a restart, here. Go to the Dashboard, in Choose Baseline and select define:

Then select the most recent analysis and click OK.  That will open the NDepend settings, where you will see the new baseline. Save the settings and rerun the analysis and the error is gone.

We can then open the tree again and double click on another issue that remains in Infrastructure. That will open the source code, pointing to a readonly declaration for a DBContext. This is not a big issue, it’s only telling us that we are declaring the variable as readonly, but the object it’s storing is mutable, so it can be changed. There is a mention of this issue in the Framework Design Guidelines, by Microsoft – https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/. If you hover the mouse on the issue, there is a tooltip on how to fix it:

We have three ways to fix this issue:

  • Remove the readonly from the field
  • Make the field private and not protected
  • Use an attribute to say “ok, I am aware of this, but I don’t mind”

The first option will suppress the error, but will remove what we want to do – show that this field should not be entirely replaced with another dbcontext. The second option will remove the possibility to use the dbcontext in derived classes, so I’ll choose the third option and add the attribute. If I right-click on the issue in the Rules and Queries window and select Suppress Issue, a window opens:

All I have to do is to copy the attribute to the clipboard and paste it into the source code. I also have to declare the symbol CODE_ANALYSIS in the project (Project/Properties/Build). That was easy! Let’s go to the next one.

This is an obsolete method used. Fortunately, the description shows us to use the UseHiLo method. We change the method, run the app to see if there’s nothing broken, and we’re good. W can run the analysis again and see what happened:

We had a slight decrease in the technical debt, we solved one high issue and one violated rule. As you can see, NDepend not only analyzes your code, but it also gives you a position on what you are doing with your code changes. This is a very well architected code (as it should be – it’s an architecture sample), so the issues are minor, but you can see what can be done with NDepend. When you have a messy project, this will be surely an invaluable tool!

 

One thing that comes to me sometimes when I am debugging a program is to see a variable that has changed value, but I don’t know where and when it was changed. That leaded me to long and boring sessions of debugging, checking where the variable could have its changed.

I just wanted some kind of breakpoint that I could set that told me “break when this variable changes value”, but Visual Studio never had such a feature. Until now.

With .Net Core 3.0, Microsoft introduced a new feature in Visual Studio that will enhance a lot the debugging experience. The only drawback is that it only works with .Net Core 3.0 apps. But, nevertheless, it’s a great, long waited feature!

This feature is a little buried in Visula Studio (no, there is no right-click in a variable name with something like “break when this variable value changes”), so how do I set a data breakpoint?

To show this feature, I will use a .Net Core console app that gets the prime numbers:

static void Main(string[] args)
{
    if (args.Length < 1)
    {
        Console.WriteLine("usage: GetPrimesLessThan <number>");
        return;
    }

    if (!int.TryParse(args[0], out int number))
    {
        Console.WriteLine("parameter should be a valid integer number");
        return;
    }
    var primes = GetPrimesLessThan(number);
    Console.WriteLine($"Found {primes.Length} primes less than {number}");
    Console.WriteLine($"Last prime last than 10000 is {primes.Last()}");
}
private static int[] GetPrimesLessThan(int maxValue)
{
    if (maxValue <= 1)
        return new int[0];
    ;
    var primeArray = Enumerable.Range(0, maxValue).ToArray();
    var sizeOfArray = primeArray.Length;

    primeArray[0] = primeArray[1] = 0;

    for (int i = 2; i < Math.Sqrt(sizeOfArray - 1) + 1; i++)
    {
        if (primeArray[i] <= 0) continue;
        for (var j = 2 * i; j < sizeOfArray; j += i)
            primeArray[j] = 0;
    }

    return primeArray.Where(n => n > 0).ToArray();
}

This algorithm uses an array of integers that are set to zero if the number is not prime. At the end, all non prime numbers in the array are set to zero, so I can extract the primes with this linq line:

return primeArray.Where(n => n > 0).ToArray();

Let’s say that I want to know when the number 10 changes to 0. If I didn’t have this feature, I would have to follow the code, set breakpoints and analyze the data for every hit. Not anymore.

Here’s what must be done:

  • Set a breakpoint in the line just after the PrimeArray declaration
  • Run the code and let it break.
  • Open the Locals Window and expand the PrimeArray node
  • Right Click on the node with the value 10
  • Select Break when Value Changes

That’s it! You have set a data breakpoint for the value 10 in the array. Now let the code run, when the value changes, the code will break:

There you can analyze your code and see what has changed the variable. Nice, no?

This is a great feature and will enhance a lot my debugging, I’m glad that Visual Studio people could add it. I hope that you’ll enjoy it as much as I do.

The source code for the used project is at https://github.com/bsonnino/DataBreakpoint

Recently, I went to a client, where they assigned me a machine to work, but there was no software installed, and there was no admin rights to install any software. I had to set up my box to start development before all the red tape could be set, licenses assigned, software installed and so on.

So I thought to setup a box where I could install the software as standard user, with no need of any special licenses to get up und running for development in Dotnet Core and React with Typescript (the same steps can be used to develop with Angular or any other Javascript environment).

Initially, I installed Google Chrome (its developer tools are great and you can debug your Javascript apps in it). I went to Chrome site (https://www.google.com/chrome/), downloaded the installer and ran it. When it asked me the admin credentials, I clicked Cancel and the installer offered to install as Standard User.

Then I installed Visual Studio Code. It’s a terrific IDE to edit your programs (in any language) and even debug your apps. There are extensions to do everything, it’s really great. I used a direct link to download the portable version and unzip the zip file and add the path to VSCode with this Powershell script:

Set-ExecutionPolicy Bypass -Scope Process -Force;
$remoteFile = 'https://go.microsoft.com/fwlink/?Linkid=850641';
$downloadFile = $env:Temp+'\vscode.zip';
$vscodePath = $env:LOCALAPPDATA+"\VsCode";

(New-Object Net.WebClient).DownloadFile($remoteFile, $downloadFile);
Expand-Archive $downloadFile -DestinationPath $vscodePath -Force
$env:Path += ";"+$vscodePath
[Environment]::SetEnvironmentVariable
     ("Path", $env:Path, [System.EnvironmentVariableTarget]::User);

The next step is to install Node.js (https://nodejs.org/). Node.js is a JavaScript runtime, where you can run your Javascript Apps. I went to the Node site saw that there is a zip file for Windows (https://nodejs.org/dist/v12.13.0/node-v12.13.0-win-x64.zip). I created this Powershell script to download and install Node LTS:

Set-ExecutionPolicy Bypass -Scope Process -Force;
$remoteFile = 'https://nodejs.org/dist/v12.13.0/node-v12.13.0-win-x64.zip';
$downloadFile = $env:Temp+'\node-v12.13.0-win-x64.zip';
$nodePath = $env:LOCALAPPDATA+"\Node";

(New-Object Net.WebClient).DownloadFile($remoteFile, $downloadFile);
Expand-Archive $downloadFile -DestinationPath $nodePath -Force
$env:Path += ";"+$nodePath
[Environment]::SetEnvironmentVariable
     ("Path", $env:Path, [System.EnvironmentVariableTarget]::User);

You can open a Powershell window and type these commands or you can downlad and run the script file from my GitHub (link below).

The next step is to install Yarn (https://yarnpkg.com/lang/en/), a dependency manager running this command in the command line

npm install yarn -g

The last step is to install Dotnet core. There is no package to install it as a standar user, but there is a Powershell script to install it at https://dot.net/v1/dotnet-install.ps1. If you open a Powershell window and run the commands:

Set-ExecutionPolicy Bypass -Scope Process -Force;
$remoteFile = 'https://dot.net/v1/dotnet-install.ps1';
$downloadFile = 'dotnet-install.ps1';
$dotnetPath = $env:LOCALAPPDATA+"\Microsoft\Dotnet";

(New-Object Net.WebClient).DownloadFile($remoteFile, $downloadFile);
$env:Path += ";"+$dotnetPath
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::User);

Now you are all set, your development machine is ready to go. To create a React app, just open a command line window and type these commands:

yarn create react-app reactapp --typescript
cd reactapp
yarn start

If you point your browser to http://localhost:3000, you will see:

Running Code . will open VS Code with the project folder open:

You can also create an Asp.NET Core MVC with the commands:

dotnet new mvc -o AspNetApp
cd AspNetApp
dotnet run

And open http://localhost:5000 in the browser to see the app running:

As you can see, with a few steps, you can setup a developer machine with no need of admin rights nor the need of any license, you can now start developing your full apps and debug them in VS Code.

That way, I could start and run from day one, and when the full install came, I was already developing.

After finishing the setting, I noticed that all the scripts are very similar, so I created a single script, Install-FromWeb:

[CmdletBinding()]
Param (
  $RemoteFile,
  $DownloadFile,
  [bool]$DoExtractFile = $False,
  [string]$ExecutePath = $null,
  $AddedPath
)
Write-Host $RemoteFile
Write-Host $DownloadFile

Invoke-WebRequest -Uri $RemoteFile -OutFile $DownloadFile
If ($DoExtractFile){
  Expand-Archive $DownloadFile -DestinationPath $AddedPath -Force
}
If (-Not ([string]::IsNullOrEmpty($ExecutePath))){
  & "$ExecutePath"
}  
$env:Path += ";"+$AddedPath
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::User);

When I was pushing the data to my GitHub, I noticed that Git for Windows (https://gitforwindows.org/) wasn’t installed. I checked the Git for Windows site and there is a portable version in https://github.com/git-for-windows/git/releases/download/v2.24.0.windows.2/PortableGit-2.24.0.2-64-bit.7z.exe. With it, you can use Git as a version control system. With that, you can install your machine with a single set of instructions:

Set-ExecutionPolicy Bypass -Scope Process -Force;
.\Install-FromWeb.ps1 -RemoteFile "https://go.microsoft.com/fwlink/?Linkid=850641" -DownloadFile $env:Temp"\vscode.zip" -DoExtractFile $true -AddedPath $env:LOCALAPPDATA"\VsCode"
.\Install-FromWeb.ps1 -RemoteFile 'https://nodejs.org/dist/v12.13.0/node-v12.13.0-win-x64.zip' -DownloadFile $env:Temp'\node-v12.13.0-win-x64.zip' -DoExtractFile $true -AddedPath $env:LOCALAPPDATA"\Node"
Move-Item -Path $env:LOCALAPPDATA"\Node\node-v12.13.0-win-x64" -Destination $env:LOCALAPPDATA"\Node1"
Remove-item -Path $env:LOCALAPPDATA"\Node"
Rename-item -Path $env:LOCALAPPDATA"\Node1" -NewName $env:LOCALAPPDATA"\Node"
npm install yarn -g
.\Install-FromWeb.ps1 -RemoteFile 'https://dot.net/v1/dotnet-install.ps1' -DownloadFile $env:Temp'\dotnet-install.ps1' -DoExtractFile $false -AddedPath $env:LOCALAPPDATA'\Microsoft\Dotnet' -ExecutePath $env:Temp'\dotnet-install.ps1'
.\Install-FromWeb.ps1 -RemoteFile 'https://github.com/git-for-windows/git/releases/download/v2.24.0.windows.2/PortableGit-2.24.0.2-64-bit.7z.exe' -DownloadFile $env:TEMP'\PortableGit.exe' -DoExtractFile $false -AddedPath $env:LOCALAPPDATA'\Git' -Debug
& $env:TEMP'\PortableGit.exe' -o $env:LOCALAPPDATA'\Git' -y

All the scripts are in my Github : https://github.com/bsonnino/DevMachine

So, happy development!

You need to create customized reports based on Sharepoint data and you don’t have the access to the server to create a new web part to access it directly. You need to resort to other options to generate the report. Fortunately, Sharepoint allows some methods to access its data, so you can get the information you need. If you are using C#, you can use one of these three methods:

  • Client Side Object Model (CSOM) – this is a set of APIs that allow access to the Sharepoint data and allow you to maintain lists and documents in Sharepoint
  • REST API – with this API you can access Sharepoint data, not only using C#, but with any other platform that can perform and process REST requests, like web or mobile apps
  • SOAP Web Services – although this kind of access is being deprecated, there are a lot of programs that depend on it, so it’s being actively used until now. You can use this API with any platform that can process SOAP web services.

This article will show how to use these three APIs to access lists and documents from a Sharepoint server and put them in a WPF program, with an extra touch: the user will be able to export the data to Excel, for further manipulation.

To start, let’s create a WPF project in Visual Studio and name it SharepointAccess. We will use the MVVM pattern to develop it, so right click the References node in the Solution Explorer and select Manage NuGet Packages and add the MVVM Light package. That will add a ViewModel folder, with two files in it to your project. The next step is to make a correction in the ViewModelLocator.cs file. If you open it, you will see an error in the line

using Microsoft.Practices.ServiceLocation;

Just replace this using clause with

using CommonServiceLocator;

The next step is to link the DataContext of MainWindow to the MainViewModel, like it’s described in the header of ViewModelLocator:

<Window x:Class="SharepointAccess.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        xmlns:local="clr-namespace:SharepointAccess"
        mc:Ignorable="d"
        Title="Sharepoint Access" Height="700" Width="900"
        DataContext="{Binding Source={StaticResource Locator}, Path=Main}">

Then, let’s add the UI to MainWindow.xaml:

<Grid>
    <Grid.Resources>
        <DataTemplate x:Key="DocumentsListTemplate">
            <StackPanel>
                <TextBlock Text="{Binding Title}" />
            </StackPanel>
        </DataTemplate>
        <DataTemplate x:Key="DocumentTemplate">
            <StackPanel Margin="0,5">
                <TextBlock Text="{Binding Title}" />
                <TextBlock Text="{Binding Url}" />

            </StackPanel>
        </DataTemplate>
        <DataTemplate x:Key="FieldsListTemplate">
            <Grid >
                <Grid.ColumnDefinitions>
                    <ColumnDefinition Width="150" />
                    <ColumnDefinition Width="*" />
                </Grid.ColumnDefinitions>
                <TextBlock Text="{Binding Key}" TextTrimming="CharacterEllipsis"/>
                <TextBlock Text="{Binding Value}" Grid.Column="1" TextTrimming="CharacterEllipsis"/>
            </Grid>
        </DataTemplate>
    </Grid.Resources>
    <Grid.ColumnDefinitions>
        <ColumnDefinition Width="*" />
        <ColumnDefinition Width="*" />
        <ColumnDefinition Width="*" />
    </Grid.ColumnDefinitions>
    <Grid.RowDefinitions>
        <RowDefinition Height="40" />
        <RowDefinition Height="*" />
        <RowDefinition Height="40" />
    </Grid.RowDefinitions>
    <StackPanel Grid.Row="0" Grid.ColumnSpan="3" HorizontalAlignment="Stretch" Orientation="Horizontal">
        <TextBox Text="{Binding Address}" Width="400" Margin="5" HorizontalAlignment="Left"
                 VerticalContentAlignment="Center"/>
        <Button Content="Go" Command="{Binding GoCommand}" Width="65" Margin="5" HorizontalAlignment="Left"/>
    </StackPanel>
    <StackPanel Orientation="Horizontal"  Grid.Row="0" Grid.Column="3" 
                      HorizontalAlignment="Right" Margin="5,5,10,5">
        <RadioButton Content=".NET Api" IsChecked="True" GroupName="ApiGroup" Margin="5,0"
                     Command="{Binding ApiSelectCommand}" CommandParameter="NetApi" />
        <RadioButton Content="REST" GroupName="ApiGroup"
                     Command="{Binding ApiSelectCommand}" CommandParameter="Rest" Margin="5,0"/>
        <RadioButton Content="SOAP"  GroupName="ApiGroup"
                     Command="{Binding ApiSelectCommand}" CommandParameter="Soap" Margin="5,0"/>
    </StackPanel>
    <ListBox Grid.Column="0" Grid.Row="1" ItemsSource="{Binding DocumentsLists}" 
             ItemTemplate="{StaticResource DocumentsListTemplate}"
             SelectedItem="{Binding SelectedList}"/>
    <ListBox Grid.Column="1" Grid.Row="1" ItemsSource="{Binding Documents}"
             ItemTemplate="{StaticResource DocumentTemplate}"
             SelectedItem="{Binding SelectedDocument}"/>
    <ListBox Grid.Column="2" Grid.Row="1" ItemsSource="{Binding Fields}"
             ItemTemplate="{StaticResource FieldsListTemplate}"
             />
    <TextBlock Text="{Binding ListTiming}" VerticalAlignment="Center" Margin="5" Grid.Row="2" Grid.Column="0" />
    <TextBlock Text="{Binding ItemTiming}" VerticalAlignment="Center" Margin="5" Grid.Row="2" Grid.Column="1" />
  </Grid>

The first line in the grid will have a text box to enter the address of the Sharepoint site, a button to go to the address and three radiobuttons to select the kind of access you want.

The main part of the window will have three listboxes to show the lists on the selected site, the documents in each list and the properties of the document.

As you can see, we’ve used data binding to fill the properties of the UI controls. We’re using the MVVM pattern and all these propperties should be bound to properties in the ViewModel. The buttons and radiobuttons have their Command properties bound to a property in the ViewModel, so we don`t have to add code to the code behind file. We’ve also used templates for the items in the listboxes, so the data is properly presented.

The last line will show the timings for getting the data.

If you run the program, it will run without errors and you will get an UI like this, that doesn’t do anything:

 

It’s time to add the properties in the MainViewModel to achieve the functionality we want. Before that, we’ll add two classes to manipulate the data that comes from the Sharepoint server, no matter the access we are using. Create a new folder named Model, and add these two files, Document.cs and DocumentsList.cs:

public class Document
{
    public Document(string id, string title, 
        Dictionary<string, object> fields,
        string url)
    {
        Id = id;
        Title = title;
        Fields = fields;
        Url = url;
    }
    public string Id { get; }
    public string Title { get; }
    public string Url { get; }
    public Dictionary<string, object> Fields { get; }
}<span id="mce_marker" data-mce-type="bookmark" data-mce-fragment="1">​</span>
public class DocumentsList
{
    public DocumentsList(string title, string description)
    {
        Title = title;
        Description = description;
    }

    public string Title { get; }
    public string Description { get; }
}

These are very simple classes, that will store the data that comes from the Sharepoint server. The next step is to get the data from the server. For that we will use a repository that gets the data and returns it using these two classes. Create a new folder, named Repository and add a new interface, named IListRepository:

public interface IListRepository
{
    Task<List<Document>> GetDocumentsFromListAsync(string title);
    Task<List<DocumentsList>> GetListsAsync();
}

This interface declares two methods, GetListsAsync and GetDocumentsFromListAsync. These are asynchronous methods because we don want them to block the UI while they are being called. Now, its time to create the first access to Sharepoint, using the CSOM API. For that, you must add the NuGet package Microsoft.SharePointOnline.CSOM . This package will provide all the APIs to access Sharepoint data. We can now create our first repository. In the Repository folder, add a new class and name it CsomListRepository.cs:

public class CsomListRepository : IListRepository
{
    private string _sharepointSite;

    public CsomListRepository(string sharepointSite)
    {
        _sharepointSite = sharepointSite;
    }

    public Task<List<DocumentsList>> GetListsAsync()
    {
        return Task.Run(() =>
        {
            using (var context = new ClientContext(_sharepointSite))
            {
                var web = context.Web;
                
                var query = web.Lists.Include(l => l.Title, l => l.Description)
                     .Where(l => !l.Hidden && l.ItemCount > 0);

                var lists = context.LoadQuery(query);
                context.ExecuteQuery();

                return lists.Select(l => new DocumentsList(l.Title, l.Description)).ToList();
            }
        });
    }

    public Task<List<Document>> GetDocumentsFromListAsync(string listTitle)
    {
        return Task.Run(() =>
        {
            using (var context = new ClientContext(_sharepointSite))
            {
                var web = context.Web;
                var list = web.Lists.GetByTitle(listTitle);
                var query = new CamlQuery();

                query.ViewXml = "<View />";
                var items = list.GetItems(query);
                context.Load(list,
                    l => l.Title);
                context.Load(items, l => l.IncludeWithDefaultProperties(
                    i => i.Folder, i => i.File, i => i.DisplayName));
                context.ExecuteQuery();

                return items
                    .Where(i => i["Title"] != null)
                    .Select(i => new Document(i["ID"].ToString(), 
                    i["Title"].ToString(), i.FieldValues, i["FileRef"].ToString()))
                    .ToList();
            }
        });
    }
}

For accessing the Sharepoint data, we have to create a ClientContext, passing the Sharepoint site to access. Then, we get a reference to the Web property of the context and then we do a query for the lists that aren’t hidden and that have any items in them. The query should return with the titles and description of the lists in the website. To get the documents from a list we use a similar way: create a context, then a query and load the items of a list.

We can call this code in the creation of the ViewModel:

private IListRepository _listRepository;

public MainViewModel()
{
    Address = ConfigurationManager.AppSettings["WebSite"];
    GoToAddress();
}

This code will get the initial web site url from the configuration file for the app and call GoToAddress:

private async void GoToAddress()
{
    var sw = new Stopwatch();
    sw.Start();
    _listRepository = new CsomListRepository(Address);

    DocumentsLists = await _listRepository.GetListsAsync();
    ListTiming = $"Time to get lists: {sw.ElapsedMilliseconds}";
    ItemTiming = "";
}

The method calls the GetListsAsync method of the repository to get the lists and sets the DocumentsLists property with the result.

We must also declare some properties that will be used to bind to the control properties in the UI:

public List<DocumentsList> DocumentsLists
{
    get => _documentsLists;
    set
    {
        _documentsLists = value;
        RaisePropertyChanged();
    }
}
public string ListTiming
{
    get => _listTiming;
    set
    {
        _listTiming = value;
        RaisePropertyChanged();
    }
}
public string ItemTiming
{
    get => itemTiming;
    set
    {
        itemTiming = value;
        RaisePropertyChanged();
    }
}
public string Address
{
    get => address;
    set
    {
        address = value;
        RaisePropertyChanged();
    }
}

These properties will trigger the PropertyChanged event handler when they are modified, so the UI can be notified of their change.

If you run this code, you will have something like this:

Then, we must get the documents when we select a list. This is done when the SelectedList property changes:

public DocumentsList SelectedList
{
    get => _selectedList;
    set
    {
        if (_selectedList == value)
            return;
        _selectedList = value;
        GetDocumentsForList(_selectedList);
        RaisePropertyChanged();
    }
}

GetDocumentsForList is:

private async void GetDocumentsForList(DocumentsList list)
{
    var sw = new Stopwatch();
    sw.Start();
    if (list != null)
    {
        Documents = await _listRepository.GetDocumentsFromListAsync(list.Title);
        ItemTiming = $"Time to get items: {sw.ElapsedMilliseconds}";
    }
    else
    {
        Documents = null;
        ItemTiming = "";
    }
}

You have to declare the Documents and the Fields properties:

public List<Document> Documents
{
    get => documents;
    set
    {
        documents = value;
        RaisePropertyChanged();
    }
}

public Dictionary<string, object> Fields => _selectedDocument?.Fields;

One other change that must be made is to create a property named SelectedDocument that will be bound to the second list. When the user selects a document, it will fill the third list with the document’s properties:

public Document SelectedDocument
{
    get => _selectedDocument;
    set
    {
        if (_selectedDocument == value)
            return;
        _selectedDocument = value;
        RaisePropertyChanged();
        RaisePropertyChanged("Fields");
    }
}

Now, when you click on a list, you get all documents in it. Clicking on a document, opens its properties:

Everything works with the CSOM access, so it’s time to add the other two ways to access the data: REST and SOAP. We will implement these by creating two new repositories that will be selected at runtime.

To get items using the REST API, we must do HTTP calls to http://<website>/_api/Lists and  http://<website>/_api/Lists/GetByTitle(‘title’)/Items. In the Repository folder, create a new class and name it RestListRepository. This class should implement the IListRepository interface:

public class RestListRepository : IListRepository
{
    public Task<List<Document>> GetDocumentsFromListAsync(string title)
    {
        throw new NotImplementedException();
    }

    public Task<List<DocumentsList>> GetListsAsync()
    {
        throw new NotImplementedException();
    }
}

GetListsAsync will be:

private XNamespace ns = "http://www.w3.org/2005/Atom";
public Task<List<DocumentsList>> GetListsAsync()
{
    var doc = await GetResponseDocumentAsync(_sharepointSite + "Lists");
    if (doc == null)
        return null;

    var entries = doc.Element(ns + "feed").Descendants(ns + "entry");
    return entries.Select(GetDocumentsListFromElement)
        .Where(d => !string.IsNullOrEmpty(d.Title)).ToList();
}

GetResponseDocumentAsync will issue a HTTP Get request, will process the response and will return an XDocument:

public async Task<XDocument> GetResponseDocumentAsync(string url)
{
    var handler = new HttpClientHandler
    {
        UseDefaultCredentials = true
    };
    HttpClient httpClient = new HttpClient(handler);
    var headers = httpClient.DefaultRequestHeaders;
    var header = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " +
        "(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36";
    if (!headers.UserAgent.TryParseAdd(header))
    {
        throw new Exception("Invalid header value: " + header);
    }
    Uri requestUri = new Uri(url);

    try
    {
        var httpResponse = await httpClient.GetAsync(requestUri);
        httpResponse.EnsureSuccessStatusCode();
        var httpResponseBody = await httpResponse.Content.ReadAsStringAsync();
        return XDocument.Parse(httpResponseBody);
    }
    catch
    {
        return null;
    }
}

The response will be a XML String. We could get the response as a Json object if we pass the accept header as application/json.  After the document is parsed, we process all entry elements, retrieving the lists, in GetDocumentsListFromElement:

private XNamespace mns = "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata";
private XNamespace dns = "http://schemas.microsoft.com/ado/2007/08/dataservices";
private DocumentsList GetDocumentsListFromElement(XElement e)
{
    var element = e.Element(ns + "content")?.Element(mns + "properties");
    if (element == null)
        return new DocumentsList("", "");
    bool.TryParse(element.Element(dns + "Hidden")?.Value ?? "true", out bool isHidden);
    int.TryParse(element.Element(dns + "ItemCount")?.Value ?? "0", out int ItemCount);
    return !isHidden && ItemCount > 0 ?
      new DocumentsList(element.Element(dns + "Title")?.Value ?? "",
        element.Element(dns + "Description")?.Value ?? "") :
      new DocumentsList("", "");
}

Here we filter the list by parsing the Hidden and ItemCount properties and returning an empty document if the document is hidden or has no items. GetDocumentsFromListAsync is:

private async Task<Document> GetDocumentFromElementAsync(XElement e)
{
    var element = e.Element(ns + "content")?.Element(mns + "properties");
    if (element == null)
        return new Document("", "", null, "");
    var id = element.Element(dns + "Id")?.Value ?? "";
    var title = element.Element(dns + "Title")?.Value ?? "";
    var description = element.Element(dns + "Description")?.Value ?? "";
    var fields = element.Descendants().ToDictionary(el => el.Name.LocalName, el => (object)el.Value);
    int.TryParse(element.Element(dns + "FileSystemObjectType")?.Value ?? "-1", out int fileType);
    string docUrl = "";

    var url = GetUrlFromTitle(e, fileType == 0 ? "File" : "Folder");
    if (url != null)
    {
        var fileDoc = await GetResponseDocumentAsync(_sharepointSite + url);
        docUrl = fileDoc.Element(ns + "entry")?.
            Element(ns + "content")?.
            Element(mns + "properties")?.
            Element(dns + "ServerRelativeUrl")?.
            Value;
    }

    return new Document(id, title, fields, docUrl);
}

It parses the XML and extracts a document and its properties. GetUrlFromTitle gets the Url from the Title property and is:

private string GetUrlFromTitle(XElement element, string title)
{
    return element.Descendants(ns + "link")
            ?.FirstOrDefault(e1 => e1.Attribute("title")?.Value == title)
            ?.Attribute("href")?.Value;
}

The third access method is using the Soap service that Sharepoint makes available. This access method is listed as deprecated, but it’s still alive. You have to create a reference to the http://<website>/_vti_bin/Lists.asmx and create a WCF client for it. I preferred to create a .NET 2.0 Client instead of a WCF service, as I found easier to authenticate with this service.

In Visual Studio, right-click the References node and select the Add Service Reference. Then, click on the Advanced button and then, Add Web Reference. Put the url in the box and click the arrow button:

When you click the Add Reference button, the reference will be added to the project and it can be used. Create a new class in the Repository folder and name it SoapListRepository. Make the class implement the IListRepository interface. The GetListsAsync method will be:

XNamespace ns = "http://schemas.microsoft.com/sharepoint/soap/";
private Lists _proxy;

public async Task<List<DocumentsList>> GetListsAsync()
{
    var tcs = new TaskCompletionSource<XmlNode>();
    _proxy = new Lists
    {
        Url = _address,
        UseDefaultCredentials = true
    };
    _proxy.GetListCollectionCompleted += ProxyGetListCollectionCompleted;
    _proxy.GetListCollectionAsync(tcs);
    XmlNode response;
    try
    {
        response = await tcs.Task;
    }
    finally
    {
        _proxy.GetListCollectionCompleted -= ProxyGetListCollectionCompleted;
    }

    var list = XElement.Parse(response.OuterXml);
    var result = list?.Descendants(ns + "List")
        ?.Where(e => e.Attribute("Hidden").Value == "False")
        ?.Select(e => new DocumentsList(e.Attribute("Title").Value,
        e.Attribute("Description").Value)).ToList();
    return result;
}

private void ProxyGetListCollectionCompleted(object sender, GetListCollectionCompletedEventArgs e)
{
    var tcs = (TaskCompletionSource<XmlNode>)e.UserState;
    if (e.Cancelled)
    {
        tcs.TrySetCanceled();
    }
    else if (e.Error != null)
    {
        tcs.TrySetException(e.Error);
    }
    else
    {
        tcs.TrySetResult(e.Result);
    }
}

As we are using the .NET 2.0 web service, in order to convert the call to an asynchronous method, we must use a TaskCompletionSource to detect when the call to the service returns. Then we fire the call to the service. When it returns, the completed event is called and sets the TaskCompletionSource to the desired state: cancelled, if the call was cancelled, exception, if there was an error or set the result if the call succeeds. Then, we remove the event handler for the completed event and process the result (a XmlNode), to transform into a list of DocumentsList.

The call to GetDocumentsFromListAsync is very similar to GetListsAsync:

XNamespace rs = "urn:schemas-microsoft-com:rowset";
XNamespace z = "#RowsetSchema";

public async Task<List<Document>> GetDocumentsFromListAsync(string title)
{
    var tcs = new TaskCompletionSource<XmlNode>();
    _proxy = new Lists
    {
        Url = _address,
        UseDefaultCredentials = true
    };
    _proxy.GetListItemsCompleted += ProxyGetListItemsCompleted;
    _proxy.GetListItemsAsync(title, "", null, null, "", null, "", tcs);
    XmlNode response;
    try
    {
        response = await tcs.Task;
    }
    finally
    {
        _proxy.GetListItemsCompleted -= ProxyGetListItemsCompleted;
    }

    var list = XElement.Parse(response.OuterXml);

    var result = list?.Element(rs + "data").Descendants(z + "row")
        ?.Select(e => new Document(e.Attribute("ows_ID")?.Value,
        e.Attribute("ows_LinkFilename")?.Value, AttributesToDictionary(e),
        e.Attribute("ows_FileRef")?.Value)).ToList();
    return result;
}

private Dictionary<string, object> AttributesToDictionary(XElement e)
{
    return e.Attributes().ToDictionary(a => a.Name.ToString().Replace("ows_", ""), a => (object)a.Value);
}

private void ProxyGetListItemsCompleted(object sender, GetListItemsCompletedEventArgs e)
{
    var tcs = (TaskCompletionSource<XmlNode>)e.UserState;
    if (e.Cancelled)
    {
        tcs.TrySetCanceled();
    }
    else if (e.Error != null)
    {
        tcs.TrySetException(e.Error);
    }
    else
    {
        tcs.TrySetResult(e.Result);
    }
}

The main difference is the processing of the response, to get the documents list. Once you have the two methods in place, the only thing to do is select the correct repository in MainViewModel. For that, we create an enum for the API Selection:

public enum ApiSelection
{
    NetApi,
    Rest,
    Soap
};

Then, we need to declare a command bound to the radiobuttons, that will receive a string with the enum value:

public ICommand ApiSelectCommand =>
    _apiSelectCommand ?? (_apiSelectCommand = new RelayCommand<string>(s => SelectApi(s)));

private void SelectApi(string s)
{
    _selectedApi = (ApiSelection)Enum.Parse(typeof(ApiSelection), s, true);
    GoToAddress();
}

The last step is to select the repository in the GoToAddress method:

private async void GoToAddress()
{
    var sw = new Stopwatch();
    sw.Start();
    _listRepository = _selectedApi == ApiSelection.Rest ?
        (IListRepository)new RestListRepository(Address) :
        _selectedApi == ApiSelection.NetApi ?
        (IListRepository)new CsomListRepository(Address) :
        new SoapListRepository(Address);

    DocumentsLists = await _listRepository.GetListsAsync();
    ListTiming = $"Time to get lists: {sw.ElapsedMilliseconds}";
    ItemTiming = "";
}

With the code in place, you can run the app and see the data shown for each API.

One last change to the program is to add a command bound to the Go button, so you can change the address of the web site and get the lists and documents for the new site:

public ICommand GoCommand =>
            _goCommand ?? (_goCommand = new RelayCommand(GoToAddress, () => !string.IsNullOrEmpty(Address)));

This command has an extra touch: it will only enable the button if there is an address in the address box. If it’s empty, the button will be disabled. Now you can run the program, change the address of the website, and get the lists for the new website.

Conclusions

As you can see, we’ve created a WPF program that uses the MVVM pattern and accesses Sharepoint data using three different methods – it even has a time measuring feature, so you can check the performance difference and choose the right one for your case.

The full source code for this project is at https://github.com/bsonnino/SharepointAccess

One book that I recommend the reading is Clean Code, by Robert Martin. It is a well written book with wonderful techniques to create better code and improve your current programs, so they become easier to read, maintain and understand.

While going through it again, I found an excellent opportunity to improve my skills trying to do some refactoring: in listing 4.7 there is a prime generator function that he uses to show some refactoring concepts and turn int listing 4.8. I then thought do do the same and show my results here.

We can start with the listing converted to C#. This is a very easy task. The original program is written in  Java, but converting it to C# is just a matter of one or two small fixes:

using System;

namespace PrimeNumbers
{
/**
* This class Generates prime numbers up to a user specified
* maximum. The algorithm used is the Sieve of Eratosthenes.
* <p>
* Eratosthenes of Cyrene, b. c. 276 BC, Cyrene, Libya --
* d. c. 194, Alexandria. The first man to calculate the
* circumference of the Earth. Also known for working on
* calendars with leap years and ran the library at Alexandria.
* <p>
* The algorithm is quite simple. Given an array of integers
* starting at 2. Cross out all multiples of 2. Find the next
* uncrossed integer, and cross out all of its multiples.
* Repeat untilyou have passed the square root of the maximum
* value.
*
* @author Alphonse
* @version 13 Feb 2002 atp
*/
    public class GeneratePrimes
    {
        /**
        * @param maxValue is the generation limit.
*/
        public static int[] generatePrimes(int maxValue)
        {
            if (maxValue >= 2) // the only valid case
            {
                // declarations
                int s = maxValue + 1; // size of array
                bool[] f = new bool[s];
                int i;

                // initialize array to true.
                for (i = 0; i < s; i++)
                    f[i] = true;
                // get rid of known non-primes
                f[0] = f[1] = false;
                // sieve
                int j;
                for (i = 2; i < Math.Sqrt(s) + 1; i++)
                {
                    if (f[i]) // if i is uncrossed, cross its multiples.
                    {
                        for (j = 2 * i; j < s; j += i)
                            f[j] = false; // multiple is not prime
                    }
                }
                // how many primes are there?
                int count = 0;
                for (i = 0; i < s; i++)
                {
                    if (f[i])
                        count++; // bump count.
                }
                int[] primes = new int[count];
                // move the primes into the result
                for (i = 0, j = 0; i < s; i++)
                {
                    if (f[i]) // if prime
                        primes[j++] = i;
                }
                return primes; // return the primes
            }
            else // maxValue < 2
                return new int[0]; // return null array if bad input.
        }
    }
}

The first step is to put in place some tests, so we can be sure that we are not breaking anything while refactoring the code. In the solution, I added a new Class Library project, named it GeneratePrimes.Tests and added the packages NUnit, NUnit3TestAdapter and FluentAssertions to get fluent assertions in a NUnit test project. Then I added these tests:

using NUnit.Framework;
using FluentAssertions;

namespace PrimeNumbers.Tests
{
    [TestFixture]
    public class GeneratePrimesTests
    {
        [Test]
        public void GeneratePrimes0ReturnsEmptyArray()
        {
            var actual = GeneratePrimes.generatePrimes(0);
            actual.Should().BeEmpty();
        }

        [Test]
        public void GeneratePrimes1ReturnsEmptyArray()
        {
            var actual = GeneratePrimes.generatePrimes(1);
            actual.Should().BeEmpty();
        }

        [Test]
        public void GeneratePrimes2ReturnsArrayWith2()
        {
            var actual = GeneratePrimes.generatePrimes(2);
            actual.Should().BeEquivalentTo(new[] { 2 });
        }

        [Test]
        public void GeneratePrimes10ReturnsArray()
        {
            var actual = GeneratePrimes.generatePrimes(10);
            actual.Should().BeEquivalentTo(new[] { 2,3,5,7 });
        }

        [Test]
        public void GeneratePrimes10000ReturnsArray()
        {
            var actual = GeneratePrimes.generatePrimes(10000);
            actual.Should().HaveCount(1229).And.EndWith(9973);
        }
    }
}

These tests check that there are no primes for 0 and 1, one prime for 2, the primes for 10 are 2, 3, 5, 7 and that there are 1229 primes less than 10,000 and the largest one is 9973. Once we run the tests, we can see that the pass and we can start doing our changes.

The easiest fix we can do is to revise the comments at the beginning. We don’t need the history of Erasthotenes (you can go to Wikipedia for that). We don’t need the author and version, thanks to source control technology :-). We don’t need either the initial comment:

/**
    * This class Generates prime numbers up to a user specified
    * maximum. The algorithm used is the Sieve of Eratosthenes.
    *  https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes   
*/
public class GeneratePrimes
{
    public static int[] generatePrimes(int maxValue)

Then we can invert the initial test, to reduce nesting. If we hover the mouse in the line of the first if, an arrow appears at the border, indicating a quick fix:

We can do the quick fix, then eliminate the else clause (don’t forget to remove the extra comments that are not needed):

public static int[] generatePrimes(int maxValue)
{
    if (maxValue < 2) 
        return new int[0]; 

    // declarations
    int s = maxValue + 1; // size of array
    bool[] f = new bool[s];
    int i;

Save the code and check that all tests pass. The next step is to rename the variables:

  • s can be renamed to sizeOfArray
  • f can be renamed as isPrimeArray

Go to the declaration of s and press Ctrl-R-R to rename and rename it to sizeOfArray. Do the same with the f variable. Don’t forget to remove the comments (and to run the tests):

int sizeOfArray = maxValue + 1; 
bool[] isPrimeArray = new bool[sizeOfArray];
int i;

To go to the next refactorings, we can use the comments as indicators for extracting methods. We can extract the InitializeArray method:

The extracted code isn’t what I expected, so I change it to:

private static bool[] InitializeArray(int sizeOfArray)
{
    bool[] isPrimeArray = new bool[sizeOfArray];
    // initialize array to true.
    for (var i = 0; i < sizeOfArray; i++)
        isPrimeArray[i] = true;
    return isPrimeArray;
}

I can use the code like this:

var isPrimeArray = InitializeArray(sizeOfArray);

After passing the tests, I can refactor the code of InitializeArray to:

private static bool[] InitializeArray(int sizeOfArray)
{
    return Enumerable
        .Range(0, sizeOfArray)
        .Select(n => true)
        .ToArray();
}

The next step is the sieve:

The code for the sieve is really bad:

private static void Sieve(int sizeOfArray, bool[] isPrimeArray, 
    out int i, out int j)
{
    // get rid of known non-primes
    isPrimeArray[0] = isPrimeArray[1] = false;
    for (i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
    {
        if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
        {
            for (j = 2 * i; j < sizeOfArray; j += i)
                isPrimeArray[j] = false; // multiple is not prime
        }
    }
}

It has two out parameters (which, for me, is a code smell), and has an error (the out parameter j must be assigned) before exiting the method. So we can change it to remove the out parameters and remove the sizeOfArray parameter:

private static void Sieve(bool[] isPrimeArray)
{
    var sizeOfArray = isPrimeArray.Length;

    isPrimeArray[0] = isPrimeArray[1] = false;

    for (int i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
    {
        if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
        {
            for (int j = 2 * i; j < sizeOfArray; j += i)
                isPrimeArray[j] = false; 
        }
    }

Then, we can extract the method to count primes:

CountPrimes has the same flaws as Sieve, so we change it to:

private static int CountPrimes(bool[] isPrimeArray)
{
    var sizeOfArray = isPrimeArray.Length;
    var count = 0;
    for (var i = 0; i < sizeOfArray; i++)
    {
        if (isPrimeArray[i])
            count++; 
    }
    return count;
}

We can refactor it to:

private static int CountPrimes(bool[] isPrimeArray) => 
    isPrimeArray.Count(i => i);

The next step is MovePrimes:

After we tweak the MovePrimes code, we get:

private static int[] MovePrimes(bool[] isPrimeArray, int count)
{
    var sizeOfArray = isPrimeArray.Length;
    var primes = new int[count];
    for (int i = 0, j = 0; i < sizeOfArray; i++)
    {
        if (isPrimeArray[i]) // if prime
            primes[j++] = i;
    }
    return primes;
}

Then we can refactor MovePrimes:

 private static int[] MovePrimes(bool[] isPrimeArray, int count) =>
     isPrimeArray
         .Select((p, i) => new { Index = i, IsPrime = p })
         .Where(v => v.IsPrime)
         .Select(v => v.Index)
         .ToArray();

Notice that we aren’t using the primes count in this case, so we can remove the calculation of the count and the parameter. After some cleaning and name changing, we get:

public static int[] GetPrimes(int maxValue)
{
    if (maxValue < 2)
        return new int[0];

    bool[] isPrimeArray = InitializeArray(maxValue);
    Sieve(isPrimeArray);
    return MovePrimes(isPrimeArray);
}

Much cleaner, no? Now, it’s easier to read the method, the details are hidden, but the code still runs the same way. We have a more maintainable method, and it shows clearly what it does.

But there is a change we can do here: we are using static methods only. We can then use extension methods and add the keyword this to allow the methods to be used as extension methods. For example, if we change MovePrimes and Sieve to:

private static int[] MovePrimes(this bool[] isPrimeArray) =>
    isPrimeArray
        .Select((p, i) => new { Index = i, IsPrime = p })
        .Where(v => v.IsPrime)
        .Select(v => v.Index)
        .ToArray();

private static bool[] Sieve(this bool[] isPrimeArray)
{
    var sizeOfArray = isPrimeArray.Length;

    isPrimeArray[0] = isPrimeArray[1] = false;

    for (int i = 2; i < Math.Sqrt(sizeOfArray) + 1; i++)
    {
        if (isPrimeArray[i]) // if i is uncrossed, cross its multiples.
        {
            for (int j = 2 * i; j < sizeOfArray; j += i)
                isPrimeArray[j] = false;
        }
    }
    return isPrimeArray;

We can have the GetPrimes method to be changed to:

public static int[] PrimesSmallerOrEqual(this int maxValue)
{
    if (maxValue < 2)
        return new int[0];

    return maxValue.InitializeArray()
        .Sieve()
        .MovePrimes();
}

Cool, no? With this change, the tests become:

public class GeneratePrimesTests
{
    [Test]
    public void GeneratePrimes0ReturnsEmptyArray()
    {
        0.PrimesSmallerOrEqual().Should().BeEmpty();
    }

    [Test]
    public void GeneratePrimes1ReturnsEmptyArray()
    {
        1.PrimesSmallerOrEqual().Should().BeEmpty();
    }

    [Test]
    public void GeneratePrimes2ReturnsArrayWith2()
    {
        2.PrimesSmallerOrEqual()
            .Should().BeEquivalentTo(new[] { 2 });
    }

    [Test]
    public void GeneratePrimes10ReturnsArray()
    {
        10.PrimesSmallerOrEqual()
            .Should().BeEquivalentTo(new[] { 2, 3, 5, 7 });
    }

    [Test]
    public void GeneratePrimes10000ReturnsArray()
    {
        10000.PrimesSmallerOrEqual()
            .Should().HaveCount(1229).And.EndWith(9973);
    }
}

The full code is at https://github.com/bsonnino/PrimeNumbers. Each commit there is a phase of the refactoring.

Sometimes, when we open an Explorer window in the main computer, we see red bars in some disks, telling us that the disk is almost full and that we need to do some cleanup. We call the system cleanup, that removes some unused space, but this isn’t enough to  make things better.

So, we try to find the duplicate files in the disk to remove some extra space, but we have a problem: where are the duplicate files? The first answer is to check the files with the same name and size, but that isn’t enough – files can be renamed, and still be duplicates.

So, the best thing to do is to find a way to find and list all duplicates in the disk. But how can we do this?

The naive approach is to get all files with the same size and compare them one with the other. But this is really cumbersome, because if there are 100 files in the group, there will be 100!/(2!*98!) = 100*99/2 = 4950 comparisons and has a complexity of O(n^2).

One other approach is to get a checksum of the file and compare checksums. That way, you will still have the O(n^2) complexity, but you’ll have less data to compare (but you will have to compute the time to calculate the checksums). A third approach would be to use a dictionary to group the files with the same hash. The search in a dictionary has a O(1) complexity, so this would do a O(n) complexity.

Now, we only have to choose the checksum. Every checksum has a number of bits and, roughly, the larger the number of bits, the longer it takes to compute it. But the larger number of bits make it more difficult to get wrong results: if you are using CRC16 checksum (16 bits), you will have 65,535 combinations and the probability of two different files have the same checksum is very large. CRC32 allows 2,147,483,647 combinations and, thus, is more difficult to have a wrong result. You can use other algorithms, like MD5 (128 bits), SHA1 (196 bits) or SHA256 (256 bits), but computing these will be way longer than computing the CRC32 bits. As we are not seeking for huge accuracy, but for speed, we’ll use the CRC32 algorithm to compute the hashes. A fast implementation of this algorithm can be found here , and you can use it by installing the CRC32C.NET NuGet package.

From there, we can create our program to find and list the duplicates in the disk. In Visual Studio, create a new WPF application. In the Solution Explorer, right-click on the references node and select the WpfFolderBrowser and Crc32C.NET packages. Then add this code in MainWindow.xaml:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <Button Width="85" Height="30" Content="Start" Click="StartClick"
                HorizontalAlignment="Right" Margin="5" Grid.Row="0"/>
    <Grid Grid.Row="1">
        <Grid.RowDefinitions>
            <RowDefinition Height="*"/>
            <RowDefinition Height="30"/>
        </Grid.RowDefinitions>
        <ScrollViewer HorizontalScrollBarVisibility="Disabled">
        <ItemsControl x:Name="FilesList" HorizontalContentAlignment="Stretch">
            <ItemsControl.ItemTemplate>
                <DataTemplate>
                    <Grid HorizontalAlignment="Stretch">
                        <Grid.RowDefinitions>
                            <RowDefinition Height="30" />
                            <RowDefinition Height="Auto" />
                        </Grid.RowDefinitions>
                        <TextBlock Text="{Binding Value[0].Length, StringFormat=N0}"
                                   Margin="5" FontWeight="Bold"/>
                        <TextBlock Text="{Binding Key, StringFormat=X}"
                                   Margin="5" FontWeight="Bold" HorizontalAlignment="Right"/>
                        <ItemsControl ItemsSource="{Binding Value}" Grid.Row="1" 
                                      HorizontalAlignment="Stretch"
                                      ScrollViewer.HorizontalScrollBarVisibility="Disabled"
                                      Background="Aquamarine">
                            <ItemsControl.ItemTemplate>
                                <DataTemplate>
                                    <TextBlock Text="{Binding FullName}" Margin="15,0"  />
                                </DataTemplate>
                            </ItemsControl.ItemTemplate>
                        </ItemsControl>
                    </Grid>
                </DataTemplate>
            </ItemsControl.ItemTemplate>
        </ItemsControl>
        </ScrollViewer>
        <StackPanel Grid.Row="1" Orientation="Horizontal">
            <TextBlock x:Name="TotalFilesText" Margin="5,0" VerticalAlignment="Center"/>
            <TextBlock x:Name="LengthFilesText" Margin="5,0" VerticalAlignment="Center"/>
        </StackPanel>
    </Grid>
</Grid>

In the button’s click event handler, we will open a Folder browser dialog and, if the user selects a folder, we will process it, enumerating the files and  finding the ones that have the same size. Then, we calculate the Crc32 for these files and add them to a dictionary, grouped by hash:

private async void StartClick(object sender, RoutedEventArgs e)
{
    var fbd = new WPFFolderBrowserDialog();
    if (fbd.ShowDialog() != true)
        return;
    FilesList.ItemsSource = null;
    var selectedPath = fbd.FileName;

    var files = await GetPossibleDuplicatesAsync(selectedPath);
     FilesList.ItemsSource = await GetRealDuplicatesAsync(files);
}

The GetPossibleDuplicatesAsync will enumerate the files and group them by size, returning only the groups that have more than one file:

private async Task<List<IGrouping<long, FileInfo>>> GetPossibleDuplicates(string selectedPath)
{
    List<IGrouping<long, FileInfo>> files = null;
    await Task.Factory.StartNew(() =>
    {
        files = GetFilesInDirectory(selectedPath)
                       .OrderByDescending(f => f.Length)
                         .GroupBy(f => f.Length)
                         .Where(g => g.Count() > 1)
                         .ToList();
    });
    return files;
}

GetFilesInDirectory enumerates the files in the selected directory:

private List<FileInfo> GetFilesInDirectory(string directory)
{
    var files = new List<FileInfo>();
    try
    {
        var directories = Directory.GetDirectories(directory);
        try
        {
            var di = new DirectoryInfo(directory);
            files.AddRange(di.GetFiles("*"));
        }
        catch
        {
        }
        foreach (var dir in directories)
        {
            files.AddRange(GetFilesInDirectory(System.IO.Path.Combine(directory, dir)));
        }
    }
    catch
    {
    }

    return files;
}

After we have the duplicate files grouped, we can search the real duplicates with GetRealDuplicatesAsync:

private static async Task<Dictionary<uint,List<FileInfo>>> GetRealDuplicatesAsync(
    List<IGrouping<long, FileInfo>> files)
{
    var dictFiles = new Dictionary<uint, List<FileInfo>>();
    await Task.Factory.StartNew(() =>
    {
        foreach (var file in files.SelectMany(g => g))
        {
            var hash = GetCrc32FromFile(file.FullName);
            if (hash == 0)
                continue;
            if (dictFiles.ContainsKey(hash))
                dictFiles[hash].Add(file);
            else
                dictFiles.Add(hash, new List<FileInfo>(new[] { file }));
        }
    });
    return dictFiles.Where(p => p.Value.Count > 1).ToDictionary(p => p.Key, p => p.Value);
}

The GetCrc32FromFile method with use the Crc32C library to compute the Crc32 hash from the file. Note that we can’t compute the hash in one pass, by reading the whole file, as this will fail with files with more than 2Gb. So, we read chunks of 10,000 bytes and process them.

public static uint GetCrc32FromFile(string fileName)
{
    try
    {
        using (FileStream file = new FileStream(fileName, FileMode.Open))
        {
            const int NumBytes = 10000;
            var bytes = new byte[NumBytes];
            var numRead = file.Read(bytes, 0, NumBytes);
            if (numRead == 0)
                return 0;
            var crc = Crc32CAlgorithm.Compute(bytes, 0, numRead);
            while (numRead > 0)
            {
                numRead = file.Read(bytes, 0, NumBytes);
                if (numRead > 0)
                    Crc32CAlgorithm.Append(crc, bytes, 0, numRead);
            }
            return crc;
        }
    }
    catch (Exception ex) when (ex is UnauthorizedAccessException || ex is IOException)
    {
        return 0;
    }
}

Now, when you run the app, you will get something like this:

You can then verify the files you want to remove and then go to Explorer and remove them. But there is one thing to do here: the time to compute the hash is very large, especially if you have a lot of data to process (large files, large number of files or both). Could it be improved?

This issue is somewhat complicated to solve. Fortunately, .NET provide us with an excellent tool to improve performance in this case: Parallel programming. By making a small change in the code, you can calculate the CRC of the files in parallel, thus improving the performance. But there is a catch: we are using classes that are not thread safe. If you use the common Dictionary and List to store the data, you will end up with wrong results. But, once again, .NET comes to rescue us: it provides the ConcurrentDictionary and ConcurrentBag to replace the common classes, so we can store the data in a thread safe way. We can then change the code to this:

private static async Task<Dictionary<uint, List<FileInfo>>> GetRealDuplicatesAsync(
    List<IGrouping<long, FileInfo>> files)
{
    var dictFiles = new ConcurrentDictionary<uint, ConcurrentBag<FileInfo>>();
    await Task.Factory.StartNew(() =>
    {
        Parallel.ForEach(files.SelectMany(g => g), file =>
        {
            var hash = GetCrc32FromFile(file.FullName);
            if (hash != 0)
            {
                if (dictFiles.ContainsKey(hash))
                    dictFiles[hash].Add(file);
                else
                    dictFiles.TryAdd(hash, new ConcurrentBag<FileInfo>(new[] { file }));
            }
        });
    });
    return dictFiles.Where(p => p.Value.Count > 1)
        .OrderByDescending(p => p.Value.First().Length)
        .ToDictionary(p => p.Key, p => p.Value.ToList());
}

When we do that and run our program again, we will see that more CPU is used for the processing and the times to get the list come to 46 seconds from 78 seconds (for 18GB of duplicate files).

Conclusions

With this program, we can show the largest duplicates in a folder and see what can be safely deleted in our disk, thus retrieving some space (in our case, we would have potentially got 9Gb extra). We’ve done some optimization in the code by parallelizing the calculations using the parallel extensions in .NET.

The source code for this article is in https://github.com/bsonnino/FindDuplicates

Sometimes, you need to parse some html data to do some processing and present it to the user. That may be a daunting task, as some pages can become very complex and it may be difficult to do it.

For that, you can use an excellent tool, named HTML Agility Pack. With it, you can parse HTML from a string, a file, a web site or even from a WebBrowser: you can add a WebBrowser to your app, navigate to an URL and parse the data from there.

In this article, I’ll show how to make a query in Bing, retrieve and parse the response. For that, we need to create the query url and pass it to Bing. You may ask why I’m querying Bing and not Google – I’m doing that because Google makes it difficult to get its data, and I want to show you how to use HTML Agility Pack, and not how to retrieve data from Google :-). The query should be something like this:

https://www.bing.com/search?q=html+agility+pack&count=100

We will use the Query (q) and the number of results (count) parameters. With them, we can create our program. We will create a WPF program that gets the query text, parses it and presents the results in a Listbox.

Create a new WPF program and name it BingSearch.

The next step is to add the HTML Agility Pack to the project. Right-click the References node in the Solution Explorer and select Manage NuGet Packages. Then add the Html Agility Pack to the project.

Then, in the main window, add this XAML code:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <StackPanel Grid.Row="0" Orientation ="Horizontal" 
                Margin="5,0" VerticalAlignment="Center">
        <TextBlock Text="Search" VerticalAlignment="Center"/>
        <TextBox x:Name="TxtSearch" Width="300" Height="30" 
                 Margin="5,0" VerticalContentAlignment="Center"/>
    </StackPanel>
    <Button Grid.Row="0" HorizontalAlignment="Right" 
            Content="Search" Margin="5,0" VerticalAlignment="Center"
            Width="65" Height="30" Click="SearchClick"/>
    <ListBox Grid.Row="1" x:Name="LbxResults" />
</Grid>

Right click in the button’s click event handler in the XAML and press F12 to add the handler in code and go to it. Then, add this code to the handler:

private async void SearchClick(object sender, RoutedEventArgs e)
{
    if (string.IsNullOrWhiteSpace(TxtSearch.Text))
        return;
    var queryString = WebUtility.UrlEncode(TxtSearch.Text);
    var htmlWeb = new HtmlWeb();
    var query = $"https://bing.com/search?q={queryString}&count=100";
    var doc = await htmlWeb.LoadFromWebAsync(query);
    var response = doc.DocumentNode.SelectSingleNode("//ol[@id='b_results']");
    var results = response.SelectNodes("//li[@class='b_algo']");
    if (results == null)
    {
        LbxResults.ItemsSource = null;
        return;
    }
    var searchResults = new List<SearchResult>();
    foreach (var result in results)
    {
        var refNode = result.Element("h2").Element("a");
        var url = refNode.Attributes["href"].Value;
        var text = refNode.InnerText;
        var description = result.Element("div").Element("p").InnerText;
        searchResults.Add(new SearchResult(text, url, description));
    }
    LbxResults.ItemsSource = searchResults;
}

Initially we encode the text to search to add it to the query and create the query string. Then we call the LoadFromWebAsync method to load the HTML data from the query response. When the response comes, we get the response node, from the ordered list with id b_results and extract from it the individual results. Finally, we parse each result and add it to a list of SearchResult, and assign the list to the items in the ListBox. You can note that we can finde the nodes using XPath, like in

var results = response.SelectNodes("//li[@class='b_algo']");

Or we can traverse the elements and get the text of the resulting node with something like:

var refNode = result.Element("h2").Element("a");
var url = refNode.Attributes["href"].Value;
var text = refNode.InnerText;
var description = WebUtility.HtmlDecode(
    result.Element("div").Element("p").InnerText);

SearchResult is declared as:

internal class SearchResult
{
    public string Text { get; }
    public string Url { get; }
    public string Description { get; }

    public SearchResult(string text, string url, string description)
    {
        Text = text;
        Url = url;
        Description = description;
    }
}

if you run the program, you will see something like this:

The data isn’t displayed because we haven’t defined any data template for the list items. You can define an item template like that in the XAML:

<ListBox.ItemTemplate>
    <DataTemplate>
        <StackPanel Margin="0,3">
            <TextBlock Text="{Binding Text}" FontWeight="Bold"/>
            <TextBlock >
              <Hyperlink NavigateUri="{Binding Url}" RequestNavigate="LinkNavigate">
                 <TextBlock Text="{Binding Url}"/>
              </Hyperlink>
            </TextBlock>
            <TextBlock Text="{Binding Description}" TextWrapping="Wrap"/>
        </StackPanel>
    </DataTemplate>
</ListBox.ItemTemplate>

The LinkNavigate event handler is:

private void LinkNavigate(object sender, RequestNavigateEventArgs e)
{
    System.Diagnostics.Process.Start(e.Uri.AbsoluteUri);
}

Now, when you run the program, you will get something like this:

You can click on the hyperlink and it will open a browser window with the selected page. We can even go further and add a WebBrowser to our app that will show the selected page when you click on an item. For that, you have to modify the XAML code with something like this:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <Grid.ColumnDefinitions>
        <ColumnDefinition Width="*"/>
        <ColumnDefinition Width="*"/>
    </Grid.ColumnDefinitions>
    <StackPanel Grid.Row="0" Orientation ="Horizontal" 
                Margin="5,0" VerticalAlignment="Center">
        <TextBlock Text="Search" VerticalAlignment="Center"/>
        <TextBox x:Name="TxtSearch" Width="300" Height="30" 
                 Margin="5,0" VerticalContentAlignment="Center"/>
    </StackPanel>
    <Button Grid.Row="0" HorizontalAlignment="Right" 
            Content="Search" Margin="5,0" VerticalAlignment="Center"
            Width="65" Height="30" Click="SearchClick"/>
    <ListBox Grid.Row="1" x:Name="LbxResults" 
             ScrollViewer.HorizontalScrollBarVisibility="Disabled"
             SelectionChanged="LinkChanged">
        <ListBox.ItemTemplate>
            <DataTemplate>
                <StackPanel Margin="0,3">
                    <TextBlock Text="{Binding Text}" FontWeight="Bold"/>
                    <TextBlock >
                      <Hyperlink NavigateUri="{Binding Url}" RequestNavigate="LinkNavigate">
                         <TextBlock Text="{Binding Url}"/>
                      </Hyperlink>
                    </TextBlock>
                    <TextBlock Text="{Binding Description}" TextWrapping="Wrap"/>
                </StackPanel>
            </DataTemplate>
        </ListBox.ItemTemplate>
    </ListBox>
    <WebBrowser Grid.Column="1" Grid.RowSpan="2" x:Name="WebPage"  />
</Grid>

We’ve added a second column to the window and added a WebBrwser to it, then added a SelectionChanged event to the listbox, so we can navigate to the selected page.

The SelectionChanged event handler is:

private void LinkChanged(object sender, SelectionChangedEventArgs e)
{
    if (e.AddedItems?.Count > 0)
    {
        WebPage.Navigate(((SearchResult)e.AddedItems[0]).Url);
    }
}

Now, when you run the app and click on a result, it will show the page in the WebBrowser. One thing that happened is that, sometimes a Javascript error pops up. To remove these errors, I used the solution obtained from here:

public MainWindow()
{
    InitializeComponent();
    WebPage.Navigated += (s, e) => SetSilent(WebPage, true);
}

public static void SetSilent(WebBrowser browser, bool silent)
{
    if (browser == null)
        throw new ArgumentNullException("browser");

    // get an IWebBrowser2 from the document
    IOleServiceProvider sp = browser.Document as IOleServiceProvider;
    if (sp != null)
    {
        Guid IID_IWebBrowserApp = new Guid("0002DF05-0000-0000-C000-000000000046");
        Guid IID_IWebBrowser2 = new Guid("D30C1661-CDAF-11d0-8A3E-00C04FC9E26E");

        object webBrowser;
        sp.QueryService(ref IID_IWebBrowserApp, ref IID_IWebBrowser2, out webBrowser
        if (webBrowser != null)
        {
            webBrowser.GetType().InvokeMember("Silent", 
                BindingFlags.Instance | BindingFlags.Public | 
                BindingFlags.PutDispProperty, null, webBrowser, 
                new object[] { silent });
        }
    }
}


[ComImport, Guid("6D5140C1-7436-11CE-8034-00AA006009FA"), 
    InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
private interface IOleServiceProvider
{
    [PreserveSig]
    int QueryService([In] ref Guid guidService, [In] ref Guid riid, 
        [MarshalAs(UnmanagedType.IDispatch)] out object ppvObject);
}

With this code, the Javascript errors disappear and when you run the app, you will see something like this:

As you can see, the HTML Agility Pack makes it easy to process and parse HTML Pages, allowing you to manipulate them the way you want.

The full source code for this article is in https://github.com/bsonnino/BingSearch

Some time ago I wrote a post about converting a WPF application into .NET Core. One thing that called my attention in this Build 2019 talk was that the performance for file enumerations was enhanced in the .NET core apps. So I decided to check this with my own app and see what happens in my machine.

I added some measuring data in the app, so I could see what happens there:

private async void StartClick(object sender, RoutedEventArgs e)
{
    var fbd = new WPFFolderBrowserDialog();
    if (fbd.ShowDialog() != true)
        return;
    FilesList.ItemsSource = null;
    ExtList.ItemsSource = null;
    ExtSeries.ItemsSource = null;
    AbcList.ItemsSource = null;
    AbcSeries.ItemsSource = null;
    var selectedPath = fbd.FileName;
    Int64 minSize;
    if (!Int64.TryParse(MinSizeBox.Text, out minSize))
        return;
    List<FileInfo> files = null;
    var sw = new Stopwatch();
    var timeStr = "";
    await Task.Factory.StartNew(() =>
    {
       sw.Start();
       files = GetFilesInDirectory(selectedPath).ToList();
       timeStr = $" {sw.ElapsedMilliseconds} for enumeration";
       sw.Restart();
       files = files.Where(f => f.Length >= minSize)
         .OrderByDescending(f => f.Length)
         .ToList();
       timeStr += $" {sw.ElapsedMilliseconds} for ordering and filtering";
    });
    var totalSize = files.Sum(f => f.Length);
    TotalFilesText.Text = $"# Files: {files.Count}";
    LengthFilesText.Text = $"({totalSize:N0} bytes)";
    sw.Restart();
    FilesList.ItemsSource = files;
    var extensions = files.GroupBy(f => f.Extension)
        .Select(g => new { Extension = g.Key, Quantity = g.Count(), Size = g.Sum(f => f.Length) })
        .OrderByDescending(t => t.Size).ToList();
    ExtList.ItemsSource = extensions;
    ExtSeries.ItemsSource = extensions;
    var tmp = 0.0;
    var abcData = files.Select(f =>
    {
        tmp += f.Length;
        return new { f.Name, Percent = tmp / totalSize * 100 };
    }).ToList();
    AbcList.ItemsSource = abcData;
    AbcSeries.ItemsSource = abcData.OrderBy(d => d.Percent).Select((d, i) => new { Item = i, d.Percent });
    timeStr += $"  {sw.ElapsedMilliseconds} to fill data";
    TimesText.Text = timeStr;
}

That way, I could measure two things: the time to enumerate the files and the times to sort, filter and assign the files to the lists. Then, I run the two programs, to see what happened.

The machine I’ve run is a Virtual machine with a Core I5 and 4 virtual processors and a virtualized hard disk, with 12,230 files (93.13 GB of data). The measures may vary on your machine, but the differences should be comparable. To avoid bias, I ran 3 times each program (in Admin mode), then rebooted and run the other one.

Here are the results I’ve got:

Run Enumerate Sort/Filter Assign
.NET
1 137031 96 43
2 58828 56 9
3 59474 55 8
Avg 85111 69 20
.NET Core
1 91105 120 32
2 33422 90 14
3 32907 87 20
Avg 52478 99 22

 

As you can see by the numbers, the .NET Core application improved a lot the times for file enumeration, but still lacks some effort for sorting/filtering and assigning data to the UI lists. But that’s not bad for a platform still in preview!

If you do some testingfor the performance, I’m curious to see what you’ve got, you can put your results and comments in the Comments section.

 

One thing that I use a lot is sample data. Every article I write needs some data to explain the concepts, I need some data to see how it fits in my designs or even sample data for testing. This is a real trouble, as I must find some reliable data for my programs. Sometimes, I go to databases (Northwind and AdventureWorks are my good friends), sometimes, I use Json or XML data and other times I create the sample data by myself.

None of them are perfect, and it’s not consistent. Every time I get a new way of accessing data (yes, it can be good for learning purposes, but it’s a nightmare for maintenance). Then, looking around, I found Bogus (https://github.com/bchavez/Bogus), It’s a simple data generator for C#. All you have to do is create rules for your data and generate it. Simple as that! Then, you’ve got the data to use in your programs. It can be fixed (every time you run your program, you have the same data) or variable (every time you get a different set of data), and once you got it, you can serialize it to whichever data format you want: json files, databases, xml or plain text files.

Generating sample data

The first step to generate sample data is to create your classes. Create a new console app and add these two classes:

public class Customer
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
    public string City { get; set; }
    public string Country { get; set; }
    public string ZipCode { get; set; }
    public string Phone { get; set; }
    public string Email { get; set; }
    public string ContactName { get; set; }
    public IEnumerable<Order> Orders { get; set; }
}
public class Order
{
    public Guid Id { get; set; }
    public DateTime Date { get; set; }
    public Decimal OrderValue { get; set; }
    public bool Shipped { get; set; }
}

Once you’ve got the classes, you can add the repositories to get the sample data. To use the sample data generator, you must add the Bogus NuGet package to your project, with the command Install-Package Bogus, in the package manager console. Then we can add the repository class to retrieve the data. Add a new class to the project and name it SampleCustomerRepository. Then add this code in the class:

public IEnumerable<Customer> GetCustomers()
{
    Randomizer.Seed = new Random(123456);
    var ordergenerator = new Faker<Order>()
        .RuleFor(o => o.Id, Guid.NewGuid)
        .RuleFor(o => o.Date, f => f.Date.Past(3))
        .RuleFor(o => o.OrderValue, f => f.Finance.Amount(0, 10000))
        .RuleFor(o => o.Shipped, f => f.Random.Bool(0.9f));
    var customerGenerator = new Faker<Customer>()
        .RuleFor(c => c.Id, Guid.NewGuid())
        .RuleFor(c => c.Name, f => f.Company.CompanyName())
        .RuleFor(c => c.Address, f => f.Address.FullAddress())
        .RuleFor(c => c.City, f => f.Address.City())
        .RuleFor(c => c.Country, f => f.Address.Country())
        .RuleFor(c => c.ZipCode, f => f.Address.ZipCode())
        .RuleFor(c => c.Phone, f => f.Phone.PhoneNumber())
        .RuleFor(c => c.Email, f => f.Internet.Email())
        .RuleFor(c => c.ContactName, (f, c) => f.Name.FullName())
        .RuleFor(c => c.Orders, f => ordergenerator.Generate(f.Random.Number(10)).ToList());
    return customerGenerator.Generate(100);
}

In line 3, we have set the Randomizer.Seed to a fixed seed, so the data is always the same for all runs. If we don’t want it, we just don’t need to set it up. Then we set the rules for the order and customer generation. Then we call the Generate method to generate the sample data. Easy as that.

As you can see, the generator has a lot of classes to generate data. For example, the Company  class generates data for the company, like CompanyName. You can use this data as sample data for your programs. I can see some uses for it:

  • Test data for unit testing
  • Sample data for design purposes
  • Sample data for prototypes

but I’m sure you can find some more.

To use the data, you can add this code in the main program:

static void Main(string[] args)
{
    var repository = new SampleCustomerRepository();
    var customers = repository.GetCustomers();
    Console.WriteLine(JsonConvert.SerializeObject(customers, 
        Formatting.Indented));
}

We are serializing the data to Json, so you must add the Newtonsoft.Json Nuget package to your project. When you run this project, you will see something like this:

As you can see, it generated a whole set of customers with their orders, so you can use in your programs.

You may say that this is just dummy data, it will be cluttering your project, and it this data will remain unused when the program goes to production, and you’re right. But that’s not the way I would recommend to use this kind of data. It is disposable and will clutter the project. So, a better way would be create another project, a class library with the repository. That solves one problem, the maintenance one. When entering in production, you replace the sample dll with the real repository and you’re done. Or no? Not too fast. If you see the code in the main program, you will see that we are instantiating an instance of the SampleCustomerRepository, that won’t exist anymore. That way, you must also change the code for the main program. Not a good solution.

You can use conditional compilation to instantiate the repository you want, like this:

        static void Main(string[] args)
        {
#if SAMPLEREPO
            var repository = new SampleCustomerRepository();
#else
            var repository = new CustomerRepository();
#endif
            var customers = repository.GetCustomers();
            Console.WriteLine(JsonConvert.SerializeObject(customers, 
                Formatting.Indented));
        }

That’s better, but still not optimal: you need to compile the program twice: one time to use the sample data and the other one for going to production. With automated builds that may be less than a problem, but if you need to deploy by yourself, it can be a nightmare.

The solution? Dependency injection. Just create an interface for the repository, use a dependency injection framework (I like to use Unity or Ninject, but there are many others out there, just check this list). That way, the code will be completely detached from the data and you won’t need to recompile the project to use this or that data. Just add the correct dll and you are using the data you want. This is a nice approach, but the topic gives space for another post, just wait for it!

The code for this article is at https://github.com/bsonnino/SampleData