Sometimes, you need to parse some html data to do some processing and present it to the user. That may be a daunting task, as some pages can become very complex and it may be difficult to do it.

For that, you can use an excellent tool, named HTML Agility Pack. With it, you can parse HTML from a string, a file, a web site or even from a WebBrowser: you can add a WebBrowser to your app, navigate to an URL and parse the data from there.

In this article, I’ll show how to make a query in Bing, retrieve and parse the response. For that, we need to create the query url and pass it to Bing. You may ask why I’m querying Bing and not Google – I’m doing that because Google makes it difficult to get its data, and I want to show you how to use HTML Agility Pack, and not how to retrieve data from Google :-). The query should be something like this:

https://www.bing.com/search?q=html+agility+pack&count=100

We will use the Query (q) and the number of results (count) parameters. With them, we can create our program. We will create a WPF program that gets the query text, parses it and presents the results in a Listbox.

Create a new WPF program and name it BingSearch.

The next step is to add the HTML Agility Pack to the project. Right-click the References node in the Solution Explorer and select Manage NuGet Packages. Then add the Html Agility Pack to the project.

Then, in the main window, add this XAML code:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <StackPanel Grid.Row="0" Orientation ="Horizontal" 
                Margin="5,0" VerticalAlignment="Center">
        <TextBlock Text="Search" VerticalAlignment="Center"/>
        <TextBox x:Name="TxtSearch" Width="300" Height="30" 
                 Margin="5,0" VerticalContentAlignment="Center"/>
    </StackPanel>
    <Button Grid.Row="0" HorizontalAlignment="Right" 
            Content="Search" Margin="5,0" VerticalAlignment="Center"
            Width="65" Height="30" Click="SearchClick"/>
    <ListBox Grid.Row="1" x:Name="LbxResults" />
</Grid>

Right click in the button’s click event handler in the XAML and press F12 to add the handler in code and go to it. Then, add this code to the handler:

private async void SearchClick(object sender, RoutedEventArgs e)
{
    if (string.IsNullOrWhiteSpace(TxtSearch.Text))
        return;
    var queryString = WebUtility.UrlEncode(TxtSearch.Text);
    var htmlWeb = new HtmlWeb();
    var query = $"https://bing.com/search?q={queryString}&count=100";
    var doc = await htmlWeb.LoadFromWebAsync(query);
    var response = doc.DocumentNode.SelectSingleNode("//ol[@id='b_results']");
    var results = response.SelectNodes("//li[@class='b_algo']");
    if (results == null)
    {
        LbxResults.ItemsSource = null;
        return;
    }
    var searchResults = new List<SearchResult>();
    foreach (var result in results)
    {
        var refNode = result.Element("h2").Element("a");
        var url = refNode.Attributes["href"].Value;
        var text = refNode.InnerText;
        var description = result.Element("div").Element("p").InnerText;
        searchResults.Add(new SearchResult(text, url, description));
    }
    LbxResults.ItemsSource = searchResults;
}

Initially we encode the text to search to add it to the query and create the query string. Then we call the LoadFromWebAsync method to load the HTML data from the query response. When the response comes, we get the response node, from the ordered list with id b_results and extract from it the individual results. Finally, we parse each result and add it to a list of SearchResult, and assign the list to the items in the ListBox. You can note that we can finde the nodes using XPath, like in

var results = response.SelectNodes("//li[@class='b_algo']");

Or we can traverse the elements and get the text of the resulting node with something like:

var refNode = result.Element("h2").Element("a");
var url = refNode.Attributes["href"].Value;
var text = refNode.InnerText;
var description = WebUtility.HtmlDecode(
    result.Element("div").Element("p").InnerText);

SearchResult is declared as:

internal class SearchResult
{
    public string Text { get; }
    public string Url { get; }
    public string Description { get; }

    public SearchResult(string text, string url, string description)
    {
        Text = text;
        Url = url;
        Description = description;
    }
}

if you run the program, you will see something like this:

The data isn’t displayed because we haven’t defined any data template for the list items. You can define an item template like that in the XAML:

<ListBox.ItemTemplate>
    <DataTemplate>
        <StackPanel Margin="0,3">
            <TextBlock Text="{Binding Text}" FontWeight="Bold"/>
            <TextBlock >
              <Hyperlink NavigateUri="{Binding Url}" RequestNavigate="LinkNavigate">
                 <TextBlock Text="{Binding Url}"/>
              </Hyperlink>
            </TextBlock>
            <TextBlock Text="{Binding Description}" TextWrapping="Wrap"/>
        </StackPanel>
    </DataTemplate>
</ListBox.ItemTemplate>

The LinkNavigate event handler is:

private void LinkNavigate(object sender, RequestNavigateEventArgs e)
{
    System.Diagnostics.Process.Start(e.Uri.AbsoluteUri);
}

Now, when you run the program, you will get something like this:

You can click on the hyperlink and it will open a browser window with the selected page. We can even go further and add a WebBrowser to our app that will show the selected page when you click on an item. For that, you have to modify the XAML code with something like this:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <Grid.ColumnDefinitions>
        <ColumnDefinition Width="*"/>
        <ColumnDefinition Width="*"/>
    </Grid.ColumnDefinitions>
    <StackPanel Grid.Row="0" Orientation ="Horizontal" 
                Margin="5,0" VerticalAlignment="Center">
        <TextBlock Text="Search" VerticalAlignment="Center"/>
        <TextBox x:Name="TxtSearch" Width="300" Height="30" 
                 Margin="5,0" VerticalContentAlignment="Center"/>
    </StackPanel>
    <Button Grid.Row="0" HorizontalAlignment="Right" 
            Content="Search" Margin="5,0" VerticalAlignment="Center"
            Width="65" Height="30" Click="SearchClick"/>
    <ListBox Grid.Row="1" x:Name="LbxResults" 
             ScrollViewer.HorizontalScrollBarVisibility="Disabled"
             SelectionChanged="LinkChanged">
        <ListBox.ItemTemplate>
            <DataTemplate>
                <StackPanel Margin="0,3">
                    <TextBlock Text="{Binding Text}" FontWeight="Bold"/>
                    <TextBlock >
                      <Hyperlink NavigateUri="{Binding Url}" RequestNavigate="LinkNavigate">
                         <TextBlock Text="{Binding Url}"/>
                      </Hyperlink>
                    </TextBlock>
                    <TextBlock Text="{Binding Description}" TextWrapping="Wrap"/>
                </StackPanel>
            </DataTemplate>
        </ListBox.ItemTemplate>
    </ListBox>
    <WebBrowser Grid.Column="1" Grid.RowSpan="2" x:Name="WebPage"  />
</Grid>

We’ve added a second column to the window and added a WebBrwser to it, then added a SelectionChanged event to the listbox, so we can navigate to the selected page.

The SelectionChanged event handler is:

private void LinkChanged(object sender, SelectionChangedEventArgs e)
{
    if (e.AddedItems?.Count > 0)
    {
        WebPage.Navigate(((SearchResult)e.AddedItems[0]).Url);
    }
}

Now, when you run the app and click on a result, it will show the page in the WebBrowser. One thing that happened is that, sometimes a Javascript error pops up. To remove these errors, I used the solution obtained from here:

public MainWindow()
{
    InitializeComponent();
    WebPage.Navigated += (s, e) => SetSilent(WebPage, true);
}

public static void SetSilent(WebBrowser browser, bool silent)
{
    if (browser == null)
        throw new ArgumentNullException("browser");

    // get an IWebBrowser2 from the document
    IOleServiceProvider sp = browser.Document as IOleServiceProvider;
    if (sp != null)
    {
        Guid IID_IWebBrowserApp = new Guid("0002DF05-0000-0000-C000-000000000046");
        Guid IID_IWebBrowser2 = new Guid("D30C1661-CDAF-11d0-8A3E-00C04FC9E26E");

        object webBrowser;
        sp.QueryService(ref IID_IWebBrowserApp, ref IID_IWebBrowser2, out webBrowser
        if (webBrowser != null)
        {
            webBrowser.GetType().InvokeMember("Silent", 
                BindingFlags.Instance | BindingFlags.Public | 
                BindingFlags.PutDispProperty, null, webBrowser, 
                new object[] { silent });
        }
    }
}


[ComImport, Guid("6D5140C1-7436-11CE-8034-00AA006009FA"), 
    InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
private interface IOleServiceProvider
{
    [PreserveSig]
    int QueryService([In] ref Guid guidService, [In] ref Guid riid, 
        [MarshalAs(UnmanagedType.IDispatch)] out object ppvObject);
}

With this code, the Javascript errors disappear and when you run the app, you will see something like this:

As you can see, the HTML Agility Pack makes it easy to process and parse HTML Pages, allowing you to manipulate them the way you want.

The full source code for this article is in https://github.com/bsonnino/BingSearch

Some time ago I wrote a post about converting a WPF application into .NET Core. One thing that called my attention in this Build 2019 talk was that the performance for file enumerations was enhanced in the .NET core apps. So I decided to check this with my own app and see what happens in my machine.

I added some measuring data in the app, so I could see what happens there:

private async void StartClick(object sender, RoutedEventArgs e)
{
    var fbd = new WPFFolderBrowserDialog();
    if (fbd.ShowDialog() != true)
        return;
    FilesList.ItemsSource = null;
    ExtList.ItemsSource = null;
    ExtSeries.ItemsSource = null;
    AbcList.ItemsSource = null;
    AbcSeries.ItemsSource = null;
    var selectedPath = fbd.FileName;
    Int64 minSize;
    if (!Int64.TryParse(MinSizeBox.Text, out minSize))
        return;
    List<FileInfo> files = null;
    var sw = new Stopwatch();
    var timeStr = "";
    await Task.Factory.StartNew(() =>
    {
       sw.Start();
       files = GetFilesInDirectory(selectedPath).ToList();
       timeStr = $" {sw.ElapsedMilliseconds} for enumeration";
       sw.Restart();
       files = files.Where(f => f.Length >= minSize)
         .OrderByDescending(f => f.Length)
         .ToList();
       timeStr += $" {sw.ElapsedMilliseconds} for ordering and filtering";
    });
    var totalSize = files.Sum(f => f.Length);
    TotalFilesText.Text = $"# Files: {files.Count}";
    LengthFilesText.Text = $"({totalSize:N0} bytes)";
    sw.Restart();
    FilesList.ItemsSource = files;
    var extensions = files.GroupBy(f => f.Extension)
        .Select(g => new { Extension = g.Key, Quantity = g.Count(), Size = g.Sum(f => f.Length) })
        .OrderByDescending(t => t.Size).ToList();
    ExtList.ItemsSource = extensions;
    ExtSeries.ItemsSource = extensions;
    var tmp = 0.0;
    var abcData = files.Select(f =>
    {
        tmp += f.Length;
        return new { f.Name, Percent = tmp / totalSize * 100 };
    }).ToList();
    AbcList.ItemsSource = abcData;
    AbcSeries.ItemsSource = abcData.OrderBy(d => d.Percent).Select((d, i) => new { Item = i, d.Percent });
    timeStr += $"  {sw.ElapsedMilliseconds} to fill data";
    TimesText.Text = timeStr;
}

That way, I could measure two things: the time to enumerate the files and the times to sort, filter and assign the files to the lists. Then, I run the two programs, to see what happened.

The machine I’ve run is a Virtual machine with a Core I5 and 4 virtual processors and a virtualized hard disk, with 12,230 files (93.13 GB of data). The measures may vary on your machine, but the differences should be comparable. To avoid bias, I ran 3 times each program (in Admin mode), then rebooted and run the other one.

Here are the results I’ve got:

Run Enumerate Sort/Filter Assign
.NET
1 137031 96 43
2 58828 56 9
3 59474 55 8
Avg 85111 69 20
.NET Core
1 91105 120 32
2 33422 90 14
3 32907 87 20
Avg 52478 99 22

 

As you can see by the numbers, the .NET Core application improved a lot the times for file enumeration, but still lacks some effort for sorting/filtering and assigning data to the UI lists. But that’s not bad for a platform still in preview!

If you do some testingfor the performance, I’m curious to see what you’ve got, you can put your results and comments in the Comments section.