Deborah's Developer MindScape

         Tips and Techniques for Web and .NET developers.

August 17, 2009

Counting Words in a String Using Anonymous Types

Filed under: C#,Lambda Expressions,LINQ,VB.NET @ 1:10 pm

This may be more of a homework assignment for a programming class than something you would do in your applications, but it is a good example of using anonymous types, which are new in .Net 3.5 in both VB and C#.

[To begin with an overview of anonymous types, start here.]

NOTE: Be sure to set a reference to System.Text.RegularExpressions.

In C#:

// Sample string
string sampleText  = @"That that is, is. 
                       That that is not, it not. 
                       Is that it? It is.";

// Convert to lower case and convert double-spaces to a single space
sampleText = sampleText.ToLower();
sampleText = Regex.Replace(sampleText, @"\s+", " ");
string[] separators  = new string[4] {" ", ".", ",", "?"};
string[] wordArray = sampleText.Split(separators,

// Sort the result

// Using an anonymous type
var query = from string w  in wordArray.Distinct()
            select new {Word = w, 
        Count = wordArray.Count(wordToCount => wordToCount == w)};
foreach (var item in query)
    Debug.WriteLine(item.Word + ": " + item.Count);

In VB:

NOTE: For the VB code, be sure to also set a reference to System.Xml, System.Xml.Linq, and System.Core.

‘ Sample string
Dim sampleText As String = <string>That that is, is.
                            That that is not, it not.
                            Is that it? It is.</string>.Value

‘ Convert to lower case and convert double-spaces to a single space
sampleText = sampleText.ToLower
sampleText = Regex.Replace(sampleText, "\s+", " ")
Dim separators() As String = {" ", ".", ",", "?"}
Dim wordArray() As String = sampleText.Split(separators,  _

‘ Sort the result

‘ Using an anonymous type
Dim query = From w As String In wordArray.Distinct _
            Select New With {.Word = w, _
      .Count = wordArray.Count(Function(wordToCount) wordToCount = w)}
For Each item In query 
    Debug.WriteLine(item.Word & ": " & item.Count)

This code first builds a sample string. (Anyone recognize what movie this string came from?)

The C# code uses a verbatim string literal (@) to ensure that the string is interpreted verbatim. In VB, the code uses the XML literals feature new in .Net 3.5 to build a sample string.

The code converts the string to lower case so that the word count counts “The” and “the” as the same word. It then removes excess spaces, linefeeds, and other white-space characters.

It uses the string Split method to convert the string to an array of words and then sorts the words. If your string includes other punctuation marks, you will need to add them to the separators array.

The code uses LINQ to find the unique set of words and their counts. The District method is used to process only unique words from the list of words. This prevents duplicate words in the list.

The select new syntax defines an anonymous type to build a type comprised of the word itself and its count. You can define any desired properties of an anonymous type by adding them within the { }, separated by commas. In this example, two properties are defined: Word and Count. The Word property is the unique word. The Count property is the count of those words within the list. The Count property uses a Lambda expression to count the words.

Each item of the anonymous type is then displayed to the Debug window as follows:

is: 5
it: 3
not: 2
that: 5

This lists the word and the number of times it occurs in the string.


P.S. (Edited 8/19/09) Though it does not demonstrate anonymous types, Eric Smith provided a *very* concise technique for counting words in a string using regular expressions and lambda expressions (see Comments below). I updated the code slightly to include the OrderBy and I provided the VB version of the code:

In C#:

foreach (var g in Regex.Matches(sampleText.ToLower(), @"\w+")
                    .GroupBy(m => m.Value)
                    .OrderBy(m => m.Key))
    Debug.WriteLine(g.Key + ": " + g.Count());

In VB:

For Each g In Regex.Matches(sampleText.ToLower(), "\w+") _
                .Cast(Of Match)() _
                .GroupBy(Function(m) m.Value) _
                .OrderBy(Function(m) m.Key)
    Debug.WriteLine(g.Key & ": " & g.Count())


  1.   Eric Smith — August 18, 2009 @ 1:30 pm    Reply

    Or simply:

    foreach (var g in Regex.Matches(sampleText.ToLower(),@”\w+”)
    .GroupBy(m => m.Value))
    Debug.WriteLine(g.Key + “: ” + g.Count());

    Sorry, no VB.NET.

  2.   Joacim — August 23, 2009 @ 7:07 pm    Reply

    The phrase in the sample string comes from the movie “Flowers for Algernon”.

  3.   DeborahK — August 25, 2009 @ 2:19 pm    Reply

    Hi Eric –

    Thanks for the suggestion. See the update at the end of my blog. I added the OrderBy and the associated VB code.

    Thanks again!

  4.   DeborahK — August 25, 2009 @ 2:20 pm    Reply

    Hi Joacim –

    You are correct!

    Thanks for visiting my blog!

  5.   Bill Hogsett — December 26, 2009 @ 12:18 pm    Reply

    I made a couple of refinements to Eric Smith’s concise code.

    This will sort the list by number of uses of a word rather than in alphabetical order and then sort by aphabetical order:

    For Each g In Regex.Matches(ioLines.ToLower(), “\w+”) _
                   .Cast(Of Match)() _
                   .GroupBy(Function(m) m.Value) _
               .OrderByDescending(Function(m) m.Count).ThenBy(Function(m) m.Key)
               aResult = g.Key & ControlChars.Tab & ControlChars.Tab & g.Count & ControlChars.CrLf
               allResults = allResults & ControlChars.CrLf & aResult

    In my application I want to give the user the option on how the results are sorted.


  6.   Rosie — June 29, 2011 @ 5:38 pm    Reply

    What a joy to find such clear tihkinng. Thanks for posting!

RSS feed for comments on this post. TrackBack URI

Leave a comment

© 2022 Deborah's Developer MindScape   Provided by WPMU DEV -The WordPress Experts   Hosted by Microsoft MVPs