Deborah's Developer MindScape






         Tips and Techniques for Web and .NET developers.

August 18, 2009

Defining Lists of Anonymous Types

Filed under: C#,LINQ,VB.NET @ 1:44 am

Similar to my prior post here that details how to use anonymous types to display word counts, this post details how to track information on the physical lines that contain each word. Again, this specific task is more like a homework assignment than a business issue, but it does demonstrate how to work with lists of anonymous types.

[To begin with an overview of anonymous types, start here.]

First, the code needs an extension method. This extension method returns a generic List of a particular anonymous type.

In C#:

public static List<T> ListOfType<T>(this T type)
{
    return new List<T>();
}

In VB:

NOTE: This code must reside in a module.

<Extension()> _
Public Function ListOfType(Of T)(ByVal type As T) As List(Of T)
    Return New List(Of T)
End Function

Without this extension method, you would have no way to create a List<of anonymous type>. You can use this extension method any time you want to work with a list of your anonymous types.

The following code defines a list of all words in the sentence along with each physical line containing the word.

NOTE: Be sure to set a reference to System.Text.RegularExpressions.

In C#:

// Sample string
string sampleText  = @"That that is, is. 
                       That that is not, it not. 
                       Is that it? It is.";

// Convert to lower case
sampleText = sampleText.ToLower();

// Split into lines
char[] lineSeparator =new char[1] {Convert.ToChar(10)};
string[] lineArray  = sampleText.Split(lineSeparator,
                          StringSplitOptions.RemoveEmptyEntries);

// Define the anonymous type and List(of anonymous type)
var wordLines = new {Word = String.Empty, Line = 0};
var listOfWordLines = wordLines.ListOfType();

string lineText;
string[] lineWords;
string[] separators  = new string[4] {" ", ".", ",", "?"};
for (int i = 0; i < lineArray.Count(); i++)
{
    lineText = Regex.Replace(lineArray[i], @"\s+", " ");
    lineWords = lineText.Split(separators,
                          StringSplitOptions.RemoveEmptyEntries);

    // Using an anonymous type
    int lineNumber = i + 1;
    var lineQuery = from string w in lineWords.Distinct()
                select new  {Word = w, Line = lineNumber};
    listOfWordLines.AddRange(lineQuery);
}

var wordGroupQuery = from wordLine in listOfWordLines
                     group wordLine by wordLine.Word into g
                     orderby g.Key
                     select new { word = g.Key, wordGroup = g };

foreach (var g in wordGroupQuery)
{
    Debug.Write(g.word + ": ");
    foreach (var lineNumber in g.wordGroup)
        Debug.Write(lineNumber.Line + " ");
    Debug.WriteLine("");
}

In VB:

NOTE: For the VB code, be sure to also set a reference to System.Xml, System.Xml.Linq, and System.Core.

‘ Sample string
Dim sampleText As String = <string>That that is, is.
                            That that is not, it not.
                            Is that it? It is.</string>.Value

‘ Convert to lower case
sampleText = sampleText.ToLower

‘ Split into lines
Dim lineSeparator() As String = {Chr(10)}
Dim lineArray() As String = sampleText.Split(lineSeparator, _
                                StringSplitOptions.RemoveEmptyEntries)

‘ Define the anonymous type and List(of anonymous type)
Dim wordLines = New With {.Word = String.Empty, .Line = 0}
Dim listOfWordLines = wordLines.ListOfType()

Dim lineText As String
Dim lineWords() As String
Dim separators() As String = {" ", ".", ",", "?"}
For i As Integer = 0 To lineArray.Count – 1
    lineText = Regex.Replace(lineArray(i), "\s+", " ")
    lineWords = lineText.Split(separators, _
                               
StringSplitOptions.RemoveEmptyEntries)

    ‘ Using an anonymous type
    Dim lineNumber As Integer = i + 1
    Dim lineQuery = From w As String In lineWords.Distinct _
                Select New With {.Word = w, .Line = lineNumber}
    listOfWordLines.AddRange(lineQuery)
Next

Dim wordGroupQuery = From wordLine In listOfWordLines _
          Group wordLine By currentWord = wordLine.Word Into Group _
          Select word = currentWord, wordGroup = Group _
          Order By Word

For Each g In wordGroupQuery
    Debug.Write(g.word & ": ")
    For Each lineNumber In g.wordGroup
        Debug.Write(lineNumber.Line & " ")
    Next
    Debug.WriteLine("")
Next

This code first builds a sample string. The C# code uses a verbatim string literal (@) to ensure that the string is interpreted verbatim. In VB, the code uses the XML literals feature new in .Net 3.5 to build a sample string.

The code converts the string to lower case so that the routine treats “The” and “the” as the same word. It then splits the string into its physical lines. The lineArray contains the text of each physical line in the string.

The next two lines of code define the anonymous type and the List of the anonymous type. You can define any desired properties of an anonymous type by adding them within the { }, separated by commas. In this example, two properties are defined: Word and Line. The Word property is the word from the string. The Line property is the number of the line containing the word. The ListOfType extension method creates a generic List of this type.

The code then loops through the lineArray. It first removes excess spaces from the line’s text, and any other white-space characters using a Regular Expression. It then splits the line of text into an array of words. If your string includes other punctuation marks, you will need to add them to the separators array.

The code uses LINQ to find the unique set of words. The Distinct method ensures that only unique words in the line are processed.

The select new syntax defines an anonymous type that is the same as the anonymous type defined earlier. This type defines the the set of words and line numbers for each line in the lineArray. The AddRange method of the List is used to append the words for each line into a single list.

At this point, the list of words and their physical line numbers is complete. The remaining code organizes the list for display. The code uses a Group By query to group the list by word. The final for/each loop then uses the groups to first display the word, then the list of line numbers that contain the word.

The result is:

is: 1 2 3
it: 2 3
not: 2
that: 1 2 3

Enjoy!

1 Comment

  1.   Flávio dos Santos — May 20, 2010 @ 8:11 am    Reply

    Great job !!!
    thanks a lot !

RSS feed for comments on this post. TrackBack URI

Leave a comment

© 2020 Deborah's Developer MindScape   Provided by WPMU DEV -The WordPress Experts   Hosted by Microsoft MVPs