Deborah's Developer MindScape






         Tips and Techniques for Web and .NET developers.

Archive for Text Files

September 25, 2009

Count Lines Of Code

Filed under: C#,Text Files,VB.NET @ 7:34 pm

The number of lines of code is not a very useful metric for evaluating good code or a good developer. However, it is sometimes useful to know how much code you have.

For example, I recently worked on a project converting a VB 6 application to .NET. When we were finished, it was interesting to see how we had reduced the number of lines of code by 50% and yet added functionality.

There are tools available to help you get counts. But if you just want a quick and dirty technique, here is some code.

NOTE: Be sure to import System.IO.

In C#:

public class LineCount
{
    private Dictionary<string, int> _FilesInProject =
                                      new Dictionary<string, int>();
    public Dictionary<string, int> FilesInProject
    {
        get { return _FilesInProject; }
    }

    public int GetLineCount(string directoryName,
                            string[] fileExtensions)
    {
        int LineCount = 0;
        int TotalLineCount = 0;

        DirectoryInfo topDirectoryInfo =
                                new DirectoryInfo(directoryName);

        // Process the directory and all subdirectories
        foreach (DirectoryInfo dir in
                                topDirectoryInfo.GetDirectories())
        {

            // Loop the file types
            foreach (string fileExtension in fileExtensions)
            {

                foreach (FileInfo file in dir.GetFiles(fileExtension))
                {
                    // open files for streamreader
                    StreamReader sr = new StreamReader(file.FullName);
                    //loop until the end
                    while ((sr.ReadLine() != null))
                    {
                        LineCount += 1;
                    }
                    //close the streamreader
                    sr.Close();

                    // add the file name to the list
                    FilesInProject.Add(file.FullName, LineCount);

                    // Handle the line counting
                    TotalLineCount += LineCount;
                    LineCount = 0;
                }
            }
        }

        return TotalLineCount;
    }
}

In VB:

Public Class LineCount

    Private _FilesInProject As New Dictionary(Of String, Integer)
    Public ReadOnly Property FilesInProject() _
                                As Dictionary(Of String, Integer)
        Get
            Return _FilesInProject
        End Get
    End Property

    Public Function GetLineCount(ByVal directoryName As String, _
                        ByVal fileExtensions() As String) As Integer
        Dim LineCount As Integer = 0
        Dim TotalLineCount As Integer = 0

        Dim topDirectoryInfo As New DirectoryInfo(directoryName)

        ‘ Process the directory and all subdirectories
        For Each dir As DirectoryInfo In _
                                topDirectoryInfo.GetDirectories

            ‘ Loop the file types
            For Each fileExtension As String In fileExtensions

                For Each file As FileInfo In _
                                   dir.GetFiles(fileExtension)
                    ‘ open files for streamreader
                    Dim sr As New StreamReader(file.FullName)
                    ‘loop until the end
                    While Not (sr.ReadLine() Is Nothing)
                        LineCount += 1
                    End While
                    ‘close the streamreader
                    sr.Close()

                    ‘ add the file name to the list
                    FilesInProject.Add(file.FullName, LineCount)

                    ‘ Handle the line counting
                    TotalLineCount += LineCount
                    LineCount = 0
                Next
            Next
        Next

        Return TotalLineCount
    End Function
End Class

This code counts the number of lines in each file and retains a total count. It stores the names of the files and the line count for each file in a dictionary. It returns the total count as the return value from the function. If you only need the total count, you can remove the Dictionary.

The LineCount class has one read-only property (FilesInProject) that defines the files within a specified directory along with each files line count. The file name is stored as the key and the file count as the value in each dictionary entry.

The GetLineCount method takes a directory name and a set of file extensions. All files in all directories and subdirectories with any of the defined extensions will be found and counted.

The code uses the DirectoryInfo class to loop through the directories. It then uses the FileInfo class to get the files with the defined extensions.

For each found file, it uses a StreamReader to read the file and count the lines.

You can make this method fancier by adding additional code before counting the line. For example, you could skip blank lines, or lines with comments, or lines with just { or }. This simple case just counts all of the lines.

The code stores each found file along with its line count into the dictionary. It also accumulates the total line count.

When it is finished, it returns the total line count.

You can use this class like this:

In C#:

string[] fileExtensions  = {"*.cs"};
string directoryName  = @"C:\Tools\SampleApplication";

LineCount lc = new LineCount();
int totalLines  =
    lc.GetLineCount(directoryName, fileExtensions);

foreach (KeyValuePair<string,int> f in lc.FilesInProject)
    Debug.WriteLine(f.Key + ": " + f.Value);
Debug.WriteLine("Total: " + totalLines);

In VB:

Dim fileExtensions() As String = {"*.vb"}
Dim directoryName As String = "C:\Tools\SampleApplication"

Dim lc As New LineCount
Dim totalLines As Integer = _
    lc.GetLineCount(directoryName, fileExtensions)

For Each f In lc.FilesInProject
    Debug.WriteLine(f.Key & ": " & f.Value)
Next
Debug.WriteLine("Total: " & totalLines)

This code sets up an array of file extensions. In this example, only one extension is included in the array. The code then sets up the directory to search.

The code then calls the GetLineCount and then loops through the resulting Dictionary to list the found files and their line counts.

Enjoy!

August 25, 2009

Reading Fixed Length Files: TextFieldParser

Filed under: Text Files,VB.NET @ 5:05 pm

In my prior post, I covered how to read a fixed length file into an in-memory DataTable. You could then work with the DataTable as desired to access the fields from the file. You could even bind the resulting DataTable to a grid or other control.

But sometimes you just need to process the file line-by-line and need a simpler solution. That is where the TextFieldParser class comes in.

The TextFieldParser works with any file extension and with any character set, so you can use it with UTF-8 or ANSI text files.

For this example, the text file is as follows:

000001  Baggins             Bilbo     20090811
000002  Baggins             Frodo     20090801
000003  Gamgee              Samwise   20090820
000004  Cotton              Rosie     20090821

NOTE: Be sure to set a reference to Microsoft.VisualBasic.FileIO

Since this class is part of the Visual Basic namespace, only the VB code is presented here. However, you can import this namespace in a C# program and use it.

Dim fileName As String = "testFixed.txt"
Dim dirName As String = _
     Path.GetDirectoryName(Application.ExecutablePath)

Using tf As New TextFieldParser _
            (Path.Combine(dirName, fileName))
 
   tf.TextFieldType = FileIO.FieldType.FixedWidth
    tf.SetFieldWidths(6, 22, 10, 8)

    Dim row As String()
    While Not tf.EndOfData
        Try
            row = tf.ReadFields()
            For Each field As String In row
                ‘ Do whatever to the set of fields
                Debug.WriteLine(field)
            Next

        Catch ex As MalformedLineException
            MessageBox.Show("Line " & ex.Message & _
            "is not valid and will be skipped.")
        End Try
    End While
End Using

This code first declares variables to hold the text file name and the directory name. The file can reside in any directory that the user can access. In this example, the file resides in the same directory where the application is executed. But this is not a requirement.

The first using statement in the example code creates an instance of the TextFieldParser class. The parameter to the constructor defines the directory and filename of the text file. The Path.Combine method is used to ensure that the correct slashes are added to the end of the directory name as required.

The code then sets two additional properties. The first is TextFieldType. Since this is a fixed width file, the FixedWidth type is specified. The second is SetFieldWidths. This defines the widths of each column in the text file.

The While loop processes each line of the file. The ReadFields method reads each row of the file into a string array. If there is a problem reading the line, the Try/Catch block catches the error, displays a message and continues. (However, in a real application you would want to log this information instead of displaying it to the user.)

The For/Next statement loops through the fields in the row and displays them. This is where you would add the code that processes the values from the text file.

Notice that as you go through each loop, you replace the row variable with the field from the next line. So after a row is processed, you cannot access the data from that row again. (If you do need more random access to the data, you can store each row in a list or other structure, or you can read the data into a DataTable using the technique detailed here.)

The resulting information in the Debug window is as follows:

000001
Baggins
Bilbo
20090811
000002
Baggins
Frodo
20090801
000003
Gamgee
Samwise
20090820
000004
Cotton
Rosie
20090821

Enjoy!

Reading Comma Delimited Files: TextFieldParser

Filed under: Text Files,VB.NET @ 4:58 pm

In my prior post, I covered how to read a comma delimited file into an in-memory DataTable. You could then work with the DataTable as desired to access the fields from the file. You could even bind the resulting DataTable to a grid or other control.

But sometimes you just need to process the file line-by-line and need a simpler solution. That is where the TextFieldParser class comes in.

The TextFieldParser works with any file extension and with any character set, so you can use it with UTF-8 or ANSI text files.

For this example, the text file is as follows:

1,  Baggins, Bilbo, 20090811 
2,  Baggins, Frodo, 20090801 
3,  Gamgee,  Samwise, 20090820 
4,  Cotton,  Rosie, 20090821

NOTE: Be sure to set a reference to Microsoft.VisualBasic.FileIO

Since this class is part of the Visual Basic namespace, only the VB code is presented here. However, you can import this namespace in a C# program and use it.

Dim fileName As String = "testCSV.txt"
Dim dirName As String = _
     Path.GetDirectoryName(Application.ExecutablePath)

Using tf As New TextFieldParser _
            (Path.Combine(dirName, fileName))
    tf.TextFieldType = FileIO.FieldType.Delimited
    tf.SetDelimiters(",")

    Dim row As String()
    While Not tf.EndOfData
        Try
            row = tf.ReadFields()
            For Each field As String In row
                ‘ Do whatever to the set of fields
                Debug.WriteLine(field)
            Next

        Catch ex As MalformedLineException
            MessageBox.Show("Line " & ex.Message & _
            "is not valid and will be skipped.")
        End Try
    End While
End Using

This code first declares variables to hold the text file name and the directory name. The file can reside in any directory that the user can access. In this example, the file resides in the same directory where the application is executed. But this is not a requirement.

The first using statement in the example code creates an instance of the TextFieldParser class. The parameter to the constructor defines the directory and filename of the text file. The Path.Combine method is used to ensure that the correct slashes are added to the end of the directory name as required.

The code then sets two additional properties. The first is TextFieldType. Since this is a delimited file, the Delimited type is specified. The second is SetDelimiters. This defines the delimiter used for the file. This example uses a comma separated value (CSV) file, so a comma is specified.

The While loop processes each line of the file. The ReadFields method reads each row of the file into a string array. If there is a problem reading the line, the Try/Catch block catches the error, displays a message and continues. (However, in a real application you would want to log this information instead of displaying it to the user.)

The For/Next statement loops through the fields in the row and displays them. This is where you would add the code that processes the values from the text file.

Notice that as you go through each loop, you replace the row variable with the field from the next line. So after a row is processed, you cannot access the data from that row again. (If you do need more random access to the data, you can store each row in a list or other structure, or you can read the data into a DataTable using the technique detailed here.)

The resulting information in the Debug window is as follows:

1
Baggins
Bilbo
20090811
2
Baggins
Frodo
20090801
3
Gamgee
Samwise
20090820
4
Cotton
Rosie
20090821

Enjoy!

Reading Fixed Length Files

Filed under: C#,Data,Text Files,VB.NET @ 3:34 pm

There may be times that you need to read fixed length files into your application. For example, you obtain output from a legacy system or other application in a fixed length text file format, and you need to read and use that data in your application.

NOTE: For more information on fixed length files, see this link.

.NET provides several techniques for reading text files. This post focuses on how to read a fixed length text file into a DataTable.

You may find it very useful to read your text file into a DataTable, whether or not you plan to use a database. Reading a text file into a DataTable not only saves you a significant amount of string manipulation coding, it also makes it easy to access the imported data from within your application.

For example, you can use binding to bind the resulting DataTable to a grid or other controls. You can use Linq to DataTables like in this example to manipulate the resulting data. All of the features of the DataTable are then available to you.

BIG NOTE: Many developers have ignored this technique because one look at the code and the developer assumed it is somehow associated with a database, it is NOT. This is referring to in-memory DataTable objects.

For this example, the text file appears as follows:

000001  Baggins             Bilbo     20090811
000002  Baggins             Frodo     20090801
000003  Gamgee              Samwise   20090820
000004  Cotton              Rosie     20090821

Notice several things about this file:

  1. The columns are a fixed width.
  2. There is no header row that provides the column names. You could add column headers here if desired.

The first step in reading the file is to define a schema.ini file that defines the column widths. The file must follow these specifications:

  • The file must be called schema.ini.
  • The file must exist in the same directory as the text file.
  • The file must be in ANSI format. (See the note at the bottom of this post for information on saving a file to ANSI format.)

The contents of the schema.ini file for the example above is shown below:

[testFixed.txt]
ColNameHeader=False
Format=FixedLength
DateTimeFormat=yyyymmdd
Col1=CustomerId Text Width 6
Col2=LastName Text Width 22
Col3=FirstName Text Width 10
Col4=LastUpdateDate DateTime Width 8

The first line of the file is always the name of the associated text file enclosed in square brackets ([ ]).

The next set of lines define basic attributes of the text file:

  • ColNameHeader: In this case, there is no column header in the text file, so this property is set to false. The system will assume that the first line of the text file is the header unless you specify otherwise.
  • Format: In this case, the format is FixedLength. The system will assume comma delimited unless you specify otherwise.
  • DateTimeFormat: If you have a date in your file, you can specify the format here.

The last set of lines defines each column in the text file. The format of these lines are as follows:

Colx=ColumnName ColumnType Width ColumnWidth

See this link for more information on the contents of the schema.ini file.

You can then read the file using the following code.

In C#:

string fileName = "testFixed.txt";
string dirName = Path.GetDirectoryName(Application.ExecutablePath);
DataTable dt;

using (OleDbConnection cn =
    new OleDbConnection(@"Provider=Microsoft.Jet.OleDb.4.0;" +
            "Data Source=" + dirName + ";" +
            "Extended Properties=\"Text;\""))
{
    // Open the connection
    cn.Open();

    // Set up the adapter
    using (OleDbDataAdapter adapter =
        new OleDbDataAdapter("SELECT * FROM " + fileName, cn))
    {
        dt = new DataTable("Customer");
        adapter.Fill(dt);
    }
}

In VB:

Dim fileName As String = "testCSV.txt"
Dim dirName As String = _
            Path.GetDirectoryName(Application.ExecutablePath)
Dim dt As DataTable

Using cn As New OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0;" & _
            "Data Source=" & dirName & ";" & _
            "Extended Properties=""Text;""")
    ‘ Open the connection
    cn.Open()

    ‘ Set up the adapter
    Using adapter As New OleDbDataAdapter( _
            "SELECT * FROM " & fileName, cn)
        dt = New DataTable("Customer")
        adapter.Fill(dt)
    End Using
End Using

This code starts by declaring variables to hold the text file name, directory containing the file and the resulting DataTable.

This technique only works with a standard set of file name extensions (see the NOTE at the end of this post). The file can reside in any directory. In this example, the file resides in the same directory where the application is executed. But this is not a requirement.

The first using statement in the example code sets up the connection string for connecting to the directory. It sets the Provider property to use the Microsoft.Jet.OleDb provider. The Data Source property defines the directory containing the text file. The Extended Properties define that the file will be Text ("Text"). The Extended Properties must be within quotes, so double-quotes (VB) or slash quote (C#) are used to escape the quotes.

If a schema.ini file exists in the directory defined as the data source and has a bracketed entry with the text file name, that .ini file is used to determine any other extended properties. So no other extended properties are defined in the connection string itself.

The code then opens the connection, thereby opening the file and the associated schema.ini file. Since this code is in a using statement, the files are automatically closed at the end of the using block.

The second using statement sets of the DataAdapter by defining a Select statement and the open connection. The Select statement selects all of the information from a specific file as defined by the fileName variable.

The code then creates the DataTable, giving the table a name. In this example, the table name is "Customer".

Finally, it uses the Fill method of the TableAdapter to read the data from the text file into the DataTable.

Using the technique detailed here, you can view the resulting DataTable. The column headings were defined by the header in the text file. If you don’t have a header, the columns will be giving a default name.

image

Note how the date in the above screen shot appears as a standard date column.

You can then access the data in the table as you access any other DataTable. For example:

In C#:

foreach (DataRow dr in dt.Rows)
{
    Debug.Print("{0}: {1}, {2} LastUpdated: {3}",
                dr["CustomerId"],
                dr["LastName"],
                dr["FirstName"],
                dr["LastUpdateDate"]);

}

In VB:

For Each dr As DataRow In dt.Rows
    Debug.Print("{0}: {1}, {2} LastUpdated: {3}", _
                dr("CustomerId"), _
                dr("LastName"), _
                dr("FirstName"), _
                dr("LastUpdateDate"))
Next

NOTE:

By default, this technique only works with .txt, .csv, .tab, and .asc file extensions. If your file name has a different extension, you can either change the extension in your code before reading the file, or you can update the Extensions key in following registry setting:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text

NOTE:

By default, this technique assumes you are working with ANSI text files. If that is not the case, you can update the CharacterSet key in the same registry setting:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text

Though this is not recommended.

VERY IMPORTANT NOTE:

If you test this sample code by creating a text file with Visual Studio, the resulting text file will be in UTF-8 format. You need to save the file into ANSI format. The easiest way I found to do this is detailed below.

Adding a Text File to your Project:

  1. Right-click on your project in Visual Studio.
  2. Select Add | New Item from the context menu.
  3. Pick Text File from the available templates and click Add.
  4. Type in the data for the test file or paste in the text from the example at the top of this post.
  5. Save the file within Visual Studio. This creates a UTF-8 formatted file.
  6. If you plan to use the directory of the executing application, set the Copy to Output Directory to Copy always in the properties window for the file.

Converting the resulting UTF-8 file to ANSI format:

  1. Right-click on the file and select Open With
  2. Select Notepad.
  3. Select File | Save As.
  4. Set the Encoding to ANSI and click Save.
Enjoy!

Reading Comma Delimited Files

Filed under: C#,Data,Text Files,VB.NET @ 2:14 pm

There may be times that you need to read comma separated value (CSV) files into your application. For example, you obtain output from a legacy system or other application in a comma delimited text file format, and you need to read and use that data in your application.

NOTE: For more information on delimited files, see this link.

.NET provides several techniques for reading text files. This post focuses on how to read a comma delimited text file into a DataTable.

You may find it very useful to read your text file into a DataTable, whether or not you plan to use a database. Reading a text file into a DataTable not only saves you a significant amount of string manipulation coding, it also makes it easy to access the imported data from within your application.

For example, you can use binding to bind the resulting DataTable to a grid or other controls. You can use Linq to DataTables like in this example to manipulate the resulting data. All of the features of the DataTable are then available to you.

BIG NOTE: Many developers have ignored this technique because one look at the code and the developer assumed it is somehow associated with a database, it is NOT. This is referring to in-memory DataTable objects.

For this example, the text file appears as follows:

CustomerId, LastName, FirstName, LastUpdateDate
1,  Baggins, Bilbo, 20090811 
2,  Baggins, Frodo, 20090801 
3,  Gamgee,  Samwise, 20090820 
4,  Cotton,  Rosie, 20090821

Notice several things about this file:

  1. It is a comma separated value (CSV) file.
  2. The first line provides the column names. This is optional.

You can read this text file into a DataTable using OleDb as follows.

In C#:

string fileName = "testCSV.txt";
string dirName = Path.GetDirectoryName(Application.ExecutablePath);
DataTable dt;

using (OleDbConnection cn =
    new OleDbConnection(@"Provider=Microsoft.Jet.OleDb.4.0;" +
            "Data Source=" + dirName + ";" +
            "Extended Properties=\"Text;HDR=Yes;FMT=Delimited\""))
{
    // Open the connection
    cn.Open();

    // Set up the adapter
    using (OleDbDataAdapter adapter =
        new OleDbDataAdapter("SELECT * FROM " + fileName, cn))
    {
        dt = new DataTable("Customer");
        adapter.Fill(dt);
    }
}

In VB:

Dim fileName As String = "testCSV.txt"
Dim dirName As String = _
            Path.GetDirectoryName(Application.ExecutablePath)
Dim dt As DataTable

Using cn As New OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0;" & _
            "Data Source=" & dirName & ";" & _
            "Extended Properties=""Text;HDR=Yes;FMT=Delimited""")
    ‘ Open the connection
    cn.Open()

    ‘ Set up the adapter
    Using adapter As New OleDbDataAdapter( _
            "SELECT * FROM " & fileName, cn)
        dt = New DataTable("Customer")
        adapter.Fill(dt)
    End Using
End Using

This code starts by declaring variables to hold the text file name, directory containing the file and the resulting DataTable.

This technique only works with a standard set of file name extensions (see the NOTE at the end of this post). The file can reside in any directory. In this example, the file resides in the same directory where the application is executed. But this is not a requirement.

The first using statement in the example code sets up the connection to the directory. It sets the Provider property to use the Microsoft.Jet.OleDb provider. The Data Source property defines the directory containing the text file. The Extended Properties define that the file will be Text ("Text"), it has a header (HDR=Yes), and it is in a delimited file format (FRM=Delimited). The Extended Properties must be within quotes, so double-quotes (VB) or slash quote (C#) are used to escape the included quotes.

The code then opens the connection, thereby opening the file. Since this code is in a using statement, the file is automatically closed at the end of the using block.

The second using statement sets of the DataAdapter by defining a Select statement and the open connection. The Select statement selects all of the information from a specific file as defined by the fileName variable.

The code then creates the DataTable, giving the table a name. In this example, the table name is "Customer".

Finally, it uses the Fill method of the TableAdapter to read the data from the text file into the DataTable.

Using the technique detailed here, you can view the resulting DataTable. The column headings were defined by the header in the text file. If you don’t have a header, the columns will be giving a default name.

image

You can then access the data in the table as you access any other DataTable. For example:

In C#:

foreach (DataRow dr in dt.Rows)
{
    Debug.Print("{0}: {1}, {2} LastUpdated: {3}",
                dr["CustomerId"],
                dr["LastName"],
                dr["FirstName"],
                dr["LastUpdateDate"]);

}

In VB:

For Each dr As DataRow In dt.Rows
    Debug.Print("{0}: {1}, {2} LastUpdated: {3}", _
                dr("CustomerId"), _
                dr("LastName"), _
                dr("FirstName"), _
                dr("LastUpdateDate"))
Next

NOTE:

By default, this technique only works with .txt, .csv, .tab, and .asc file extensions. If your file name has a different extension, you can either change the extension in your code before reading the file, or you can update the Extensions key in following registry setting:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text

NOTE:

By default, this technique assumes you are working with ANSI text files. If that is not the case, you can update the CharacterSet key in the same registry setting:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text

Though this is not recommended.

VERY IMPORTANT NOTE:

If you test this sample code by creating a text file with Visual Studio, the resulting text file will be in UTF-8 format. You need to save the file into ANSI format. The easiest way I found to do this is detailed below.

Adding a Text File to your Project:

  1. Right-click on your project in Visual Studio.
  2. Select Add | New Item from the context menu.
  3. Pick Text File from the available templates and click Add.
  4. Type in the data for the test file or paste in the text from the example at the top of this post.
  5. Save the file within Visual Studio. This creates a UTF-8 formatted file.
  6. If you plan to use the directory of the executing application, set the Copy to Output Directory to Copy always in the properties window for the file.

Converting the resulting UTF-8 file to ANSI format:

  1. Right-click on the file and select Open With
  2. Select Notepad.
  3. Select File | Save As.
  4. Set the Encoding to ANSI and click Save.
Enjoy!

Text Files

Filed under: C#,Text Files,VB.NET,XML @ 1:09 pm

There are many different types of text files that you may need to process in your applications. Some of the more common types are described in this post.

Delimited files

Delimited files separate the fields of the file with some type of a delimiter. The most common delimited files are comma separated value (CSV) files, as shown below.

1,  Baggins, Bilbo, 20090811
2,  Baggins, Frodo, 20090801
3,  Gamgee,  Samwise, 20090820
4,  Cotton,  Rosie, 20090821

Other field delimiters, such as tabs or semicolons, can also be used.

In most cases, delimited files define the end of each record with a CrLf (carriage return/linefeed). So the example above contains four records, one for each line of data.

If you need to read the data from a delimited file into your application, there are features in .NET to assist you.

You can read the file directly using the file classes within .NET and then process the columns using String functions.

Alternatively, you can read the file into a DataTable, even if you have no plans to use this data in a database. This allows you to work with the data as a set of rows and columns instead of using a large number of string manipulation functions. See this link for an example of reading a CSV text file into an in-memory DataTable.

Another option is to read the file using VB’s TextFieldParser class. See this link for an example.

Fixed length files

Fixed length files define data within specific columns. In the example below, the customer Id is left-justified in the first 8 columns, the last name is left-justified in the next 20 columns, the first name is left-justified in the next 10 columns, and the last edit date is left justified in the last 10 columns.

000001  Baggins             Bilbo     20090811
000002  Baggins             Frodo     20090801
000003  Gamgee              Samwise   20090820
000004  Cotton              Rosie     20090821

As with delimited files, the end of each record is normally defined with a CrLf (carriage return/linefeed). So the example above contains four records, one for each line of data.

This style of text file is often used by legacy systems, especially mainframe systems.

The StringBuilder class in the .NET framework provides easy to use formatting features that help you write code to output fixed length files. See this link for an example.

If you need to read the data from a delimited file into your application, there are features in .NET to assist you.

You can read the file directly using the file classes within .NET and then process the columns using String functions.

Alternatively, you can read the file into a DataTable, even if you have no plans to use this data in a database. This allows you to work with the data as a set of rows and columns instead of using a large number of string manipulation functions. See this link for an example of reading a fixed length file into an in-memory DataTable.

Another option is to read the file using VB’s TextFieldParser class. See this link for an example.

XML files

XML (eXtensible Markup Language) files follow the XML specification to define hierarchical data. You use XML elements, defined with start (< >) and end (</ >) tags and XML attributes, define with name/value pairs within a start element tag, to define the structure and content of a file.

For example, this XML file defines a set of customers, each delimited within customer tags. Each field defined for the customer is defined in an element within the customer open and close tags. In the example below, each customer has a CustomerId, LastName, FirstName, and LastUpdateData.

<customers>
  <customer>
    <CustomerId>1</CustomerId>
    <LastName>Baggins</LastName>
    <FirstName>Billbo</FirstName>
    <LastUpdateDate>20090811</LastUpdateDate>
  </customer>
  <customer>
    <CustomerId>2</CustomerId>
    <LastName>Baggins</LastName>
    <FirstName>Frodo</FirstName>
    <LastUpdateDate>20090801</LastUpdateDate>
  </customer>
  <customer>
    <CustomerId>3</CustomerId>
    <LastName>Gamgee</LastName>
    <FirstName>Samwise</FirstName>
    <LastUpdateDate>20090820</LastUpdateDate>
  </customer>
  <customer>
    <CustomerId>4</CustomerId>
    <LastName>Cotton</LastName>
    <FirstName>Rosie</FirstName>
    <LastUpdateDate>20090821</LastUpdateDate>
  </customer>
</customers>

If you have any choice on the type of text file to use for your application, an XML file is the best choice. XML has become the standard for defining data structures due to its simplicity, clarity, self-descriptiveness, and consistency. There are great tools for reading and writing XML files available in the .NET framework. Visual Basic has additional features, called XML literals, that make working with XML files a breeze. See this link for other posting on working with XML files.

Sometimes, however, you have no choice. You have to read a text file that is not in an XML structure. For example, you may need to read files generated by legacy systems or by other software. In that case, you need to use delimited or fixed length files.

July 21, 2009

Formatting Text Files

Filed under: C#,Text Files,VB.NET @ 3:16 pm

There are often times that you need to write out text files containing data managed by your application. For example, you may need to write out a file to be read by another system. In many cases, this file needs to have a particular format, with justified or aligned columns like this:

000001  Baggins             Bilbo     20090711 
000002  Baggins             Frodo     20090701 
000003  Gamgee              Samwise   20090720 
000004  Cotton              Rosie     20090721
 

The first column is the Id, padded with 0’s. The remaining columns are the last name, first name, and date of last edit in YYYYMMDD format.

NOTE: If you can format your text file as XML instead, see this.

The StringBuilder class provides a method that makes this formatting a snap.

In C#:

using System.IO;
using System.Text;

StringBuilder sb = new StringBuilder();

foreach (Customer c in Customers.Retrieve())
{
    sb.AppendFormat("{0,-8}{1,-20}{2,-10}{3,-10:yyyyMMdd}{4}",
                    c.CustomerId.ToString().PadLeft(6, ‘0’),
                    c.LastName,
                    c.FirstName,
                    c.LastEditDate,
                    Environment.NewLine);
}

File.WriteAllText("Test.txt", sb.ToString());

In VB:

Dim sb As New System.Text.StringBuilder

For Each c As Customer In custList
    sb.AppendFormat("{0,-8}{1,-20}{2,-10}{3,-10:yyyyMMdd}{4}", _
                    c.CustomerId.ToString.PadLeft(6, "0"c), _
                    c.LastName, _
                    c.FirstName, _
                    c.LastEditDate, _
                    Environment.NewLine)
Next

My.Computer.FileSystem.WriteAllText("Test.txt", sb.ToString, False)

The code starts by creating an instance of the StringBuilder class. It then loops through each customer in the list of customers. See this for the code that creates the customer list. NOTE: A LastEditDate was added to the Customer class (not shown) in order to demonstrate date formatting.

You do not have to have business objects to use this code. You could instead loop through a DataSet, DataTable, or DataReader to get the data for your file.

The key to formatting the data into columns is the AppendFormat method of the StringBuilder. This method uses the composite formatting features of the .NET framework to build the string. In this case, the formatting parameter is a set of indexed placeholders, called format items, that look like this:

{index, length:formatString}

The index matches the formatting string with the parameter that follows. For example, index 0 is the first parameter after the format string, which is the CustomerId. Index 1 is the second parameter after the format string, which is the LastName. And so on.

The optional length defines the amount of space that is provided for the column. The system will automatically pad the field with spaces to the defined width. A positive length value will align the text to the right. A negative value aligns the text to the left.

The optional formatString defines a standard or custom format string using the .NET formatting syntax.

The format items used in this example are as follows:

  • {0, –8}: Defines that the first parameter will fill 8 characters and will be left justified. This is the CustomerId. Notice that the PadLeft method is used by the CustomerId property to pad the Id field with zeros.
  • {1, –20}: Defines that the second parameter will fill 20 characters and will be left justified. This is the last name.
  • {2,-10}: Defines that the third parameter will fill 10 characters and will be left justified. This is the first name.
  • {3,-10:yyyyMMdd}: Defines that the fourth parameter will fill 10 characters and will be left justified. The value will be formatted as a date in YYYYMMDD format. This is the last edit date.
  • {4}: Defines that the fifth parameter will be used “as is”. This is the new line constant, which is inserted to ensure each customer is on a separate line.

The final part of the code uses the ToString method of the StringBuilder to convert the StringBuilder text to a string. It then writes that string to the defined file.

To read this file back into your application as an in-memory DataTable, see this link. To read the file back in using VB’s TextFieldParser class, see this link.

Enjoy!

© 2014 Deborah's Developer MindScape   Provided by WPMU DEV -The WordPress Experts   Hosted by Microsoft MVPs