Many times I need to enumerate the files in my disk or in a folder and subfolders, but that always has been slow. All the file enumeration techniques go through the disk structures querying the file names and going to the next one. With the Windows file indexing, this has gone to another level of speed: you can query and filter your data almost instantaneously, with one pitfall: it only works in the indexed parts of the disk (usually the libraries and Windows folders), being unusable for your data folders, unless you add them to the indexer:

Wouldn’t it be nice to have a database that stores all files in the system and is updated as the files change? Some apps, like Copernic Desktop Search or X1 Search do exactly that: they have a database that indexes your system and can do fast queries for you. The pitfall is that you don’t have an API to integrate to your programs, so the only way you have is to query the files is to use their apps.

At some time, Microsoft thought of doing something like a database of files, creating what was called WinFS – Windows Future Storage, but the project was cancelled. So, we have to stick with our current APIs to query files. Or no? As a matter of fact there is something in the Windows system that allows us to query the files in a very fast way, and it’s called the NTFS MFT (NT file system master file table).

The NTFS MFT is a file structure use internally by Windows that allows querying files in a very fast way. It was designed to be fast and safe (there are two copies of the MFT, in case one of them gets corrupt), and we can access it to get our files enumerated. But some things should be noted when accessing the MFT:

  • The MFT is only available for NTFS volumes. So, you cannot access FAT drives with this API
  • To access the MFT structures, you must have elevated privileges – a normal user won’t be able to access it
  • With great power comes great responsibility (this is a SpiderMan comic book quote), so you should know that accessing the internal NTFS structures may harm you system irreversively – use the code with care, and don’t blame me if something goes wrong (but here’s a suggestion to Windows API designers: why not create some APIs that query the NTFS structures safely for normal users? That could be even be added to UWP programming).

Acessing the MFT structure

There is no formal API to access the MFT structure. You will have to decipher the structure (there is a lot of material here) and access the data using raw disk data read (that’s why you need elevated privileges). This is a lot of work and hours of trial and error.

Fortunately, there are some libraries that do that in C#, and I’ve used this one, which is licensed as LGPL. You can use the library in your compiled work as a library, with no restriction. If you include the library source code in your code, it will be “derived work” and you must distribute all code as LGPL.

We will create a WPF program that will show the disk usage. It will enumerate all files and show them in the list, so you can see what’s taking space in your disk. You will be able to select any of the NTFS disks in your machine.

Open Visual Studio with administrative rights (this is very important, or you won’t be able to debug your program). Then create a new WPF project and add a new item. Choose Application Manifest File, you will have an app.manifest file added to your project. Then, you must change the requestedExecutionLevel tag of the file to:

<requestedExecutionLevel level="requireAdministrator" uiAccess="false" />

The next step is to detect the NTFS disks in your system. This is done with this code:

var ntfsDrives = DriveInfo.GetDrives()
    .Where(d => d.DriveFormat == "NTFS").ToList();

Then, add the NtfsReader project to the solution and add a reference to it in the WPF project. Then, add the following UI in MainWindow.xaml.cs:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
        <RowDefinition Height="30"/>
    </Grid.RowDefinitions>
    <StackPanel Orientation="Horizontal" Margin="5">
        <TextBlock Text="Drive" VerticalAlignment="Center"/>
        <ComboBox x:Name="DrvCombo" Margin="5,0" Width="100" 
                  VerticalContentAlignment="Center"/>
    </StackPanel>
    <ListBox x:Name="FilesList" Grid.Row="1" 
             VirtualizingPanel.IsVirtualizing="True"
             VirtualizingPanel.IsVirtualizingWhenGrouping="True"
             >
        <ListBox.ItemTemplate>
            <DataTemplate>
                <StackPanel Orientation="Horizontal">
                    <TextBlock Text="{Binding FullName}" 
                               Margin="5,0" Width="450"/>
                    <TextBlock Text="{Binding Size,StringFormat=N0}" 
                               Margin="5,0" Width="150" TextAlignment="Right"/>
                    <TextBlock Text="{Binding LastChangeTime, StringFormat=g}" 
                               Margin="5,0" Width="200"/>
                </StackPanel>
            </DataTemplate>
        </ListBox.ItemTemplate>
    </ListBox>
    <TextBlock x:Name="StatusTxt" Grid.Row="2" HorizontalAlignment="Center" Margin="5"/>
</Grid>

We will have a combobox with all drives in the first row and a listbox with the files. The listbox has an ItemTemplate that will show the name, size and date of last change of each file. To fill this data, you will have to add this code in MainWindow.xaml.cs:

public MainWindow()
{
    InitializeComponent();
    var ntfsDrives = DriveInfo.GetDrives()
        .Where(d => d.DriveFormat == "NTFS").ToList();
    DrvCombo.ItemsSource = ntfsDrives;
    DrvCombo.SelectionChanged += DrvCombo_SelectionChanged;
}

private void DrvCombo_SelectionChanged(object sender, 
    System.Windows.Controls.SelectionChangedEventArgs e)
{
    if (DrvCombo.SelectedItem != null)
    {
        var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
        var ntfsReader =
            new NtfsReader(driveToAnalyze, RetrieveMode.All);
        var nodes =
            ntfsReader.GetNodes(driveToAnalyze.Name)
                .Where(n => (n.Attributes & 
                             (Attributes.Hidden | Attributes.System | 
                              Attributes.Temporary | Attributes.Device | 
                              Attributes.Directory | Attributes.Offline | 
                              Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
                .OrderByDescending(n => n.Size);
        FilesList.ItemsSource = nodes;
    }
}

It gets all NTFS drives in your system and fills the combobox. In the SelectionChanged event handler, the reader gets all nodes in the drive. These nodes are filtered to remove all that are not normal files and then ordered descending by size and added to the listbox.

If you run the program you will see some things:

  • If you look at the output window in Visual Studio, you will see these debug messages:
1333.951 MB of volume metadata has been read in 26.814 s at 49.748 MB/s
1324082 nodes have been retrieved in 2593.669 ms

This means that it took 2.6s to read and analyze all files in the disk (pretty fast for 1.3 million files, no?).

  • When you change the drive in the combobox, the program will freeze for some time and the list will be filled with the files. The freezing is due to the fact that you are blocking the main thread while you are analyzing the disk. To avoid this, you should run the code in a secondary thread, like this code:
private async void DrvCombo_SelectionChanged(object sender, 
    System.Windows.Controls.SelectionChangedEventArgs e)
{
    if (DrvCombo.SelectedItem != null)
    {
        var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
        DrvCombo.IsEnabled = false;
        StatusTxt.Text = "Analyzing drive";
        List<INode> nodes = null;
        await Task.Factory.StartNew(() =>
        {
            var ntfsReader =
                new NtfsReader(driveToAnalyze, RetrieveMode.All);
            nodes =
                ntfsReader.GetNodes(driveToAnalyze.Name)
                    .Where(n => (n.Attributes &
                                 (Attributes.Hidden | Attributes.System |
                                  Attributes.Temporary | Attributes.Device |
                                  Attributes.Directory | Attributes.Offline |
                                  Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
                    .OrderByDescending(n => n.Size).ToList();
        });
        FilesList.ItemsSource = nodes;
        DrvCombo.IsEnabled = true;
        StatusTxt.Text = $"{nodes.Count} files listed. " +
                         $"Total size: {nodes.Sum(n => (double)n.Size):N0}";
    }
}

This code creates a task and runs the analyzing code in it, and doesn’t freeze the UI. I just took care of disabling the combobox and putting a warning for the user. After the code is run, the nodes list is assigned to the listbox and the UI is re-enabled.

This code can show you the list of the largest files in your disk, but you may want to analyze it by other ways, like grouping by extension or by folder. WPF has an easy way to group and show data: the CollectionViewSource. With it, you can do grouping and sorting in the ListBox. We will change our UI to add a new ComboBox to show the new groupings:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
        <RowDefinition Height="30"/>
    </Grid.RowDefinitions>
    <StackPanel Orientation="Horizontal" Margin="5">
        <TextBlock Text="Drive" VerticalAlignment="Center"/>
        <ComboBox x:Name="DrvCombo" Margin="5,0" Width="100" 
                  VerticalContentAlignment="Center"/>
    </StackPanel>
    <StackPanel Grid.Row="0" HorizontalAlignment="Right" Orientation="Horizontal" Margin="5">
        <TextBlock Text="Sort" VerticalAlignment="Center"/>
        <ComboBox x:Name="SortCombo" Margin="5,0" Width="100" 
                  VerticalContentAlignment="Center" SelectedIndex="0" 
                  SelectionChanged="SortCombo_OnSelectionChanged">
            <ComboBoxItem>Size</ComboBoxItem>
            <ComboBoxItem>Extension</ComboBoxItem>
            <ComboBoxItem>Folder</ComboBoxItem>
        </ComboBox>
    </StackPanel>
    <ListBox x:Name="FilesList" Grid.Row="1" 
             VirtualizingPanel.IsVirtualizing="True"
             VirtualizingPanel.IsVirtualizingWhenGrouping="True" >
        <ListBox.ItemTemplate>
            <DataTemplate>
                <StackPanel Orientation="Horizontal">
                    <TextBlock Text="{Binding FullName}" 
                               Margin="5,0" Width="450"/>
                    <TextBlock Text="{Binding Size,StringFormat=N0}" 
                               Margin="5,0" Width="150" TextAlignment="Right"/>
                    <TextBlock Text="{Binding LastChangeTime, StringFormat=g}" 
                               Margin="5,0" Width="200"/>
                </StackPanel>
            </DataTemplate>
        </ListBox.ItemTemplate>
        <ListBox.GroupStyle>
            <GroupStyle>
                <GroupStyle.HeaderTemplate>
                    <DataTemplate>
                        <StackPanel Orientation="Horizontal">
                            <TextBlock Text="{Binding Name}" FontSize="15" FontWeight="Bold" 
                                       Margin="5,0"/>
                            <TextBlock Text="(" VerticalAlignment="Center" Margin="5,0,0,0" />
                            <TextBlock Text="{Binding Items.Count}" VerticalAlignment="Center"/>
                            <TextBlock Text=" files - " VerticalAlignment="Center"/>
                            <TextBlock Text="{Binding Items, 
                                Converter={StaticResource ItemsSizeConverter}, StringFormat=N0}"
                                         VerticalAlignment="Center"/>
                            <TextBlock Text=" bytes)" VerticalAlignment="Center"/>
                        </StackPanel>
                    </DataTemplate>
                </GroupStyle.HeaderTemplate>
            </GroupStyle>
        </ListBox.GroupStyle>
    </ListBox>
    <TextBlock x:Name="StatusTxt" Grid.Row="2" HorizontalAlignment="Center" Margin="5"/>
</Grid>

The combobox has three options, Size, Extension and Folder. The first one is the same thing we’ve had until now; the second will group the files by extension and the third will group the files by top folder. We’ve also added a GroupStyle to the listbox. If we don’t do that, the data will be grouped, but the groups won’t be shown. If you notice the GroupStyle, you will see that we’re adding the name, then the count of the items (number of files in the group), then we have a third TextBox where we pass the Items and a converter. That’s because we want to show the total size in bytes of the group. For that, I’ve created a converter that converts the Items in the group to the sum of the bytes of the file:

public class ItemsSizeConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        var items = value as ReadOnlyObservableCollection<object>;
        return items?.Sum(n => (double) ((INode)n).Size);
    }

    public object ConvertBack(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        throw new NotImplementedException();
    }
}

The code for the SelectionChanged for the sort combobox is:

private void SortCombo_OnSelectionChanged(object sender, 
    SelectionChangedEventArgs e)
{
    if (_view == null)
        return;
    _view.GroupDescriptions.Clear();
    _view.SortDescriptions.Clear();
    switch (SortCombo.SelectedIndex)
    {
        case 1:
            _view.GroupDescriptions.Add(new PropertyGroupDescription("FullName", 
                new FileExtConverter()));
            break;
        case 2:
            _view.SortDescriptions.Add(new SortDescription("FullName",
                ListSortDirection.Ascending));
            _view.GroupDescriptions.Add(new PropertyGroupDescription("FullName", 
                new FilePathConverter()));
            break;
    }
}

We add GroupDescriptions for each kind of group. As we don’t have the extension and top path properties in the nodes shown in the listbox, I’ve created two converters to get these from the full name. The converter that gets the extension from the name is:

class FileExtConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        var fileName = value as string;
        return string.IsNullOrWhiteSpace(fileName) ? 
            null : 
            Path.GetExtension(fileName).ToLowerInvariant();
    }

    public object ConvertBack(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        throw new NotImplementedException();
    }
}

The converter that gets the top path of the file is:

class FilePathConverter :IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        var fileName = value as string;
        return string.IsNullOrWhiteSpace(fileName) ?
            null :
            GetTopPath(fileName);
    }

    private string GetTopPath(string fileName)
    {
        var paths = fileName.Split(Path.DirectorySeparatorChar).Take(2);
        return string.Join(Path.DirectorySeparatorChar.ToString(), paths);
    }

    public object ConvertBack(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        throw new NotImplementedException();
    }
}

One last thing is to create the _view field, when we are filling the listbox:

 private async void DrvCombo_SelectionChanged(object sender, 
     System.Windows.Controls.SelectionChangedEventArgs e)
 {
     if (DrvCombo.SelectedItem != null)
     {
         var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
         DrvCombo.IsEnabled = false;
         StatusTxt.Text = "Analyzing drive";
         List<INode> nodes = null;
         await Task.Factory.StartNew(() =>
         {
             var ntfsReader =
                 new NtfsReader(driveToAnalyze, RetrieveMode.All);
             nodes =
                 ntfsReader.GetNodes(driveToAnalyze.Name)
                     .Where(n => (n.Attributes &
                                  (Attributes.Hidden | Attributes.System |
                                   Attributes.Temporary | Attributes.Device |
                                   Attributes.Directory | Attributes.Offline |
                                   Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
                     .OrderByDescending(n => n.Size).ToList();
         });
         FilesList.ItemsSource = nodes;
         _view = (CollectionView)CollectionViewSource.GetDefaultView(FilesList.ItemsSource);

         DrvCombo.IsEnabled = true;
         StatusTxt.Text = $"{nodes.Count} files listed. " +
                          $"Total size: {nodes.Sum(n => (double)n.Size):N0}";
     }
     else
     {
         _view = null;
     }
 }

With all these in place, you can run the app and get a result like this:

Conclusions

As you can see, there is a way to get fast file enumeration for your NTFS disks, but you must have admin privileges to use it. We’ve created a WPF program that uses this kind of enumeration and allows you to group the data in different ways, using the WPF resources. If you need to enumerate  your files very fast, you can consider this way to do it.

The full source code for this article is at https://github.com/bsonnino/NtfsFileEnum

I’m sure that you have had this scenario at least once: you download a file from the internet and try to run it and it doesn’t work fine. If it is a CHM file, the help file doesn’t open the content:

image

 

If it’s an executable, it will show a message telling you that you got it from an unreliable source or, even worse, it stays there and does nothing. A client of mine set me a zip with some files to install. I downloaded the zip, unzipped it and tried to run and nothing happened. The only thing I could do is to open Task Manager and kill the Explorer process.

This is caused by a mechanism Windows has to lock files that come from the internet and can harm your computer. If it is a zipped file and you unzip it with Explorer, all extracted files will be protected with this mechanism. If you right click the file in Explorer, select Properties and check on the “Unlock” box, everything runs fine. The file is unlocked.

image

image

But what is this protection mechanism and how can we check if the file is protected and unprotect it?

The mechanism used is called “Alternate Data Streams” (ADS), and it’s a way that NTFS has to add extra data in a file. This data is not shown when you open a file and is size is not added to the total file size, but the data is there, hidden and taking disk space. If you open a command line prompt window and do a Dir /R command, you can see these alternate streams:

image

In the image above, you can see two lines for each file. The first one is the “real” file, and the second one is the “hidden” file. You can see that the “hidden” file has the same name of the “real” file and a complement after the colon. The Zone.Identifier:$DATA is the alternate stream for the file. It may have any name and can have any size (it can even be larger than the “real” file) and it’s a very effective way to hide information – if you open a file, you won’t see any trace of the alternate data stream. But, with great power comes great responsibility, and that can also be used for evil purposes – it can hide things the user is not aware and would not want in his computer.

For our blocked files, we know that they have a ADS, named Zone.Identifier. But what’s in the files? To show the contents of an ADS, you cannot use the normal tools. You can use Powershell to open them. For example, if you open a Powershell command line (se Window+X, then select “Windows Powershell”, you can type

Get-Content -Path .\Web_Servers_Succinctly.pdf -Stream Zone.Identifier

And it will show the contents of the file:

image

We can see that this file was transferred from Zone=3. From here we can see that 3 is URLZONE_INTERNET, the file came from the internet. So that’s what is blocking the file. It we unblock the file, here’s what we get when we query the streams in the file:

image

We have changed the ZoneId to 4, URLZONE_UNTRUSTED. This is enough to unblock the file.

Manipulating ADSs with C#

Now we know the mechanism behind Windows protection and we can create a C# program that unblocks all files in a directory. There is no API to access file streams in .NET, so you must resort to Win32 P/Invoke. The functions NtfsFindFirstStream and NtfsFindNextStream enumerate all streams in a file. DeleteFile will delete a stream, and CreateFile will create and add the stream.

Fortunately, we don’t have to work with this functions, as a brave developer has created a library to handle ADS in .NET (https://github.com/RichardD2/NTFS-Streams) and we can add it from Nuget. We will create a program to list, show contents or delete alternate streams with C#.

In Visual Studio, create a new Console application. Right click in the References node and select Manage Nuget Packages. There, install Trinet.Core.IO.Ntfs.

image

Then, in Program.cs, we will start to create our program:

static void Main(string[] args)
{
    if (args.Length == 0)
    {
        Console.WriteLine("usage: adsviewer -(l|s|d)  [streamname]");
        return;
    }
    var selSwitch = ProcessSwitches(args);
    if (selSwitch == null)
        return;
    var parameters = args.Where(a => !a.StartsWith("-"));
    string fileName = ".";
    if (parameters.Count() > 0)
        fileName = parameters.First();
    var streamName = parameters.Skip(1).FirstOrDefault();
    if (File.Exists(fileName))
        ProcessFile(fileName, streamName, selSwitch);
    else if (Directory.Exists(fileName))
        foreach (var file in Directory.GetFiles(fileName))
            ProcessFile(file, streamName, selSwitch);
    else
    {
        Console.WriteLine("error: file/folder does not exist");
        return;
    }
}

Initially, we’ve checked if the user passed any parameters. If not, we show the usage help. Then, we process the switch (the option with a hiphen), that can be a “l” (to list the stream names), a “s” (to show the content of the streams or a “d”  (to delete the streams.

Then, we get the file or folder name and the stream name. If the user didn’t pass any file name, we admit he wants to process all the streams in all files in the current directory. If he passes a name, we check if it’s a file or a folder. If it’s a folder, we will process all files in that folder.

If the user passes a stream name, we will only process that stream. If he doesn’t pass a name, we will process all streams in the file.

ProcessSwitches checks the passed switch and sees if its valid:

private static string ProcessSwitches(string[] args)
{
    var switches = args.Where(a => a.StartsWith("-"));
    if (switches.Count() > 1)
    {
        Console.WriteLine("error: you can use only one switch at a time (-l, -s, -d)");
        return null;
    }
    var selSwitch = switches.Count() == 0 ? "l" : switches.First().Substring(1);
    if (!"lsd".Contains(selSwitch))
    {
        Console.WriteLine("error: valid switches: -l(ist), -s(how contents), -d(elete)");
        return null;
    }
    return selSwitch;
}

ProcessFile will process one file. If the user passed a directory name, we enumerate all files in the directory and call ProcessFile for each file.

private static void ProcessFile(string fileName, string streamName, string selSwitch)
{
    switch (selSwitch)
    {
        case "l":
            ListStreams(fileName);
            break;
        case "s":
            ShowContents(fileName, streamName);
            break;
        case "d":
            DeleteStream(fileName, streamName);
            break;
    }
}

Then, we have three kind of processing: if the user wants to list the names, we call ListStreams:

private static void ListStreams(string fileName)
{
    FileInfo file = new FileInfo(fileName);
    Console.WriteLine($"{file.Name} - {file.Length:n0}");
    foreach (AlternateDataStreamInfo s in file.ListAlternateDataStreams())
        Console.WriteLine($"    {s.Name} - {s.Size:n0}");
}

It will enumerate all streams and list their names and sizes. The second processing is to show the contents, in ShowContents:

private static void ShowContents(string fileName, string streamName)
{
    FileInfo file = new FileInfo(fileName);
    Console.WriteLine($"{file.Name} - {file.Length:n0}");
    if (!string.IsNullOrWhiteSpace(streamName) &amp;&amp; file.AlternateDataStreamExists(streamName))
    {
        AlternateDataStreamInfo s = file.GetAlternateDataStream(streamName, FileMode.Open);
        ShowContentsOfStream(file, s);
    }
    else if (string.IsNullOrWhiteSpace(streamName))
        foreach (AlternateDataStreamInfo s in file.ListAlternateDataStreams())
            ShowContentsOfStream(file, s);
}

If the user passed a name of a valid stream, it will show its contents. If it has not passed a name, the program will enumerate all streams and show the contents for them all. The ShowContentsOfStream method is:

private static void ShowContentsOfStream(FileInfo file, AlternateDataStreamInfo s)
{
    Console.WriteLine($"    {s.Name}");
    using (TextReader reader = s.OpenText())
    {
        Console.WriteLine(reader.ReadToEnd());
    }
}

Finally, the third kind of processing is to delete the streams. For that, we use the DeleteStreams method:

private static void DeleteStream(string fileName, string streamName)
{
    FileInfo file = new FileInfo(fileName);
    Console.WriteLine($"{file.Name} - {file.Length:n0}");
    if (!string.IsNullOrWhiteSpace(streamName) && file.AlternateDataStreamExists(streamName))
    {
        file.DeleteAlternateDataStream(streamName);
        Console.WriteLine($"    {streamName} deleted");
    }
    else if (string.IsNullOrWhiteSpace(streamName))
        foreach (AlternateDataStreamInfo s in file.ListAlternateDataStreams())
        {
            s.Delete();
            Console.WriteLine($"    {s.Name} deleted");
        }
}

If the user passed a valid stream name, the program will delete that stream. If he hasn’t passed a name, the program will delete all streams in the file.

That way, we arrived at the end of our program.

Conclusions

ADSs are used in many ways, to add extra data to a file. Although there are legitimate ways to use them, sometimes, that can be used to store harmful data. We have developed a tool that can list all ADSs in a file, show their contents or delete them. With it, you can analyze the data that is attached to your file and decide what to do with it. Next time you get something that came from the internet and is blocked, you can easily unblock it with this tool.

The full source code is in http://github.com/bsonnino/ADSViewer