Backup Revisited

February 23, 2017

After using the program developed in the last post, I was thinking about some ways to optimize it. Then I went to the FileFinder class and saw this:

class FileFinder
{
    public async Task<ConcurrentDictionary<string, List>> GetFiles(string[] paths, 
        Regex excludeFilesRegex, Regex excludePathsRegex, bool incremental)
    {
        var files = new ConcurrentDictionary<string, List>();
        var tasks = paths.Select(path =>
            Task.Factory.StartNew(() =>
            {
                var rootDir = "";
                var drive = Path.GetPathRoot(path);
                if (!string.IsNullOrWhiteSpace(drive))
                {
                    rootDir = drive[0] + "_drive";
                    rootDir = rootDir + path.Substring(2);
                }
                else
                    rootDir = path;
                var selectedFiles = GetFilesInDirectory(path, excludeFilesRegex, excludePathsRegex, incremental);
                files.AddOrUpdate(rootDir, selectedFiles.ToList(), (a, b) => b);
            }));
        await Task.WhenAll(tasks);
        return files;
    }

    private List GetFilesInDirectory(string directory, Regex excludeFilesRegex, 
        Regex excludePathsRegex,bool incremental)
    {
        var files = new List();
        try
        {
            var directories = Directory.GetDirectories(directory);
            try
            {
                var selectedFiles = Directory.EnumerateFiles(directory).Where(f => !excludeFilesRegex.IsMatch(f.ToLower()));
                if (incremental)
                    selectedFiles = selectedFiles.Where(f => (File.GetAttributes(f) & FileAttributes.Archive) != 0);
                files.AddRange(selectedFiles);
            }
            catch
            {
            }
            foreach (var dir in directories.Where(d => !excludePathsRegex.IsMatch(d.ToLower())))
            {
                files.AddRange(GetFilesInDirectory(Path.Combine(directory, dir), excludeFilesRegex, excludePathsRegex, incremental));
            }
        }
        catch
        {
        }

        return files;
    }
}

I pass the filters to the GetFilesInDirectory method and do my filter there. That way, the folders I don’t want aren’t enumerated. For that, I had to make a change in the Config class, adding a new property for the path Regex and initializing it:

public class Config
{
    public Config(string fileName)
    {
        if (!File.Exists(fileName))
            return;
        var doc = XDocument.Load(fileName);
        if (doc.Root == null)
            return;
        IncludePaths = doc.Root.Element("IncludePaths")?.Value.Split(';');
        ExcludeFiles = doc.Root.Element("ExcludeFiles")?.Value.Split(';') ?? new string[0] ;
        ExcludePaths = doc.Root.Element("ExcludePaths")?.Value.Split(';') ?? new string[0];
        BackupFile = $"{doc.Root.Element("BackupFile")?.Value}{DateTime.Now:yyyyMMddhhmmss}.zip";
        ExcludeFilesRegex = new Regex(string.Join("|", ExcludeFiles));
        ExcludePathRegex = new Regex(string.Join("|", ExcludePaths));
    }

    public Regex ExcludeFilesRegex { get; }
    public Regex ExcludePathRegex { get; }
    public IEnumerable IncludePaths { get; }
    public IEnumerable ExcludeFiles { get; }
    public IEnumerable ExcludePaths { get; }
    public string BackupFile { get; }
}

With this changes, I could run the program again and measure the differences. That made a great difference. Before the change, the program was taking 160s to enumerate the files and give me 470000 files. After the change, enumerating the files took only 14.5s to give me the same files (I ran the programs three times each to avoid distortions). That’s a huge difference, no?

Then I started to think a little bit more and thought that the Regex could be compiled. So, I made two simple changes in the Config class:

ExcludeFilesRegex = new Regex(string.Join("|", ExcludeFiles),RegexOptions.Compiled);
ExcludePathRegex = new Regex(string.Join("|", ExcludePaths), RegexOptions.Compiled);

When I ran again the program, it took me 11.5s to enumerate the files. It doesn’ t seem much, but it’s a 25% improvement with just a simple change. That was really good. That way, I have a backup program that enumerates files way faster than before.

All the source code for the project is at https://github.com/bsonnino/BackupData

  • 0

    Poor man’s backup

    February 20, 2017

    Introduction When you have something digital, having backups is something fundamental to keep your data safe. There are many threats over there that can destroy your data: hardware failures, viruses, natural disasters are some of the ways to make all your data vanish from one moment to the other. I use to keep my data […]

  • 0

    Série de vídeos sobre sensores em UWP

    January 18, 2017

    Acabo de concluir uma série de vídeos sobre o uso de sensores em UWP, que publiquei no Channel 9. São vídeos curtos, com até 15 minutos cada. Vale a pena dar uma conferida e ver como usar os sensores disponíveis no Windows 10 (os programas funcionam tanto no desktop, em tablets, como no smartphone Windows […]

  • 0

    Multi-Monitor debugging in Visual Studio

    January 12, 2017

    When you have a multi-monitor device, you usually want to write code in one monitor and debug the program in another one. This is especially helpful when you want to debug some visual interface that is rendered by the code (or debug the Paint event, for WinForms apps). But the program you’re debugging insists to […]

  • 0

    Working with the Surface Dial in UWP

    December 31, 2016

    Not long ago, Microsoft released its new device, the Surface Studio, a powerhorse with a 28” touch screen, with characteristics that put it in the wish list for every geek (https://www.microsoft.com/en-us/surface/devices/surface-studio/overview). With it, Microsoft released the Surface Dial, a rotary wheel that redefines the way you work when you are doing some creative design (https://www.microsoft.com/en-us/surface/accessories/surface-dial). […]