Poor man’s backup

February 20, 2017

Introduction

When you have something digital, having backups is something fundamental to keep your data safe. There are many threats over there that can destroy your data: hardware failures, viruses, natural disasters are some of the ways to make all your data vanish from one moment to the other.

I use to keep my data in several places (you can never be 100% sure Smile), making cloud and local backups, and I can say that they have saved me more than once. For cloud backups, there are several services out there and I won’t discuss them, but for local backups, my needs are very specific, and I don’t need the fancy stuff out there (disk images, copying blocked data, and so on). I just need a backup that has these features:

  • Copies data in a compressed way – it would be better that it’s a standard format, like zip files, so I can open the backups with normal tools and don’t need to use the tool to restore data.
  • Allows the copy of selected folders in different drives. I don’t want to copy my entire disk (why keep a copy of the Windows installation, or the installed programs, if I can reinstall them when I need).
  • Allows the exclusion of files and folders in the copy (I just want to copy my source code, there is no need to copy the executables and dlls).
  • Allows incremental (only the files changed) or full backup (all files in a set)
  • Can use backup configurations (I want to backup only my documents or only my source code, and sometimes both)
  • Can be scheduled and run at specified times without the need of manually starting it.

With these requirements, I started to look for backup programs out there and I have found some free ones that didn’t do everything I wanted and some paid ones that did everything, but I didn’t want to pay what they were asking for. So, being a developer, I decided to make my own backup with the free tools out there.

The first requirement is a compressed backup, with a standard format. For zip files, I need zip64, as the backup files can be very large and the normal zip files won’t handle large files. So, I decided to use the DotNetZip library (https://dotnetzip.codeplex.com/), an open source library that is very simple to use and supports Zip64 files. Now I can go to the next requirements. That can be done with a normal .NET console program.

Creating the backup program

In Visual Studio, create a new Console Program and, in the Solution Explorer, right-click the References node and select “Manage NuGet packages” and add the DotNetZip package. I don’t want to add specific code for working with the command line options, so I added a second package, CommandLineParser (https://github.com/gsscoder/commandline), that does this for me. I just have to create a new class with the options I want and it does all the parsing for me:

class Options
{
    [Option(DefaultValue = "config.xml", 
      HelpText = "Configuration file for the backup.")]
    public string ConfigFile { get; set; }

    [Option('i', "incremental", DefaultValue= false,
      HelpText = "Does an increamental backap.")]
    public bool Incremental { get; set; }

    [HelpOption]
    public string GetUsage()
    {
        return HelpText.AutoBuild(this,
          (HelpText current) => HelpText.DefaultParsingErrorsHandler(this, current));
    }
}

To use it, I just have to pass the command line arguments and have it parsed:

var options = new Options();
CommandLine.Parser.Default.ParseArguments(args, options);

It will even give me a –help command line for help:

image

The next step is to process the configuration file. Create a new class and name it Config.cs:

public class Config
{
    public Config(string fileName)
    {
        if (!File.Exists(fileName))
            return;
        var doc = XDocument.Load(fileName);
        if (doc.Root == null)
            return;
        IncludePaths = doc.Root.Element("IncludePaths")?.Value.Split(';');
        ExcludeFiles = doc.Root.Element("ExcludeFiles")?.Value.Split(';') ?? new string[0] ;
        ExcludePaths = doc.Root.Element("ExcludePaths")?.Value.Split(';') ?? new string[0];
        BackupFile = $"{doc.Root.Element("BackupFile")?.Value}{DateTime.Now:yyyyMMddhhmmss}.zip";
        ExcludeFilesRegex =
            new Regex(string.Join("|", string.Join("|", ExcludeFiles), string.Join("|", ExcludePaths)));
    }

    public Regex ExcludeFilesRegex { get; }
    public IEnumerable IncludePaths { get; }
    public IEnumerable ExcludeFiles { get; }
    public IEnumerable ExcludePaths { get; }
    public string BackupFile { get; }
}

To make it easy to select the paths and files to be excluded, I decided to give it a Regex style and create a Regex that will match all files. For example, if you want to remove all mp3 files, you would add something like “\.mp3$” (starts with a “.”, then mp3 and then the end of the string). If you want to remove mp3 and mp4 files, you can add this: “\.mp[34]$”. For the paths, you get the same thing, but they start and end with a slash (double slash, for the regex).

With this in place, we can start our backup. Create a new class and call it Backup.cs. Add this code to it:

class Backup
{
    private readonly FileFinder _fileFinder = new FileFinder();

    public async Task DoBackup(Config config, bool incremental)
    {
        var files = await _fileFinder.GetFiles(config.IncludePaths.ToArray(), 
               config.ExcludeFilesRegex, incremental);
        using (ZipFile zip = new ZipFile())
        {
            zip.UseZip64WhenSaving = Zip64Option.AsNecessary;
            foreach (var path in files)
                zip.AddFiles(path.Value, false, path.Key);
            zip.Save(config.BackupFile);
        }
        foreach (var file in files.SelectMany(f => f.Value))
            ResetArchiveAttribute(file);
        return 0;
    }

    public void ResetArchiveAttribute(string fileName)
    {
        var attributes = File.GetAttributes(fileName);
        File.SetAttributes(fileName, attributes & ~FileAttributes.Archive);
    }
}

This class uses a FileFinder class to find all files that match the pattern we want and creates a zip file. The GetFiles method from FileFinder returns a dictionary structured like this:

  • The key is a path related to the search path. As the paths can be on any disk of your system and they can have the same names (ex C:\Temp and D:\Temp), and that would not be ok in the zip file, the paths are changed to reflect the same structure, but their names are changed to allow to be added to the zip files. That way, if I am searching in C:\Temp and in D:\Temp, the keys for this dictionary would be C_Drive\Temp and D_Drive\Temp. That way, both paths will be stored in the zip and they wouldn’t clash. These keys are used to change the paths when adding the files to the zip
  • The value is a list of files found in that path

The files are added to the zip and, after that, their Archive bit is reset. This must be done, so the incremental backup can work in the next time: incremental backups are based on the Archive bit: if it’s set, the file was modified and it should be backed up. If not, the file was untouched. This is not a foolproof method, but it works fine for most cases. A more foolproof way to do this would be to keep a log file every full backup with the last modified dates of the files and compare them with the current file dates. This log should be updated every backup. For my case, I think that this is too much and the archive bit is enough.

The FileFinder class is like this one:

class FileFinder
{
    public async Task<ConcurrentDictionary<string, List>> GetFiles(string[] paths, 
        Regex regex, bool incremental)
    {
        var files = new ConcurrentDictionary<string, List>();
        var tasks = paths.Select(path =>
            Task.Factory.StartNew(() =>
            {
                var rootDir = "";
                var drive = Path.GetPathRoot(path);
                if (!string.IsNullOrWhiteSpace(drive))
                {
                    rootDir = drive[0] + "_drive";
                    rootDir = rootDir + path.Substring(2);
                }
                else
                    rootDir = path;
                var selectedFiles = Enumerable.Where(GetFilesInDirectory(path), f => 
                     !regex.IsMatch(f.ToLower()));
                if (incremental)
                    selectedFiles = selectedFiles.Where(f => (File.GetAttributes(f) & FileAttributes.Archive) != 0);
                files.AddOrUpdate(rootDir, selectedFiles.ToList(), (a, b) => b);
            }));
        await Task.WhenAll(tasks);
        return files;
    }

    private List GetFilesInDirectory(string directory)
    {
        var files = new List();
        try
        {
            var directories = Directory.GetDirectories(directory);
            try
            {
                files.AddRange(Directory.EnumerateFiles(directory));
            }
            catch
            {
            }
            foreach (var dir in directories)
            {
                files.AddRange(GetFilesInDirectory(Path.Combine(directory, dir)));
            }
        }
        catch
        {
        }

        return files;
    }
}

The main method of this class is GetFiles. It is an asynchronous method, I will create a new task for every search path. The result is a ConcurrentDictionary, and it has to be so, because there are many threads updating it at once and we could have concurrency issues. The ConcurrentDictionary handles locking when adding data from different threads.

The GetFilesInDirectory finds all files in one directory and, after all files are found, the data is filtered according to the Regex and, if the user asks for an incremental backup, the files are checked for their archive bit set. With this set of files, I can add them to the zip and have a backup file that can be read with standard programs.

Just one requirement remains: to have a scheduled backup. I could make the program stay in the system tray and fire the backup at the scheduled times, but there is an easier way to do it: use the Windows task scheduler. You just need to open a command prompt and type the command:

schtasks /create /sc daily /st "08:15" /tn "Incremental Backup" /t
r "D:\Projetos\Utils\BackupData\BackupData\bin\Debug\Backupdata.exe -i"

That will create a scheduled task that will run the incremental backup every day at 8:15. The main program for this backup is very simple:

static void Main(string[] args)
{
    var options = new Options();
    CommandLine.Parser.Default.ParseArguments(args, options);
    if (string.IsNullOrWhiteSpace(options.ConfigFile))
        return;
    if (string.IsNullOrWhiteSpace(Path.GetDirectoryName(options.ConfigFile)))
    {
        var currentDir = Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location);
        if (!string.IsNullOrWhiteSpace(currentDir))
            options.ConfigFile = Path.Combine(currentDir, options.ConfigFile);
    }
    var config = new Config(options.ConfigFile);
    var backup = new Backup();
    var result = backup.DoBackup(config, options.Incremental).Result;

}

I will parse the arguments, read and parse the config file, create the backup and exit. As you can see, the last line calls DoBackup.Result. This is because the Main method cannot be async and, if I just run it without calling async, it would not wait and would exit without running the backup. Calling result, the program will wait for the task completion.

Just one issue, here – if you wait for the task schedule to fire, you will see that a console window appears, and we don’t want that this happens while we are doing something else. One way to hide the console window is to go to the app properties and set the output type as a Windows application. That will be enough to hide the console window:

image

Conclusions

As you can see, it’s not too difficult to make a backup program using open source tools we have available. This program is very flexible, small and not intrusive. It can run anytime you want and have different configuration files. Not bad, huh?

The full source code for the project is in  https://github.com/bsonnino/BackupData

  • 0

    Série de vídeos sobre sensores em UWP

    January 18, 2017

    Acabo de concluir uma série de vídeos sobre o uso de sensores em UWP, que publiquei no Channel 9. São vídeos curtos, com até 15 minutos cada. Vale a pena dar uma conferida e ver como usar os sensores disponíveis no Windows 10 (os programas funcionam tanto no desktop, em tablets, como no smartphone Windows […]

  • 0

    Multi-Monitor debugging in Visual Studio

    January 12, 2017

    When you have a multi-monitor device, you usually want to write code in one monitor and debug the program in another one. This is especially helpful when you want to debug some visual interface that is rendered by the code (or debug the Paint event, for WinForms apps). But the program you’re debugging insists to […]

  • 0

    Working with the Surface Dial in UWP

    December 31, 2016

    Not long ago, Microsoft released its new device, the Surface Studio, a powerhorse with a 28” touch screen, with characteristics that put it in the wish list for every geek (https://www.microsoft.com/en-us/surface/devices/surface-studio/overview). With it, Microsoft released the Surface Dial, a rotary wheel that redefines the way you work when you are doing some creative design (https://www.microsoft.com/en-us/surface/accessories/surface-dial). […]

  • 0

    2016 Retrospective–Open Source Projects

    December 24, 2016

    2016 is coming to an end (What an year here in Brazil!), and we can see retrospectives everywhere. I don’t want to do a “What happened”  or “What did I do”  retrospective. Instead, I want to write about the open source projects I used along the year. I use open source projects a lot (I […]