Analyzing disk space with Excel

Introduction

Nowadays you have a lot of disk space available, it is common to find disks with 4 terabytes and we don’t have to worry with what we store on them. That is not entirely true, because:

  • No matter the size of the disk, we will always be able to fill it.
  • Although normal disks are very large, it is being more common to use SSD disks for boot and these ones have a very limited size, due to their cost.
  • With little disk space, the disk access speed slows down, due to fragmentation.
  • Our computers are an extension of our home and we should not leave it a complete mess, we should keep our disks clean.

One way to manage the disk space is to use specialized tools like my utility DiskPiePro (http://www.pcmag.com/article2/0,2817,2354473,00.asp). Although a utility may be a very good alternative, you have another way, very good and flexible to do it: use PowerShell and Excel to do it.

PowerShell is, according to Wikipedia, “Microsoft’s task automation framework, consisting of a command-line shell and associated scripting language built on .NET Framework. PowerShell provides full access to COM and WMI, enabling administrators to perform administrative tasks on both local and remote Windows systems”. With it, you can easily access your system’s data and many tools expose their information to PowerShell. With it, you can get your disk information and export it to a comma separated value file that Excel can read and analyze. This is a very powerful solution, as we will see now.

Creating a PowerShell script

To work with PowerShell, we must open a PowerShell window. You can open the start menu in Windows and type PowerShell. You can open a PowerShell prompt window (similar to a command prompt) or open the PowerShell ISE, an IDE for PowerShell development. I prefer to use the ISE, as it has many resources, including Intellisense that completes commands and shows the options available for every command.

If you are using Windows 8, PowerShell ISE is hidden. You will not see it on the start menu. You must open an Explorer Window, go to c:\windows\system32\WindowsPowerShell\V1.0 and open the file PowerShell_ise.exe.

Using the Get-ChildItem (by convention, PowerShell commands are composed by a verb and a noun) we can get the list of the files of the current folder (just as a side note, the Get-ChildItem command has two aliases, dir and ls – you can get what it does, no?). To get the list of files in all the subfolders, we use the parameter recurse. To avoid the cancellation of the command when you have no access to a folder we use the parameter ErrorAction. That way, the full command to get the list of files in the current folder and subfolders is:

Get-ChildItem -recurse -ErrorAction SilentlyContinue

Until now, we did not do anything that you could do with a simple dir command, but one of the things that make PowerShell powerful is the ability to compose commands. We want to filter our files, eliminating those that are small, so they do not interfere with our analysis. We can do that using piping: we get the output of a command and use it as the input for the next one, using the “|” character. The command to filter data is Where-Object. To show just the files with more than 1MB, we use the command:

Where-Object Length -gt 1MB

This syntax is only valid for PowerShell 3 (if you do not have PowerShell 3 installed, you can filter using this command: Where-Object {$_.Length -gt 1MB}).

The command Get-ChildItem returns a set of objects that correspond to the files, with many properties. We do not want all of them (if you want to see what the available properties are, you can use the command Get-ChildItem | Get-Member), so we will filter the shown properties with Select-Object:

Select-Object -Property Name, Length, DirectoryName, FullName, Extension

At last, we will export the resulting data to a csv file, with Export-Csv:

Export-Csv LargestFiles.csv –NoTypeInformation -UseCulture

By composing all commands, we can create a small script that gets all files with more than 1MB and writes a csv file with the list. The script below does just that. I have modified the script a bit to parameterize the folder to start the search the minimum size and the name of the csv file:

[CmdletBinding()]

param
(
[string]$initialpath =
‘.’,

[int]$minimumsize =
1MB,

[string]$outputfile =
‘LargeFilesReport.csv’

)

Get-ChildItem
$initialpath
-Recurse
-ErrorAction
“SilentlyContinue”
|

Where-Object {$_.Length -gt $minimumsize} |

Select-Object
-Property
Name, Length, DirectoryName, FullName, Extension
|

Export-Csv
$outputfile
-NoTypeInformation -UseCulture

If you save the script as Get-LargestFiles.ps1 (remember, by convention, PowerShell scripts have a verb and a noun), we can call it to get all files larger than 10MB in the current folder and create a file named LargeFiles.csv with the commands:

.\Get-LargestFiles –initialpath . –minimumsize 10MB –outputfile LargeFiles.csv

Or

.\Get-LargestFiles . 10MB LargeFiles.csv

You can enter the parameters in any order if you add the parameter name before them or you must enter them in order, then you can remove the parameter names. You can also omit some of the parameters and the script will use the default values. If you get an access denied error when executing this script, probably is because PowerShell restricts script execution by default. You must open a PowerShell window with admin rights and execute the command Set-ExecutionPolicy RemoteSigned that allows the execution of any local files, but only signed remote files. You must do this procedure only once.

As you can see, PowerShell is very powerful, much more than a simple scripting language (anyone remember of DOS batch files?) and allows obtaining many information about the machine, it is worth knowing it better.

Now we already have the csv file, let us analyze it with Excel. If you have the csv extension associated with Excel, you can open it directly in PowerShell, with

Invoke-Item .\LargeFilesReport.csv

This command opens Excel with the list of largest files open in it.

Analyzing the report file with Excel

Arranging the file

When you open the file, Excel reads, interprets the data and adds each property to one column. We can change the size of each column to view the data better. You can do it selecting the entire worksheet, by clicking on the top border, to the left of the A column and selecting Format/AutoFit Column Width, or with a double click on the top border that divides two columns to fit just one column. Another way to resize a column is to drag and drop the column border for the column we want to resize.

The next step is to sort the worksheet, so that the largest files are in the top. With the worksheet selected, click on Sort & Filter and select Custom Sort. Sort by the Length column, on the Largest to Smallest order.

Next, we will format the lengths of the files. Click on the cell just below the Length title and press [Shift] + [End] + [Down-arrow]. That will select all file sizes. We will also name this range to make it easier to work with it: on the top bar, to the left of the data entry box, where there is the current cell name indicator, type Sizes and press [Enter]. Naming the range in that way allows us to access the range with this name: one you select Size in the cell indicator box, you are selecting the whole range.

We want to show the file sizes the same way Explorer does, using GB for files larger than 1GBm=, MB for files larger than 1MB and KB for files larger than 1KB. The first idea that came to my mind was to create a hidden column and divide the values by 1,000, 1,000,000 or 1,000,000,000,000, depending on the file size. There is a simpler way to do it: using custom formatting. With the file sizes range selected, right click and select Format Cells. Select the Custom format and type this format:

[<1000000]0.00,” KB”;[<1000000000]0.00,,” MB”;0.00,,,” GB”

We are using here the US regional settings. If you are in another region with different settings, like Europe or South America, you should change the “.” With the “,” and vice-versa.

This format is very different from what we are used to, but it is very powerful. Let us explain it step-by-step: the format has three parts, separated by “;”. On the first, [<1000000]0.00,” KB”, Excel verifies if the number is smaller than 1,000,000. If it is, then it applies the 0.00,” KB” format. The “,” at the end of the format makes that the number is divided by 1000. The result is shown with two decimals and the string “KB” at the end. The second part is similar to the first, but we use two commas at the end to divide the number by 1,000,000 and add the “MB” suffix. The third part, with three commas, divides the number by 1,000,000,000 and adds “GB” to the end. It is easy when you know it, no?

The first question that comes to mind when we are managing our disks is how much space the largest files are consuming and how many files have more than 1MB. Add two lines above the title line and, on cell A1 put the label “Total”. On cell B1, add the formula =SUM(Sizes). On cell C1 add the formula =COUNT(Sizes). To format the number of files, right-click cell B1 and select Format Cells and the Custom format. On the formatting, type #,##0 “Files”.

Next, we will make the top rows fixed. Select the cell below the title (A4) and, on the View tab, click in Freeze Panes/Freeze Panes. That way, the first three rows are frozen and are shown even when we scroll the table down.

 

Figure 1 – Worksheet formatted with total space and number of files

Our formatting will be finished with an indication of file size, so we can locate easily the files that interest to us. We will use to do that the conditional formatting. This kind of formatting allows highlighting our data in many ways, with colors, graphs or icons.

We will use the conditional formatting with icons, using a red icon for files with more than 100MB, yellow for files larger than 10MB and green for the smaller ones. Position the cursor in the cell indicator combobox and select the Sizes range. Next, go to the Home tab and select Conditional Formatting/Icon Sets, and then select the three circles icon set. By default, this formatting add green icons to the first 33%, yellow on the next 33% and red on the last 33%. Although we want something similar to that, it is not what we want. We click again in Conditional Formatting and select Manage Rules, with a double click in the rule to change it. We change the first icon to red and the last one to green. Then, we change the Type of the first two icons to Number and, finally, change the value for the first icon to 100000000 and for the second icon to 10000000. That way, files with more than 100MB will be marked with a red icon, the ones with sizes between 10MB and 100MB will be marked with a yellow icon and the smaller ones with a green icon. When we click OK, we can see something like Figure 2.

Figure 2 – Worksheet with conditional formatting

Total by Extension

Another way to analyze our disk is to know which extensions are filling it. We must create a new worksheet, called “Extensions”. There we will add all extensions found and the space occupied by them. In the cells A1 to C1 add the titles: “Extension”, “Size” and “Quantity”.

Then, copy the extension from column E in the first worksheet to the cell A2 on the new one. With the extensions selected, go to the Data tab and select Remove Duplicates. The dialog box asks which columns you want to use for the duplicates. Select column A and click OK. Excel then shows how many duplicates it has removed.

To get the total space occupied by each extension, we must use the SUMIF function. This function uses a criterion to sum the data. If it is true, the data is added to the total. If not, nothing is added. This function has three parameters:

  • The value range – in our case, these are the extensions in the first worksheet.
  • The criterion, that is, the extension that we want to match. For the first row, that is cell A2
  • The sum interval, in our case the Sizes range

The formula that goes in cell B2 is =SUMIF(LargeFilesReport!$E$4:$E$1451;A2;Sizes). Copy this formula for all extensions, so you can see the size occupied by all extensions. To know the number of files in each extension we use the COUNTIF function. It is similar to the SUMIF function, but only has two parameters, the interval and the criterion. The formula to be used is =COUNTIF(LargeFilesReport!E4:E1449;A2). Copying this formula, we can know the number of files for each extension.

We will sort the data by size, so we can know which extensions use more space. Select the three columns and, on then Home tab, select Sort and Filter/Custom Sort. Select the size sort, from the largest to the smallest.

To illustrate the data, we will add a pie chart that shows the used disk space by extension. Select the data of the first two columns (by clicking on cell A1 and pressing [Shift+End] [Down-Arrow] [Right-Arrow] and, on the Insert tab, select a Pie 2D chart, Pie of Pie. Excel inserts a chart similar to Figure 3, allowing you to see which extensions use most of your disk. On my case, we can see that the extensions bak, dll, lib, plg e exe use more than ¾ of the total space!

 

Figure 3- Table of sizes by extension with chart

ABC Curve

We already have our list sorted, but we do not know the files that we must delete – to clean all 11GB, we should delete all 1400 files, and that is a lot of work. Let us optimize the work and focus on the files that really matter. Using the 80-20 rule (http://en.wikipedia.org/wiki/Pareto_principle), 20% of the files should use about 80% of the disk space. Let us see if this really happens here, by creating an ABC curve. Create a new worksheet and rename it to “ABC Curve”. Select the title of the column FullName on the first worksheet and press [Shift] +[End] + [Down-Arrow] to select the file names and press [Ctrl] + [C] to copy. Return to the new worksheet and paste the data clicking on the A1 cell and pressing [Ctrl] + [V]. On the B1 cell, put the title “Accumulated %”.

The formula we will create shows the accumulated space percentage for every file. For example, the first file uses 3.77GB/11.34GB, or 33.25% of the used space. The second one uses 266.67MB/11.34GB, or 2.35% of the used space, that added to the previous one, sum to 35.60% of the used space. We keep summing all accumulated percentages in this column and see the percentage until it reaches 80%. When it reaches 80%, we have the last file we need to focus and leave the rest of the files untouched.

To create this formula, we go to cell B2 of the new worksheet and use the function =SUM(). Go to the first worksheet and select cell B3 and press “:” and select B4 again. We want to fix the first value, adding always from cell B4 to the current cell when we copy, so we go to the first B4 in the formula and press [F4] to make it fixed. Excel puts a “$” in the line and in the column, to indicate that this value is fixed and will not change when it is copied. We type the closing parenthesis in the formula and type “/”. Then, we go to the total value in the first worksheet, select it and press [F4] to make it a fixed value. The resulting formula is:

=SUM(LargeFilesReport!$B$4:LargeFilesReport!B4)/LargeFilesReport!$B$1

Pressing [Enter], the value for the percentage of the first file is shown. We can format if as a percentage right clicking the mouse and selecting Format Cells, then Percentage. When we select a cell, a small square is shown in the right bottom border of the selection. Click on the square and drag it to the front of the last file name. When we release the mouse button, the formula is copied for all selected cells. We can now verify where our limit is. On my file, that is shown on line 278, in the 277th file, from the total of 1446 files. That is 19.16% of the files, very close from 20%, no?

We can finish this worksheet with a chart. Select cell B1 and press [Shift] + [End] + [Down-Arrow] to select the percentages. Excel 2013 shows chart suggestions. Select the Chart suggestion and insert a Line chart. A chart similar to Figure 4 is shown.

Figure 4 – Worksheet with ABC Curve

I did a small change in this chart. When you add it, the percentage axis goes to 120%. As I do not want percentages over 100%, I clicked on the axis to select it, selected Format Axis and put the maximum value to 1.

We can finish the worksheet, making it more flexible, by using the Auto-Filter resource. Select the data from the first worksheet and, in the Data tab, click in Filter. Excel shows a button in every column title, where we can filter or sort the data by any column.

When we use the Auto-Filter, the sum we have put in the first line does not match the filtered values, it always matches the total values. To get the values for the filtered data, we use the SUBTOTAL function. On cell A2, put the title “Filtered Total” and, on cell B2, put the function =SUBTOTAL(9;Sizes). That makes that the cell shows the sum of the filtered records. On cell C2, put the formula =SUBTOTAL(3;Sizes), to show the quantity of filtered records. On Figure 5, I’ve used the Auto-Filter to filter all the files with names starting with “D:\Projetos” (D:\Projects). That way, I can know that in my projects folder I have 558 files (about 1/3 of the total), using 6.18GB (a bit more than half of the used space).

Figure 5 – Worksheet with filtered data with Auto-Filter

Conclusions

We could see in this article a little bit of the power and flexibility of Excel. By combining it with other tools, like PowerShell, we multiply that power. Many tools export their data to the csv format, and can be read and analyzed by Excel.

We saw here that, with the PowerShell command output to csv, we could read the data in Excel and analyze them in many ways. This is very good, allows that the user customize the way she wants to see the data, easily getting reports or charts with no need of any kind of programming (did you notice that we did not use any kind of macro language, just Excel commands?).

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>