Clear As Mud



Normal
0


false
false
false







MicrosoftInternetExplorer4










A question on one of the forums recently asked about hiding sheets in a workbook to be posted as an example, but where that workbook contained sensitive information. As I replied, hiding the sheets doesn’t really hide the data so he would be just as exposed.


Jimmy Pena has a a great application for generating random data, but sometimes it is just better to scramble the data that you already have. I suggested to the poster that they should really be scrambling the data, and further suggested that it should not be too difficult to write some code to do so.


That got me thinking as to how to do it, in a generic manner. It should cater for names, ids, amounts, currencies and so on.


A Simple Scrarmbling App


My first thought was to insert a row at the top of the target sheet, and add an entry for all columns to be obfuscated. So, for instance, if it had ‘Name’ at the top, each entry in that column would be changed to Name #1, Name #2 etc. If it had Number<7>, you would generate a random number up to 9,999,999. If it had Decimal<5,2> you would generate numbers up 99,999.99, and so on. If the value in row 1 was blank, that column gets ignored.


So you would insert a row with values such as ‘Name’, ‘Number<7>’, and so on.


The code for this was relatively simple to write, just a loop through each column and change the values. To ensure that all like values get changed to the same value, I did a Replace on the whole column once a value to be changed was found. In other words, if Bob appeared 100 times in the column, each instance of Bob would be changed to the same value. To ensure that I didn’t then go and change the second, changed, instance of Bob to some new value, I pre-pended the new value with ‘Obfuscate’, which I stripped off with a Replace at the end.


The main loop was like this


            mpDataType = .Cells(1, j).Value


            mpNextItem = 0


       


            If mpDataType <> “” Then


                mpLastRow = .Cells(.Rows.Count, j).End(xlUp).Row


 


                For i = 2 To mpLastRow




                    If Not .Cells(i, j).Value Like “Obfuscated*” Then


                   


                        mpNextItem = mpNextItem + 1


                        Select Case True


                       


                            Case mpDataType Like “Number<*”


                                ‘number processing


                           


                            Case mpDataType Like “Decimal<*”


                                ‘decimal processing


                       


                            Case mpDataType Like “Currency<*”


                               ‘currency processing


                               


                            Case Else                                


                                .Columns(j).Replace What:=.Cells(i, j).Value, _


                                                    Replacement:=”Obfuscated” & mpDataType & ” #” & mpNextItem, _


                                                    LookAt:=xlWhole, _


                                                    SearchOrder:=xlByRows


                        End Select


                    End If


                Next i


 


            .UsedRange.Replace What:=”Obfuscated”, _


                           Replacement:=””, _


                           LookAt:=xlPart, _


                           SearchOrder:=xlByRows


 


So far, so good, but I could see one major problem. If there is a formula that refers to data somewhere else on the spreadsheet, that table needs to be obfuscated too, but in a smart way. For instance, say that there is a lookup table of names Bob, Simon and Alex. If there is a formula somewhere of =VLOOKUP(A20,lookup_table,2,FALSE), and A20 is one of those values in the table, then that value in the lookup table should change to the same value that A20 switches too. Unfortunately, it isn’t only VLOOKUP, it is LOOKUP, HLOOKUP, COUNTIF, and so on. Tough!


It is actually worse if the values are used elsewhere a simple cell reference or by a code event update, there is no way in my code to recognise that.


Obfuscator


In the end, I decided to avoid this route, far too difficult, and I opted to save all the before values, and all of after values in separate arrays, and after having removed the ‘Obfuscated’ tag I went through each sheet and checked if any of the before values still remained, if so I replaced them.


A bit brute force, but it seems to work okay.


The data is first changed like so


 


                            Case Else


                                mpIdxChange = mpIdxChange + 1


                                mpBefores(mpIdxChange) = .Cells(i, j).Value


                                mpAfters(mpIdxChange) = mpDataType & ” #” & mpNextItem


                                If mpIdxChange Mod 1000 = 0 Then


                                


                                    mpSizeArray = UBound(mpBefores) + 1000


                                    ReDim Preserve mpBefores(1 To mpSizeArray)


                                    ReDim Preserve mpAfters(1 To mpSizeArray)


                                End If


                               


                                .Columns(j).Replace What:=.Cells(i, j).Value, _


                                                    Replacement:=”Obfuscated” & mpAfters(mpIdxChange), _


                                                    LookAt:=xlWhole, _


                                                    SearchOrder:=xlByRows


 And finally ‘corrected’ like so


 


        ‘now make the changes for names, ids, etc.


        ReDim Preserve mpBefores(1 To mpIdxChange)


        ReDim Preserve mpAfters(1 To mpIdxChange)


       


        ‘first remove the tag


        .UsedRange.Replace What:=”Obfuscated”, _


                       Replacement:=””, _


                       LookAt:=xlPart, _


                       SearchOrder:=xlByRows


                      


        For Each mpWS In ActiveWorkbook.Worksheets


        


            ‘then update any associated values


            For i = 1 To mpIdxChange


           


                mpWS.UsedRange.Replace What:=mpBefores(i), _


                                       Replacement:=mpAfters(i), _


                                       LookAt:=xlWhole, _


                                       SearchOrder:=xlByColumns


            Next i


        Next mpWS


       


        .Rows(1).Delete


Of course, it can be taken a lot further, adding further sophistication. For example, it doesn’t explicitly cater for an Excel spreadsheet that is constructed as a pseudo-database, separate, linked tables. And it probably needs a decent progress bar  as it could take some time on large data.


An addin with the code and an example file can be downloaded via the RSS feed. The addin adds an item to the Data menu.


 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>