Clear As Mud

Normal
0

false
false
false

MicrosoftInternetExplorer4


A question on one of the forums recently asked about
hiding sheets in a workbook to be posted as an example, but where that workbook contained sensitive information. As I replied, hiding the sheets doesn’t really
hide the data so he would be just as exposed.

Jimmy Pena has a a great application for generating
random data
, but sometimes it is just better to scramble the data that you already
have. I suggested to the poster that they should really be scrambling the data,
and further suggested that it should not be too difficult to write some code to
do so.

That got me thinking as to how to do it, in a generic
manner. It should cater for names, ids, amounts, currencies and so on.

A Simple Scrarmbling
App

My first thought was to insert a row at the top of the
target sheet, and add an entry for all columns to be obfuscated. So, for
instance, if it had ‘Name’ at the top, each entry in that column would be
changed to Name #1, Name #2 etc. If it had Number<7>, you would generate
a random number up to 9,999,999. If it had Decimal<5,2> you would
generate numbers up 99,999.99, and so on. If the value in row 1 was blank, that
column gets ignored.

So you would insert a row with values such as ‘Name’, ‘Number<7>’,
and so on.

The code for this was relatively simple to write, just a
loop through each column and change the values. To ensure that all like values
get changed to the same value, I did a Replace on the whole column once a value
to be changed was found. In other words, if Bob appeared 100 times in the
column, each instance of Bob would be changed to the same value. To ensure that
I didn’t then go and change the second, changed, instance of Bob to some new
value, I pre-pended the new value with ‘Obfuscate’, which I stripped off with a
Replace at the end.

The main loop was like this

            mpDataType
= .Cells(1, j).Value

            mpNextItem
= 0

       

            If
mpDataType <> “” Then

               
mpLastRow = .Cells(.Rows.Count, j).End(xlUp).Row

 

                For i
= 2 To mpLastRow


                    If
Not .Cells(i, j).Value Like “Obfuscated*” Then

                   

                       
mpNextItem = mpNextItem + 1

                       
Select Case True

                       

                           
Case mpDataType Like “Number<*”

                                ‘number
processing

                           

                           
Case mpDataType Like “Decimal<*”

                                ‘decimal processing

                       

                           
Case mpDataType Like “Currency<*”

                               ‘currency
processing

                               

                           
Case Else                                

                               
.Columns(j).Replace What:=.Cells(i, j).Value, _

                                                   
Replacement:=”Obfuscated” & mpDataType & ”
#” & mpNextItem, _

                                                   
LookAt:=xlWhole, _

                                                   
SearchOrder:=xlByRows

                       
End Select

                   
End If

                Next i

 

           
.UsedRange.Replace What:=”Obfuscated”, _

                          
Replacement:=””, _

                          
LookAt:=xlPart, _

                          
SearchOrder:=xlByRows

 

So far, so good, but I could see one major problem. If there
is a formula that refers to data somewhere else on the spreadsheet, that table
needs to be obfuscated too, but in a smart way. For instance, say that there is
a lookup table of names Bob, Simon and Alex. If there is a formula somewhere of
=VLOOKUP(A20,lookup_table,2,FALSE), and A20 is one of those values in the table,
then that value in the lookup table should change to the same value that A20
switches too. Unfortunately, it isn’t only VLOOKUP, it is LOOKUP, HLOOKUP,
COUNTIF, and so on. Tough!

It is actually worse if the values are used elsewhere a
simple cell reference or by a code event update, there is no way in my code to
recognise that.

Obfuscator

In the end, I decided to avoid this route, far too
difficult, and I opted to save all the before values, and all of after values in
separate arrays, and after having removed the ‘Obfuscated’ tag I went through
each sheet and checked if any of the before values still remained, if so I
replaced them.

A bit brute force, but it seems to work okay.

The data is first changed like so

 

                           
Case Else

                                mpIdxChange =
mpIdxChange + 1

                               
mpBefores(mpIdxChange) = .Cells(i, j).Value

                               
mpAfters(mpIdxChange) = mpDataType & ” #” & mpNextItem

                                If mpIdxChange
Mod 1000 = 0 Then

                           
    

                                    mpSizeArray
= UBound(mpBefores) + 1000

                                    ReDim
Preserve mpBefores(1 To mpSizeArray)

                                    ReDim
Preserve mpAfters(1 To mpSizeArray)

                                End If

                               

                               
.Columns(j).Replace What:=.Cells(i, j).Value, _

                                                   
Replacement:=”Obfuscated” & mpAfters(mpIdxChange), _

                                                    LookAt:=xlWhole,
_

                                                   
SearchOrder:=xlByRows

 And finally ‘corrected’
like so

 

        ‘now make the
changes for names, ids, etc.

        ReDim Preserve
mpBefores(1 To mpIdxChange)

        ReDim Preserve
mpAfters(1 To mpIdxChange)

       

        ‘first remove
the tag

       
.UsedRange.Replace What:=”Obfuscated”, _

                      
Replacement:=””, _

                      
LookAt:=xlPart, _

                      
SearchOrder:=xlByRows

                      

        For Each mpWS
In ActiveWorkbook.Worksheets

        

            ‘then
update any associated values

            For i = 1
To mpIdxChange

           

               
mpWS.UsedRange.Replace What:=mpBefores(i), _

                                      
Replacement:=mpAfters(i), _

                                      
LookAt:=xlWhole, _

                                      
SearchOrder:=xlByColumns

            Next i

        Next mpWS

       

       
.Rows(1).Delete

Of course, it can be taken a lot further, adding further
sophistication. For example, it doesn’t explicitly cater for an Excel
spreadsheet that is constructed as a pseudo-database, separate, linked tables. And it
probably needs a decent progress bar  as
it could take some time on large data.

An addin with the code and an example file can be downloaded
via the RSS feed. The addin adds an item to the Data menu.

 

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>