Clear As Mud




A question on one of the forums recently asked about
hiding sheets in a workbook to be posted as an example, but where that workbook contained sensitive information. As I replied, hiding the sheets doesn’t really
hide the data so he would be just as exposed.

Jimmy Pena has a a great application for generating
random data
, but sometimes it is just better to scramble the data that you already
have. I suggested to the poster that they should really be scrambling the data,
and further suggested that it should not be too difficult to write some code to
do so.

That got me thinking as to how to do it, in a generic
manner. It should cater for names, ids, amounts, currencies and so on.

A Simple Scrarmbling

My first thought was to insert a row at the top of the
target sheet, and add an entry for all columns to be obfuscated. So, for
instance, if it had ‘Name’ at the top, each entry in that column would be
changed to Name #1, Name #2 etc. If it had Number<7>, you would generate
a random number up to 9,999,999. If it had Decimal<5,2> you would
generate numbers up 99,999.99, and so on. If the value in row 1 was blank, that
column gets ignored.

So you would insert a row with values such as ‘Name’, ‘Number<7>’,
and so on.

The code for this was relatively simple to write, just a
loop through each column and change the values. To ensure that all like values
get changed to the same value, I did a Replace on the whole column once a value
to be changed was found. In other words, if Bob appeared 100 times in the
column, each instance of Bob would be changed to the same value. To ensure that
I didn’t then go and change the second, changed, instance of Bob to some new
value, I pre-pended the new value with ‘Obfuscate’, which I stripped off with a
Replace at the end.

The main loop was like this

= .Cells(1, j).Value

= 0


mpDataType <> “” Then

mpLastRow = .Cells(.Rows.Count, j).End(xlUp).Row


                For i
= 2 To mpLastRow

Not .Cells(i, j).Value Like “Obfuscated*” Then


mpNextItem = mpNextItem + 1

Select Case True


Case mpDataType Like “Number<*”



Case mpDataType Like “Decimal<*”

                                ‘decimal processing


Case mpDataType Like “Currency<*”



Case Else                                

.Columns(j).Replace What:=.Cells(i, j).Value, _

Replacement:=”Obfuscated” & mpDataType & ”
#” & mpNextItem, _

LookAt:=xlWhole, _


End Select

End If

                Next i


.UsedRange.Replace What:=”Obfuscated”, _

Replacement:=””, _

LookAt:=xlPart, _



So far, so good, but I could see one major problem. If there
is a formula that refers to data somewhere else on the spreadsheet, that table
needs to be obfuscated too, but in a smart way. For instance, say that there is
a lookup table of names Bob, Simon and Alex. If there is a formula somewhere of
=VLOOKUP(A20,lookup_table,2,FALSE), and A20 is one of those values in the table,
then that value in the lookup table should change to the same value that A20
switches too. Unfortunately, it isn’t only VLOOKUP, it is LOOKUP, HLOOKUP,
COUNTIF, and so on. Tough!

It is actually worse if the values are used elsewhere a
simple cell reference or by a code event update, there is no way in my code to
recognise that.


In the end, I decided to avoid this route, far too
difficult, and I opted to save all the before values, and all of after values in
separate arrays, and after having removed the ‘Obfuscated’ tag I went through
each sheet and checked if any of the before values still remained, if so I
replaced them.

A bit brute force, but it seems to work okay.

The data is first changed like so


Case Else

                                mpIdxChange =
mpIdxChange + 1

mpBefores(mpIdxChange) = .Cells(i, j).Value

mpAfters(mpIdxChange) = mpDataType & ” #” & mpNextItem

                                If mpIdxChange
Mod 1000 = 0 Then


= UBound(mpBefores) + 1000

Preserve mpBefores(1 To mpSizeArray)

Preserve mpAfters(1 To mpSizeArray)

                                End If


.Columns(j).Replace What:=.Cells(i, j).Value, _

Replacement:=”Obfuscated” & mpAfters(mpIdxChange), _



 And finally ‘corrected’
like so


        ‘now make the
changes for names, ids, etc.

        ReDim Preserve
mpBefores(1 To mpIdxChange)

        ReDim Preserve
mpAfters(1 To mpIdxChange)


        ‘first remove
the tag

.UsedRange.Replace What:=”Obfuscated”, _

Replacement:=””, _

LookAt:=xlPart, _



        For Each mpWS
In ActiveWorkbook.Worksheets


update any associated values

            For i = 1
To mpIdxChange


mpWS.UsedRange.Replace What:=mpBefores(i), _

Replacement:=mpAfters(i), _

LookAt:=xlWhole, _


            Next i

        Next mpWS



Of course, it can be taken a lot further, adding further
sophistication. For example, it doesn’t explicitly cater for an Excel
spreadsheet that is constructed as a pseudo-database, separate, linked tables. And it
probably needs a decent progress bar  as
it could take some time on large data.

An addin with the code and an example file can be downloaded
via the RSS feed. The addin adds an item to the Data menu.


Leave a Reply

Your email address will not be published. Required fields are marked *