Memory mapped files and pointers… or not ???
Last week I was in Seattle for the MVP summit. Whilst there I saw a brief presentation from Joe Kunk on Memory Mapped files in .NET 4. I suggested to Joe that perhaps pointers might give a performance boost. Then someone from Microsoft (not to mention names) suggested if that were the case, it might be a good argument for the inclusion of pointers in a future version. I chatted to Joe afterwards and he suggested I download the code and give it a try, so I did. The following are my findings, and the journey along the way. The end result is a dramatic performance improvement:
First steps : code review
I downloaded Joe’s code and created a 5GB sample data file. I figured 5 GB is big enough to push it into the 64 bit realm. I ran the code and it took about 25 minutes to create the index file. I had a quick look at the code and identified some areas to immediately tweak. Inside the main code block loop there were a number of conversions going on, some implied; all of which tend to slow code down. The original code was as follows:
For i = 1 To accessor.Capacity
Dim CurrentByte = accessor.ReadByte(ViewPosition) : ViewPosition += 1
Dim CurrentChar = Chr(CurrentByte)
If (Char.IsLetterOrDigit(CurrentChar)) Then
InWord = True
Wordlength += 1
If ((Char.IsUpper(CurrentChar) AndAlso CurrentChar = "R") OrElse
(Char.IsLower(CurrentChar) AndAlso CurrentChar = "r")) Then
HasR = True
End If
There’s a number of issues to be aware of here. The code that compares a Char with a String causes overhead, causing the char to be converted to a string and then calling VB’s string comparison routines. Rather than write AndAlso CurrentChar = "R" , it’s better to write AndAlso CurrentChar = "R"c) . This makes it a simple comparison of Char’s.
The calls to IsUpper and IsLower are completely redundant and are more expensive than the Char comparison calls. In fact, it’s questionable if Char is even needed here. Given this code is purely designed for ASCII, not Unicode, I rewrote the code to use Byte instead of Char for this initial part of the loop. Modifying the code only took a few minutes, the only bump being the need to create a replacement for the IsLetterOrDigit call. I’d argue IsLetterOrDigit might not be what is wanted in the first place, as it means words with hyphens, @ signs or apostrophes are excluded. Given St Pat’s day is next week, I certainly wouldn’t want to be excluding any of the O’Reily’s 😉
Never the less, I kept the code pure to it’s original implementation. For IsLetterOrDigit, I made a simple True/False array. This was easy to make by writing out the values into the debug window, e.g:
For i = 0 To 255
Debug.Write(Char.IsLetterOrDigit(Chr(i)) & ", ")
Next
From the output in debug window the output from that, I created a lookup array via cut and paste:
Private IsLetterOrDigit() As Boolean = {False, False, False, False, False, ……………
With those quick changes in place the test time dropped to around 18 to 19 minutes: a significant improvement, and well worth the time it took to make them, but nothing earth shattering.
Look for bottlenecks
On my desktop machine I have standard mechanical hard drives and newer Solid State Drives (SSD’s). The SSD’s are blistering fast, and make a huge gain in read times (and windows boot times). Putting the data file on a SSD gave a performance improvement of about another 5 minutes. (test was now 14.4 minutes). Although again a significant improvement, it wasn’t the order of magnitude I would expect in what is basically a file IO bound operation. This alluded to the bottleneck being the code, not the hardware. So it was time to revisit the entire Memory Mapped file hypothesis. The first step was to write a standard benchmark using a standard FileStream.
My first runs came in at about 36 seconds !!
I set the FileStream to read the data in blocks of &H10000 bytes at a time. It’s an arbitrary figure, but later testing shows it’s around the sweet spot for this particular machine. My first runs came in at about 36 seconds !! I was astounded. Some minor tweaks and the time was down to 32 seconds. I checked and verified the output. The locations were correct and I had the same number of matches as Joe did plus one. I’m guessing the extra one is a boundary case condition that my code picked up on. (It’d probably be easy enough to test by looking at the last entries)
Out of curiosity I put the data file back on the old mechanical drive. The time jumped massively from 30 seconds to about 90 seconds. This told me the bottle neck was now the hardware. Investment spent on the hardware rather than the software, such as SSD’s and even RAID SSD’s etc, are worth investigating at this point.
As to pointers?
Well I never bothered writing the code to test the pointers and memory mapped file. The code I wrote was already 40 to 50 times faster than the memory mapped file implementation. With the file stream approach I estimate pointers to save about 10%, or 3 seconds on the fastest case scenario. If you look under the covers of the file stream you’ll see the buffer used internally uses pointers with win API call to ReadFile. With a bit of work you could write your own wrapper, call ReadFile yourself and skip the copying from the internal buffer to the buffer used in the search loop. Rather than go to that length, I simply tested what that extra copying was causing by copying into yet another buffer from inside the search loop: that added about 3 seconds to the code. It hardly seems worth the effort. Rather than spend time porting the code to use pointers it would be probably more worth the time to investigate making the code run async (I’ll leave that one for another day 😉 )
But that doesn’t mean there isn’t a case for pointers with memory mapped files. I used them many years ago (even from VB6). All I have shown is that for this case scenario, Memory Mapped files are not a good match. A compelling case scenario for pointers and memory mapped files remains to be shown.
Conclusion
Memory mapped files are useful in many scenarios, such as shared data manipulation, and inter process communication. For raw data reads however, they are overhead with no realizable benefit.
- Create benchmarks using similar techniques. And look under the covers to try to understand the differences and their significance.
- Write well typed code and use Strict On semantics where ever possible. For example the quick code clean-up I did on the original code shaved off 5 minutes. Although that was only 20% improvement for the initial code, it was quick and easy to implement. On the final code, that 5 minutes is a massive overhead on top of 30 seconds (10 fold) !! So little bits of detail in your code can go a long long way to writing high performance code.
#Region "modifications by Bill McCarthy"
Private Function CreateIndexEntryList(ByVal TextFilePath As String) As List(Of IndexEntry)
Dim index As New List(Of IndexEntry)(&HF0000)
Dim bufferSize As Int32 = &H10000
Dim offset As Int32 = 0
Dim buffer(0 To bufferSize - 1) As Byte
Dim filePosition As Long = 0
Dim inWord = False
Dim wordlength As Int32 = 0
Dim hasR = False
Dim sb As New System.Text.StringBuilder(6)
Try
Using stream = New IO.FileStream(TextFilePath, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize)
Dim countRead = stream.Read(buffer, offset, bufferSize - offset)
While countRead > 0
Dim upper = countRead + offset
For i = offset To upper - 1
Dim currentByte = buffer(i)
If (IsLetterOrDigit(currentByte)) Then
inWord = True
wordlength += 1
If (currentByte = byteUpper_R) OrElse (currentByte = byteLower_r) Then
hasR = True
End If
Else
If (wordlength = 5 And hasR = True) Then
Dim bufferPosition = i - 5
sb.Clear()
For j = 0 To 4
sb.Append(Chr(buffer(bufferPosition + j)))
Next
index.Add(New IndexEntry(sb.ToString, filePosition + bufferPosition - offset))
End If
inWord = False
wordlength = 0
hasR = False
End If
Next
offset = wordlength
If offset > 0 Then
For i = 0 To offset - 1
buffer(i) = buffer(upper + i - offset)
Next
End If
filePosition += countRead
countRead = stream.Read(buffer, offset, bufferSize - offset)
End While
End Using
Catch ex As Exception
Dim message = "IndexingLargeFiles.Index.CrearteIndexEntryList: " + ex.Message
Throw New Exception(message, ex)
End Try
Return index
End Function
Private Const byteUpper_R = CByte(Asc("R"c))
Private Const byteLower_r = CByte(Asc("r"c))
Private IsLetterOrDigit()As Boolean = {False,False,False,False,False,False,
False, False, False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, True, True, True, True, True, True, True,
True, True, True, False, False, False, False, False, False, False, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, False, False, False, False,
False, False, True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True, True, True,
False, False, False, False, False, False, False, False, True, False, False, False,
False, True, False, True, False, True, False, True, False, False, False, False, False,
False, False, False, False, False, False, True, False, True, False, True, True, False,
False, False, False, False, False, False, False, False, False, True, False, False,
False, False, False, False, False, False, False, False, True, False, False, False,
False, True, False, False, False, False, False, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, False, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, False, True, True, True, True, True, True,
True, True}
#End Region
Well done! Do you think that maybe memory mapped files would be better suited if random access was needed? This app just trudges through the file sequentially, which isn’t a compelling scenario for a memory-mapped file.
Hi Mark. I doubt it. The problem is you’d need to fit the entire file in a view, otherwise random access could be potentially much more expensive with memory mapped files as you could potentially be filling an entire view for each call (worse case scenario). If the random isn’t entirely random, rather it falls within a view, then it maybe better than the sequential test; although it has a long way to go to catch up with ReadFile (FileStream)