### Can you optimize this code ? (Is GoTo evil ?)

yesterday I posted a bit of code that was checkign to see if a string was whitespace only.  Let’s assume the assumption I made was correct, that checkign the middle of the string then spreading out in each direction is the best way of doing that.

So the code I worte yesterday had some branching “issues”.  As I wrote it, I really felt I was missing something obvious… you know that, “this just doesn’t feel right” kind of feeling.  To do the branching I used a boolean flag, hasNonWhitespace.  The only alternative I could think of was to use a GoTo statement.  So here’s the first cut, and also a second version using GoTo.   Is GoTo evil ?  Can you optimize this code some other way ????

Public Function IsWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean
Dim length As Int32 = stringToCheck.Length
Dim middle As Int32 = length \ 2
Dim first As Int32 = middle
Dim second As Int32 = middle + 1
Dim chr As Char

If whitespaceChars Is Nothing OrElse whitespaceChars.Length = 0 Then
whitespaceChars = New Char() {” “c, ChrW(&H3000)}
End If

Dim whitespaceUbound As Int32 = whitespaceChars.Length – 1
Dim hasNonWhitespace As Boolean = False

For i As Int32 = 0 To middle

If first >= 0 Then
hasNonWhitespace = True
chr = stringToCheck.Chars(first)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
hasNonWhitespace = False
Exit For
End If
Next
first -= 1
End If

If hasNonWhitespace Then Return False

If second < length Then
hasNonWhitespace = True
chr = stringToCheck.Chars(second)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
hasNonWhitespace = False
Exit For
End If
Next
second += 1
End If

If hasNonWhitespace Then Return False

Next

Return Not hasNonWhitespace

End Function

‘ using GoTo

Public Function IsWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean

Dim length As Int32 = stringToCheck.Length
Dim middle As Int32 = length \ 2
Dim first As Int32 = middle
Dim second As Int32 = middle + 1

Dim chr As Char

If whitespaceChars Is Nothing OrElse whitespaceChars.Length = 0 Then
whitespaceChars = New Char() {” “c, ChrW(&H3000)}
End If

Dim whitespaceUbound As Int32 = whitespaceChars.Length – 1

For i As Int32 = 0 To middle

If first >= 0 Then
chr = stringToCheck.Chars(first)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
first -= 1
GoTo secondpart
End If
Next
Return False
End If

secondpart:
If second < length Then

chr = stringToCheck.Chars(second)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
second += 1
GoTo nextpart
End If
Next
Return False
End If

nextpart:
Next

Return True

End Function

## 21 Comments so far

1.   Geoff Appleby on June 6th, 2005

I don’t see GoTo’s as evil, so long as you have a damn good reason for using them.

And I’ve tried. Really tried. Really hard.

I can’t come up with anything that runs faster.

I’m still trying tho

2.   Bill McCarthy on June 6th, 2005

Hey Geoff,

I think you’ve hit the nail on the head … it’s not that they are evil, more it’s the way that we use them that can be. In complicated code, or large methods I think GoTo should be avoided. With complicated code they can actually make it hard to restructure, and can easily lend themselves to spaghetti style code. In large methods, they tend to be used instead of breaking the method out into smaller methods. But in this kind of code where the method is pretty simple, unlikely to need major changes so I am thinking that maybe goto is not so bad in this case. What I found interesting though was my instinct was to avoid using GoTo, rather use a flag, or look at even using While True blocks etc.

BTW: I’m glad you haven’t been able to optimize it Makes me feel happier about that choice.

3.   Matthew Douglass on June 7th, 2005

I’m going to explain my thoughts first, then show the code since I wrote it without really knowing VB.NET and without a compiler:

To me the first problem with your function is the duplication of code where you check if the first character is whitespace and where you check if the second character is whitespace. If you pull that out into its own function that simplies the code and removes the need for the goto:

Please excuse any obvious typos/mistakes.

Public Function IsWhitespaceCharacter(ByVal charToCheck As Char, ByVal ParamArray whitespaceChars() As Char) As Boolean

Dim whitespaceUbound As Int32 = whitespaceChars.Length – 1

For j As Int32 = 0 To whitespaceUbound

If charToCheck = whitespaceChars(j) Then

Return True

End If

Next

Return False

End Function

Public Function IsWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean

Dim length As Int32 = stringToCheck.Length

Dim middle As Int32 = length \ 2

Dim first As Int32 = middle

Dim second As Int32 = middle + 1

If whitespaceChars Is Nothing OrElse whitespaceChars.Length = 0 Then

whitespaceChars = New Char() {" "c, ChrW(&H3000)}

End If

For i As Int32 = 0 To middle

If first >= 0 Then

If Not IsWhitespaceCharacter(stringToCheck.Chars(first), whitespaceChars) Return False;

first -= 1

End If

If second < length Then

If Not IsWhitespaceCharacter(stringToCheck.Chars(second), whitespaceChars) Return False;

second += 1

End If

Next

Return True ‘ I believe a empty string is considered whitespace only, correct as necessary.

End Function

4.   Bill McCarthy on June 7th, 2005

Hey Matthew,

The code you posted looks pretty good to me (I haven’t tested it). The problems I see with it though are increased call stack usage, and the ubbound for the character array loop is recalculated each time the IsWhitespaceCharacter function is called. So my guess is it would result in a performance penalty.

5.   Matthew Douglass on June 7th, 2005

The ubound recalculated bothered me too — I almost passed it in as an extra parameter. But I thought their might be some use for the IsWhitespaceCharacter as a standalone function, so I didn’t. If you knew it was internal only, then I’d add it as a parameter at the small cost of increased coupling between the two.

The call stack usage doesn’t bother me too much — mainly because I think there is a decent chance it would get inlined. I’d love if someone who has a VB.NET compiler could do a quick perf analysis of the two.

6.   Bill McCarthy on June 7th, 2005

I can’t right now, but I’ll run some tests later tonight and post back here. It sounds like Geoff may have run some tests… I actually haven’t run any yet.

BTW: I actually think my base assumption of starting searching in the middle is probably not a good assumption. I tend to find strings generally have leading whitespace rather than trailing whitespace, but it would depend on usage. Take for exampel reading from a text file line by line, if the lines actually have the carriage returns and or line feeds on the and of them, and they are considered whitespace, the optimal may be searching from "near" the end but not at the end.

7.   Geoff Appleby on June 7th, 2005

Bill, no, i think your right in starting from the middle.

Remember, the way the mothod is structured is to find one single non-whitespace char ASAP, and bail out immediately if it does. If whitespace mostly appears on the left or right, it doesn’t matter. most of the time (and we can only talk probabilities here) a non-whitespace char is highly likely to be somewhere near the middle of the string, or not too far away from there.

As for tests, yeah, I have been. I’ll run through Matthew code when I get a chance.

8.   Bill McCarthy on June 7th, 2005

Hey Geoff,

I think you’re wrong about me being right. It’s probably not a bad assumption. I think when I look at strings, there’s a reasonable probability they will start with whitespace. Chances of them having more than one consequtive whitespace char in the middle is low, but only if the ratio of test to whitespace is generally large. One example that comes to mind is say parsing an indented XML file. LEt’s assume they use "space" or similar ,not tabs. There the likelihood of the ratio of whitespace to non-whitespace on any given line is likely to be > 1:1,( depending on the depth of the object model and the type of xml output used) in many cases. So for those lines, the ideal spot to start would be near the end. Definetly not at the start, not so the middle, and maybe not the end. We could optimise this better is we knew the whitespace char list was optimised accordingly. That is if it checks for cr’s and lf’ early in the list, then it’s only going to be one or two comparisons for the ending chars, rather than the 20 ro so comparisons it can take ot declare a character as not being whitespace. Obviously the relative efficeincy depends a lot on the whitespace shar list though, whether cr and lf are considered to be whitespace, whether the whitespace lsit is large etc. But If I take the two standard lsits, VB’s 2 char one, and String’s 21 char one, then I think searching from the end in will genrally be the most efficient, as we can reduce the complexity of your code, and with the VB char set, as cr or lf is considered non whitespace, and with the string char set the cr and lf are pretty early on in the list, so the cost of those likely couple of comparisons shoudl be trivial compared to the cost of settign up to say work from the middle out.

Okay, maybe, maybe not… it’s a theory anyway

9.   Geoff Appleby on June 7th, 2005

binary search perhaps?

10.   Geoff Appleby on June 7th, 2005

Here’s an interesting one for you. based on a few sample strings of random gumpf, this one seemd to deal better with longer strings…

Protected Overrides Function IsWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean

Dim length As Int32 = stringToCheck.Length

Dim middle As Int32 = length \ 2

Dim quarter As Int32 = middle \ 2

Dim first As Int32 = middle

Dim second As Int32 = middle + 1

Dim chr As Char

If whitespaceChars Is Nothing OrElse whitespaceChars.Length = 0 Then

whitespaceChars = New Char() {" "c, ChrW(&H3000)}

End If

Dim whitespaceUbound As Int32 = whitespaceChars.Length – 1

‘check first char

chr = stringToCheck.Chars(0)

For j As Int32 = 0 To whitespaceUbound

If chr = whitespaceChars(j) Then

GoTo firstpart

End If

Next

Return False

firstpart:

If length = 1 Then Return True

‘check last char

chr = stringToCheck.Chars(length – 1)

For j As Int32 = 0 To whitespaceUbound

If chr = whitespaceChars(j) Then

GoTo secondpart

End If

Next

Return False

secondpart:

If length = 2 Then Return True

‘check middle char

chr = stringToCheck.Chars(middle)

For j As Int32 = 0 To whitespaceUbound

If chr = whitespaceChars(j) Then

GoTo thirdpart

End If

Next

Return False

thirdpart:

If length = 3 Then Return True

‘check first quarter char

chr = stringToCheck.Chars(quarter)

For j As Int32 = 0 To whitespaceUbound

If chr = whitespaceChars(j) Then

GoTo sixthpart

End If

Next

Return False

sixthpart:

‘check third quarter char

chr = stringToCheck.Chars(middle + quarter)

For j As Int32 = 0 To whitespaceUbound

If chr = whitespaceChars(j) Then

GoTo seventhpart

End If

Next

Return False

seventhpart:

‘check first quarter range backwards

For i As Int32 = quarter – 1 To 1 Step -1

chr = stringToCheck.Chars(i)

For j As Int32 = 0 To whitespaceUbound

If chr = whitespaceChars(j) Then

GoTo eightpart

End If

Next

Return False

eightpart:

Next

‘check last quarter range forwards

For i As Int32 = middle + quarter + 1 To length – 2

chr = stringToCheck.Chars(i)

For j As Int32 = 0 To whitespaceUbound

If chr = whitespaceChars(j) Then

GoTo ninepart

End If

Next

Return False

ninepart:

Next

‘check second quarter range forwards

For i As Int32 = quarter + 1 To middle – 1

chr = stringToCheck.Chars(i)

For j As Int32 = 0 To whitespaceUbound

If chr = whitespaceChars(j) Then

GoTo fourthpart

End If

Next

Return False

fourthpart:

Next

‘check third quarter range backwards

For i As Int32 = middle + quarter – 1 To middle + 1 Step -1

chr = stringToCheck.Chars(i)

For j As Int32 = 0 To whitespaceUbound

If chr = whitespaceChars(j) Then

GoTo fifthpart

End If

Next

Return False

fifthpart:

Next

Return True

End Function

11.   Bill McCarthy on June 8th, 2005

Geoff …. uhm… I’m speechless

12.   Jacob Grass on June 9th, 2005

Char.IsWhitespace(myChar) …

This uses a Select Case statement. Returns true if the character is

ChrW(9)

ChrW(10)

ChrW(11)

ChrW(12)

ChrW(13)

" "

Or if the unicode category is 11,12,13 ..

Probably not the most efficient, but it’s in there.

13.   Geoff Appleby on June 9th, 2005

Speechless huh?

So tell me…NOW are gotos evil? *grin* I think all it really proves is that you have to be willing to wear the penalties of anything other than walking the entire string from one end to the other. Statisitical analysis (which is what i was trying to emulate while not really doing it) might allow us to find in advance more _likely_ locations of non-whitespace chars, but it is it really worth the penalty of either slower or harder to maintain code?

14.   Bill McCarthy on June 10th, 2005

hey Jacob,

Actually that is probalby very efficient. I think I might switch to that not only becuase of that, but also it adds some consistency. Although one has to ask why Vb’s Trim, and String.Trim don’t use that ?? I wonder if they did some benchmarks on calling CharacterInfo.IsWhitespace(char) or not ? Then again maybe it’s just because String.Trim is designed to do other things. I think that may be it. I haven’t tested, but I think the CharacterInfo.Iswhitespace(char) is probalby the better choice… well it’s my bet at this moment anyway

15.   Bill McCarthy on June 10th, 2005

Hey Geoff,

uhm… speechless…. totally …

<slap><slap> Ah, no I’m better…. yes, you have convinced me, GoTo is evil. give it an inch and someone will take it for a mile <bg>

i’m going for the easier to maintain code. I know the following is different code, but simple is good, you’ve convinced me. (well unless you tell me the perf suxs that is)

Public Function IsWhitespace(ByVal stringToCheck As String) As Boolean

IsWhitespace = True

Dim i As Int32

If stringToCheck Is Nothing OrElse stringToCheck.Length = 0 Then

Return True

End If

i = stringToCheck.Length

While IsWhitespace = True And i > 0

i -= 1

IsWhitespace = Char.IsWhiteSpace(stringToCheck.Chars(i))

End While

Return IsWhitespace

End Function

16.   Jacob Grass on June 10th, 2005

With respect to the statistical analysis as to where one should start the inquiry, if you’re dealing with a random string, then the odds are highly in your favor that any character you start with will be non-whitespace. Though, no character has any more likelihood than any other.

However, if you have some context about the string and its contents (i.e. it’s not random) then some analysis could be performed to increase the likelihood. Though, I’d be willing to bet that the first or last character would be the most fruitful.

17.   Jacob Grass on June 10th, 2005

Hey Bill, I haven’t done any testing with this, but in my head this is a little more efficient:

Public Function IsWhitespace(ByVal stringToCheck As String) As Boolean

IsWhitespace = True

Dim i As Int32

If Not stringToCheck is Nothing Then

i = stringToCheck.Length -1

Do

IsWhitespace = Char.IsWhiteSpace(stringToCheck.Chars(i))

i-=1

Loop Until Not IsWhitespace OrElse i < 0

End If

Return IsWhitespace

End Function

18.   Bill McCarthy on June 10th, 2005

Hey Jacob,

On the code, you need to test for a zero length string, or set up your loop such that it isn’t entered in that case. Looking at mine, I could drop the explicit test for a zero length string and it should work okay, which would be about the same in code. I would do an early exit rather than nest the hwile loop… I just prefer them…. I doubt there is any performance difference between the two on that count.

The statistical analysis of strings, I think given the predonimance of "code", html and xml, leading whitespace is very common. Given the parsing on a line by line basis is also common, then carriage returns and line feeds can be expected to be very common at the end of a string. Ideally we want to find a non whitespace character as soon as possible, so looking at the ends is actually the worst place to start. My guess is, that two or three characters in from the end is the most likely place, BUT considering we have to test all the characters if we find the string is purely whitespace, then startign from other than the ends adds extra complexity.

anyway, if it’s raining on Sunday and I get bored on Sunday, what I’ll do is recursively parse all plain text files in( my documents or something like that) and cache all the strings on a line by line basis, and then test some code with them

19.   Bill McCarthy on June 10th, 2005

oh btw: did you notice I used And, not AndAlso. Given that during the loop you do have to check both (only on exit of the loop can you short circuit), And is actually more efficient than the conditional branching AndAlso

20.   Jacob Grass on June 11th, 2005

Yup, my fault on the zero length string. I saw that it could be taken out of yours, so I left it out of mine.

re: Stats

Well, what you’re talking about is the context of that which you are parsing. Parsing code/html/xml makes sense on that front to start in the middle and work your way out to the ends. Now, assume that you have some generic text file, you have no knowledge whatsoever of its contents. You randomly take a line from that file and you randomly take a contiguous sequence of characters from that line. Any character in that sequence is as likely to be (non)whitespace as any other. That’s the point I was trying to make.

Oh, and that’s an interesting note on the short-circuit. I’ll have to keep that in mind. I have reams of code that looks like:

Do Until blnFound OrElse i = Count

..

Loop

What’s up with your human proof, btw? It never works the first time for me.

21.   Bill McCarthy on June 12th, 2005

Hey Jacob,

At least on developer’s machines, "text" is more likely to have leading whitespace. As office moves to XML this is going to spread. For use input, leading whitespace is rere (usually only a sinlge space), but trailing whitespace can be common as it is visually not as obvious. Even then the usual I see is people cut and pasting and getitng on extra trailing whitespace.

So i get what you mean about it any character can pobbily be whitespace or not, but I think there is a definetly a probability that the start is less likely than the end for non whitespace. It’s like the alphabet.. although any letter could be any legal char, we usually find a higher percentage of a and e, s and t compared to q or z etc. So if you had to test for each different letter, you’d be better to test for say s and t earlier on, than alphabetically.

Re: short circuited versus non, the simple rule I use is if the second expression is not dependant on the first to be legal (eg testing for null first) then bitwise operators can be more performant than branching ones. The toehr factors are the cost of evaluating the second expression. But if you expect that to be evaluated the majority of the time anyway then that cost is the same for both.

As ot the CAPTCHA, yeh don’t even try first time. Apparently it has a rediculous short time out. (sorry, but not my doing)