What it’s like to live through a Disaster

Less than a week ago, a tornado tore through a small town about 25 miles from my home, leaving it almost completely devastated. I am thankful that no one I know personally was hurt or even lost significant property, but I’ve had some stories from the experience, and I am very mindful and prayerful for those still living in this community.

Part of this experience has brought to me a new understanding of what it means to live though a disaster like this, which I hope I can share with you now. I will list the implications below. Not all of these apply to every family unit, but some family units will be subject all them, and some of them may surprise you:
  • No electricity for nearly a week, with no idea when it’s coming back.
  • No refrigeration
  • Personal food reserves destroyed, contaminated, or depleted, with no clear way to get more
  • No running water or sanitation
  • No Shelter
  • No cell phone service in the area. While coverage survived the initial disaster, the lack of power in the area eventually overwhelmed providers’ abilities to power the cell towers. If it had survived, there would be no way to charge your phone.
  • No news of the outside world. Help is on the way to this community, but many there have no way to know this, because they have lost the ability to use TV, Radio, and even cellular internet.
  • No way to leave, in the numerous cases where vehicles were destroyed.
  • No way to call for help, or any indication that it’s coming, because of the earlier mentioned isolation from electronic communications

Even in the United States, with all of our resources, it’s scary quickly you can become isolated and helpless. While people just a few miles away are fine, this small town is back in the stone age. And if you were hit particularly hard (loss of vehicle and food supplies) and don’t know your neighbors well, you could be in a particularly bad spot. Even if you have a strong family or other support network outside of town, you have no way to contact these people, or anyone else who could help. This is real desperation.

Fortunately, help is coming. Tomorrow morning, the church I attend is coordinating with Church of Christ Disaster Relief to open a location that will provide food and supplies to the victims of this disaster. So far, this is the only relief effort to visit this town, though I suspect it’s only the first.

As a member of the technical community, I was particularly interested in writing about this, because of attitudes I saw on some technical community web sites the last time a Christian relief organization provided disaster support. Technical folks often have a decidedly secular mindset; a common sentiment was that Christian relief organizations where really only interested in distributing Bibles, and that would be the bulk of the “supplies” provided.

I can tell you that nothing is further from the truth. Churches of Christ Disaster Relief maintains pre-loaded trucks that are ready to depart as soon as a need is identified. Some of the contents of these trucks are perishable food-stuffs that would need to be rotated if the truck sits too long… which doesn’t really happen because the organization is so active. There are several categories of box in each truck: food boxes that contain enough material to feed a family of four for a week, infant care boxes, with diapers and other necessaries, bottled water boxes, cleaning supplies, clothing and others. All of this is provided at no cost to victims, without discrimination. If more material is needed, more trucks will be sent (later trucks are more selectively loaded). And this is just the first wave. Later efforts will even provide furniture and appliances free of charge to those with real need.

Yes, there are a few bibles included (one in each food box), but they are not a significant part of the cost or mass/volume of the materials provided. The organization also often makes use of church buildings as convenient pre-existing locations to centralize their distribution efforts and members of those congregations to provide volunteer staffing at the distribution points. Yes, we do this in the name of Christ, because He first loved us, and we are not ashamed of this. But this is real relief, meeting real needs.

Posted in non-computer | Leave a comment

Can we stop using AddWithValue() already?

I see code examples posted online all the time that look like this:
cmd.Parameters.AddWithValue("@Parameter", txtTextBox1.Text);

This needs to stop. There is a problem with the AddWithValue() function: it has to infer the database type for your query parameter. Here’s the thing: sometimes it gets it wrong. This especially happens with database layers that deal in Object arrays or similar for the parameter data, where some of the important information ADO.Net uses to infer the type is missing. However, this can happen even when the .Net type is known. VarChar vs NVarChar or Char from strings is one way. Date vs DateTime is another.

The good news is that most of the time, these type mismatches don’t matter. Unfortunately, that’s not the whole story. Sometimes they do matter, and when it matters, it can matter in a big way.

For example, say you have a varchar database column, but send a string parameter using the AddWithValue() function. ADO.Net will send this to the database as an nvarchar value. The database is not permitted to implicitly convert your nvarchar parameter to a varchar value to match the column type for the query. That would be a narrowing conversion that has the potential to lose information from the original value (because you might have non-Latin characters in the parameter), and if that happened the database might produce the wrong query results. Instead, the database will likely need to convert the varchar column to nvarchar for this query (which is a widening conversion that is guaranteed not to lose information). The problem is that it will need to do this for every row in your table.

This conversion can also happen with some other mismatches: For example, date columns may need to be widened to datetime values. And don’t even get me started on what happens if you have a mismatch between a date or number type and a string type. Even with nvarchar or nchar, you may find that the lengths don’t match up, such that a table has to have every value in an nvarchar field of a specific length modified to match a value of a different length.

If that kind of operation sounds expensive to you (potential run-time conversions for data in a table containing possibly millions of rows), you’re right. It is. But that’s only the beginning. These newly converted values now no longer are technically the same value as what is stored in any indexes that may use this column, making those indexes useless for completing your query. Now we’re really hitting below the belt. Index use cuts to the core of database performance. Failing to hit an index can be the difference between a query taking hours or taking seconds, between a query taking minutes or returning instantly. And it all began with AddWithValue().

So what should you do instead? The solution is to be aware of the underlying database type you need to end up with, and then create a query parameter that uses this exact type. Here’s an example using a DateTime database type:
cmd.Parameters.Add("@Parameter", SqlDbType.DateTime).Value = MyDateTimeVariable;

Here’s another example using a decimal(11,4):
cmd.Parameters.Add("@Parameter", SqlDbType.Decimal, 11, 4).Value = MyDecimalVariable;

Note that while this is slightly longer, it’s still a single line of code. That’s it. This simple change to how you define parameters can potentially save significant performance penalties.
Posted in .net, c#, sql | Leave a comment

The N Word

No, not that N word. I’m talking about N string literal prefixes in T-SQL. Like this:

If you don’t know what that N is for, it tells Sql Server that your string literal is an nvarchar, rather than a varchar… that is, that the string literal may contain Unicode characters, so it can support non-ASCII characters. Things like this: 例子. But I can hear you now: that sample is all ASCII. Why does it matter? I’m glad you asked.

Let’s pretend for a minute that the Bar column from that example is a varchar column, and not an nvarchar column after all. We have a type mismatch on the comparison. Pop Quiz: what happens?

We’d like Sql Server to convert the ‘Baz’ literal to a varchar, because that is obviously more efficient. Unfortunately, it won’t work that way. Converting from nvarchar to varchar is a narrowing conversion. There are some things that can’t be accurately expressed when converting from nvarchar to varchar, which means there is a potential to lose information in the conversion. Sql Server is not smart enough to know that this particular literal will map to the smaller data type without data loss. If it converts the literal to a varchar, it might give you the wrong result, and Sql Server won’t do that.

Instead, it has no choice but to convert your Bar column to an nvarchar. I’ll say that again: it has no choice but to convert the value from every row in your Bar column to an nvarchar, even if you only get one row in the results. It can’t know if a given row matches your literal until it completes that conversion. Moreover, if you have an index on that column that would have helped, these converted values are not really the same value any more as what is stored in your index, meaning Sql Server can’t even use the index.

This could easily mean a night and day performance difference. A query that used to return instantly could literally take minutes to complete. A query that used to take a few seconds might now run for an hour.

Just in case you think this scenario seems unlikely, keep in mind that ADO.Net uses nvarchar parameter types by default if you use the AddWithValue() function or it otherwise can’t infer the parameter type. If that query parameter compares to a varchar column, you’ll end up in this exact situation, and I see it all the time.

The good news is that you’re okay going the other direction… at least in this scenario. If Bar is an nvarchar column and you define Baz as a varchar literal, converting the Baz literal would be a widening conversion, which Sql Server will be more than happy to perform. Your Bar column values are unchanged, and so you can still use an index with the Bar column.

I hope your conclusion from this example is not that you should always just omit the N prefix. That’s not the message I want to send at all. In fact, the same Stack Overflow question that prompted this example also included an example that would fail to even execute in the case of type mismatch. Instead, I hope I’ve shown here that it can really matter whether you get your SQL string literals right, and that it pays to keep the exact data types of your columns in mind.
Posted in sql, Sql Server | Leave a comment

The single most broken thing in CSS

Like most web people, I have tasted the Kool-aid, and it was good. I believe in the use of CSS for layout over tables (except, of course, for tabular data, which happens more than people realize). However, CSS is also known for being quirky and difficult to master, largely because of weak browser implementations. If you’re reading this, you probably just thought of Internet Explorer, but IE is not alone here. Even in the context of browser quirks I think CSS, with practice, holds up pretty well and is actually a very nice system… with one major exception.

CSS needs a way that is built-in and intuitive to support positioning an arbitrary number of block level elements visually side-by-side.

Let’s list out the requirements for what this feature should support, keeping in mind the spirit of separating content from presentation:
  1. It should scale to any number of elements, not just two or three.
  2. Multiple options for what to do as content gets wider or the window gets narrower: wrap, scroll, hide, etc.
  3. When style dictates the elements wrap to a new “line”, this should happen such that the elements still follow the same flow, in an order that mimics how text would flow (including awareness of the current browser culture).
  4. When styling dictates that elements wrap to a new “line”, you should be able to intuitively (and optionally) style them so each element on a new line will take positions below an element on the first line, so the result resembles a grid. If elements have varying widths, there should be multiple options for how to account for the space when an element further down is much wider than an element above it.
  5. You should not need to do any special styling for the first or last element that is different from other elements
  6. You should not need to add any extra markup to the document to indicate the first or last element, or to mark the beginning or end of the side-by-side sequence. We want the content separate from the styles, after all.
  7. Since potential side-by-side elements are siblings, in a sense, it is reasonable (and possibly necessary) to expect them to belong to some common parent element, perhaps even as the exclusive direct children of that parent.

I want to point out here that I’d be surprised if everything or nearly everything I just described isn’t already possible today. However, it’s not even close to intuitive. It requires hacks and a certain level of css-fu not easily attainable for the common designer or developer, or over-reliance on systems like bootstrap.

I believe that what CSS needs — what it’s really missing — is for these feature set to be supported in a first-class way that is discoverable for new and self-taught web developers and designers, that works because this is the whole point for this specific set of style, and not because some designer figured out how to shoehorn something else to make it do what they wanted. This is the Elephant in the CSS Room.

I feel pretty comfortable with that requirement outline. Sadly, I no longer do enough web design to take the next step: thinking through how the exact styles and syntax needed to implement this should actually look. I definitely lack the influence to take the step after that: getting a proposal before the committee that could actually cause this to be accepted to the spec. And no one is in a position to take the final step: getting browsers to support this in a reasonable uniform way in a reasonable prompt time frame. All of that makes this post little different than rant… but a guy can dream, can’t he?
Posted in development, web | Leave a comment

What a hunk of Junk!

I admit it: I’m a Star Wars fan, including the Extended Universe. I’ve read and reread (recently, even) a number of the books. There’s one thing that bothers me about the whole thing: the Millenium Falcon. I feel like other fans focus more than they should on lines like “She’s the fastest ship in the fleet” and less on lines like “You came in that thing? You’re braver than I thought.”

I think I’ve finally figured out how best to express this frustration. Take space ships from the Star Wars universe and translate them to real world cars. See, I feel like fans have an image of the Millenium Falcon as something like this:

Chevy Impala Autobot from Transformers Movie

Not the Millenium Falcon

Yes, it’s fast. Yes, it’s heavily modified. Yes, it has weapons. Most of all, it’s cool. But does it fit what I see as the Millenium Falcon’s place in the Star Wars universe? No. Not even close. That would look something more like this:

Old Box Truck

Millenium Falcon

Now that’s more like it. In fact, this may even be too nice. The Millenium Falcon is supposed to already be kind of… old by the time movies start. Above all else, it’s supposed to be a light freighter, and nothing says light freighter like the ubiquitous white box truck. Han was a smuggler, and as a smuggler he would not have always wanted to draw attention to himself.

This isn’t to say there was nothing special at all about the Millenium Falcon. Picture the van above after it’s had is engine and transmission replaced with the fastest set that can be made to fit, which would include a turbo and nitrous canisters. Maybe throw in some armor plating on the rear door, and give it an upgraded suspension that can handle the speed and weight. This truck could really fly. But in the end, it’s still a truck.
Posted in non-computer | 1 Comment

Four basic security lessons for undergrad CS Students

Security is a huge problem in the IT industry. It seems like we hear almost weekly about a new systems breach resulting in the leak of millions of user accounts. The recent breaches at Target and Kickstarter come to mind, and those are just the ones that made news. Often this is actually more of a people problem than a technology problem: of convincing non-technical employees the importance of following correct security procedures, and of convincing non-technical managers to allow developers to make the proper security investments. But many of these are the result of easily-correctable technical issues.

When looking at student work, I don’t expect them to be hardcore security experts. But I do expect that students have learned four basic lessons by the time they finish their undergrad work. Those lessons are, in no particular order:

1. A general idea of the correct way to store passwords.
Probably this general idea will be slightly wrong, but that’s okay. A recent grad is unlikely to be asked to build a new authentication system from the ground up. However, they should know enough to raise red flags if they see something done horribly wrong, and they should know enough to follow what’s going on if asked to fix a bug in an existing system. Getting down to nuts and bolts, the student should understand the difference between encrypting and hashing, they should know to use bcrypt or scrypt (or least not to use md5), and they should know they need a per-user salt.

2. How to avoid Sql Injection Attacks
Or, put another way, how to use sql query parameters in their platform of choice. This assumes, of course, that students have at least some exposure to databases as part of their degree (they should). Sql Injection should be part of that exposure.

I also take issue with that standard comic on the subject (see here). The problem is that it talks about sanitizing database inputs, and that’s just the wrong approach. If you’re thinking, “sanitize”, you’ve already lost; it implies you should write code that examines user input and removes bad things, which is the wrong approach. Real sql injection security lies in quarantining unsafe data. And, yes, I do think undergrad students should know the difference.

Quarantined data does not need to be sanitized. There can be all kings of attempted bad things in the data, but if the programmer used a mechanism that transmits this data to the database in a completely separate data block from the sql query, the chances of that data being executed as code drop to 0%. On the other hand, the chances of a bug, logical flaw, or ignorance of a potential exploit creeping into sanitizing code? Significantly higher.

3. How to avoid Cross-site Scripting / Cross-site Request Forgery Issues
If you work with the web (and today, who doesn’t?) you need to understand XSS/CSRF issues and how to mitigate them. This is definitely an issue for students, because often they’ll go straight from college to working on a web property, and may even do some web work before graduating. Simply put, they’re at risk for this from day one. The solution to this issue is to be diligent about escaping data. Better if your web platform helps you do this in the appropriate ways.

4. Don’t write your own security code
Perhaps the most important lesson. Security code is one of those areas that has hidden complexity. It’s easy to write security code that seems to work just fine — it may even pass an exhaustive suite of unit tests — but is still flawed in subtle ways such that you don’t catch it until six months after you get hacked. The solution here is to lean as much as possible on security code provided by the platform of your choice. This code will be written by people who understand the security issues in play. It will be battle-tested and designed in such a way that it helps you get the details right. Most of all, it’s backed and serviced by a vendor such that when (not if) flaws are discovered you are generally able to patch them without having to re-write code yourself.

Bonus Resources: I have two resources that I recommend that students at least be aware of. The idea is not to have a complete grasp of everything they cover, but to know where to look to get more information. The first is OWASP, especially their Top 10 list. The second is this Stack Exchange question: What technical details should a programmer of a web application consider before making the site public?
Posted in development | 2 Comments

What to look for in a bargain Android Tablet

I’ve seen a lot of bargain Android Tablets lately, and I know a lot of people who are interested in getting one, but don’t think they can afford it. I’ve got news for you: they’re cheaper than you might think. Tiger Direct recently had one for $20 after rebate. These tablets are a hot item. The trick is, how do you know that you’re getting something worth having? A lot of those cheap tablets are not going to do what you expect of them.

Here are my tips for finding a worth-while bargain Android tablet (January 2014 edition):

1. Look for at least Android 4.1 or newer out of the box. Android is a free operating system (well, sort of), and so in theory you could update an older tablet yourself, but there’s more going on here than that. Anything older than Android 4.1, and you’re likely looking at last year’s tablet coming off the shelf, and last year’s bargain tablets were, well, just plain bad. There’s a reason I don’t have a 2013 edition of this post. Android is also free (or nearly so) to manufactures, and so there’s no reason to see anything older than this on a new device.
2. Minimum 1.2 Ghz dual-core processor. Emphasis on the dual core; that is what will keep the operating system feel responsive, even when running some of the higher-demand apps.
3. Dual (front and rear) cameras. Many of the bargain tablets will cut out one or both cameras to keep costs down, but as someone who’s had a couple different tablets for a while now I can say with confidence that you really will want a camera on both sides. It’s the feature I miss most on my Kindle Fire. The front camera will be used mainly for video chat, and doesn’t need to be great, but the rear camera should be at least 3MP (more would be better, but remember: we’re bargain hunting).
4. A MicroSD card slot. This will let you turn a cheap 4 GB tablet into a generous 36 GB device for less than $30 extra. Take that iPad. You can skip this if you find one that has generous storage out of the box.
5. Capacitive Touch Screen. It’s rare to see a resistive touch screen tablet any more, but if you don’t pay attention you can get caught out here. Even with a capacitive screen, this is the place where the manufacture is most likely to cut corners, and you may end up with a display that is not sensitive enough, or not responsive enough. At this point, though, it’s hard to suss out the good ones from the bad.
6. Minimum 200ppi (pixels per inch). You’ll have to do the math here, but if you’re looking for bargain tablets that likely means a 7 inch device, and that means at least something around 1280×720. 800×600, or even 1024×768, are not likely to cut it. Anything less than this, and the tablet screen won’t look clear. Small text on the small screen will be harder to read. More is better, but remember: we’re shopping for bargains. Sadly, this item is likely to push your purchase up over $100 at the moment. If you’re willing to fudge on this (I advise against it), you can get some crazy deals that meet all the other points.

I’d like to have a note about the battery, but at this point I don’t have a feel yet for what to look for in that department. Still, follow these six rules, and you should be able to get a decent, off-brand tablet that you’ll be very happy to have, for a price much less than you’d expect.
Posted in IT News | Leave a comment

Secure WiFi is Broken

Today, I’m offering you a challenge. Go out and find a public wifi network that offers encryption without making you log in. I dare you. Unfortunately, this endeavor is doomed to failure. WiFi as it exists today makes it virtually impossible to offer such a thing.

I find myself in the business of managing a reasonably-sized wifi deployment (about 80 access points over 19 buildings). I would really love to offer open encrypted wifi on this network, but I can’t.

What I want to provide is an HTTPS-like experience for my users that just works: an SSL layer that doesn’t care who you are, but still provides meaningful encryption for the last 50 meters where your traffic is moving through the air for anyone nearby to snoop.

I’m annoyed that so many encryption solutions are coupled to both authentication and authorization. These don’t need to be linked. You don’t have to log into an https site to get encrypted traffic, and you shouldn’t have to log into a wifi network to get encryption either.

My ideal scenario is that someday I’ll be able to install the same wildcard ssl certificate that we purchase for our web sites to each access point or at a controller, change a setting for an SSID to use this certificate for encryption, and as long the certificate is from a well-known/reputable vendor, user devices will just work.

I include guest devices in this category. I want someone — anyone, but especially our visiting admissions candidates — to be able to turn on their device for the first time and have the experience be easy: no capture, no guest registration, no prompt to agree to terms of service, just choose the SSID and they’re online.

Sure, I could use a shared key scenario and just publish the key, but that’s not the same thing. If anyone knows the key, anyone can decrypt the traffic. And this still requires an extra step to get online.

I honestly couldn’t care less about the authentication part of this. I don’t need to know right away that it was Jane Smith’s computer committing whatever nefarious deed. The immediate reaction to that kind of thing is the same regardless of the name of the person behind it. As long as I can target a MAC address or have reasonably static IP addresses (I do), I’m happy enough using a captive portal rule on a specific machine after the fact to identify a user for those times when enforcement issues come up. Our college-owned machines here do log user names all the time, so it’s just student-owned devices where this is necessary.

Sadly, I don’t believe this kind of wifi exists today. Certificate-based 802.1x comes close, but the need to install/configure devices with a supplicant breaks it. Even so, I would settle for 1x… if only I could count on it working. The truth is that too many devices, especially things like smart phones and game consoles, fail to have adequate support for this, and will fail to give users any information about why network association fails, or how to fix it. Personally, I place blame on the WiFi Alliance, certifying devices that don’t work for this feature as well as they should.

Currently, we’re working to provide two WiFi options: one that’s completely open (and I mean completely), and one that uses 1x and prompts for a user’s Active Directory login. Anyone can walk on campus and get online at a basic level. Really. I don’t care. Guest (and even neighbor) use is a drop in the bucket compared to what our regular students demand. But if a guest needs encryption, they’d better hope the site or service supports https. We encourage students to use the 1x SSID whenever they can, and try to educate them about the importance of encryption. Most don’t care, and choose the open network, but at least the option is open to them.
Posted in networking, Uncategorized, wifi | Leave a comment

Tips for Hot Chocolate

Today I have something that’s not at all computer related. It’s starting to get colder outside, and so I’m going to share my tips for good hot chocolate.

1. Heat the water before adding the cocoa mix. If you heat them together, you’re also applying the strong heat to the powder, and that’s a good way to end up with burned chocolate.
2. Don’t try to drink it too hot. Patience is a virtue.
3. Buy the packets, rather than the large containers. If you look at the servings on each package and compare it with the price, the cost per serving is the same. The packets are much more convenient, and make it easier to get the amount right. With the large containers, the tendency is to not use enough.
4. A little bit of extra creamer or whipped cream goes a long way.
5. Chopsticks and popsicle sticks make good stir sticks.
6. A packet of hot chocolate mix is a great way to flavor your morning coffee. Cream and sugar in one.
Posted in non-computer | Leave a comment

Coding for Branch Prediction

Modern CPUs don’t work the way most programmers think. We’re taught this model that a processor has an instruction pointer that increments through our code, jumping to the right place when we call a function or use a loop or conditional, and a line executes when the instruction pointer reaches that line. That used to be right, but today things are a lot more complicated. Chip manufacturers like Intel and AMD have learned that they can optimize this process to get better performance than the simplistic model would allow, and so we have things like out-of-order execution, branch prediction, and floating point bugs.

The good news is that the chip manufactures work very hard to make sure that the results, at least, are always what a programmer expects. Not only that, but performance is better on average, too. The trick is that pesky average. It means that, in certain circumstances, performance might be worse. Additionally, even when it’s better, it might not be as much better as it could be.

Take loops, for example. Say you have some data in a collection, and you want to perform an operation on every element in the collection. But maybe the value of each element changes what operation you want to perform. Let’s start with a collection and some Dictionaries to sort the items into, like this:
var items = Enumerable.Range(0, 1000000);
var ints = new Dictionary<int, string>(1000000); 
var doubles = new Dictionary<double, string>(1000000);

The argument in the dictionary constructors is to avoid memory reallocation time during my sample runs, to get better benchmarks. I also have some other code not shown to do things like discount the JITter and make sure I get good benchmarks. But let’s get into it.

You might be tempted to write code something like this:
foreach(int item in items)
    //different code branches that still do significant work in the cpu
    if (item % 3 == 0)
    {   //force hash computation and multiplication op (both cpu-bound)
        ints[item] = (item * 2).ToString();
         doubles[(double)item] = (item * 3).ToString();

This is sub-optimal, because of a cpu feature known as Branch Prediction. Branching is just a fancy way of saying there is an if block in your code. The code will follow one of two different branches, depending on the result of the conditional. Modern CPUs will try to guess which branch to use, and begin executing code down one branch before the conditional is known. If the CPU guesses right, it’s a performance win. If it guesses wrong, it’s a big performance loss. We can use this feature to improve on our original code:
//doing MORE work: need to evaluate our items two ways, allocate arrays
var intItems = items.Where(i => i % 3 == 0).ToArray();
var doubleItems = items.Where(i => i % 3 != 0).ToArray();

// but now there is no branching... adding all the ints, then adding all the doubles.
foreach (var item in intItems) { ints[item] = (item * 2).ToString(); }
foreach (var item in doubleItems) { doubles[(double)item] = (item * 3).ToString(); }

The result is that, most of the time, my benchmarks show this 2nd option runs faster. A typical result was 1118652 ticks for the naive option and 1005190 for the branch prediction-friendly option. It’s not a huge win, but it real, and for most programmers it’s counter-intuitive. If we had a more challenging conditional expression, or were doing more work in the branches, the improvement would be greater. Also, these branches are mixed. If one of the branches was chosen a lot more than the other, or if the selections tended to come in batches, the original code would win more.

But I’m not done yet. I can think of at least two other ways to take advantage of branch prediction. Here’s what produced the best results in my benchmarks:
var deferred = new List<string>(1000000);
foreach (var item in items)
    if (item % 3 == 0)
        ints[item] = (item * 2).ToString();
foreach (var item in deferred) { doubles[(double)item] = (item * 3).ToString(); }

This should make no sense to anyone. At first look it seems like the worst of both worlds, and that memory allocation for the list should be nasty. Yet it produced the best results. Only if you read the code very carefully will you see that we save an entire array allocation (maybe several, thanks to the way .ToArray() works) and one complete iteration through the set with this option over the other branch prediction-friendly option.

So what do we learn here? I hope your answer is not to change the way you write loops. The performance wins in these contrived tests were small, at best. My first attempt to use the feature didn’t get it quite right, and the results were much worse, rather than better. Moreover, when I ran the same (working) code on a different model cpu, the naive code consistently won, because that cpu had very different branch prediction rules. Rather, the lesson here is, once again, that premature optimization is not worth it. What is faster on one cpu may be much worse on another, and it may be for reasons that are way outside what you expect. The best way to improve performance is to profile your code, find out where it’s really slow, and focus your efforts there… and while you’re there, make sure you’re measuring that your changes really have the effect you intend. Sure, keep branch prediction in your back pocket as something you can use… but wait until you know it’s worth it.
Posted in .net, c#, development | Leave a comment