On the Lambda

Programming, Technology, and Systems Administration

On the Lambda

Let’s Make a Single Statement Mode for ADO.NET

September 5th, 2022 · No Comments · .net, security, sql, Sql Server, stackoverflow

One of the issues I see often in code on Stack Overflow is SQL injection. I still typically answer multiple questions per week that include, along with the rest of answer, some form of the text, “This is scary-vulnerable to SQL injection issues.”

This is a “Really Big Deal”™. It’s one of about three things I believe you should not do even for practice, learning, testing, and proof of concept projects (another is plain-text or otherwise poorly protected passwords).  On the other hand, everyone has to learn about this somewhere. String manipulation is just too natural for programmers to expect us to just know about it.

With that in mind, I think we’re missing an easy security win in ADO.NET (and other libraries): there should be an option for a Single Statement Mode. This mode would only execute up to the first statement received in a batch. Anything afterwards is ignored.

Consider this bad example from a recent Stack Overflow question:

query = ("update items set name='" + txtName.Text + "',category='" + txtCategory.Text + "',price='" + txtPrice.Text + "where iid =" + id + "'");

If someone where to enter, say, ';DROP TABLE Items;-- for the txtName field, bad things are gonna happen (or not: one hopes the user account here would lack the required permissions. Hopes. But more subtle and invasive attacks are possible as well, and not even all that difficult).

Now instead consider if SQL Server and ADO.Net worked together so only the first statement is executed, and anything afterwards is ignored? Now we end up with executable code like this:

update items set name='';

This is still a broken query, but — importantly — it no longer attempts to execute the user-injected SQL command. Alternatively, the presence of multiple statements could cause the batch to fail.

Even more importantly, what if this were the default mode? Users have to discover parameterized queries, but if this is the default mode we have a measure of protection out-of-the-box. Personally, I’ve found the ability to execute multiple statements very useful, but it’s not (by far) the majority of the queries I send. I would gladly trade the need to explicitly turn that ability on for a command or connection object in order to know most queries have an added layer of injection protection.

Note I am explicitly NOT proposing this as a replacement for parameterized queries, but rather as a layer of depth in addition to them.

Adding this as an option to ADO.NET as it should be possible (even better if you can flag it on by default for a whole solution or project), but I understand making it the default would be a breaking change for a lot of code out there. It’s not something Microsoft can just do.

However, I also believe it’s past time for an ADO.NET v2. The current ADO.NET pre-dates nullable types and using blocks, both of which can potentially change the API in very positive ways. It also uses separate interfaces for a row in a DataTable than the current fields in DataReader. There are other improvements available, as well.

I believe there’s enough room for improvement on the table to consider a completely updated ADO.NET v2 library which includes a by-default single-statement mode. Such an updated library would make .NET programs invulnerable to the vast majority of SQL injection attacks out of the box, and that’s a security win too big to ignore.

So step 1: update the existing ADO.NET library to have this as an option. Step 2: Create a modernized ADO.NET v2 that uses the option by default.


SQL and Dates

July 13th, 2022 · No Comments · development, sql, Sql Server

I want to clear up a few common issues I often see related to date values in SQL.

Column Types

The first of these is storing dates in a string column type, like varchar or nvarchar. Schemas that do this need to be fixed, because they really are broken. When you do this, you don’t really have a date value at all anymore; you have a string value pretending to be a date. You lose the ability to directly call date functions like DATEPART(), DATEADD(), FORMAT(), etc, and you lose the ability to index the column properly, which can really cut to the core of your database performance. Even worse, you may find queries tending to need to convert these values for ALL of the rows in a table to find out to find out if they meet a WHERE condition or similar, even when you only need a few rows from a large table. Thanks to internationalization issues, these conversions are not the quick or simple operations you might think. The string value will also take up more storage and memory. Possibly worst of all, you open up your application to bugs where incorrect data makes its way into the column, because the database won’t be able to enforce the inputs.


Next up is formats for literals in the SQL code. I often see people use the preferred format for their own language or culture, whether it’s MM/dd/yyyy, dd/MM/yy, dd-MM-yyyy, or something else. It’s important to understand here that SQL is its own language, and just like any other language or culture it has its own expectations around date format. When writing SQL, it’s WRONG to put your own cultural expectations into this very different language. This applies to sample data and posts to public locations like Stack Overflow, too.

The correct format for SQL when you only have a date, with no time portion, is yyyyMMdd. Note the lack of any separators. This is slightly different than the common ISO-8601 yyyy-MM-dd format, but careful reading of ISO-8601 allows for an unseparated variant, so this is still compliant. When you have both date and time, the correct format is yyyy-MM-ddTHH:mm:ss[.fff] or yyyyMMdd HH:mm:ss[.fff]. (For clarity on what the capital letter portions of the format strings mean, I’m using the .Net DateTime format string reference). I’ll spare you repeating the details of why these are the only acceptible formats, but if you are interested you can find more info here.


The last issue is using string concatenation to substitute a date value into an SQL query. This goes back to the whole SQL injection thing. I know some people believe you don’t need parameters for date and numeric values because the system will handle the formatting. This is wrong, and it’s still possible to manipulate some systems to format with arbitrary text for date and numeric data.

But I’m not even going to use that. I’ll assume you have some special magic escaper function that never makes a mistake. Even then, parameterized queries are still better and you still should not programmatically insert date or datetime strings into SQL code. I’m not even going to argue performance (though SQL Server is generally better at optimizing your query plans if you use a parameter instead of a varying string literal).

I am going to talk about correctness. Thanks to cultural/internationalization issues, converting back and forth between strings and dates (also numbers) is not only much slower than you’d expect, it’s also much more error-prone and much less of a solved problem then we’d like to believe. With a query parameter that is correctly typed as a DateTime column, and a DateTime object assigned to the parameter value, the ADO.Net library will correctly and efficiently handle these conversion issues for you. If you ever find yourself trying to format a DateTime variable for use in an SQL command string, you’re doing something very wrong. This is somewhat specific to the .Net ecosystem, but if you’re on a different platform you delude yourself if you believe the problem is handled much differently. Even with dynamic/loosely-typed platforms, which often work very hard to smooth over these issues, you will often find you can make significant gains by being more careful about handling dates.


What’s up with MySql

May 4th, 2022 · No Comments · development, servers, sql

Historically, when one considers database server platforms there are four major players: Oracle, MySql, Sql Server, and Postgresql. (One could argue for Sqlite and even Access, but those are in-process engines and really belong in a different category.) Unfortunately, I question whether MySql still belongs with this group. It’s come to the point where I actively advocate avoiding MySql for new development. While I understand moving an existing, functioning app may be more trouble than it’s worth, if you’re starting from scratch I just can’t in good faith recommend MySql anymore.

To understand why, I need to deliver a short history lesson. Let’s go back to 2004. At this time the database market wasn’t exactly young any more, but products were still raw compared to what we have today. Enter MySql. It was a very capable database for the era, and could keep up and even beat the “enterprise” competitors charging into 5 and 6 figures for similar deployments, while at the same time being the only real FOSS option (Postgresql existed, but wasn’t really mature yet). For this reason MySql was widely adopted, especially among low-cost web hosts.

Unfortunately, MySql came to be owned by Sun, and Sun had fallen on hard times. They also had Java, but neither Java nor MySql brought in much revenue. Sun’s cash product selling Sparc workstations was fading, and fast, and so Sun couldn’t afford to support MySql development beyond basic bug fixes. Thus MySql began to stagnate.

Enter Oracle, who decides to buy out Sun. Unfortunately, Oracle wants Sun mainly for Java, so this doesn’t improve things much for MySql. Worse, they view MySql as competitor for their existing database product. The result is while they don’t actually kill off MySql, they also don’t do any more for the product than Sun. I do want to make clear I don’t blame Oracle for causing the stagnation; only for failing to end it.

But stagnate it did… for more than 10 years, until around 2017 when development of what became MySql 8.0 finally starts to accelerate. Even then, it wouldn’t release until late 2018.

In the meantime, competing products did not sit still. Features like lateral joins, common table expressions, windowing functions, that have only recently come to MySql, were built first (and often better, at least so far) in other products. And let’s not even talk about engine improvements for performance and stability, or MySql’s long-standing scary quirks.

There was some attempt to avoid all this via the MariaDB fork, but unfortunately it was never really able to achieve enough momentum to make up for the lost time; a case of too little, too late.

The point is MySql 5.7 is still rooted to the 2004 state of the platform, even though it was released in 2015. In my opinion it doesn’t really even qualify as a modern database platform.

But that’s 5.7 and earlier. Now we can push things forward to the 2018 state of the art by getting to MySql 8.0; this is at least reasonable (to contrast running 5.7 or earlier, which is not reasonable). Just know you’d still be playing catch up from about four years behind the rest of the market. Worse, the fundamental issues that caused the original stagnation are not resolved.

Today, while I understand and respect the momentum of an existing product already using MySql, it wouldn’t even be on my list to consider for new development. If FOSS were a priority, Postgresql beats MySql in pretty much every possible category; otherwise Sql Server has a lot going for it as well.


SQL vs NoSql

April 29th, 2022 · No Comments · development, sql, Sql Server

If you’ve heard of NoSql, maybe you wonder what it is and how it compares to a traditional SQL database. Let’s take a look.

SQL databases store well-structured data using known schemas. They require rigid adherence to a set of design principles, but provide amazing power if you can learn those principles and the associated query language well. NoSql databases come in several varieties, but typically store keys and “documents” (often JSON).

Let’s look at an example. Let’s say you have a business and need to record Sales. Here’s a look at how each kind of solution might handle the problem.

An SQL database would need several tables to store the data for each Sale: one for a header, one for each line in the order, and additional tables for things like customer or product. Retrieving data means knowing about those tables and how to JOIN them, and adding new fields into those tables can be exceptionally difficult.

A NoSql database would store one document for each Sale. It might also have separate primary documents for customers and products, but would almost certainly NOT need a separate document between the header and the details. If you want to retrieve the Sale, and you know how the order number is used to create the document key, but the code is very simple from there.

But initial storage and retrieval is only one part of an application. When we move beyond simple CRUD, we find the SQL database tends to make it easier to aggregate data across many orders, especially on the fly and based on ad hoc fields. Want to know your sales total for the month? How much tax was collected? How many new customers you’ve served this year? SQL will let you write a query to find this kind of information with existing data without needing to retrieve each record from the database. NoSql would either need to retrieve every Sales record for processing or have additional application logic to update a designated “SalesAggregates” document. SQL databases also make it easier to do things like get a list of all your Sales records, or Customers. This also isn’t to say NoSql databases never have these capabilities, but it is often more natural or performant (thanks to indexing) in SQL.

Recently, developers sometimes find it easier to begin a project using a NoSql solution and then migrate to an SQL database later, as the schema has matures or stabilizes, the need for historical analysis grows, or the amount of stored data increases. However, you’ll also find plenty of projects migrate the opposite direction, or stick with NoSql (or SQL) for their entire life-cycle. I’ve also heard of hybrid solutions using NoSql for the main/initial application, but with periodic ETL (export/transform/load) jobs to shift some of the data on a schedule to an SQL database or data warehouse for later analysis and reporting.

Finally, NoSql sometimes has a perceived performance advantage. “It’s web scale!” I would discourage reading too much into this. In practice, well-built SQL databases perform very well and scale very far on similar hardware.

On a personal note and for full disclosure, my own skillset is deeply rooted in SQL. Relational databases have been a sound foundation for more than 50 years, with no signs of slowing down, which is an amazing feat in the fast-changing world of technology.


Using Flash in the Browser post-2021

January 21st, 2021 · No Comments · IT News, networking, security, Windows

Let’s get this out of the way: Flash is bad, okay? If you have another way to do something — any way at all — you should use it.

Some of us don’t have a choice. Right now I need to support no fewer than three different embedded systems using Flash: a Wi-Fi controller, a security camera deployment, and an older HP LJ 40xx series printer. Any one of these would cost hundreds to thousands to replace, not to mention the man-hours. All three meant it was a better bet for me to find a way to extend my access to Flash for a little longer.

I found two options.

First is in a post by user crazyhammer on the Extreme Networks forums. He provides instructions on preserving the Flash client installed with Windows. And it works! Unfortunately it’s (ugh) Internet Explorer only. That got me most of the way, but it’s not my preferred option. Also, my security cameras absolutely refuse to talk to IE.

So I present you option two: my method to run Flash in a variant of Google Chrome in the Year of Our Lord 2021.

To start, you need a working copy Flash from the before-times as a donor. I can’t provide this — I don’t have the rights. Over time these will get harder to find, but for the moment they do exist. Especially if you’re in a business environment, your IT service probably has an old PC somewhere that hasn’t started up in a while. For my donor, I pulled the drive and put it in a cradle connected to another computer, to be sure Chrome’s auto-update wouldn’t nuke my Flash plugin as soon as I started the machine.

I found the Flash plugin files at the following location in the guest drive:

C:\Users\<username>\AppData\Google\Chrome\User Data\PepperFlash

Where “C:” is the guest drive letter. Copy the entire contents of this folder. One extra warning: Chrome claimed to include Flash out of the box. What actually happened was it downloaded the plugin the first time you needed it. A virgin machine might not have the Flash plugin you need. Instead, look for something that was used for a while and retired.

Now we need a version of Chrome that will accept this plugin. Thankfully, we have PortableApps to the rescue. Specifically, I used the 32-bit version of Chrome 79. I chose 32-bit because my donor plugin also came from a 32-bit machine. Most people reading this can and should choose a 64-bit version. Newer versions of Chrome probably also work, but I KNOW version 79 was okay and didn’t want to waste time testing.

One you install your preferred Portable Chrome edition, take the PepperFlash files we found earlier and put them in the Data\Profile\PepperFlash folder in your new installation.

At this point you should be good to go, with three important caveats: One, this version of Chrome will still block Flash by default; the difference it also still includes the settings to let you unblock it. Two, the Flash plug-in will likely show a banner that it’s out of date. That’s okay. The banner includes a button to let you run the content anyway. Third, both the plug-in and browser will not update or patch for any security issues; you really shouldn’t use them any longer than you need to finish retiring old systems.


Certificate Authentication in SQL Server

November 23rd, 2020 · No Comments · development, networking, security, sql, Sql Server

I sometimes see questions asking how to do certificate authentication in SQL Server.

If you’re not aware, certificate authentication for databases works by issuing a certificate that can be trusted by the database server. The certificate might be signed by public certificate authority or by some other entity trusted by the database, and is associated with a specific user in the database. When a certificate is presented to the database, it can know both that the certificate is valid, and what user it’s dealing with.

Certificate authentication has a number of benefits over legacy passwords, including automating credential rotation, support for multiple credentials per user isolated by application or device, revocation, and more.

Contrary to what you may have heard, SQL Server does indeed have support for certificate authentication. It just looks a little different from how some other database products do it. Instead, SQL Server’s equivalent option is the Integrated Security feature.

It works like this:

When you use integrated security across machines, where the server lives on a different computer from the client, you also rely on an Active Directory domain to mediate trust between the machines. A user logs in to a client computer joined to an Active Directory domain. As part of the authentication process, a domain controller (DC) from the domain will issue an authentication token for the session.

The user then wants to login to SQL Server using integrated security, where the SQL Server installation is a member of the same domain. To do this, the token is passed to SQL Server, which in turn validates the token against a DC. Now SQL Server can trust the user is who they say they are. From there it consults it’s own user information to decide whether or what access to grant.

The integrity of this process is guaranteed using – you guessed it – certificates. Also a clock (thanks Kerberos).

The weakness here is it limits authentication to devices with Active Directory membership. However, it otherwise grants the same security benefits as more-traditional certificate authentication, as well as the additional benefit of automating certificate transport and issuance in a secure way.


Dealing with a warped or bowed printer bed

January 8th, 2019 · No Comments · 3D Printing

Last time, I talked about a seesaw/lever effect when leveling a square bed with four posts. Today I’ll address dealing with a (slightly) warped printer bed. For simplicity sake, I will only examine warping on along one axis. You should be able to apply the principles here to examine the X and Y axis separately, to get the best possible results for your bed.

Bed warping is a common issue for square beds with four adjustable posts. It’s one reason you see so many people mount glass plates over top of their original bed. It’s a natural result of heating a fixed bed, which will cause it to expand, without also heating the carriage for the mount points. This stresses the metal plate, causing it to warp. Generally manufacturers try to address this by have loose screw mounts in the bed. The problem is the pressure from the adjustable springs tend to limit the ability of the bed to expand properly. Warping is also the result of heat spreading into the bed unevenly, where the bed will expand nearer the heating element first. Finally, some warping of the metal is just a natural result over time. At the tolerances we need for good print results, even a brand new bed is unlikely to be perfectly flat.

At this point, I need to make it clear I’m not talking about obvious bending or damage. The kind of warping I’m referring to is only as much 0.3 mm from one end of the bed to the other… but when we’re talking about 0.1 mm layer heights, that’s enough to cause serious problems for prints. Really, anything more than 0.5 mm and you may want to replace the bed entirely, or look into the glass addition I’ve already referred to. You can sometimes also deal with greater warps by getting the best level you can, and then using an automatic leveling system that can compensate for some surface variance with the software, where it knows the bed is higher or lower in certain areas and adjusts the print head height to account for it.

Here is an illustration of a print bed from the side, with some warping visible:

The warping is exaggerated. Any bed warped this much would be unusable.

Note how the bed bends up along the right side. Let’s say for a moment difference between the left edge and the right edge is 0.2 mm. Such a bed would be very difficult to level.

We can improve things some, though, without changing the bed at all. Consider the following illustration:

The same print bed as before.

This is the same print bed from the previous illustration. This time, we have loosened the knobs on the left side, and tightened the knobs along the right, so the entire bed rotated slightly clockwise. Now, instead of rising on the right side, it rises on both sides from the middle. The result is the left side and right side are on the same level, and the middle is now a little lower.

The point is the change between the left or right and where the warping is worst is likely to be much less than the 0.2 mm we were dealing with before before… likely close enough to get good results throughout the entire bed.

If you find yourself with a section of your print bed where you have trouble getting good adhesion, even if you’re sure you’ve leveled the bed, you may want to consider the bed may be warped, and an adjustment like this (in combination with re-setting your Z-stop) may be just the answer.


Leveling a square 3D printer bed with four mounting posts

January 7th, 2019 · No Comments · 3D Printing

3D printers come in a surprising variety of shapes and models, and this extends to the print beds as well. I’ve seen round beds, for printers that have a radial print size. I’ve seen triangle beds, which are supposedly easier to level, because three points determines a plane. I’ve even seen printers without any bed at all, that work by extruding resin from a tank.

By far the most common kind of printer bed I’ve seen is square, and it almost always has for posts for level adjustment, with one post at each corner.

Top down view of a 3D printer bed

This can actually make for tricky leveling where adjustments have unintuitive consequences. As was said, it’s three points that define a plane, not four. Using four posts can put strain on a bed and cause warping. I’ve even heard of people removing an entire post, and even re-mounting the third post, in order to get better levels.

But not all of us want to do that. Here’s a way to help understand what happens when you adjust a knob on the bed. First, view the bed rotated 45 degrees:

Top down view of a 3D printer bed rotated 45 degrees
Now view the bed rotated 90 degrees away from you, so we see it from the side rather than the top. From this view, only three of the knobs are visible. The fourth knob is hidden behind the one in the middle:

3D Printer bed viewed from the side

The two knobs in the center can act as the fulcrum of a lever or seesaw, where raising or lowering the bed at one end can have the opposite result on the far end.

We can now think of the printer bed acting more like a seesaw or lever, where the two knobs in the center act as the fulcrum. It’s important to understand that a knob does not set a fixed position for that corner. Each corner is secured with a spring that increases or decreases the amount of pressure on the bed at that location relative to the fulcrum created by the two adjacent knobs. Tightening a knob to lower the bed in one corner can also have the effect of raising the bed slightly at the far corner opposite the fulcrum, because now there is more lift pressure against that spring. Because we are dealing with springs, rather than a free-moving lever, this effect is not absolute, but it does happen.

Hopefully this helps explain why you sometimes need more than one time around the bed at each corner to get a good level. And perhaps it will help you make smarter adjustments when leveling your print bed in the future.


Indexing a CSV file with .Net

November 28th, 2018 · No Comments · .net, stackoverflow

I recently answered a Stack Overflow question, where I supplied what I believe may be a useful tool for indexing simple CSV data where you might not want to hold the entire data set in memory. I wanted to list it here, because I added the solution that supports CSV as an afterthought; the question does not indicate it uses CSV records. The question uses VB.Net, but the C# translation should be trivial. Just watch out for needing to add the “out” keyword in a few places.

Some caveats:

  • The code assumes the records exist on a single line.
  • There’s nothing at this time to skip the header row, though that is trivial to add.
  • You have to supply your own key selector function.
  • The code is untested, entered directly into the Stack Overflow answer box. It almost certainly still has several bugs.

Still, I found it interesting and fun to write. Click the link above to the Stack Overflow question to see the code.


Tracking through Lightspeed: Complexity vs Reliability

December 18th, 2017 · No Comments · networking, non-computer, servers

If you haven’t yet seen “The Last Jedi” and don’t want part of it spoiled, you may want to give this one a pass. I’m going to focus on how a scene from the movie relates to current technology.

In the “The Last Jedi”, part of the plot revolves around trying to shut down a tracking mechanism on the main First Order ship. There’s a whole scene about how this important specially-protected “A-class” process could be reset at a single breaker. I don’t remember the exact details, but what sticks out is that such important process would have such a simple point of failure. In talking about the movie with non-technical people, bring up the scene and this always bothers them. Even programmers and networking pros are bothered by this.

Here’s the thing: I’m not. This may be the plot point that makes the most sense to me of any. Rather, my personal grudge against the plot (I liked the movie, btw) is making us sit through Finn’s whole side adventure at all if it’s going to (spoiler) come to nothing? (And, yes, I know there are several others reasons for those sequences, but it seems like it belongs in the novelization rather than the movie). But the easily breakable critical process? That I totally get.

You see, this would not be first the time an outage is caused by the very system put in place to improve or ensure High Availability. Far from it. You see, complexity and reliability are in direct opposition. Whenever you add complexity in order to improve reliability — and all high available systems do this — you need to step back and look at the new risk you’re about to introduce, and then make very sure the gains will be worth it. Most often there’s risk you haven’t even begun to identify yet.