Excel and OleDb stupidity

This afternoon I was working on a Windows Forms app for the finance guys, to help them allocate costs to systems, proportional to predefined allocation keys.


The app has to import Excel spreadsheet files and perform all the database actions.


The key columns to uniquely identify a cost are text values. Unfortunately, some of those key values do not contain letters, but only numeric characters. And this is where it sucks to have to work with Excel.


This is because Excel knows no datatypes. You can specify formatting for cells, but Excel ignores that when someone asks for cell data. Instead, it scans the first couple of rows to guess the datatype. By default it checks the first 8 rows. If those contains mixed values, it uses a specified default type for those values that are not numeric.


This means that if you use OleDb to retrieve values, the column type can vary per row.


What can you do about it? Well nothing really. You can tweak some values, and hope that you will never encounter the corner cases that you cannot do anything about. There is plenty of information to be found. One good explanation was this.


However, one of the finance guys gave me an incredibly useful tip, which you can use as a workaround. You have to manually make a couple of easy modifications to the excel file, and Excel will always use the text data type.


  1. Assume that the offending data is in column 1, and that the first row is the header row. 
  2. You insert 2 empty columns to the right of the offending column.
  3. Put an ‘ in the first cell of the first empty column (B2).
  4. Then enter =B2&B1in the cell B3.
  5. Select B2 and B3, and double-click the bottom right corner of the selection box. This will copy the data from there to the last row.
  6. B3 will now hold the data as text, because the ‘ tells Excel that no matter what is in the cell, it is text.
  7. Then you copy column 3, paste it as ‘values only’ over column 1.
  8. Delete columns 2 and 3 again.

Now you can read the column in OleDb and always get text values for that column.

Differences between C++ and C#, part 3

One of the things that C# (and VB.NET) programmers have to live with is non-deterministic destruction of their objects.


The garbage collector was introduced in .NET to take care of memory leaks and remove the need for manual memory management. It does this by keeping track of references, and as soon as an object is no longer referenced, it ‘can’ be collected by the garbage collector.


The ‘can’ is important here, because cleanup happens only when the GC thinks is should happen. On systems with lots of memory, and few memory allocations and releases, it is possible that this takes minutes or hours, or even doesn’t happen at all.


This can be a real problem, because now you don’t have memory leaks anymore, but resource leaks.


Suppose your app opens a file for writing, and does not explicitly dispose of the stream reference.

FileStream stream =
  new FileStream(Path.GetTempFileName(), FileMode.Open);
StreamWriter writer = new StreamWriter(stream);writer.WriteLine(“Hello, world.”);

writer.Close();


The file is written and flushed to disk, but as long as the garbage collector does not collect the ‘stream’ object, it will hold an open file handle that will prevent you or anyone else from accessing that file. To overcome this you have to make sure that the file is released. On way of doing that would be like this:

FileStream stream = new FileStream(Path.GetTempFileName(), FileMode.Open);
try
{
  StreamWriter writer = new StreamWriter(stream);
  writer.WriteLine(“Hello, world.”);
  writer.Close();
}
finally
{
  stream.Dispose();

}


You could also use the ‘using’ keyword, but the same holds true. It is easy to make mistakes, and if you have several of such objects in the same scope, then your code will become a convoluted mess of nested try/finally or using constructs.


In C++/CLI, this problem can be solved differently.


In native C++, they way you create an object determines whether it will live on the stack or in the heap. The location implies the way the object will be cleaned up. If it is on the stack, it will be released automatically when it goes out of scope.


C++/CLI does not have the ability to create managed objects on the stack, but it supports stack like semantics. I.e. even if you use the declaration that would put the object in the stack, it is still created in the managed heap. But the syntax and the semantics are stack based.


This means that in the following example, the stream object will automatically be released when it goes out of scope.

FileStream stream(Path::GetTempFileName(), FileMode::Open);
StreamWriter writer(%stream);
writer.WriteLine(“Hello, world.”);

writer.Close();


And regardless of how many such objects we need to create or use locally, they will all be cleaned up automatically without needing ugly nested structures, and without needing extra lines of code.


Of course, you still have to determine if this is something you want, because sometimes you need objects to live beyond the scope in which they were created. In that case you can use the gcnew operator to use heap semantics.


FileStream^ stream = gcnew FileStream(Path::GetTempFileName(), FileMode::Open);


C++ gives you the ability to choose the best approach, which is nice if you want to write correct code without wanting to increase the number of lines of code, and structural complexity.

gpupdate and psexec, and a moment of D’oh

I had to do a security change today which required each XP machine to first update its policies, and then reboot to make sure that the required changes went into effect immediately.


To do the update I used psexec in conjunction with gpupdate /force to tell each machine to update its policies remotely, and I discovered something funny: if no user is logged on to the remote machine, you’ll get an error that the user settings could not be updated. Which is fair enough I suppose, except that it took me a while to figure out.


To do the reboot I used psshutdown. Mental note to self: ‘If I want to use psshutdown again to reboot all client machines on the network, I should not forget to specify -r’ The maintenance people entered the server room just as I thought ‘Hm, why is it taking so long for the computers to come back online…?’ [^o)]


Of course when manufacturing noticed that they went down and didn’t come back up, they were ‘worried’. Luckily all our client machines are located in the server room, with kvm cables running through the plant so booting all of them back on was done in a minute.


It was really cool to have the diagnostics window open and then ‘BAM’ drop every client node from the network at the same time.