Moving over to blog.senthilthecoder.com

Having become a GNU toolchain developer, I have decided to stop blogging here and continue blogging at http://blog.senthilthecoder.com. Mostly because I don’t think it’s appropriate to blog about non-Microsoft related work under the msmvps.com domain.

If you have subscribed to this blog, please reset your rss feed URL to http://blog.senthilthecoder.com/index.xml instead.

Moved over to the dark side

Yes, I’m now a Linux developer.

About 6 months back, I got an opportunity to work on the AVR Toolchain – a port of GCC and related software for the AVR and AVR32 microcontrollers. Having always been interested in compilers and programming languages, I grabbed the chance with both hands, and have been happily hacking on it since then. While GCC can be developed on Windows, Linux is the natural platform and is what most developers use, so I had to switch to Linux as well.

The switch to Linux was frustrating for a while, as I suddenly had no access to my favorite tools that I’d gotten used to for years. No Visual Studio, No WinDBG No WinMerge.., the list is long. You start working on a problem, and you realize you need to use some tool, say “Find in Files”. Now you have two problems – finding a tool that can do “Find in Files” and knowing enough about it to get the job done , and your original problem. What made things worse was that most of the work was done SSHing into another machine, so practically no GUI applications were available.

So I decided to go cold turkey and stop using Windows for a while, and it paid off handsomely. After a few days of constantly hitting Google and “man <command>” every 5 minutes, I gradually got used to it. One thing that greatly helped ease the transition was that I was already using VsVim, a Visual Studio extension that emulates the Vim editor. If you are a serious software developer, I can’t recommend learning Vim enough – it really changes the way you think about editing code (once you get past the infamous learning curve).

Now, after 6 months, I actually favor the command line over a GUI for most applications – it really lets you get a lot done over a few keystrokes (once you know what to key in, of course!).

PS: Only when I logged in did I realize that my last post was made more than a year ago. Time really does fly as you grow older 🙂

 

Naggy now does lowlighting of excluded code!

Naggy, the AVR Studio extension I wrote that provided live compiler diagnostics, now supports visually showing conditionally excluded code, again courtesy Clang. You can download the latest version here.

Most of the Clang integration code had to be rewritten for this though, as the “black box” API provided by Clang didn’t expose enough details to make this work. It took some time to get familiar with that, and I still don’t think I’ve understood Clang’s idea of object ownership completely, but boy, was it fun! I basically register a custom PPCallback with the Preprocessor and then listen to the entry/exclusion of conditional preprocessor directives. What made life difficult was that the callbacks weren’t fired by Clang
in any meaningful order – you’d sometimes get a callback for #endif
before the #if callback, based on whether the #if evaluated to true or
false. I eventually ended up just recording the source location and entered/not
entered information in the callback. When it’s time to show the excluded code, I calculate the actual
excluded blocks of code by sorting the source locations and then pairing them.

I had to change Clang code to have the callback determine whether a particular conditional preprocessor block was entered or not. I also discovered and fixed a bug in the codebase, and I’m hoping to push these changes back into the Clang codebase.

Now that the lowlighting code uses the low level API, I couldn’t have the diagnostics finding code use the higher level one – that would mean that the source file would get processed twice, once for diag finding and once for lowlighting. So there were some changes there too to unify both of these clients to share the Clang instance and have code processed only once. Hopefully I didn’t break anything that worked earlier 🙂

As usual, the source code is up at https://github.com/saaadhu/naggy/, and you can download the extension itself at https://github.com/downloads/saaadhu/naggy/Naggy%20v0.2.vsix

 

Choreo : Write VS like macros in AVR Studio

Choreo is an AVR Studio extension that lets you write and execute macros within the IDE. You can download it from https://github.com/saaadhu/choreo/downloads. The source code is available from https://github.com/saaadhu/choreo/.

If you’ve used macros in Visual Studio or in any other tool, you probably need no further explanation, but for the others, here’s what MSDN says a macro is:

“A macro is a series of commands and instructions that you group together
as a single command to accomplish a task automatically. Macros allow
you to automate repetitive actions.”

Choreo macros are just IronPython functions that have access to DTE, the top level automation object exposed by Visual Studio. Choreo discovers the macro functions at start up and creates commands for them inside AVR Studio, and these commands can then be executed through the Command Window, or through shortcut keys (after binding them to keystrokes, of course).

Here’s a macro that writes the current date/time at the current cursor position.

from System import DateTime

def InsertDateTime():
	dte.ActiveDocument.Selection.Text = DateTime.Now.ToString()

To make this macro discoverable by Choreo, save the above code in a .py file (say editor.py), and drop the file inside %localappdata%\Atmel\AvrStudio\5.0\Extensions\Senthil Kumar Selvaraj\Choreo.1\Macros. If you’re running AVR Studio, you can do Tools -> Refresh Choreo to make Choreo pick up new macros. Choreo creates commands in the format Choreo.<python_file_name>.<python_function_name> for whatever functions it discovers.

You can then bind your macro to a keystroke using the Tools -> Options dialog.

Type the keystroke, say Ctrl followed by ., in the Press shortcut keys box, click Assign and then OK. You should now be able to execute your macro with the configured keystroke when typing in the editor.

Choreo automatically picks up code changes in existing commands, so you don’t have to explicitly issue a refresh (Tools -> Refresh Choreo). However, new functions in existing files, as well as new files won’t be picked up automatically, so an explicit refresh is needed there.

Record and Replay support is still missing, and commands can’t take parameters at the moment, but otherwise, it works pretty well.

Enjoy!

 

Naggy : Live compiler diagnostics in AVR Studio!

If you’ve programmed in any managed language in Visual Studio, you’d have definitely seen those nagging red squiggles that appear as you type, telling you just how dumb you are, every time you pause typing. Might not be everyone’s cup of tea, but I’ve personally found them very useful; they save quite a few compile-groan-swear-fix-compile cycles.

So I decided to write a similar “squiggly generator” for AVR Studio 5, the product that I’m working on with a bunch of other guys. Naggy is what I call it, and it is a VSIX extension that installs into AVR Studio 5. You can download it from here. Naggy is open source (MIT License), and the source code is available at https://github.com/saaadhu/naggy/.

Mandatory screenshot:

screenshot

Naggy uses Clang under the hood to do the actual source code analysis, and uses the diagnostic information provided by it to tag appropriate text spans in the editor with the red squiggles. The diagnostic message is shown as a tooltip when you hover the mouse over the squiggle. In a sense, Naggy is little more than a wrapper for Clang – it visually shows what Clang finds. There are a few tricky things to deal with though, like reading toolchain and compiler flags from AVR Studio and passing them on to Clang.

As its version number (0.1) indicates, it is still a very raw product (see issues). In case you run into problems, you can always do Tools –> Extension Manager –> Naggy –> Disable (or Uninstall).

Any feedback is good feedback, so good or bad, do let me know about it. I’m planning to also integrate the static analysis features offered by Clang into AVR Studio, and it all depends on whether people actually find this useful.

Ramble – a Ruby sentence generator

There was an interesting problem at work recently, where I basically wanted to generate a whole lot of C language expressions for testing purposes. Rather than generating them by hand, I wondered if I could somehow feed the C language grammar into a program and get most, if not all, possible valid expressions out of it. Of course, I knew that a syntactically valid program might not necessarily by semantically valid, but I figured that I could eliminate or transform those cases manually.

So off I went, using this problem as an opportunity to write some Ruby code. The end result is Ramble, a Ruby program that given a grammar like

start : paragraphs;
paragraphs
: paragraph
| paragraph LINEBREAK LINEBREAK paragraphs
;
paragraph
: sentence
| sentence paragraph
;
sentence
: subject verb object PERIOD
;
subject
: CAT
| DOG
| PONY
;
verb
: EATS
| GULPS
| SWALLOWS
;
object
: HAY
| FOOD
| MILK
;

and some helper code to “stringify” terminal symbols like CAT, DOG and PONY, will generate text like

Cat swallows hay. Pony swallows milk. Dog eats hay. Dog swallows hay. Pony gulps milk. Pony gulps hay.

Pony gulps food. Dog eats food.

Dog gulps milk. Pony eats food. Cat swallows hay.

Cool, don’t you think?
You can browse the source code online at github here, but the general idea is to parse the grammar file (I’m assuming a simplified yacc format), generate an abstract tree of productions, and then traverse the tree to generate text (with a callback to get text for terminal symbols). Pretty similar to what a compiler does, except that instead of accepting source code as input, it takes a grammar file, and instead of emitting code, it spits out sentences in the language.

I used Treetop to parse the grammar, and wrote some hairy recursive code to filter and operate on the generated tree. Now the grammar could be left recursive i.e., it could have a rule like

A : A | B

and a generator that naively tries to generate every possible sentence recursively would quickly overflow the stack and die. I chose to randomly pick one of the two (or more) options, hoping that running the program repeatedly will give me a variety of sentences.

Now for the bad part – I tried running this on a subset of the C grammar, and it crashed with a stack overflow error. With rules that were transitively recursive, the random choice option wasn’t enough to control the recursion before it blew the stack.

And boy, was I wrong about filtering out syntactically correct but semantically valid expressions! The few times the program completed without crashing, the emitted expressions were just total rubbish semantically – there was no way I could have manually massaged those into shape.

It was an interesting exercise though, and it helped scratch two itches at the same time – dabbling with language stuff, and writing serious Ruby code (I’m a Ruby newbie).

 

When gmail.google.com didn’t actually go to GMail

The other day, I logged on to my home network and entered gmail.google.com from Firefox. Imagine my surprise when I got a “landing” page with a bunch of URLs, rather than the usual spartan GMail page.

One terrifying thought followed the other. Was my machine infected? Was my WiFi router compromised? Maybe the ASDL modem? Worse, how long had it gone unnoticed?

Ok, time to take a deep breath and start isolating the problem, I figured.

I logged on from an alternate machine and found it worked fine there. I did a nslookup from the command prompt on my machine on gmail.google.com, and got back an IP address in the Google domain.

DNS request timed out.
timeout was 2 seconds.
Server: UnKnown
Address: 218.248.241.5

Name: www3.l.google.com
Address: 209.85.231.100
Aliases: gmail.google.com

Maybe some browser extension was causing the problem? To isolate that, I tried navigating to the same URL from Chrome and IE. No luck there – I was still getting the wrong page.

Must be some kind of virus then, I figured. I wanted to see the IP address the browser was navigating to, so I fired up Wireshark and watched the traffic. Not surprisingly, I found this

42    11.621942    192.168.1.100    61.1.96.69    DNS    Standard query A gmail.google.com
43 11.659343 61.1.96.69 192.168.1.100 DNS Standard query response A 64.95.64.197

The IP address in the response (64.95.64.197) ,when run through a reverse lookup, resolved to a domain name that definitely wasn’t Google’s – the name suggested some kind of ad network. But wait, where did the IP address come from? 61.1.96.69. That was the DNS server the request was sent to, and as the Wireshark log shows, it obviously responded with the wrong IP address for the domain name.

Ok, so my browser was sending a request to a hacked DNS server, and that was why I was getting the wrong page. But where did 61.1.96.69 (the DNS server) come from? And why does nslookup use a different DNS server (218.248.241.5)? Time to check the wireless router/ADSL modem then, I thought.

And sure enough, this is what I found in the status page.

DNS 1:     218.248.241.5              
DNS 2: 61.1.96.69

Was my router compromised to use a spiked DNS server? A reverse DNS lookup on 61.1.96.69 showed that it was in fact in the same domain as my ISP (BSNL/Sancharnet). So no, it wasn’t my router – BSNL was giving me the two IP addresses, and the second one, which my browser had used, was poisoned to return wrong IP addresses for requests for the gmail.google.com domain name.

This was the first time I’d encountered DNS cache poisoning, and it is easy to see how dangerous it can be. SSL of course would save the day for secure websites – the fake website won’t be able to produce a valid certificate claiming to be the original website. But what about the millions of websites without a certificate? And how many internet users actually know about https versus http?

Terrifying, ain’t it?

I fixed the problem by forcing my router to use Google’s DNS servers rather than BSNL’s. Rather ironic, considering the trigger was incorrect lookup of one of Google’s own subdomains.

Why you should always dispose DataGridViews

Because your application can crash otherwise, that’s why.

While it’s always a good idea to explicitly dispose everything that is disposable, we can usually get away without disposing UI controls because

a. Complex controls aren’t created often – a single instance is often reused.

b. The finalizer kicks in and saves the day.

If you’re creating multiple instances of a System.Windows.Forms.DataGridView, however, watch out, because (b) doesn’t happen at all if you don’t call Dispose or otherwise cause the control to be destroyed.

When debugging a software crash dump recently, I found that the crash occurred because there was an exception when attempting to show the exception handler dialog, and that was because the application had run out of window handles. Being a managed application, that meant that the application was holding to way more UI control instances than is normal.

A quick look at the objects in the heap showed tons of instances of a certain type of control. Looking at the code, it was clear that while the control is created often, there is no code that holds on to the instances indefinitely – each new instance clears all references to the old instance. Sure, the previous instance was not being disposed, but I skipped over that, assuming that the problem must be someone holding on to the instances, or otherwise the finalizer would have cleared things up.

Dumping the gcroots for a random sample of those objects showed a common chain.

   1: DOMAIN(002869F0):HANDLE(Pinned):2013e8:Root:  02ed5250(System.Object[])->

   2:   01efce64(System.Collections.Generic.Dictionary`2[[System.Object, mscorlib],[System.Collections.Generic.List`1[[Microsoft.Win32.SystemEvents+SystemEventInvokeInfo, System]], mscorlib]])->

   3:   01efd3d8(System.Collections.Generic.Dictionary`2+Entry[[System.Object, mscorlib],[System.Collections.Generic.List`1[[Microsoft.Win32.SystemEvents+SystemEventInvokeInfo, System]], mscorlib]][])->

   4:   01efe464(System.Collections.Generic.List`1[[Microsoft.Win32.SystemEvents+SystemEventInvokeInfo, System]])->

   5:   01fed344(System.Object[])->

   6:   020253f4(Microsoft.Win32.SystemEvents+SystemEventInvokeInfo)->

   7:   020253d4(Microsoft.Win32.UserPreferenceChangedEventHandler)->

   8:   020242d0(System.Windows.Forms.DataGridView)

The delegate type in line 7 was a dead giveaway; looking for references to that type using Reflector showed the following code inside DataGridView’s OnHandleCreated method.

   1: protected override void OnHandleCreated(EventArgs e)

   2: {

   3:    // A bunch of other code

   4:    SystemEvents.UserPreferenceChanged += new UserPreferenceChangedEventHandler(this.OnUserPreferenceChanged);

   5: }

That line right there is why the crash happened.

When the DataGridView control’s window handle gets created, it subscribes to an event from a static class (SystemEvents.UserPreferenceChanged). It does unsubscribe from it in the OnHandleDestroyed method, but that gets called only if the control is properly disposed.

Now if you don’t call Dispose, the control remains subscribed to the event; that counts as a strong reference to the control, and it therefore cannot be garbage collected and finalized. Which means that the control and all its associated resources are not going to be released until the application shuts down (or the AppDomain unloads).

In the crashing application, there were other controls subscribed to events from the DataGridView, so it in turn prevented garbage collection and finalization of those controls, and eventually, they used up a really large number of window handles, causing the application to crash when it tried to create a new control.

It’s rather strange if you think about it; the finalizer is supposed to be mechanism to cleanup if Dispose is not called on an object, but for a DataGridView, the finalizer won’t run unless you call Dispose, as otherwise a strong reference to the object will exist. It won’t run after you call Dispose either – the Dispose implementation calls SuppressFinalize(this).

When a C++ destructor did not run – Part 2

In the previous post, we saw that linking a C++ static library compiled with /EHs to a mixed mode application prevented the destructor from running when an exception is thrown. Here’s the sample project that demonstrates the behavior, in case you aren’t convinced.

This is the code inside the library.

   1: C::C()

   2: { 

   3:     cout << "Constructed" << endl; 

   4: }

   5:  

   6: C::~C()

   7: { 

   8:     cout << "Destructed" << endl; 

   9: }

  10:  

  11: void SomeFunc()

  12: {

  13:     C c;

  14:     throw std::exception("Gone");

  15: }

And this is the code consuming the library.

   1: int main(array<System::String ^> ^args)

   2: {

   3:     try

   4:     {

   5:         SomeFunc();

   6:     }

   7:     catch(Exception ^e)

   8:     {

   9:         Console::WriteLine(e->ToString());

  10:     }

  11:     return 0;

  12: }

OK, so where do we start? First, let’s see if this happens in the normal (no exception) case as well. Commenting out line 14 in the first code snippet and running the code should show us that. Try it out, and you’ll see that the destructor runs now. So the problem must somehow be related to destruction during exception handling. But we have asked the compiler to emit code for C++ exception handling (/EHs), and we are throwing plain C++ exceptions, so why is this happening?

To understand that, we’ll have to dig deeper into how the the VC++ compiler and the CRT implement exceptions. Under the hood, the VC++ compiler uses Win32 SEH (Structured Exception Handling) to implement C++ exceptions. Matt Pietrek’s article explains how SEH works – do give it a quick read. The real quick summary of how it works is this – every function, on entry, creates an exception registration record on the stack that contains the address of the current function’s SEH exception handler and a pointer to the previous exception registration record, and writes the address of the record to the current thread’s TIB (Thread Information Block). When an SEH exception occurs, the OS walks through the registered records  once to determine the handler for the exception, and again to allow cleanup code to run. The handler knows why it was called by looking at the exception record that is passed to it – it has a flag that specifies that information.

We’ll look at how the C++ compiler uses SEH by firing up Windbg and disassembling SomeFunc.

 

   1: 0:000> x *!SomeFunc

   2: 011015e0 CliConsoleApp!SomeFunc (void)

   3:  

   4: 0:000> u 011015e0 011016ff

   5: 011015e0 6aff            push    0FFFFFFFFh

   6: 011015e2 6815361001      push    offset CliConsoleApp!CorExeMain+0xa5 (01103615)

   7: 011015e7 64a100000000    mov     eax,dword ptr fs:[00000000h]

   8: 011015ed 50              push    eax

   9: 011015ee 83ec10          sub     esp,10h

  10: 011015f1 a118001101      mov     eax,dword ptr [CliConsoleApp!__security_cookie (01110018)]

  11: 011015f6 33c4            xor     eax,esp

  12: 011015f8 50              push    eax

  13: 011015f9 8d442414        lea     eax,[esp+14h]

  14: 011015fd 64a300000000    mov     dword ptr fs:[00000000h],eax

  15: 01101603 a134401001      mov     eax,dword ptr [CliConsoleApp!_imp_?endlstdYAAAV?$basic_ostreamDU?$char_traitsDstd (01104034)]

  16: 01101608 8b0d70401001    mov     ecx,dword ptr [CliConsoleApp!_imp_?coutstd (01104070)]

  17: 0110160e 50              push    eax

  18: 0110160f 68c0451001      push    offset CliConsoleApp!`string' (011045c0)

  19: 01101614 51              push    ecx

  20: 01101615 e8a6010000      call    CliConsoleApp!std::operator<<<std::char_traits<char> > (011017c0)

  21: 0110161a 83c408          add     esp,8

  22: 0110161d 8bc8            mov     ecx,eax

  23: 0110161f ff1540401001    call    dword ptr [CliConsoleApp!_imp_??6?$basic_ostreamDU?$char_traitsDstdstdQAEAAV01P6AAAV01AAV01ZZ (01104040)]

  24: 01101625 8d542404        lea     edx,[esp+4]

  25: 01101629 c744241c00000000 mov     dword ptr [esp+1Ch],0

  26: 01101631 52              push    edx

  27: 01101632 8d4c240c        lea     ecx,[esp+0Ch]

  28: 01101636 c7442408d8451001 mov     dword ptr [esp+8],offset CliConsoleApp!`string' (011045d8)

  29: 0110163e ff15ac401001    call    dword ptr [CliConsoleApp!_imp_??0exceptionstdQAEABQBDZ (011040ac)]

  30: 01101644 6888ea1001      push    offset CliConsoleApp!_TI1?AVexceptionstd (0110ea88)

  31: 01101649 8d44240c        lea     eax,[esp+0Ch]

  32: 0110164d 50              push    eax

  33: 0110164e e8171f0000      call    CliConsoleApp!CxxThrowException (0110356a)

The FS register holds the address of the TIB, so to figure out where our exception handler is, we only need to find out where the FS register is being written into. That’s happening on line 14, and you can see that before that, the compiler emits code to push the the address of a function (on line 6) and the previous exception registration record (on line 7,8).  That’s the setup we were looking for, so the function address pushed must be our SEH exception handler. Let’s go ahead and disassemble that.

   1: 0:000> u 01103615

   2: CliConsoleApp!CorExeMain+0xa5:

   3: 01103615 8b542408        mov     edx,dword ptr [esp+8]

   4: 01103619 8d42f0          lea     eax,[edx-10h]

   5: 0110361c 8b4aec          mov     ecx,dword ptr [edx-14h]

   6: 0110361f 33c8            xor     ecx,eax

   7: 01103621 e81ee5ffff      call    CliConsoleApp!__security_check_cookie (01101b44)

   8: 01103626 b860eb1001      mov     eax,offset CliConsoleApp!_TI1?AVexceptionstd+0xd8 (0110eb60)

   9: 0110362b e934ffffff      jmp     CliConsoleApp!_CxxFrameHandler3 (01103564)

Control jumps to a compiler generated function (_CxxFrameHandler3), and following along the jumps takes us to MSVCR90!_CxxFrameHandler3, the CRT exception handler. That function in turn calls another CRT function, which examines the parameters passed to it. One of the parameters is the exception record, and here’s how it looks.

   1: typedef struct _EXCEPTION_RECORD {

   2:  DWORD ExceptionCode;

   3:  DWORD ExceptionFlags;

   4:  struct _EXCEPTION_RECORD *ExceptionRecord;

   5:  PVOID ExceptionAddress;

   6:  DWORD NumberParameters;

   7:  DWORD ExceptionInformation[EXCEPTION_MAXIMUM_PARAMETERS];

   8:  }  EXCEPTION_RECORD;

 
ExceptionCode contains the SEH exception code – there are predefined codes for access violations, C++ exceptions and CLR exceptions, among other things. The ExceptionFlags is what I was talking about earlier, it tells the handler why it’s being called. The CRT function does different things based on what the ExceptionFlags is, so let’s examine that part of the exception record.
 
   1: 0:000> dd 0012ebd4 

   2: 0012ebd4  e06d7363 00000001 00000000 752f9617

e06d7363 is the exception code for C++ applications, and exception flags is 1. The CRT function first verifies whether it’s a C++ exception by looking for that code. It then looks at the flag to figure out whether it should run the stack unwinding code. Apparently, 1 is the flag value when the OS makes the first pass over the exception registration chain, looking for a handler. We don’t have a catch block in Somefunc, so the function just returns without doing anything significant.

We’ll continue execution and wait for our handler to be called again the second time, hopefully asking it to unwind. Sure enough, control reaches the internal CRT function again, let’s see what the exception record contains this time.

   1: 0:000> dd 0012e5c8

   2: 0012e5c8  c0000027 00000002 00000000 68051870

Oops – it’s not a C++ exception anymore – the error code is c0000027 and not e06d7363. So even though the handler is called to ask it to unwind, the CRT function doesn’t run unwinding logic because it’s not a C++ exception.

That explains why the destructor did not run – running the destructor code is part of the unwinding logic. OK, but who changed the exception code? Let’s look at the call stack.

   1: 0:000> kb

   2: ChildEBP RetAddr  Args to Child              

   3: WARNING: Stack unwind information not available. Following frames may be wrong.

   4: 0012e51c 70aad82d 0012e5c8 0012ef84 0012e674 MSVCR90!_CxxExceptionFilter+0x707

   5: 0012e558 76f665f9 0012e5c8 0012ef84 0012e674 MSVCR90!_CxxFrameHandler3+0x26

   6: 0012e57c 76f665cb 0012e5c8 0012ef84 0012e674 ntdll!RtlRaiseStatus+0xb4

   7: 0012e944 68051870 0012f04c 6806c600 00000000 ntdll!RtlRaiseStatus+0x86

   8: 0012e968 680ccebd 0012f04c 6806c600 00000000 mscorwks+0x1870

   9: 0012ea84 680cd4f0 0012ebd4 0012f04c 0012ebf4 mscorwks!GetMetaDataInternalInterface+0x946a

  10: 0012eac4 680cd675 0012ebd4 0012f04c 0012eba8 mscorwks!GetMetaDataInternalInterface+0x9a9d

  11: 0012eae8 76f665f9 0012ebd4 0012f04c 0012ebf4 mscorwks!GetMetaDataInternalInterface+0x9c22

  12: 0012eb0c 76f665cb 0012ebd4 0012f04c 0012ebf4 ntdll!RtlRaiseStatus+0xb4

  13: 0012ebbc 76f66457 0012ebd4 0012ebf4 0012ebd4 ntdll!RtlRaiseStatus+0x86

  14: 0012ef28 70aadbf9 e06d7363 00000001 00000003 ntdll!KiUserExceptionDispatcher+0xf

  15: 0012ef60 01101653 0012ef78 0110ea88 f8a58fa0 MSVCR90!CxxThrowException+0x48

We see the CRT throwing the exception, but see who caught and triggered the second pass of the SEH handlers – mscorwks.dll, the core CLR engine. It apparently modified the SEH exception’s code when it was called as one of the handlers.

So this is what happened – the C++ code code raised an SEH exception with the error code for C++ exceptions (e06d7363). When the OS walked the exception handler chain, it asked the C++ exception handler whether it would handle the exception, and the exception handler said no, because there is no catch block in our C++ code. The next handler in the chain is the one installed by the CLR. It says yes, it will handle the exception, and in the process, modifies the exception code from e06d7363 to c0000027. When the OS calls the handlers again to ask them to unwind, the C++ handler doesn’t unwind because it doesn’t recognize it as a C++ exception from the error code. And that’s why the destructor did not run.

Why does the CLR exception handler modify the exception code? As this blog post says, it’s because managed exceptions also use SEH under the hood, and the CLR doesn’t want unknown exception codes to be passed to its handlers, for whatever reason. Except for SEH exceptions that it knows about, like access violations, it maps all other unmanaged exceptions to the same error code (c0000027) and treats them as general SEH exceptions.

How do we fix it? Based on what we know, simply adding a catch (…) { throw; } inside SomeFunc should fix the problem – the C++ exception handler will now say yes when asked if it can handle the exception. It of course wouldn’t mangle the exception code, so when the same handler is called for unwinding, it will run the destructor properly. The CLR exception handler will be involved only when the exception is re-thrown, but our destructor would have already run by then.

The right fix though is to compile the library with the /EHa option, which tells the compiler to emit code to handle both C++ exceptions and other SEH exceptions.  That way, the handler will run the stack unwind code when called during the second pass, C++ exception or not. In fact, the compiler doesn’t allow you to compile mixed mode code with /EHs – it would complain that the /clr and /EHs flags are incompatible. Unfortunately, neither the compiler nor the linker complain when linked against code compiled with /EHs – probably because they don’t know about that fact. There is some cost to compiling with /EHa though, your exception handlers would run in cases where they wouldn’t have run before, and I’d guess it also affects compiler optimizations to some extent.

We actually ran into this problem when using mixed mode code linked with Omni ORB, an open source native library for CORBA. It’s compiled with /EHs, and as you’d know by now, that caused some serious resource leak issues when there were exceptions involved. It took some serious debugging to narrow down the problem; missing C++ destructor calls don’t happen everyday, after all.

When a C++ destructor did not run – Part 1

Consider this piece of C++ code.

   1: using namespace std;

   2:  

   3: class C

   4: {

   5: public:

   6:     C() 

   7:     { 

   8:         cout << "Constructed"; 

   9:     }

  10:     ~C() 

  11:     { 

  12:         cout << "Destructed"; 

  13:     }

  14: };

  15:  

  16: void SomeFunc()

  17: {

  18:     C c;

  19:     throw std::exception("Gone");

  20: }

If you know any C++ at all, you’ll know that when SomeFunc returns, both “Constructed” and “Destructed” will be printed to the console. That is because C++ guarantees that the destructor of an object created on the stack will always run when control leaves the scope, no matter what, and RAII depends on that fact.

You put all this code is in a static library, say PureCPP.lib, and you compile it with the /EHs option, because you want to use C++ exceptions.

You then write a native application to consume this library, statically link to it, and everything works great.

One day, you wake up and realize you’ll have to try out this .NET stuff that everyone is talking about. You discover that there’s this language called C++/CLI that’s great for interfacing with native code. So you fire up VS, create a CLR console application that calls SomeFunc, and link PureCPP.lib against it.

Just when you’re wondering how easy things turned out to be, you notice something strange. There’s something missing in the console output. When you figure out what’s missing, your jaw hits the ground. Mine did too, when I realized that it was the “Destructed” part that was missing. Which means the impossible just happened – the destructor for class C did not run.

What followed was a long and exciting journey into the world of SEH (Structured Exception Handling), exception codes and exception propagation and handling by the CLR versus C++. All that in the next part – stay tuned.