Global command binding: the less attractive options

While I find the technique outlined in the previous post to be the preferable way to create globally available commands that can be used throughout a program’s user interface, as I mentioned I did try a number of different approaches to solve the same problem. Fact is, I think none of them are completely terrible. For that matter, it would not have been the end of the world had I wound up having to redeclare command bindings in every window or other UI element when I wanted a global command to apply. In that spirit, here are some of the other techniques I tried out. This provides alternatives so that one can make an informed choice about what technique to use, and also as a side-effect shows techniques that when generalized could actually be the preferred solution for some other problem.

Completely static code-behind:

I already mentioned the brute force approach, i.e. redeclaring command bindings in each UI element where the global command is needed. So I won’t include that here. The first alternative I want to share here is a completely static affair, without any XAML at all. It looks something like this:

public static readonly RoutedUICommand StaticGlobalCommand = new RoutedUICommand(
            "Execute GlobalCommand",
            "staticGlobalCommand",
            typeof(Window),
            new InputGestureCollection
            {
                new KeyGesture(Key.F, ModifierKeys.Control)
            });

static App()
{
    CommandBinding commandBinding =
        new CommandBinding(StaticGlobalCommand,
        (sender, e) => GlobalCommand(e.Parameter), null);
}

Then any XAML can reference the command using the markup expression {x:Static App.StaticGlobalCommand}.

Making the command available via a static field or property is consistent with the philosophy of true “global” data. In practice, a WPF program isn’t going to have more than one instance of the App class. But using static members avoids the question altogether. There’s definitely just one of them.

Completely static, using XAML:

Code-behind isn’t really the best way to declare these command and binding objects. It works, but it may complicate tasks like localization. So here’s an approach that provides the same truly-static access to the command, but which allows the objects themselves to be declared in XAML.

First, we’ll need a custom ResourceDictionary subclass added to the project. Its XAML will look something like this:

<ResourceDictionary x:Class="TestGlobalCommand.AppStaticResources"
                    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
                    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml">
  <RoutedUICommand x:Key="StaticGlobalCommand" Text="Execute GlobalCommand">
    <RoutedUICommand.InputGestures>
      <KeyGesture>Ctrl+F</KeyGesture>
    </RoutedUICommand.InputGestures>
  </RoutedUICommand>
  <CommandBinding x:Key="staticGlobalCommandBinding"
                  Command="{StaticResource StaticGlobalCommand}"
                  Executed="StaticGlobalCommandBinding_Executed"/>
</ResourceDictionary>

This XAML is essentially the same as that used for the “best” approach; it’s just in its own separate XAML file instead of in App.xaml. And similarly, in the code-behind for the above custom ResourceDictionary, the StaticGlobalCommandBinding_Executed() method needs to be declared:

private void StaticGlobalCommandBinding_Executed(object sender, ExecutedRoutedEventArgs e)
{
    App.GlobalCommand(e.Parameter);
}

The method simply calls the static method in the App class that actually implements our command.

Finally, we need a way to load the XAML and publish the command for use by other code in the program. This can be done in the App‘s static constructor:

public static readonly RoutedUICommand StaticGlobalCommand;

static App()
{
    AppStaticResources appStaticResources = new AppStaticResources();

    StaticGlobalCommand =
        (RoutedUICommand)appStaticResources["StaticGlobalCommand"];
    CommandManager.RegisterClassCommandBinding(
        typeof(Window),
        (CommandBinding)appStaticResources["staticGlobalCommandBinding"]);
}

Of course, this has the same problem we had in the initial version of the “best” approach: as one adds commands and their bindings, new lines of code need to be added for each, referencing each explicitly. We can address it the same way, with helper methods that enumerate the resources and do the right thing. We’ll use the same _RegisterCommandBindings() method from the previous post. But we also need a way to fill in the static fields:

private static void _PublishCommands(ResourceDictionary resourceDictionary)
{
    foreach (DictionaryEntry entry in resourceDictionary
        .Cast().Where(entry => entry.Value is RoutedUICommand))
    {
        FieldInfo fieldInfo = typeof(App).GetField((string)entry.Key, BindingFlags.Static | BindingFlags.Public);

        if (fieldInfo != null)
        {
            fieldInfo.SetValue(null, entry.Value);
        }
    }
}

Yeah, I know. Reflection: yuck! But it only happens once, and should not impose a significant performance cost.

So now the initialization can look like this:

AppStaticResources appStaticResources = new AppStaticResources();

_PublishCommands(appStaticResources);
_RegisterCommandBindings(appStaticResources);

The reference for each RoutedUICommand object in the ResourceDictionary will be copied to the field having the same name as the key of the object in the dictionary, and each CommandBinding in the dictionary will be published via the RegisterClassCommandBinding() method.

After all’s said and done, I’m not sure it’s really so important to use static members for global data in this context. But the above techniques will allow for that constraint if desired.

Dedicated ICommand implementation:

The routed command paradigm can be very useful. Many types of commands are general-purpose and it makes sense to provide a mechanism by which a given command can be implemented by different components depending on context. “Tie your shoe” is a command that most humans in modern society can respond to. Even more specialized commands, like “make change for my twenty dollar bill”, can be handled by a large number of different actors in different situations. Being able to “route” such commands to the nearest and/or most appropriate person is clearly useful. But some commands are so specialized, there’s really only one actor who can handle them. “Sign or veto this bill passed by Congress”: that’s a command that (in the US anyway) can be executed only by the President. You always know who’s going to handle it, so there’s no point in asking anyone else if they’re able to. There’s just the one possible implementor and you can go straight to them.

This is the same for commands in WPF. And especially for the type of “global command” that I’m talking about here. The command is global. By definition, it has exactly one implementation, available to all components of the program, provided by exactly one component. To me, it seems silly to force WPF to go looking through the object tree looking for a command binding so that it can find the implementation for the command. Every time it does that search, it’s going to come up with the same answer (per our various examples here and in the previous post). So, can’t we just implement the command in a dedicated class that implements the ICommand interface and expose that?

Well, to me that seems like a perfectly reasonable approach. Yes, there’s a bit of type proliferation here (a new class for each command), but in return for that you get an object with a type name that completely describes the purpose of the type, rather than some general-purpose type that forwards implementation to some hard-to-discover code. You also get more control over the implementation details: in particular, you can implement the ICommand so that the CanExecuteChanged event is in fact raised if and only if the CanExecute() method would return a different value (the alternative for routed commands would be to call CommandManager.InvalidateRequerySuggested(), to signal to WPF that it should go reevaluate all the CanExecute() methods it knows about, a fairly heavy-handed solution).

For the most part, this approach would work great. Except for one little detail: the RoutedUICommand and MenuItem classes are closely coupled (which is ironic, given how so much of WPF’s design is meant to avoid such close coupling), and the MenuItem class will only automatically populate itself (its text and the shortcut key information) if the bound ICommand implementation is in fact RoutedUICommand. In that case, it will use the object’s Text and InputGestures properties to fill in the item’s display text. Without RoutedUICommand, these have to be handled explicitly by one’s own code: the MenuItem‘s Header and InputGestureText properties need to be set, and the key or mouse binding for the command has to be declared elsewhere.

So, while using a custom implementation of ICommand can in fact be useful for some “global command” scenarios, it falls flat if the command will be exposed to the user in a menu. But, we can fix that.

The idea here will be to create an attached property, which can be applied to a MenuItem object, binding an ICommand implementation to that MenuItem in a way that allows us to automatically set the necessary properties from the ICommand object rather than having to set them explicitly in the XAML for the MenuItem (they may still be set in XAML, e.g. in a resource declaration where our custom ICommand implementation instance is declared).

For this to work, first we’ll need to extend ICommand so that our attached property knows what to set:

public interface IMenuCommand : ICommand
{
    string Header { get; }
}

This interface didn’t necessarily have to inherit ICommand, for this technique to work. For example, it could just have been named something like “IHeadered” and had no base interface. But the attached property code is going to require both the ICommand and IMenuCommand interfaces, and declaring the latter as inheriting the former ensures compile-time validation of this requirement.

Now that we have an interface that provides the data we want, we can implement an attached property to make use of it:

static class MenuCommand
{
    public static readonly DependencyProperty CommandProperty = DependencyProperty.RegisterAttached(
        "Command", typeof(IMenuCommand), typeof(MenuCommand), new PropertyMetadata(_CommandPropertyChanged));

    private static void _CommandPropertyChanged(DependencyObject d, DependencyPropertyChangedEventArgs e)
    {
        MenuItem target = d as MenuItem;

        if (target != null)
        {
            IMenuCommand command = (IMenuCommand)e.NewValue;

            if (command != null)
            {
                target.Command = command;
                target.Header = command.Header;
            }
            else
            {
                target.Command = null;
                target.Header = null;
            }
        }
    }

    public static void SetCommand(MenuItem target, IMenuCommand command)
    {
        target.SetValue(CommandProperty, command);
    }

    public static IMenuCommand GetCommand(MenuItem target)
    {
        return (IMenuCommand)target.GetValue(CommandProperty);
    }
}

Per typical attached property implementation, there are the basic setter and getter methods, and then the property-changed event handler where the real work happens. In this example, that work is fairly simple: the code just assigns the IMenuCommand object (i.e. the value of the property) to the MenuItem.Command property. That part just replaces the basic functionality of binding to the MenuItem.Command property directly. In addition though, we take the Header value from the IMenuCommand object and assign it to the MenuItem.Header property.

Of course, there needs to be the implementation of the IMenuCommand interface:

class GlobalMenuCommand : IMenuCommand
{
    public string Header
    {
        get { return "Execute GlobalCommand"; }
    }

    public bool CanExecute(object parameter)
    {
        return true;
    }

#pragma warning disable 67
    public event EventHandler CanExecuteChanged;
#pragma warning restore 67

    public void Execute(object parameter)
    {
        GlobalCommand(parameter);
    }
}

This is of course a very basic implementation of the interface, for the purpose of the illustration. Naturally, one can implement the interface members as needed, provide constructors with appropriate parameters, etc. as one wants.

Again, for the purpose of illustration, I made the above implementation a private nested class in the App class, and exposed it via a simple readonly field:

public static readonly IMenuCommand GlobalCommandCommand = new GlobalMenuCommand();

And now that command can be bound in XAML:

<MenuItem l:MenuCommand.Command="{x:Static l:App.GlobalCommandCommand}"/>

(With the usual XML namespace declarations for the “l” namespace, of course).

One nice thing about this technique is that it extends the ICommand interface in any situation. I’ve used it here to allow my App class to exposes a command statically. But the same idea works for ICommand objects exposed by view models too, which normally would have the same issue of not having automatic population of menu and input members.

The astute reader will notice that I have completely glossed over the question of the input gesture for the command. And without handling that, no mouse or key input is active for the command (while all of the other approaches I’ve shared here do provide for that).

Correctly generalizing that is a little trickier, because the MenuItem object isn’t where the input gesture is actually handled. It has a string property that can be used to display the gesture being used, but that doesn’t affect the actual input. An obvious way to solve this would be to add an InputGesture property to the IMenuCommand interface; the attached property could then do two things: first, it could convert the gesture to text to be assigned to the MenuItem.InputGestureText property; second, it could walk up the logical tree to find the top-level object that encloses everything and add to that object’s InputBindings collection an appropriate InputBinding for the InputGesture. E.g. if the gesture is a KeyGesture, add a KeyBinding object that ties the gesture to the command.

I’ve left that out for a couple of reasons: including it would significantly complicate the code being shown above, distracting from the core concept I really wanted to focus on; and, it is not clear to me that even the example solution (finding the top-level object in the tree and adding a binding there) is really the best general solution in all scenarios (for example, putting the input binding closer to the MenuItem object can limit when the binding would be active…in some cases, this might actually be desirable). I would encourage anyone considering an approach like this to think about what makes the most sense in their scenario, and make sure that the generalized implementation actually suits their needs well.

Conclusion:

So there it is. Three more alternatives to the basic idea of implementing a globally-available command, i.e. one that can be accessed and used by any component anywhere in the program it makes sense. I hope it was interesting, if not useful. :)

Command binding in WPF: global commands

The design of the WPF API encourages one to keep one’s code decoupled. That is, the goal is for each piece of code to do one thing,  do that one thing well, and to not carry dependencies on other pieces of code (particularly with respect to avoiding a provider being dependent on, or even aware of in any specific way, the code using that provider). This philosophy carries through to the way that WPF handles user input. While it’s certainly possible in WPF to simply wire event handlers to control events like Click, to have the associated handler perform the appropriate action for that control, there are higher level mechanisms that abstract the concepts of the source of user input, the action that user input corresponds to, and the piece of code responsible for handling that action.

The “action” is represented by the ICommand interface, and there are two main variations on that theme: the RoutedCommand and especially its subclass RoutedUICommand (both of which implement ICommand), and the directly-implemented ICommand. The former is typically used for commands that are specifically related to how the user interacts with the program itself (i.e. the UI), while the latter is typically used for interactions with the underlying data (i.e. implements the business logic in a view model). While the directly-implemented ICommand is just that — an implementation that is invoked directly through the ICommand object — the RoutedCommand represents the abstraction of a command, with the implementation of the command being bound to the command after the fact. Indeed, through the routing system, the same command can take on different concrete meaning depending on context; the command binding can vary according to the specific needs of the program, allowing for different implementations of the same command according to those needs.

Naturally, for a given command, the implementation should hold to the spirit of the intent behind the command; for example, if the command is named “open”, the implementation should involve opening something, not saving or closing it. WPF has a whole collection of pre-defined commands that a program can use, found in the ApplicationCommands class. For commonly used commands, this is a good place to look. If the action you’re implementing is found there, then you can use that instead of having to implement your own.

By now, you might be asking yourself why I’m writing all this. After all, WPF has been around for a long time now and no doubt there are plenty of other tutorials out there that adequately or even skillfully explain how command binding works. Indeed, as a non-expert in the WPF API, I’m unlikely to provide better or more detailed information than the experts who have written such tutorials. My motivation here is two-fold: first, as I mentioned in an earlier post it’s my hope that as a relative newcomer to the WPF API, I’ll be able to remember what was especially hard and present the information in an easy-to-digest manner; and second, as it happens I recently ran into a particular issue involving command binding, which required some time on my part investigating various options available in WPF, to learn what was possible and (I hoped) what was the best solution.

(By the way, I will admit that one of the things I found, and continue to find, so challenging about WPF is that it’s a very broad and complex API, often with several different ways to accomplish the same basic goal. At times, it almost feels like there were multiple people all designing solutions for similar or identical problems, and all of their solutions wound up in the API. Not only is this often disorienting — it’s easy to get confused about what particular paradigm one is actually using at the moment — it makes it very difficult to know whether one is solving a problem the right way. In some cases, a detailed investigation will reveal some truths. At other times, absent expert advice one just has to take on faith that “it works” is good enough :) .)

The particular issue I was dealing with was this: I have an action in my program that allows the user to perform some configuration that affects the program globally. I have a single static method in the App class that when called, shows the UI for this configuration. I want the user to be able to do this in multiple windows, so I will be using the same command in each window. The simplest approach is simply to create a new command binding in each window, and have the code-behind call the static method:

<App.Resources>
  <RoutedUICommand x:Key="globalConfigCommand" Text="Configure Foo...">
    <RoutedUICommand.InputGestures>
      <KeyGesture>Ctrl+F</KeyGesture>
    </RoutedUICommand.InputGestures>
  </RoutedUICommand>
</App.Resources>

<Window.CommandBindings>
  <CommandBinding Command="{StaticResource globalConfigCommand}" Executed="GlobalConfigCommand_Executed"/>
</Window.CommandBindings>

I haven’t shown the GlobalConfigCommand_Executed() method. This is just the Executed event handler, which does nothing more than call the actual global command method.

Okay, so that doesn’t seem so hard. What so wrong with just binding the command in each window or other XAML element where it applies? Well, I see two issues, each of which are variations on the same theme:

  1. Extra typing. Well, that’s annoying enough, but what’s really a problem here is the violation of the “DRY” principle: Don’t Repeat Yourself. Having to duplicate the command binding each time is not just time-consuming but also creates a maintenance problem if the binding or, more likely, the attached event handler(s) have to change.
  2. For commands like this, where there’s a static (or possibly singleton) implementation shared by all applicable contexts, it seems silly enough to have to wrap the call to the shared implementation in a bare-bones event handler, but to have create such a wrapper in each context is even worse.

In my particular example, the cost of repeating oneself is not great. It’s just one command, and I might only have a handful of windows where I want it available. But it’s not hard to see that as the number of globally applicable commands goes up, the cost of repeating oneself increases as well. This is why it’s a good idea to design the code in a non-repetitive, reusable way, even if at the moment it seems okay to repeat oneself. If it was worth doing once, you’re probably going to need to do it again, and making sure you can do so in an efficient, maintainable way is a good goal to have.

Fortunately, WPF provides a mechanism to allow command bindings to be set up globally, rather than just for specific elements via the CommandBindings collection. The CommandManager class includes the RegisterClassCommandBinding() method for this purpose: using it, you can bind Execute and CanExecute event handlers for a command just as you would for an individual element, except for any class. This means that the command routing system, when searching for a handler for a RoutedCommand, will find the handler on any instance of the registered class, without requiring that the binding be declared for each instance.

Here’s an example of how one might use that:

App.xaml:

<Application.Resources>
  <RoutedUICommand x:Key="globalCommand" Text="Execute GlobalCommand">
    <RoutedUICommand.InputGestures>
      <KeyGesture>Ctrl+F</KeyGesture>
    </RoutedUICommand.InputGestures>
  </RoutedUICommand>
  <CommandBinding x:Key="globalCommandBinding"
      Command="{StaticResource globalCommand}"
      Executed="GlobalCommandBinding_Executed"/>
</Application.Resources>

App.xaml.cs:

public App()
{
    this.Startup += (sender, e) =>
    {
        CommandBinding globalCommandBinding = (CommandBinding)FindResource("globalCommandBinding");
        CommandManager
            .RegisterClassCommandBinding(typeof(Window), globalCommandBinding);
    };
}

private void GlobalCommandBinding_Executed(object sender, ExecutedRoutedEventArgs e)
{
    GlobalCommand(e.Parameter);
}

The above includes the following:

  1. The command itself, declared in the App’s resources. It can be referenced by any XAML in the project (e.g. in a MenuItem) through the usual {StaticResource globalCommand} syntax. Note: the command includes a key gesture; this will be available globally (via the same command registration shown here), whether or not any other XAML in the project actually uses the command.
  2. A command binding, which references the command, and binds it to an event handler in the App class, a method named GlobalCommandBinding_Executed().
  3. Code in a handler for the Startup event to register the command binding for the Window class. This has to happen in the Startup event rather than the constructor itself, because the App class’s XAML (and thus the "globalCommandBinding" resource) won’t be loaded and available until then.
  4. Finally, of course, is the Executed event handler bound by the command binding, which does nothing more than call the actual global command method.

With that all set up, now the command is available in every window of the program, without having to actually add anything to the window’s XAML. If the user presses the key gesture Ctrl+F, the command will execute. A MenuItem can reference the command from any XAML (global resource, XAML element in a UserControl, in a Window, etc.), and the item will show the command text and key gesture information, as expected, and will execute the bound handler when clicked.

There is still one thing about the above that’s a little annoying: following that exact pattern, one would need to write two statements in the Startup event handler code for each command binding: one to retrieve the resource, and then another to register the binding. We can definitely do a little better; one option would be to loop on an array of resource names (defined e.g. in an array) and call a helper method for each name. But it’s unlikely a CommandBinding object would be in the App class resources for any purpose other than a global binding. So why not just enumerate all of the CommandBinding object’s and register them all? That would look like this:

public App()
{
    this.Startup += (sender, e) =>
    {
        _RegisterCommandBindings(Resources);
    };
}

private static void _RegisterCommandBindings(ResourceDictionary resources)
{
    foreach (CommandBinding commandBinding in resources.Values.OfType<CommandBinding>())
    {
        CommandManager.RegisterClassCommandBinding(typeof(Window), commandBinding);
    }
}

There. Now, to add any new global command, we need only add to the App‘s XAML the command object itself, an Executed event handler in the code-behind, and of course the CommandBinding object to tie the two together. XAML for specific program elements need not be visited at all, unless one wants the command to be visible in some UI element.

As I mentioned earlier, I tried a number of different techniques. The above is what I feel works best, at least for me (but I hope for you too). But this is WPF and so there are other ways to accomplish the same effect, or at least something similar. It’s my plan to share in my next post the other techniques I tried; knowing the other options will, I think, provide some insight by comparison as to why I like the above the best, and will in any case give the reader alternatives if the above does not in fact seem best to them.

How many XAML-based APIs do we really need?

I admit, this is really just a rant. Readers may want just to move on; this post isn’t going to solve any problems. Heck, it isn’t even based on full knowledge of the historical context. But I’ll share what I know, and explain my suppositions. I just have to get this off my chest:

Microsoft should have used WPF as the basis for the converged Windows UI API.

 

When I think about this issue (which is not all the time, but probably is more often than I should), my thoughts always go back to a conversation I had over lunch in the late 90’s. Some may recall, back then there was Windows CE. And Microsoft put it on mobile phones. I was having lunch with a friend who, at that time, worked as an advisor to a Microsoft executive involved in strategic planning for the company. I was excited about the prospect of a handheld device that could be programmed by the user, and had all these ideas about how important such a device would be.

Admittedly, my ideas didn’t come close to anticipating the phones we have today. I was thinking only of immediate capabilities, and envisioned a sort of “smart day planner” device: something that would handle contact information (addresses, phone numbers, etc.), shopping lists, task management, appointments, that sort of thing. But the thing I was most excited about was the prospect that these devices would be connected, in a way that allowed for individual household members to each have one, but for scheduling of household events, activities, and tasks to be coordinated through a centralized database. A display in the house (e.g. on the fridge) could show the calendar for all household members, shopping list items could be shared, individuals could schedule activities involving multiple household members while taking into account those other members’ schedules. These were just a few of the useful things it seemed to me such a device could do.

But my friend made very clear, I was being foolish. I don’t remember his exact words, but the gist of it was: this is a phone. People don’t want their phones to do all that other stuff, they just want to be able to call someone else and talk to them. He felt that my viewpoint was that of a gadget nerd, and not at all indicative of what the general public would want or support.

So, I guess the first part of my rant is the lack of vision Microsoft had with respect to the utility and popularity of the idea of a smart phone. I don’t blame my friend per se; I’m just using him as an example of the general mentality that apparently existed in upper management at the company at the time. Microsoft had a ten-year lead over Apple on the whole concept of a smart phone, and they blew it. They never really committed to the technology they were working on, so that by the time they revisited the issue, they were in catch-up mode instead of being the leader.

And it seems to me that being in catch-up mode led Microsoft to decisions that while perhaps expedient for short-term goals, failed to take into account long-term realities.

Anyway, around that same time, Microsoft was developing .NET. With that, I was in the role of my dismissive friend: as soon as I heard about .NET garbage-collection system, I branded the whole idea as flawed. I based this knee-jerk reaction on my experience with Lisp back in the 80’s, the only other garbage-collecting system I’d ever used. And to be sure, it had been a terrible experience. I was programming in an interpreted environment, with a garbage collector that would in fact freeze your program for up to minutes at a time to do its thing. I’d only barely heard of Java, didn’t really understand what it was, and certainly hadn’t seen what could be done with a managed-code environment running native code and an optimized garbage collector. Suffice to say, I wound up with egg on my face. (I do try to keep more of an open mind these days :) ).

But wisely, Microsoft had committed. They invested heavily in .NET, making it more than just the marketing tool the term “.NET” started out as: they produced great tools, continually advanced the languages (C#, Visual Basic.NET, and F# all provide excellent and very different ways to access the same underlying platform), and proved me completely wrong about the viability of a managed-runtime based programming environment.

Initially, there was just Windows Forms. This was a thin veneer over the native Win32 user32.dll API, not much different from the MFC API that had been available through the 90’s. Where MFC had tables of callbacks, Winforms in .NET used “events” (not invented for .NET, but certainly embraced by it), and more fully used the OOP features available. But it was still basically the same paradigm old-school Windows programmers had been familiar with since the early days: window messages drove immediate-mode rendering and user input. It did evolve features to make database client programming easier, and even had a rudimentary form of data binding, but lots of Winforms programs were written that completely ignored these features, instead sticking to the same basic techniques used in the lower-level APIs.

Winforms worked well, but in the context of the development of Windows Vista, which was to support more advanced video display techniques, including a greater reliance on DirectX API features, it was looking a bit creaky. In addition, design patterns like “Model-View-Controller” (popularized in part by Apple’s OS X), in which user interface logic was better- and more-overtly separated from business logic were becoming better known and more closely adhered to. That’s the context in which WPF came about, and it provided a number of important improvements over the Winforms paradigm: the graphical API was much more closely tied to DirectX, taking advantage of advanced hardware features; the API itself provided a retained-mode paradigm (given the frequency with which Winforms programmers failed to understand how to correctly draw things in their programs, shifting responsibility for the actual rendering of UI objects back to the API was a major improvement); and data binding was a first-class citizen in the API…it is actually harder to write WPF code that doesn’t use data binding than to write code that does.

WPF was pretty heavy-weight though, especially compared to Winforms. Those features came at a price. (I also suspect that the implementation was not necessarily as concise and efficient as it could have been, but no matter what, WPF wasn’t going to be as thin a layer as Winforms.) So around the same time, when Microsoft decided to tackle Adobe’s Flash plug-in for web browsers, Silverlight was conceived as a slimmed-down version of WPF. Something that didn’t take multiple seconds for a process to start up, and which could run on older hardware was well as new.

This is where they got into trouble though. The first iPhone came out in 2007, the same year Silverlight was released. No doubt there was a lot of scrambling at Microsoft to figure out how to respond to this new threat. Phone hardware was not as capable as desktop hardware, to be sure. But unlike the browser environment, new OS versions were only going to go on new phone hardware. And Moore’s Law was still going strong, making it clear that phone hardware was going to be improving in leaps and bounds in the coming years. Yet, Microsoft chose to use Silverlight as the basis for their new smart phone API, presumably on the basis of its “lightweight” nature.

This choice was reiterated in subsequent versions of Microsoft’s smart phone OS, and the problem was compounded by the fact that as the Silverlight/Windows Phone API found itself in need of restoring some of the features that had been stripped from the original .NET/WPF API, they were not replaced by .NET-compatible versions, but rather newly-invented variations. Then, as convergence of the phone and desktop APIs became more of a priority, the OS group (in charge of the native desktop Windows API) chose to model their new “managed Windows Runtime” after Silverlight, rather than using .NET and WPF as their model.

This, in spite of the fact that as a desktop OS, they weren’t even dealing with the same limitations that had motivated the use of Silverlight originally. I guess their thinking was probably that they knew they wanted the Windows API to converge desktop and phone, that phone was the lower common denominator, and that the Silverlight-based API was already on the phone. But as far as I’m concerned, they bet on the wrong horse.

So that’s where we are today. UWP is based on a technology that was specifically designed to accommodate limitations in what was the least-capable general-purpose computing hardware ten years ago. As a result, it is not directly compatible with what is still the most full-featured and useful API for the desktop version of Windows.  Rather than extend WPF to accommodate new “app store”-related features (most of which don’t really relate to the graphical API anyway and so wouldn’t conflict with WPF), we are now stuck with writing windowed desktop programs in WPF, but “app store” programs in an API just close enough to fool one into thinking that one could easily go from one to the other, but which in fact is missing a lot of what makes WPF so powerful and useful (never mind that many features that are shared, do not actually work the same way in each API).

Microsoft should instead have anticipated that the limitations they were trying to deal with in phone hardware were a) not nearly as constraining as those that existed on the phones currently in their collective pockets, and b) very likely to be removed in the near future. And in fact, by the time the first version of Windows Phone was released, a handheld phone was every bit as powerful as the PCs that WPF had originally been designed to work with in the early 2000’s.

So, maybe it’s foolish to continue investing in learning about WPF, never mind making an effort to share what I’ve learned. But I’ve come to respect and enjoy the API a great deal; I’ve certainly had my share of frustration with it too, but on the whole I like it a lot, and I find myself very much regretting Microsoft’s choice to not make the new UWP API fully compatible with it. Foolish or not, I intend to continue gaining expertise in WPF, with an eye toward applying what I’ve learned to desktop programs that take full advantage of the API, and toward keeping the bigger-picture paradigmatic aspects in mind as I also delve into UWP apps. Certainly there remains a fair amount of cross-over conceptually, and I feel I’ll benefit personally and professionally from having a better awareness of the comparisons and contrasts between the two.

Okay, rant over. I think that’s enough to let off the steam I’ve built up. Next time, I do some real programming. :)

I’m back!

And, I hope, in a good way. :)

For a variety of reasons, I got a decent start on this blog years ago, but then got distracted by other things and was forced to set it aside. Well, I’m finding a need and opportunity to return: better time management (only slightly…I’m still not very good at it); a slight reduction in distraction; but mostly, for a little over a year now I have been investing a significant chunk of my time in catching up on some of the developments in the software world that occurred while I was busy with other things.

My primary focus for the near future will be WPF. Which is ironic I realize, since this API appears to be mostly abandoned by Microsoft. They have instead put their current energies into the Winrt API, now evolved into Universal Windows Platform. Which is a shame. I’ve found that as difficult as it is to really grok WPF well enough to be efficient in it, it is quite powerful and offers a fundamental philosophy that encourages good separation of concerns, code isolation, decoupled programming, whatever you want to call it. Unfortunately, due to the evolutionary path UWP took, it is missing a number of the useful features WPF has, forcing developers to choose between a fully-functional XAML-based API in WPF, and the cross-platform benefits of UWP.

I’ve got a whole rant underlying my disappointment in Winrt, which I’ll share in the next post. For now, I’ll just leave it at that.

That said, I’ve found there’s a lot to like about all of the XAML ecosystem. And as I explore WPF, I will also be digging a little bit into the Winrt/UWP API, looking at what things transfer, what things don’t, and possible work-arounds for dealing with the latter.

So what’s next for this blog? Well, as I mentioned, I’ve got some baggage about Winrt to get off my chest first. After that, I thought I’d share some of what I’ve recently been learning about ICommand, CommandBinding, etc. I also had some fun recently playing with animations, which I thought I’d write about.

Once I get those out of the way though, I’d like to try to describe WPF (and by association, the other XAML APIs) from the perspective of a newbie. Like I said, it’s only been a little over a year since I really started learning the API, and I think I still remember enough of how bewildered I was from the outset to be able to write some reasonable introductory-type material to help others get acclimated.

(And if you’re actually reading this shortly after I posted it, thanks for hanging in there until I came back! :) I admit, I don’t expect there to be any regular readers left, after all these years, especially taking into account that I never had that many to start with. The above is as much a roadmap for myself, and a marker to orient any who might show up later and wonder about the discontinuity of the blog, as anything else.)

MSDN’s canonical technique for using Control.Invoke is lame

While I’m on the subject of using the Control.Invoke method, I’d like to mention a pet peeve of mine. That is, the technique that everywhere on MSDN is proposed for dealing with using that method.

In particular, according to Microsoft, the preferred technique is to write one method that does two completely different things, depending on the value returned by the Control.InvokeRequired property. If the property is true, then Invoke is called using the same method as the target, and if it’s false, then whatever code was really desired to execute is in fact executed. It looks like this:

void DoSomethingMethod()
{
    if (InvokeRequired)
    {
        Invoke((MethodInvoker)DoSomethingMethod);
    }
    else
    {
        // actually do the "something"
    }
}

Now, I’m not a big fan of methods that do more than one thing in any case. It’s my opinion that a method should do one thing and do it well. There are exceptions to every rule of course, but they are few and far between. More importantly, the above pattern does not rise to the requirements of being such an exception.

To understand why, consider what happens if you call Invoke when InvokeRequired returns false. The designers of .NET could have implemented it to throw an exception if you try to call Invoke when not necessary. But there’s no obvious reason that they should have, nor did they. In fact, the Invoke method will “do the right thing” in that case, and simply invoke the target delegate directly rather than trying to marshall it onto the thread that owns the Control instance.

Note that the Invoke method can’t just always do the marshaling. It has to know whether doing so is necessary, because if it tried to marshall the invocation when it wasn’t necessary, it would wind up stuck waiting on the marshaled invocation to happen, which it never would because it’s waiting using the same thread that’s needed for the invocation.

So Invoke simply invokes the target directly, and to decide to do this it checks the same state your own code would be checking when it looks at the InvokeRequired property.

In other words, by using the MSDN-prescribed pattern, you’re duplicating the exact same effort that already is made by the .NET Framework.

My opinion is that it’s better to not have the redundant code, and to take advantage of the fact that .NET is already doing what MSDN proposes you do: just always call Invoke and let .NET sort it out. Now, you may be thinking “but if I always call Invoke using my own method as the target, won’t that cause an infinite recursion?” Yes, it would, so don’t do that. [:)]

Instead, take advantage of C#’s anonymous methods, wrapping all of your invoked logic inside one and invoke that:

void DoSomethingMethod()
{
    Invoke((MethodInvoker)delegate
    {
        // actually do the "something"
    });
}

Note: I prefer anonymous methods for this situation, but some may prefer a lambda expression instead, and especially if what you’re invoking actually returns some value, it might even be more expressive to use that. A lambda expression is a fine alternative to an anonymous method, and winds up compiled to basically the same thing.

One final thought: don’t judge Microsoft too harshly for promoting their inefficient approach so broadly. It’s unfortunate that new examples continue to be created, and that the newer versions of the documentation haven’t been changed to replace the older examples. But prior to C#/.NET 2.0, using an anonymous method in the invocation just wasn’t an option.

I wasn’t using .NET in those early days, and even then I would not have used the “one method, two behaviors” technique. Instead, I would have broken the functionality into two different methods, one that invokes the other. But the MSDN-promoted technique is less inappropriate in that context; in fact, it was as close as they could come to the benefit that anonymous methods offer of keeping all the functionality in a single method. Since I value that feature of anonymous methods so highly, I can hardly fault them for striving for that goal even when they didn’t have anonymous methods to use. [:)]

Form-closing race condition (part 2)

In my previous post, I mentioned that it turns out that there is a way for a form to be closed without the …Closing/…Closed methods/events being called. Any code (including the synchronized flag technique I described earlier) that relies on this notification will fail under that scenario.

So, what to do? Well, as I mentioned before, one solution is simply to not depend on those methods or events. For example, in the “invoking/closing race condition” scenario where I started this whole thing, it’s reasonable to just allow an exception to occur if the worker thread loses the race, catching and ignoring it. But sometimes, that’s not an option.

In those situations, it’s useful to know how to take things into your own hands. The problem is that when you pass a form instance to Application.Run, when that form is closed, the Application class interrupts the Run method’s message loop and closes all the windows owned by the thread. Without the message loop running, when the windows associated with other forms are closed, those forms never get to process their close-related methods and events.

To fix this, we simply need to do basically the same work, but change the order of operations so that when the other forms are closed, they still get to do their close-related processing. The basic technique looks like this:

while (Application.OpenForms.Count > 0)
{
    Application.OpenForms[0].Close();
}

Application.ExitThread();

The call to Application.ExitThread is necessary to cause the Application.Run method to exit.

The logic can be put in one of two places. Either in the main form’s own OnFormClosed method, or in the Program class as an event handler. Either way is fine, but you might prefer doing it as an event handler in the Program class if you would prefer for the main form to not have to know its relationship to the UI generally. This would be especially useful in scenarios where more than one form class might be used as the “main” form, but it’s a nice abstraction in any case.

Using the latter approach as our example, if we assume that the above code is already in an event handler named _CloseHandler, then in the Program.Main method instead of something like this (the default from the IDE):

Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
Application.Run(new Form1());

You’d have something like this:

Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);

Form formShow = new Form1();

formShow.FormClosed += _CloseHandler;
formShow.Show();

Application.Run();

And that’s all there is to it. [:)]

Form-closing race condition (part 1)

Okay, I know last post I said next I would be writing about more network stuff. But I’ve been away from the blog, and in returning to it I had to revisit an issue that came up when I was doing the GUI stuff. The issue is a race condition between a thread that may need to invoke code on the GUI thread, and the GUI thread itself.

Some brief background: in Windows, GUI objects are tied to the thread on which they were created. There are a variety of ways to deal with this, but in the .NET Forms API, it’s addressed using the Control.Invoke or Control.BeginInvoke method. These methods take a delegate and ensure that it’s executed on the same thread that created the Control instance used to call the method. Invoke is synchronous, not returning until the delegate has been executed, while BeginInvoke returns right away, with the delegate being executed at some later time.

This allows some thread other than the GUI thread to execute code that is only permitted to be executed on the GUI thread. Most commonly, this would be used to allow some worker thread to update the user interface, but another use is to implement an easy form of synchronization.

If you’re reading this, you probably already knew all that. But if you don’t, you probably will want to make sure you understand the above, or review the related concepts on MSDN.

So, what’s the race condition I’m talking about? Consider a form that starts a worker thread, where the form can be closed by the user before the worker thread is done. A natural thing for such a form to do when being closed is to cause the worker thread to interrupt its work and quit. Now consider such a situation, but in addition to all the above the worker thread notifies the form upon completion via an invoked delegate.

Invoking a delegate requires a valid window handle, but a form only has a valid window handle after it’s been displayed but before it’s been closed. If the thread is terminating because the form is being closed, that’s simple enough to deal with: just have the form set a flag, checked before invoking the delegate, before telling the thread to terminate, or simply check the Control.IsDisposed property before trying to invoke the delegate and make sure the thread isn’t instructed to terminate until the form is safely disposed.

But, what if the thread was already terminating on its own, at the same instant that the form is being closed by the user? Racers, start your engines! [:)]

When that happens, there’s a possibility that the thread that is in the process of closing the form will dispose it, but do so just after the worker thread has checked the flag (special-purpose or IsDisposed property) and just before the worker thread actually tries to call Invoke.

If you’ve dealt with race conditions before, you already know what’s coming. One way to address this is to synchronize access to the flag, so that it’s assured to not be changed by one thread while it’s being used by another. It would be nice if we could do this with the IsDisposed property, but we don’t have control over the code that modifies that, so to use this approach would require a new flag. We can set it in the OnFormClosed method and check it before invoking the delegate of interest:

bool _fClosed = false;
object _objLock = new object();

protected override void OnFormClosed(FormClosedEventArgs e)
{
    lock (_objLock)
    {
        _fClosed = true;
    }

    base.OnFormClosed(e);
}

void SomeMethod()
{
    lock (_objLock)
    {
        if (!_fClosed)
        {
            BeginInvoke((MethodInvoker) delegate { });
        }
    }
}

One thing you might notice is that the above code uses BeginInvoke instead of Invoke. With this technique, using Invoke would be a major no-no. Why? Because Invoke introduces a circular lock dependency, which could lead to a deadlock condition. That is, any code executing on the GUI thread has essentially locked the GUI thread, taking the lock that any code trying to call Invoke will want. At the same time, we’ve introduced an explicit lock (_objLock).

If the worker thread gets the explicit lock while at the same time that the GUI thread starts to execute the OnFormClosed method, then we’ll wind up with two different threads, each holding the lock that the other thread needs in order to continue.

Using BeginInvoke gets around this problem by simply queuing the delegate for execution, avoiding the need for the worker thread to ever need to get the lock that’s implicit in the behavior of the GUI thread.

But wait! There’s still a problem. It turns out that there’s a way for a form to be closed without the OnFormClosing or OnFormClosed methods ever being called. Personally, this seems like a design flaw to me. But that’s the way .NET works and we have to live with it. This means that we could conceivably wind up in the method trying to call Invoke (or BeginInvoke) with a disposed form that hasn’t ever set the special-purpose _fClosed flag.

How can this be? Well, it turns out that by default, a .NET Forms application is set up such that there’s one main form, and when that form closes, it terminates the message pump and closes all of the other windows owned by that thread. Because of the way it does that, the other forms never get a chance to execute their …Closing/…Closed methods and events.

One way to deal with this, as well as the original race condition issue, is to just give up and let .NET do it. Wrap the call to Invoke with a try/catch block and let it fail if the form has been disposed before the code trying to call Invoke gets a chance to.

This is actually a fine way to deal with the race condition, I think. The fact is, the overhead of the exception isn’t going to matter, and it’s actually about the simplest way to implement a fix.

But, there’s another technique that when used in conjunction with the explicit synchronized flag will ensure that the synchronized flag technique is reliable. It’s kind of interesting in its own right, and so in my next post I’ll describe that approach.

Basic network programming in .NET (part 3)

For this post, I’ll be showing network code that is about as simple as it can get. Frankly, when it comes to the i/o itself, it’s my opinion that the code never gets all that complicated. The complexities tend to be with respect to managing all the other stuff that’s hooked to the i/o code. But, this network code is really simple.

So, what does the most basic server look like? Well, first the server needs to make itself available for connections:

Socket sockListen = new Socket(AddressFamily.InterNetwork,
    SocketType.Stream, ProtocolType.Tcp);

sockListen.Bind(epListen);
sockListen.Listen(1);

Socket sockConnect = sockListen.Accept();

All TCP sockets start out the same, allocated as above. AddressFamily.InterNetwork refers to TCP/IP, while SocketType.Stream and ProtocolType.Tcp go hand in hand (all TCP sockets are also stream sockets). These parameters are passed to the System.Net.Socket constructor to instantiate a socket (SocketType.Datagram and ProtocolType.Udp would be used for a UDP socket; other address families have other combinations of socket and protocol types that are valid).

The Socket.Bind method assigns a specific address to the socket. The epListen variable references an instance of IPEndPoint that’s been initialized to the desired server address.

It’s very important that whatever address the server uses, the client knows how to get it. Usually the server will specify the IPAddress component of the address to IPAddress.Any (meaning any IP address on the local computer can be used), and a port appropriate to the server. Depending on the configuration of the network, the client may find the IP address using a name service (e.g. DNS) that maps a textual name to an IP address, may simply depend on the user to specify the IP address, or may have it hard-coded. In this code sample, it’s hard-coded in the client (see below) as the IPAddress.Loopback value, which simply means that the server is on the same computer as the client. Similarly, the port must either be a constant that the client simply always uses, or that the user can configure for both the server and the client (so that they match).

The call to Socket.Listen makes the socket actually available for connection requests. Until this is called, a client trying to connect will simply get an “unreachable” error. Once Listen is called, the network driver will allow a number of clients up to the number specified in the call to have pending connection requests. Clients that attempt to connect once this backlog queue has been filled will receive immediate rejections.

The queue is emptied as the server actually accepts connection requests, which it does by calling Socket.Accept. The Accept method returns a Socket instance that represents the actual connection to a client. Obviously, it can’t do that until some client tries to connect, so this method will block until that happens.

Once Accept does return, we can start receiving data from the client. In a most typical scenarios, the network i/o is a back-and-forth affair, but for this sample each end will do all of its sending and all of its receiving each as a single operation. The server will receive everything and then send it all back, while the client will send everything and then receive the response from the server. For the server, that looks like this:

while ((cb = sockConnect.Receive(rgb)) > 0)
{
    _ReceivedBytes(rgb, cb);
}

foreach (string strMessage in _lstrMessages)
{
    sockConnect.Send(_code.GetBytes(strMessage + ''));
}

We simply pass a byte[] (rgb) to the Socket.Receive method, which will return to the caller once there’s any amount of data available on the socket, having copied the data into the byte[] parameter. If more data is available than the length of the byte[], as much will fit will be copied; by calling Receive repeatedly, the caller can eventually get all of the bytes that are available. Once all of the available data has been received, as long as the sender hasn’t initiated a shutdown of the connection, the Receive method will block until data becomes available again.

There are a couple of things in there not specific to the Socket class, including a helper method called _ReceivedBytes that deals with processing the stream of bytes.

As I mentioned before, TCP is strictly a stream of bytes. There is no inherent delimiting of bytes, and one can’t count on receiving bytes grouped the same way in which they were sent. The bytes will be in the same order, but any given call to receive can return any number of bytes from 1 up to the number of bytes that have already been sent but not yet received.

In all of my examples, we will be using null-terminated strings to deal with this. .NET allows null characters in an actual String instance, so rather than scanning the bytes as they come in for bytes of value 0, we’ll go ahead and convert the bytes first, and then look for the nulls.

Speaking of converting the bytes, that’s the other thing in there. The _code variable references an instance of the Encoding class. It’s been initialized from the Encoding.UTF8 property. Generally speaking, the Encoding class is used for converting bytes to and from some specific character encoding. See the MSDN documentation for more details. The important things here are:

  • I’ve chosen UTF-8 as my character encoding for my application protocol.
  • Strings need to be converted to bytes before sending and from bytes after receiving.
  • Because UTF-8 is a character encoding that uses more than one byte to represent some characters, and because TCP may deliver the parts of a byte sequence that represents one of these characters in multiple receives, we need to maintain state between calls to Receive so that one of these broken characters can be reassembled once all the bytes have been received.

If we use the same Encoding instance for each receive, it will not only address the basic character encoding issues, it will also maintain the state we need in case a multi-byte character gets broken apart during transmission. So, I create a single instance and reuse it for all character encoding/decoding operations. This would be useful for performance anyway, but it’s critical for ensuring correct handling of the byte stream.

The _ReceivedBytes method takes care of both of these needs. It uses our single Encoding instance to convert the bytes to a string and then scans for nulls to break apart the received text into individual String instances:

private void _ReceivedBytes(byte[] rgb, int cb)
{
    int ichNull;

    _strMessage += _code.GetString(rgb, 0, cb);

    while ((ichNull = _strMessage.IndexOf('')) >= 0)
    {
        _lstrMessages.Add(_strMessage.Substring(0, ichNull));

        _strMessage = _strMessage.Substring(ichNull + 1);
    }
}

The method accumulates the text that’s received into a single string (_strMessage), and then extracts individual null-terminated strings, adding them to a list of strings (_lstrMessages). Those are then sent back to the client (the second loop in the previous code block), terminating each one with a null character and converting back to bytes so that the Socket.Send method, which deals only with bytes, can actually use the data.

Finally, once we have finished sending all the strings back to the client, we clean things up:

sockConnect.Shutdown(SocketShutdown.Both);
sockConnect.Close();

The call to Socket.Shutdown indicates to the network driver that we’re done with the connection. Different network protocols use this differently, and some not at all (e.g. UDP). At a minimum, the Socket class will disable sending, receiving, or both on the instance based on the SocketShutdown parameter (an exception will be thrown if a disabled operation is attempted). At the network driver level, for a TCP connection the call to Shutdown results in negotiation between the endpoints to indicate the end of the stream in each direction. Each endpoint that shuts down with SocketShutdown.Send or SocketShutdown.Both will be seen by the other endpoint as the end of the stream. That is, once all the bytes that have been sent by the endpoint calling Shutdown are received, a call to Receive method by the other endpoint will return 0, indicating the end of the stream.

That’s the server. What about the other end? Almost the same! The client code is nearly identical:

Socket sockConnect = new Socket(AddressFamily.InterNetwork,
    SocketType.Stream, ProtocolType.Tcp);

sockConnect.Connect(epConnect);

foreach (string strMessage in rgstrMessages)
{
    sockConnect.Send(_code.GetBytes(strMessage + ''));
}

sockConnect.Shutdown(SocketShutdown.Send);

while ((cb = sockConnect.Receive(rgb)) > 0)
{
    _ReceivedBytes(rgb, cb);
}

sockConnect.Close();

The Socket instance is created the same way, but instead of binding, listening, and then accepting, the code just calls Socket.Connect. The _ReceivedBytes method is even identical in this case. It decodes the received bytes in exactly the same way, adding each delimited string to a list of strings for later processing. (Obviously a more sophisticated network application would have significantly different handling for the received data between the server and client).

You can also see that the loops are swapped. The sending loop is first for the client, the receiving loop second. The client indicates that it’s done sending by calling Shutdown with just the SocketShutdown.Send value, because it still needs to receive data from the server. Which it does, following the call to Shutdown. Of course, as with the server, the Socket is closed once we’re done.

In this sample, all of the above code is wrapped up in a couple of classes, one for the server and one for the client, called ServerBasicEcho and ClientBasicEcho, respectively. For future samples, I’ll be adding a System.Windows.Forms GUI to allow for testing of the classes. But this sample is so simple, a console application suffices nicely. Here’s the Main method:

static void Main(string[] args)
{
    try
    {
        ServerBasicEcho server = new ServerBasicEcho();
        ClientBasicEcho client = new ClientBasicEcho();
        AutoResetEvent areServerStarted = new AutoResetEvent(false);
        IPEndPoint ep = new IPEndPoint(IPAddress.Any, 5005);

        Thread threadServer = new Thread(delegate() { server.Start(ep, areServerStarted); });

        threadServer.IsBackground = true;
        threadServer.Start();

        areServerStarted.WaitOne();

        client.Start(new IPEndPoint(IPAddress.Loopback, ep.Port), _krgstrTestData);

        threadServer.Join();

        Console.WriteLine("client-server test succeeded!");
    }
    catch (Exception exc)
    {
        Console.WriteLine("client-server test failed. error: \"" + exc.Message + "\"");
    }

    Console.ReadLine();
}

Unlike the snippets above, this really is the entire Main method. The basic steps implemented here are:

  • Create the server and client instances
  • Initialize the desired server IPEndPoint address
  • Start the server on a separate thread
  • Start the client on the current thread
  • Run until the client and server both exit

There’s some additional logic in there to ensure that the client isn’t started until we’re sure the server is, as well as a little bit of output to reassure the user that everything went according to plan.

And that’s it! As you can see, in spite of the length of this post, there’s really only about a half-dozen lines of code in each of the server and client that is directly related to doing the network i/o. The rest of the code is all just logic to deal with the actual bytes that were received, and to manage the server and client implementations themselves.

As we’ll see in future posts, things get iteratively more involved as we want to add features. An interactive network connection is more complicated, as is dealing with more than one client. Oddly enough though, as we get nearer the conceptually most complicated implementation, the code actually starts to get simpler again, at least with respect to how many lines of code there actually are. It will be a good demonstration of how even though multi-threaded code can be harder to reason about, in some ways it can actually simplify the design of the program.

But that’s for another day. For now, you can find the complete console application for this particular sample by clicking here.

In the next sample, I’ll continue the theme of having a single client and a single server, but add some interactivity, and show some techniques for connecting the network objects to a GUI.

DemoDotNetNetworking.cs

using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;
using System.Net;
using System.Net.Sockets;

namespace ConsoleDemo
{
    class Program
    {
        static string[] _krgstrTestData =
        {
            "a test line",
            "another test line",
            "a very long line of text that will be sent to the server, which will then echo it so that the client can receive it again",
            "the next line is blank",
            "",
            "the previous line was blank",
            "one last line of text"
        };

        static void Main(string[] args)
        {
            try
            {
                ServerBasicEcho server = new ServerBasicEcho();
                ClientBasicEcho client = new ClientBasicEcho();
                AutoResetEvent areServerStarted = new AutoResetEvent(false);
                IPEndPoint ep = new IPEndPoint(IPAddress.Any, 5005);

                Thread threadServer = new Thread(delegate() { server.Start(ep, areServerStarted); });

                // Setting IsBackground to true ensures that even if something
                // goes wrong, the server thread will exit when the main thread
                // does.  In a more sophisticated program, there'd be a mechanism
                // for explicitly shutting the server down, but this suffices
                // for this sample.
                threadServer.IsBackground = true;
                threadServer.Start();

                // Wait here for the server to start.  Otherwise, it's
                // possible the client could start before the server and
                // thus not be able to connect to it.
                areServerStarted.WaitOne();

                client.Start(new IPEndPoint(IPAddress.Loopback, ep.Port), _krgstrTestData);

                // Wait here until the server actually exits.  That way,
                // we can easily tell if the server isn't behaving as
                // expected.
                threadServer.Join();

                Console.WriteLine("client-server test succeeded!");
            }
            catch (Exception exc)
            {
                Console.WriteLine("client-server test failed. error: \"" + exc.Message + "\"");
            }

            Console.ReadLine();
        }
    }

    /// <summary>
    /// A very basic echo server.  It runs in a single thread, and serves
    /// just a single connection before terminating.  It receives null-
    /// terminated strings, and when the last string has been received from
    /// the client, it sends all of the strings back to the client.
    /// </summary>
    public class ServerBasicEcho
    {
        /// <summary>
        /// The text encoding used to convert between bytes and strings
        /// for the network i/o
        /// </summary>
        private Encoding _code = Encoding.UTF8;

        /// <summary>
        /// The string representing the message received so far
        /// </summary>
        private string _strMessage = "";

        /// <summary>
        /// The list of strings representing the complete, null-terminated
        /// messages we've received so far
        /// </summary>
        private List<string> _lstrMessages = new List<string>();

        /// <summary>
        /// Starts the server.  Returns when the server has completed all
        /// processing for a single client.
        /// </summary>
        /// <param name="epListen">The endpoint address on which to host the server</param>
        /// <param name="areStarted">The event for the caller to wait on so that it knows when the server's ready to accept a connection</param>
        public void Start(EndPoint epListen, AutoResetEvent areStarted)
        {
            Socket sockListen = new Socket(AddressFamily.InterNetwork,
                SocketType.Stream, ProtocolType.Tcp);

            // Sets up the listening socket, to which a client will make a connection request
            sockListen.Bind(epListen);
            sockListen.Listen(1);

            areStarted.Set();

            // Waits for a client to make a connection request
            Socket sockConnect = sockListen.Accept();

            // Only serving a single client, so close the listening socket
            // once we have a client to serve
            sockListen.Close();

            // We need a place to put bytes we've received and the count of
            // bytes actually received
            byte[] rgb = new byte[8192];
            int cb;

            // Socket.Receive() will return 0 when the client has finished
            // sending all of its data.  Just keep processing bytes until then.
            while ((cb = sockConnect.Receive(rgb)) > 0)
            {
                _ReceivedBytes(rgb, cb);
            }

            // Now, send each of the strings back to the client.
            foreach (string strMessage in _lstrMessages)
            {
                sockConnect.Send(_code.GetBytes(strMessage + ''));
            }

            // Complete the "graceful closure" of the socket
            sockConnect.Shutdown(SocketShutdown.Both);
            sockConnect.Close();
        }

        /// <summary>
        /// Processes any bytes that have been received
        /// </summary>
        /// <param name="rgb">The buffer containing the bytes</param>
        /// <param name="cb">The actual count of bytes received</param>
        private void _ReceivedBytes(byte[] rgb, int cb)
        {
            int ichNull;

            // Convert the current bytes to a string and append the
            // result to string accumulator
            _strMessage += _code.GetString(rgb, 0, cb);

            // Now, extract each null-terminated string from the
            // string accumulator
            while ((ichNull = _strMessage.IndexOf('')) >= 0)
            {
                _lstrMessages.Add(_strMessage.Substring(0, ichNull));

                _strMessage = _strMessage.Substring(ichNull + 1);
            }
        }
    }

    /// <summary>
    /// A very basic client.  It connects to the given server, sends
    /// each string in the array of strings passed to it, and then
    /// receives the echoed list of strings, comparing the received
    /// strings with the original list to make sure they are the same.
    /// </summary>
    public class ClientBasicEcho
    {
        /// <summary>
        /// The text encoding used to convert between bytes and strings
        /// for the network i/o
        /// </summary>
        private Encoding _code = Encoding.UTF8;

        /// <summary>
        /// The string representing the message received so far
        /// </summary>
        private string _strMessage = "";

        /// <summary>
        /// The list of strings representing the complete, null-terminated
        /// messages we've received so far
        /// </summary>
        private List<string> _lstrMessages = new List<string>();

        /// <summary>
        /// Starts the client.  Returns when it has successfully sent
        /// all of the strings passed to it, and the server has replied
        /// with an identical list.
        /// </summary>
        /// <param name="epConnect">The endpoint address of the server to connect to</param>
        /// <param name="rgstrMessages">The list of strings to send to the server</param>
        public void Start(EndPoint epConnect, string[] rgstrMessages)
        {
            Socket sockConnect = new Socket(AddressFamily.InterNetwork,
                SocketType.Stream, ProtocolType.Tcp);

            sockConnect.Connect(epConnect);

            foreach (string strMessage in rgstrMessages)
            {
                sockConnect.Send(_code.GetBytes(strMessage + ''));
            }

            // Calling Socket.Shutdown() with SocketShutdown.Send causes
            // the network driver to signal to the remote endpoint that
            // we are done _sending_ data.  We are still free to receive
            // any data the remote endpoint might send.
            sockConnect.Shutdown(SocketShutdown.Send);

            byte[] rgb = new byte[8192];
            int cb;

            // Socket.Receive() will return 0 when the server has finished
            // sending all of its data.  Just keep processing bytes until then.
            while ((cb = sockConnect.Receive(rgb)) > 0)
            {
                _ReceivedBytes(rgb, cb);
            }

            // We're done sending, so once the server is also done sending
            // we're done with the socket and can close it.
            sockConnect.Close();

            // Check the echoed strings against our originals
            if (rgstrMessages.Length != _lstrMessages.Count)
            {
                throw new Exception("different number of strings");
            }

            for (int istr = 0; istr < rgstrMessages.Length; istr++)
            {
                if (rgstrMessages[istr] != _lstrMessages[istr])
                {
                    throw new Exception(
                        string.Format("string pair not equal (\"{0}\" and \"{1}\")",
                        rgstrMessages[istr], _lstrMessages[istr]));
                }
            }
        }

        /// <summary>
        /// Processes any bytes that have been received
        /// </summary>
        /// <param name="rgb">The buffer containing the bytes</param>
        /// <param name="cb">The actual count of bytes received</param>
        private void _ReceivedBytes(byte[] rgb, int cb)
        {
            int ichNull;

            // Convert the current bytes to a string and append the
            // result to string accumulator
            _strMessage += _code.GetString(rgb, 0, cb);

            // Now, extract each null-terminated string from the
            // string accumulator
            while ((ichNull = _strMessage.IndexOf('')) >= 0)
            {
                _lstrMessages.Add(_strMessage.Substring(0, ichNull));

                _strMessage = _strMessage.Substring(ichNull + 1);
            }
        }
    }
}

Basic network programming in .NET (part 2)

Since you’re reading this, I’ll assume the previous post didn’t scare you off. [:)]  I know even that abridged version of details may seem daunting but really, with some practice writing network code, it’s all stuff that will come naturally. With that in mind, let’s get to know what the basic structure of a network application looks like, and then practice!

In every network connection, there is an endpoint that is waiting (“listening”) for someone to contact it, and an endpoint that does the contacting (“initiates the connection”). Once contact has been established, the roles of “server” and “client” can be less well-defined, but generally speaking the endpoint that’s waiting is considered the “server” and the endpoint that doing the contacting is considered the “client”.

With that in mind, here’s the usual sequence of events:

  • Server creates a socket for the purpose of waiting for contacts

    (time passes)
     

  • Client creates a socket for the purpose of contacting a server
  • Client contacts the server
  • Server responds
  • Client and server exchange information by sending and receiving bytes

There are some subtle differences between UDP and TCP in the above. Using UDP, the “contact” is simply the transmission of a datagram to the server. Data exchange is done in exactly the same way as the initial contact, and there may be no well-defined termination of communications. In fact, each endpoint can use a single socket to communicate with an arbitrarily large number of remote endpoints.

With TCP, the initial contact is more explicit, in the form of a “connect” operation. The server’s socket is a special type of socket that is never actually used for exchanging data. Instead, it acts as a kind of “connection factory”, creating a new socket for each connection. For each client that connects to the server, the listening socket generates a socket that’s actually used for exchanging data.

At each end, the socket associated with the actual connection can be used only for communicating with the remote endpoint of that connection. And TCP provides for a well-defined termination of communications, known as “graceful closure” or “graceful shutdown”.

Of course, as mentioned previously, UDP is connectionless, it isn’t reliable, and it always delivers data in the same unit (a datagram) as it was sent. But the basic techniques for moving data and managing the i/o generally are otherwise very similar to those used in TCP. For the sake of simplicity, I will stick mainly to TCP for code examples. Once I’ve gotten through the main samples, I’ll wrap up with some that illustrate techniques specific to UDP: broadcast, and multicast.

For the next post, I’ll describe the most basic code that can implement all of the above.

Featuring WPMU Bloglist Widget by YD WordPress Developer