.NET

Hydrating Objects With Expression Trees – Part II

LINQ With C# (Portuguese)

In my previous post I showed how to hydrate objects by creating instances and setting properties in those instances.

But, if the intent is to hydrate the objects from data, why not having an expression that does just that? That’s what the member initialization expression is for.

To create such an expression we need the constructor expression and the property binding expressions:

var properties = objectType.GetProperties();
var bindings = new MemberBinding[properties.Length];
var valuesArrayExpression = Expression.Parameter(typeof(object[]), "v");

for (int p = 0; p < properties.Length; p++)
{
    var property = properties[p];

    bindings[p] = Expression.Bind(
        property,
        Expression.Convert(
            Expression.ArrayAccess(
                valuesArrayExpression,
                Expression.Constant(p, typeof(int))
            ),
            property.PropertyType
        )
    );
}

var memberInitExpression = Expression.MemberInit(
    Expression.New(objectType),
    bindings
);

var objectHidrationExpression = Expression.Lambda<Func<object[], object>>(memberInitExpression, valuesArrayExpression);

var compiledObjectHidrationExpression = objectHidrationExpression.Compile();

This might seem more complex than the previous solution, but using it is a lot more simple:

for (int o = 0; o < objects.Length; o++)
{
    newObjects[o] = compiledObjectHidrationExpression(objects[o]);
}

Hydrating Objects With Expression Trees – Part I

LINQ With C# (Portuguese)

After my post about dumping objects using expression trees, I’ve been asked if the same could be done for hydrating objects.

Sure it can, but it might not be that easy.

What we are looking for is a way to set properties on objects of an unknown type. For that, we need to generate methods to set each property of the objects.

Such methods would look like this expression:

Expression<Action<object, object>> expression = (o, v) => ((SomeType)o).Property1 = (PropertyType)v;

Unfortunately, we cannot use the .NET Reflector trick because, if you try to compile this, you’ll get this error:

error CS0832: An expression tree may not contain an assignment operator

Fortunately, that corresponds to a valid .NET expression tree. We just have to build it by hand.

So, for a given type, the set of property setters would be built this way:

var compiledExpressions = (from property in objectType.GetProperties()
                           let objectParameterExpression = Expression.Parameter(typeof(object), "o")
                           let convertedObjectParameteExpressionr = Expression.ConvertChecked(objectParameter, objectType)
                           let valueParameter = Expression.Parameter(propertyType, "v")
                           let convertedValueParameter = Expression.ConvertChecked(valueParameter, property.PropertyType)
                           let propertyExpression = Expression.Property(convertedObjectParameter, property)
                           select
                                Expression.Lambda<Action<object, object>>(
                                    Expression.Assign(
                                        propertyExpression,
                                        convertedValueParameter
                                    ),
                                    objectParameter,
                                    valueParameter
                                ).Compile()).ToArray();

And hydrating objects would be like this:

for (int o = 0; o < objects.Length; o++)
{
    var objectProperties = objects[o];

    var newObject = newObjects[o] = Activator.CreateInstance(objectType);

    for (int p = 0; p < compiledExpressions.Length; p++)
    {
        compiledExpressions[p](newObject, objectProperties[p]);
    }
}

Mastering Expression Trees With .NET Reflector

Following my last post, I received lots of enquiries about how got to master the creation of expression trees.

The answer is: .NET Reflector

On that post I needed to to generate an expression tree for this expression:

Expression<Func<object, object>> expression = o => ((object)((SomeType)o).Property1);

I just compiled that code in Visual Studio 2010, loaded the assembly in .NET Reflector, and disassembled it to C# without optimizations (View –> Options –> Disassembler –> Optimization: None).

The disassembled code looked like this:

Expression<Func<object, object>> expression;
ParameterExpression CS$0$0000;
ParameterExpression[] CS$0$0001;
expression = Expression.Lambda<Func<object, object>>(Expression.Convert(Expression.Property(Expression.Convert(CS$0$0000 = Expression.Parameter(typeof(object), "o"), typeof(SomeType)), (MethodInfo) methodof(SomeType.get_Property1)), typeof(object)), new ParameterExpression[] { CS$0$0000 });

After giving valid C# names to the variables and tidy up the code a bit, I came up with this:

ParameterExpression parameter = Expression.Parameter(typeof(object), "o");
Expression<Func<object, object>> expression =
    Expression.Lambda<Func<object, object>>(
        Expression.Convert(
            Expression.Property(
                Expression.Convert(
                    parameter,
                    typeof(SomeType)
                ),
                "Property1"
            ),
            typeof(object)
        ),
        parameter
    );

Easy! Isn’t it?

Dumping Objects Using Expression Trees

LINQ With C# (Portuguese)

No. I’m not proposing to get rid of objects.

A colleague of mine was asked if I knew a way to dump a list of objects of unknown type into a DataTable with better performance than the way he was using.

The objects being dumped usually have over a dozen of properties, but, for the sake of this post, let’s assume they look like this:

class SomeClass
{
    public int Property1 { get; set; }
    public long Property2 { get; set; }
    public string Property3 { get; set; }
    public object Property4 { get; set; }
    public DateTimeOffset Property5 { get; set; }
}

The code he was using was something like this:

var properties = objectType.GetProperties();

foreach (object obj in objects)
{
    foreach (var property in properties)
    {
        property.GetValue(obj, null);
    }
}

For a list of one million objects, this is takes a little over 6000 milliseconds on my machine.

I immediately thought: Expression Trees!

If the type of the objects was know at compile time, it would be something like this:

Expression<Func<SomeClass, int>> expression = o => o.Property1;
var compiled = expression.Compile();
var propertyValue = compiled.Invoke(obj);

But, at compile time, the type of the object and, consequently, the type of its properties, is unknown. So, we’ll need, for each property, an expression tree like this:

Expression<Func<object, object>> expression = o => ((SomeClass)o).Property1;

The previous expression gets the value of a property of the conversion of the parameter of type object to the type of the object. The result must also be converted to type object because the type of the result must match the type of the return value of the expression.

For the same type of objects, the collection of property accessors would be built this way:

var compiledExpressions = (from property in properties
                           let objectParameter = Expression.Parameter(typeof(object), "o")
                           select
                             Expression.Lambda<Func<object, object>>(
                                 Expression.Convert(
                                     Expression.Property(
                                         Expression.Convert(
                                             objectParameter,
                                             objectType
                                         ),
                                         property
                                     ),
                                     typeof(object)
                                 ),
                                 objectParameter
                             ).Compile()).ToArray();

Looks bit overcomplicated, but reading all properties of all objects for the same object set with this code:

foreach (object obj in objects)
{
    foreach (var compiledExpression in compiledExpressions)
    {
        compiledExpression (obj);
    }
}

takes a little over 150 milliseconds on my machine.

That’s right. 2.5% of the previous time.

C#: More On Array Variance

In a previous post, I went through how arrays have are covariant in relation to the type of its elements, but not safely covariant.

In the following example, the second assignment is invalid at run time because, although the type of the objectArray variable is array of object, the real type of the array is array of string and an object cannot be assigned to a string.

object[] objectArray = new string[] { "string 1", "string 2" };
objectArray[0] = "string 3";
objectArray[1] = new object();

On the other hand, because arrays are not contravariant in relation to the type of its elements, in the following code, the second line will fail at run time because string[] ≤ object[].

object[] objectArray = new object[] { "string 1", "string 2" };
string[] stringArray = (string[])objectArray;

The fact that all elements in the object array are strings doesn’t make it convertible to an array of string. To convert this object array of strings into a string array, you’ll need to create a new string array and copy each converted element.

The conversion can be as easily as this:

string[] stringArray = objectArray.Cast<string>().ToArray();

The above code is just shorthand to traversing the whole array while converting its elements to string and creating an string array with all elements.

Arrays are a good storage structure because they are the only structure provided by the runtime to store groups of items of the same type. However, because of limitations like the above, its use in APIs should be very carefully considered.

If all you need is to traverse all elements of some collection, you should use an IEnumerable<T> (IEnumerable<out T> in .NET 4.0). This way, the cost of using Enumerable.Cast<T>() is minimal.

TechDays 2010: What’s New On C# 4.0

I would like to thank those that attended my session at TechDays 2010 and I hope that I was able to pass the message of what’s new on C#.

For those that didn’t attend (or did and want to review it), the presentation can be downloaded from here.

Code samples can be downlaoded from here.

Here’s a list of resources mentioned on the session:

C# 4.0: COM Interop Improvements

Dynamic resolution as well as named and optional arguments greatly improve the experience of interoperating with COM APIs such as Office Automation Primary Interop Assemblies (PIAs). But, in order to alleviate even more COM Interop development, a few COM-specific features were also added to C# 4.0.

Ommiting ref

Because of a different programming model, many COM APIs contain a lot of reference parameters. These parameters are typically not meant to mutate a passed-in argument, but are simply another way of passing value parameters.

Specifically for COM methods, the compiler allows to declare the method call passing the arguments by value and will automatically generate the necessary temporary variables to hold the values in order to pass them by reference and will discard their values after the call returns. From the point of view of the programmer, the arguments are being passed by value.

This method call:

object fileName = "Test.docx";
object missing = Missing.Value;

document.SaveAs(ref fileName,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing);

can now be written like this:

document.SaveAs("Test.docx",
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value);

And because all parameters that are receiving the Missing.Value value have that value as its default value, the declaration of the method call can even be reduced to this:

document.SaveAs("Test.docx");

Dynamic Import

Many COM methods accept and return variant types, which are represented in the PIAs as object. In the vast majority of cases, a programmer calling these methods already knows the static type of a returned object form the context of the call, but has to explicitly perform a cast on the returned values to make use of that knowledge. These casts are so common that they constitute a major nuisance.

To make the developer’s life easier, it is now possible to import the COM APIs in such a way that variants are instead represented using the type dynamic which means that COM signatures have now occurrences of dynamic instead of object.

This means that members of a returned object can now be easily accessed or assigned into a strongly typed variable without having to cast.

Instead of this code:

((Excel.Range)(excel.Cells[1, 1])).Value2 = "Hello World!";

this code can now be used:

excel.Cells[1, 1] = "Hello World!";

And instead of this:

Excel.Range range = (Excel.Range)(excel.Cells[1, 1]);

this can be used:

Excel.Range range = excel.Cells[1, 1];

Indexed And Default Properties

A few COM interface features are still not available in C#. On the top of the list are indexed properties and default properties. As mentioned above, these will be possible if the COM interface is accessed dynamically, but will not be recognized by statically typed C# code.

No PIAs – Type Equivalence And Type Embedding

For assemblies indentified with PrimaryInteropAssemblyAttribute, the compiler will create equivalent types (interfaces, structs, enumerations and delegates) and embed them in the generated assembly.

To reduce the final size of the generated assembly, only the used types and their used members will be generated and embedded.

Although this makes development and deployment of applications using the COM components easier because there’s no need to deploy the PIAs, COM component developers are still required to build the PIAs.

C# 4.0: Dynamic Programming

The major feature of C# 4.0 is dynamic programming. Not just dynamic typing, but dynamic in broader sense, which means talking to anything that is not statically typed to be a .NET object.

Dynamic Language Runtime

The Dynamic Language Runtime (DLR) is piece of technology that unifies dynamic programming on the .NET platform, the same way the Common Language Runtime (CLR) has been a common platform for statically typed languages.

The CLR always had dynamic capabilities. You could always use reflection, but its main goal was never to be a dynamic programming environment and there were some features missing. The DLR is built on top of the CLR and adds those missing features to the .NET platform.

Dynamic Language Runtime

The Dynamic Language Runtime is the core infrastructure that consists of:

  • Expression Trees

    The same expression trees used in LINQ, now improved to support statements.

  • Dynamic Dispatch

    Dispatches invocations to the appropriate binder.

  • Call Site Caching

    For improved efficiency.

Dynamic languages and languages with dynamic capabilities are built on top of the DLR. IronPython and IronRuby were already built on top of the DLR, and now, the support for using the DLR is being added to C# and Visual Basic. Other languages built on top of the CLR are expected to also use the DLR in the future.

Underneath the DLR there are binders that talk to a variety of different technologies:

  • .NET Binder

    Allows to talk to .NET objects.

  • JavaScript Binder

    Allows to talk to JavaScript in SilverLight.

  • IronPython Binder

    Allows to talk to IronPython.

  • IronRuby Binder

    Allows to talk to IronRuby.

  • COM Binder

    Allows to talk to COM.

Whit all these binders it is possible to have a single programming experience to talk to all these environments that are not statically typed .NET objects.

The dynamic Static Type

Let’s take this traditional statically typed code:

Calculator calculator = GetCalculator();
int sum = calculator.Sum(10, 20);

Because the variable that receives the return value of the GetCalulator method is statically typed to be of type Calculator and, because the Calculator type has an Add method that receives two integers and returns an integer, it is possible to call that Sum method and assign its return value to a variable statically typed as integer.

Now lets suppose the calculator was not a statically typed .NET class, but, instead, a COM object or some .NET code we don’t know he type of. All of the sudden it gets very painful to call the Add method:

object calculator = GetCalculator();
Type calculatorType = calculator.GetType();
object res = calculatorType.InvokeMember("Add", BindingFlags.InvokeMethod, null, calculator, new object[] { 10, 20 });
int sum = Convert.ToInt32(res);

And what if the calculator was a JavaScript object?

ScriptObject calculator = GetCalculator();
object res = calculator.Invoke("Add", 10, 20);
int sum = Convert.ToInt32(res);

For each dynamic domain we have a different programming experience and that makes it very hard to unify the code.

With C# 4.0 it becomes possible to write code this way:

dynamic calculator = GetCalculator();
int sum = calculator.Add(10, 20);

You simply declare a variable who’s static type is dynamic. dynamic is a pseudo-keyword (like var) that indicates to the compiler that operations on the calculator object will be done dynamically.

The way you should look at dynamic is that it’s just like object (System.Object) with dynamic semantics associated. Anything can be assigned to a dynamic.

dynamic x = 1;
dynamic y = "Hello";
dynamic z = new List<int> { 1, 2, 3 };

At run-time, all object will have a type. In the above example x is of type System.Int32.

When one or more operands in an operation are typed dynamic, member selection is deferred to run-time instead of compile-time. Then the run-time type is substituted in all variables and normal overload resolution is done, just like it would happen at compile-time.

The result of any dynamic operation is always dynamic and, when a dynamic object is assigned to something else, a dynamic conversion will occur.

Code Resolution Method
double x = 1.75;
double y = Math.Abs(x);

compile-time

double Abs(double x)

dynamic x = 1.75;
dynamic y = Math.Abs(x);

run-time

double Abs(double x)

dynamic x = 2;
dynamic y = Math.Abs(x);     

run-time

int Abs(int x)

The above code will always be strongly typed. The difference is that, in the first case the method resolution is done at compile-time, and the others it’s done ate run-time.

IDynamicMetaObjectObject

The DLR is pre-wired to know .NET objects, COM objects and so forth but any dynamic language can implement their own objects or you can implement your own objects in C# through the implementation of the IDynamicMetaObjectProvider interface. When an object implements IDynamicMetaObjectProvider, it can participate in the resolution of how method calls and property access is done.

The .NET Framework already provides two implementations of IDynamicMetaObjectProvider:

  • DynamicObject : IDynamicMetaObjectProvider

    The DynamicObject class enables you to define which operations can be performed on dynamic objects and how to perform those operations. For example, you can define what happens when you try to get or set an object property, call a method, or perform standard mathematical operations such as addition and multiplication.

  • ExpandoObject : IDynamicMetaObjectProvider

    The ExpandoObject class enables you to add and delete members of its instances at run time and also to set and get values of these members. This class supports dynamic binding, which enables you to use standard syntax like sampleObject.sampleMember, instead of more complex syntax like sampleObject.GetAttribute("sampleMember").

C# 4.0: Alternative To Optional Arguments

Like I mentioned in my last post, exposing publicly methods with optional arguments is a bad practice (that’s why C# has resisted to having it, until now).

You might argument that your method or constructor has to many variants and having ten or more overloads is a maintenance nightmare, and you’re right. But the solution has been there for ages: have an arguments class.

The arguments class pattern is used in the .NET Framework is used by several classes, like XmlReader and XmlWriter that use such pattern in their Create methods, since version 2.0:

XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Auto;
XmlReader.Create("file.xml", settings);

With this pattern, you don’t have to maintain a long list of overloads and any default values for properties of XmlReaderSettings (or XmlWriterSettings for XmlWriter.Create) can be changed or new properties added in future implementations that won’t break existing compiled code.

You might now argue that it’s too much code to write, but, with object initializers added in C# 3.0, the same code can be written like this:

XmlReader.Create("file.xml", new XmlReaderSettings { ValidationType = ValidationType.Auto });

Looks almost like named and optional arguments, doesn’t it? And, who knows, in a future version of C#, it might even look like this:

XmlReader.Create("file.xml", new { ValidationType = ValidationType.Auto });

C# 4.0: Named And Optional Arguments

As part of the co-evolution effort of C# and Visual Basic, C# 4.0 introduces Named and Optional Arguments.

First of all, let’s clarify what are arguments and parameters:

  • Method definition parameters are the input variables of the method.
  • Method call arguments are the values provided to the method parameters.

In fact, the C# Language Specification states the following on §7.5:

The argument list (§7.5.1) of a function member invocation provides actual values or variable references for the parameters of the function member.

Given the above definitions, we can state that:

  • Parameters have always been named and still are.
  • Parameters have never been optional and still aren’t.

Named Arguments

Until now, the way the C# compiler matched method call definition arguments with method parameters was by position. The first argument provides the value for the first parameter, the second argument provides the value for the second parameter, and so on and so on, regardless of the name of the parameters. If a parameter was missing a corresponding argument to provide its value, the compiler would emit a compilation error.

For this call:

Greeting("Mr.", "Morgado", 42);

this method:

public void Greeting(string title, string name, int age)

will receive as parameters:

  • title: “Mr.”
  • name: “Morgado”
  • age: 42

What this new feature allows is to use the names of the parameters to identify the corresponding arguments in the form: name:value

Not all arguments in the argument list must be named. However, all named arguments must be at the end of the argument list. The matching between arguments (and the evaluation of its value) and parameters will be done first by name for the named arguments and than by position for the unnamed arguments.

This means that, for this method definition:

public void Method(int first, int second, int third)

this call declaration:

int i = 0;
Method(i, third: i++, second: ++i);

will have this code generated by the compiler:

int i = 0;
int CS$0$0000 = i++;
int CS$0$0001 = ++i;
Method(i, CS$0$0001, CS$0$0000);

which will give the method the following parameter values:

  • first: 2
  • second: 2
  • third: 0

Notice the variable names. Although invalid being invalid C# identifiers, they are valid .NET identifiers and thus avoiding collision between user written and compiler generated code.

Besides allowing to re-order of the argument list, this feature is very useful for auto-documenting the code, for example, when the argument list is very long or not clear, from the call site, what the arguments are.

Optional Arguments

Parameters can now have default values:

public void Method(int first, int second = 2, int third = 3)

Parameters with default values must be the last in the parameter list and its value is used as the value of the parameter if the corresponding argument is missing from the method call declaration.

For this call declaration:

int i = 0;
Method(i, third: ++i);

will have this code generated by the compiler:

int i = 0;
int CS$0$0000 = ++i;
Method(i, 2, CS$0$0000);

which will give the method the following parameter values:

  • first: 1
  • second: 2
  • third: 1

Because, when method parameters have default values, arguments can be omitted from the call declaration, this might seem like method overloading or a good replacement for it, but it isn’t.

Although methods like this:

public StreamReader OpenTextFile(
    string path,
    Encoding encoding = null,
    bool detectEncoding = true,
    int bufferSize = 1024)

allow to have its calls written like this:

OpenTextFile("foo.txt", Encoding.UTF8);
OpenTextFile("foo.txt", Encoding.UTF8, bufferSize: 4096);
OpenTextFile(
    bufferSize: 4096,
    path: "foo.txt",
    detectEncoding: false);

The complier handles default values like constant fields taking the value and useing it instead of a reference to the value. So, like with constant fields, methods with parameters with default values are exposed publicly (and remember that internal members might be publicly accessible – InternalsVisibleToAttribute). If such methods are publicly accessible and used by another assembly, those values will be hard coded in the calling code and, if the called assembly has its default values changed, they won’t be assumed by already compiled code.

At the first glance, I though that using optional arguments for “bad” written code was great, but the ability to write code like that was just pure evil. But than I realized that, since I use private constant fields, it’s OK to use default parameter values on privately accessed methods.