C#4

Hydrating Objects With Expression Trees – Part I

LINQ With C# (Portuguese)

After my post about dumping objects using expression trees, I’ve been asked if the same could be done for hydrating objects.

Sure it can, but it might not be that easy.

What we are looking for is a way to set properties on objects of an unknown type. For that, we need to generate methods to set each property of the objects.

Such methods would look like this expression:

Expression<Action<object, object>> expression = (o, v) => ((SomeType)o).Property1 = (PropertyType)v;

Unfortunately, we cannot use the .NET Reflector trick because, if you try to compile this, you’ll get this error:

error CS0832: An expression tree may not contain an assignment operator

Fortunately, that corresponds to a valid .NET expression tree. We just have to build it by hand.

So, for a given type, the set of property setters would be built this way:

var compiledExpressions = (from property in objectType.GetProperties()
                           let objectParameterExpression = Expression.Parameter(typeof(object), "o")
                           let convertedObjectParameteExpressionr = Expression.ConvertChecked(objectParameter, objectType)
                           let valueParameter = Expression.Parameter(propertyType, "v")
                           let convertedValueParameter = Expression.ConvertChecked(valueParameter, property.PropertyType)
                           let propertyExpression = Expression.Property(convertedObjectParameter, property)
                           select
                                Expression.Lambda<Action<object, object>>(
                                    Expression.Assign(
                                        propertyExpression,
                                        convertedValueParameter
                                    ),
                                    objectParameter,
                                    valueParameter
                                ).Compile()).ToArray();

And hydrating objects would be like this:

for (int o = 0; o < objects.Length; o++)
{
    var objectProperties = objects[o];

    var newObject = newObjects[o] = Activator.CreateInstance(objectType);

    for (int p = 0; p < compiledExpressions.Length; p++)
    {
        compiledExpressions[p](newObject, objectProperties[p]);
    }
}

Mastering Expression Trees With .NET Reflector

Following my last post, I received lots of enquiries about how got to master the creation of expression trees.

The answer is: .NET Reflector

On that post I needed to to generate an expression tree for this expression:

Expression<Func<object, object>> expression = o => ((object)((SomeType)o).Property1);

I just compiled that code in Visual Studio 2010, loaded the assembly in .NET Reflector, and disassembled it to C# without optimizations (View –> Options –> Disassembler –> Optimization: None).

The disassembled code looked like this:

Expression<Func<object, object>> expression;
ParameterExpression CS$0$0000;
ParameterExpression[] CS$0$0001;
expression = Expression.Lambda<Func<object, object>>(Expression.Convert(Expression.Property(Expression.Convert(CS$0$0000 = Expression.Parameter(typeof(object), "o"), typeof(SomeType)), (MethodInfo) methodof(SomeType.get_Property1)), typeof(object)), new ParameterExpression[] { CS$0$0000 });

After giving valid C# names to the variables and tidy up the code a bit, I came up with this:

ParameterExpression parameter = Expression.Parameter(typeof(object), "o");
Expression<Func<object, object>> expression =
    Expression.Lambda<Func<object, object>>(
        Expression.Convert(
            Expression.Property(
                Expression.Convert(
                    parameter,
                    typeof(SomeType)
                ),
                "Property1"
            ),
            typeof(object)
        ),
        parameter
    );

Easy! Isn’t it?

Dumping Objects Using Expression Trees

LINQ With C# (Portuguese)

No. I’m not proposing to get rid of objects.

A colleague of mine was asked if I knew a way to dump a list of objects of unknown type into a DataTable with better performance than the way he was using.

The objects being dumped usually have over a dozen of properties, but, for the sake of this post, let’s assume they look like this:

class SomeClass
{
    public int Property1 { get; set; }
    public long Property2 { get; set; }
    public string Property3 { get; set; }
    public object Property4 { get; set; }
    public DateTimeOffset Property5 { get; set; }
}

The code he was using was something like this:

var properties = objectType.GetProperties();

foreach (object obj in objects)
{
    foreach (var property in properties)
    {
        property.GetValue(obj, null);
    }
}

For a list of one million objects, this is takes a little over 6000 milliseconds on my machine.

I immediately thought: Expression Trees!

If the type of the objects was know at compile time, it would be something like this:

Expression<Func<SomeClass, int>> expression = o => o.Property1;
var compiled = expression.Compile();
var propertyValue = compiled.Invoke(obj);

But, at compile time, the type of the object and, consequently, the type of its properties, is unknown. So, we’ll need, for each property, an expression tree like this:

Expression<Func<object, object>> expression = o => ((SomeClass)o).Property1;

The previous expression gets the value of a property of the conversion of the parameter of type object to the type of the object. The result must also be converted to type object because the type of the result must match the type of the return value of the expression.

For the same type of objects, the collection of property accessors would be built this way:

var compiledExpressions = (from property in properties
                           let objectParameter = Expression.Parameter(typeof(object), "o")
                           select
                             Expression.Lambda<Func<object, object>>(
                                 Expression.Convert(
                                     Expression.Property(
                                         Expression.Convert(
                                             objectParameter,
                                             objectType
                                         ),
                                         property
                                     ),
                                     typeof(object)
                                 ),
                                 objectParameter
                             ).Compile()).ToArray();

Looks bit overcomplicated, but reading all properties of all objects for the same object set with this code:

foreach (object obj in objects)
{
    foreach (var compiledExpression in compiledExpressions)
    {
        compiledExpression (obj);
    }
}

takes a little over 150 milliseconds on my machine.

That’s right. 2.5% of the previous time.

TechDays 2010: What’s New On C# 4.0

I would like to thank those that attended my session at TechDays 2010 and I hope that I was able to pass the message of what’s new on C#.

For those that didn’t attend (or did and want to review it), the presentation can be downloaded from here.

Code samples can be downlaoded from here.

Here’s a list of resources mentioned on the session:

C# 4.0: COM Interop Improvements

Dynamic resolution as well as named and optional arguments greatly improve the experience of interoperating with COM APIs such as Office Automation Primary Interop Assemblies (PIAs). But, in order to alleviate even more COM Interop development, a few COM-specific features were also added to C# 4.0.

Ommiting ref

Because of a different programming model, many COM APIs contain a lot of reference parameters. These parameters are typically not meant to mutate a passed-in argument, but are simply another way of passing value parameters.

Specifically for COM methods, the compiler allows to declare the method call passing the arguments by value and will automatically generate the necessary temporary variables to hold the values in order to pass them by reference and will discard their values after the call returns. From the point of view of the programmer, the arguments are being passed by value.

This method call:

object fileName = "Test.docx";
object missing = Missing.Value;

document.SaveAs(ref fileName,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing);

can now be written like this:

document.SaveAs("Test.docx",
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value);

And because all parameters that are receiving the Missing.Value value have that value as its default value, the declaration of the method call can even be reduced to this:

document.SaveAs("Test.docx");

Dynamic Import

Many COM methods accept and return variant types, which are represented in the PIAs as object. In the vast majority of cases, a programmer calling these methods already knows the static type of a returned object form the context of the call, but has to explicitly perform a cast on the returned values to make use of that knowledge. These casts are so common that they constitute a major nuisance.

To make the developer’s life easier, it is now possible to import the COM APIs in such a way that variants are instead represented using the type dynamic which means that COM signatures have now occurrences of dynamic instead of object.

This means that members of a returned object can now be easily accessed or assigned into a strongly typed variable without having to cast.

Instead of this code:

((Excel.Range)(excel.Cells[1, 1])).Value2 = "Hello World!";

this code can now be used:

excel.Cells[1, 1] = "Hello World!";

And instead of this:

Excel.Range range = (Excel.Range)(excel.Cells[1, 1]);

this can be used:

Excel.Range range = excel.Cells[1, 1];

Indexed And Default Properties

A few COM interface features are still not available in C#. On the top of the list are indexed properties and default properties. As mentioned above, these will be possible if the COM interface is accessed dynamically, but will not be recognized by statically typed C# code.

No PIAs – Type Equivalence And Type Embedding

For assemblies indentified with PrimaryInteropAssemblyAttribute, the compiler will create equivalent types (interfaces, structs, enumerations and delegates) and embed them in the generated assembly.

To reduce the final size of the generated assembly, only the used types and their used members will be generated and embedded.

Although this makes development and deployment of applications using the COM components easier because there’s no need to deploy the PIAs, COM component developers are still required to build the PIAs.

C# 4.0: Dynamic Programming

The major feature of C# 4.0 is dynamic programming. Not just dynamic typing, but dynamic in broader sense, which means talking to anything that is not statically typed to be a .NET object.

Dynamic Language Runtime

The Dynamic Language Runtime (DLR) is piece of technology that unifies dynamic programming on the .NET platform, the same way the Common Language Runtime (CLR) has been a common platform for statically typed languages.

The CLR always had dynamic capabilities. You could always use reflection, but its main goal was never to be a dynamic programming environment and there were some features missing. The DLR is built on top of the CLR and adds those missing features to the .NET platform.

Dynamic Language Runtime

The Dynamic Language Runtime is the core infrastructure that consists of:

  • Expression Trees

    The same expression trees used in LINQ, now improved to support statements.

  • Dynamic Dispatch

    Dispatches invocations to the appropriate binder.

  • Call Site Caching

    For improved efficiency.

Dynamic languages and languages with dynamic capabilities are built on top of the DLR. IronPython and IronRuby were already built on top of the DLR, and now, the support for using the DLR is being added to C# and Visual Basic. Other languages built on top of the CLR are expected to also use the DLR in the future.

Underneath the DLR there are binders that talk to a variety of different technologies:

  • .NET Binder

    Allows to talk to .NET objects.

  • JavaScript Binder

    Allows to talk to JavaScript in SilverLight.

  • IronPython Binder

    Allows to talk to IronPython.

  • IronRuby Binder

    Allows to talk to IronRuby.

  • COM Binder

    Allows to talk to COM.

Whit all these binders it is possible to have a single programming experience to talk to all these environments that are not statically typed .NET objects.

The dynamic Static Type

Let’s take this traditional statically typed code:

Calculator calculator = GetCalculator();
int sum = calculator.Sum(10, 20);

Because the variable that receives the return value of the GetCalulator method is statically typed to be of type Calculator and, because the Calculator type has an Add method that receives two integers and returns an integer, it is possible to call that Sum method and assign its return value to a variable statically typed as integer.

Now lets suppose the calculator was not a statically typed .NET class, but, instead, a COM object or some .NET code we don’t know he type of. All of the sudden it gets very painful to call the Add method:

object calculator = GetCalculator();
Type calculatorType = calculator.GetType();
object res = calculatorType.InvokeMember("Add", BindingFlags.InvokeMethod, null, calculator, new object[] { 10, 20 });
int sum = Convert.ToInt32(res);

And what if the calculator was a JavaScript object?

ScriptObject calculator = GetCalculator();
object res = calculator.Invoke("Add", 10, 20);
int sum = Convert.ToInt32(res);

For each dynamic domain we have a different programming experience and that makes it very hard to unify the code.

With C# 4.0 it becomes possible to write code this way:

dynamic calculator = GetCalculator();
int sum = calculator.Add(10, 20);

You simply declare a variable who’s static type is dynamic. dynamic is a pseudo-keyword (like var) that indicates to the compiler that operations on the calculator object will be done dynamically.

The way you should look at dynamic is that it’s just like object (System.Object) with dynamic semantics associated. Anything can be assigned to a dynamic.

dynamic x = 1;
dynamic y = "Hello";
dynamic z = new List<int> { 1, 2, 3 };

At run-time, all object will have a type. In the above example x is of type System.Int32.

When one or more operands in an operation are typed dynamic, member selection is deferred to run-time instead of compile-time. Then the run-time type is substituted in all variables and normal overload resolution is done, just like it would happen at compile-time.

The result of any dynamic operation is always dynamic and, when a dynamic object is assigned to something else, a dynamic conversion will occur.

Code Resolution Method
double x = 1.75;
double y = Math.Abs(x);

compile-time

double Abs(double x)

dynamic x = 1.75;
dynamic y = Math.Abs(x);

run-time

double Abs(double x)

dynamic x = 2;
dynamic y = Math.Abs(x);     

run-time

int Abs(int x)

The above code will always be strongly typed. The difference is that, in the first case the method resolution is done at compile-time, and the others it’s done ate run-time.

IDynamicMetaObjectObject

The DLR is pre-wired to know .NET objects, COM objects and so forth but any dynamic language can implement their own objects or you can implement your own objects in C# through the implementation of the IDynamicMetaObjectProvider interface. When an object implements IDynamicMetaObjectProvider, it can participate in the resolution of how method calls and property access is done.

The .NET Framework already provides two implementations of IDynamicMetaObjectProvider:

  • DynamicObject : IDynamicMetaObjectProvider

    The DynamicObject class enables you to define which operations can be performed on dynamic objects and how to perform those operations. For example, you can define what happens when you try to get or set an object property, call a method, or perform standard mathematical operations such as addition and multiplication.

  • ExpandoObject : IDynamicMetaObjectProvider

    The ExpandoObject class enables you to add and delete members of its instances at run time and also to set and get values of these members. This class supports dynamic binding, which enables you to use standard syntax like sampleObject.sampleMember, instead of more complex syntax like sampleObject.GetAttribute("sampleMember").

C# 4.0: Alternative To Optional Arguments

Like I mentioned in my last post, exposing publicly methods with optional arguments is a bad practice (that’s why C# has resisted to having it, until now).

You might argument that your method or constructor has to many variants and having ten or more overloads is a maintenance nightmare, and you’re right. But the solution has been there for ages: have an arguments class.

The arguments class pattern is used in the .NET Framework is used by several classes, like XmlReader and XmlWriter that use such pattern in their Create methods, since version 2.0:

XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Auto;
XmlReader.Create("file.xml", settings);

With this pattern, you don’t have to maintain a long list of overloads and any default values for properties of XmlReaderSettings (or XmlWriterSettings for XmlWriter.Create) can be changed or new properties added in future implementations that won’t break existing compiled code.

You might now argue that it’s too much code to write, but, with object initializers added in C# 3.0, the same code can be written like this:

XmlReader.Create("file.xml", new XmlReaderSettings { ValidationType = ValidationType.Auto });

Looks almost like named and optional arguments, doesn’t it? And, who knows, in a future version of C#, it might even look like this:

XmlReader.Create("file.xml", new { ValidationType = ValidationType.Auto });

C# 4.0: Named And Optional Arguments

As part of the co-evolution effort of C# and Visual Basic, C# 4.0 introduces Named and Optional Arguments.

First of all, let’s clarify what are arguments and parameters:

  • Method definition parameters are the input variables of the method.
  • Method call arguments are the values provided to the method parameters.

In fact, the C# Language Specification states the following on §7.5:

The argument list (§7.5.1) of a function member invocation provides actual values or variable references for the parameters of the function member.

Given the above definitions, we can state that:

  • Parameters have always been named and still are.
  • Parameters have never been optional and still aren’t.

Named Arguments

Until now, the way the C# compiler matched method call definition arguments with method parameters was by position. The first argument provides the value for the first parameter, the second argument provides the value for the second parameter, and so on and so on, regardless of the name of the parameters. If a parameter was missing a corresponding argument to provide its value, the compiler would emit a compilation error.

For this call:

Greeting("Mr.", "Morgado", 42);

this method:

public void Greeting(string title, string name, int age)

will receive as parameters:

  • title: “Mr.”
  • name: “Morgado”
  • age: 42

What this new feature allows is to use the names of the parameters to identify the corresponding arguments in the form: name:value

Not all arguments in the argument list must be named. However, all named arguments must be at the end of the argument list. The matching between arguments (and the evaluation of its value) and parameters will be done first by name for the named arguments and than by position for the unnamed arguments.

This means that, for this method definition:

public void Method(int first, int second, int third)

this call declaration:

int i = 0;
Method(i, third: i++, second: ++i);

will have this code generated by the compiler:

int i = 0;
int CS$0$0000 = i++;
int CS$0$0001 = ++i;
Method(i, CS$0$0001, CS$0$0000);

which will give the method the following parameter values:

  • first: 2
  • second: 2
  • third: 0

Notice the variable names. Although invalid being invalid C# identifiers, they are valid .NET identifiers and thus avoiding collision between user written and compiler generated code.

Besides allowing to re-order of the argument list, this feature is very useful for auto-documenting the code, for example, when the argument list is very long or not clear, from the call site, what the arguments are.

Optional Arguments

Parameters can now have default values:

public void Method(int first, int second = 2, int third = 3)

Parameters with default values must be the last in the parameter list and its value is used as the value of the parameter if the corresponding argument is missing from the method call declaration.

For this call declaration:

int i = 0;
Method(i, third: ++i);

will have this code generated by the compiler:

int i = 0;
int CS$0$0000 = ++i;
Method(i, 2, CS$0$0000);

which will give the method the following parameter values:

  • first: 1
  • second: 2
  • third: 1

Because, when method parameters have default values, arguments can be omitted from the call declaration, this might seem like method overloading or a good replacement for it, but it isn’t.

Although methods like this:

public StreamReader OpenTextFile(
    string path,
    Encoding encoding = null,
    bool detectEncoding = true,
    int bufferSize = 1024)

allow to have its calls written like this:

OpenTextFile("foo.txt", Encoding.UTF8);
OpenTextFile("foo.txt", Encoding.UTF8, bufferSize: 4096);
OpenTextFile(
    bufferSize: 4096,
    path: "foo.txt",
    detectEncoding: false);

The complier handles default values like constant fields taking the value and useing it instead of a reference to the value. So, like with constant fields, methods with parameters with default values are exposed publicly (and remember that internal members might be publicly accessible – InternalsVisibleToAttribute). If such methods are publicly accessible and used by another assembly, those values will be hard coded in the calling code and, if the called assembly has its default values changed, they won’t be assumed by already compiled code.

At the first glance, I though that using optional arguments for “bad” written code was great, but the ability to write code like that was just pure evil. But than I realized that, since I use private constant fields, it’s OK to use default parameter values on privately accessed methods.

C# 4.0: Covariance And Contravariance In Generics Made Easy

In my last post, I went through what is variance in .NET 4.0 and C# 4.0 in a rather theoretical way.

Now, I’m going to try to make it a bit more down to earth.

Given:

class Base { }

class Derived : Base { }

Such that:

Trace.Assert(typeof(Base).IsClass && typeof(Derived).IsClass && typeof(Base).IsGreaterOrEqualTo(typeof(Derived)));

  • Covariance

    interface ICovariantIn<out T> { }

    Trace.Assert(typeof(ICovariantIn<Base>).IsGreaterOrEqualTo(typeof(ICovariantIn<Derived>)));

  • Contravariance

    interface IContravariantIn<in T> { }

    Trace.Assert(typeof(IContravariantIn<Derived>).IsGreaterOrEqualTo(typeof(IContravariantIn<Base>)));

  • Invariance

    interface IInvariantIn<T> { }

    Trace.Assert(!typeof(IInvariantIn<Base>).IsGreaterOrEqualTo(typeof(IInvariantIn<Derived>))
        && !typeof(IInvariantIn<Derived>).IsGreaterOrEqualTo(typeof(IInvariantIn<Base>)));

Where:

public static class TypeExtensions
{
    public static bool IsGreaterOrEqualTo(this Type self, Type other)
    {
        return self.IsAssignableFrom(other);
    }
}

C# 4.0: Covariance And Contravariance In Generics

C# 4.0 (and .NET 4.0) introduced covariance and contravariance to generic interfaces and delegates. But what is this variance thing?

According to Wikipedia, in multilinear algebra and tensor analysis, covariance and contravariance describe how the quantitative description of certain geometrical or physical entities changes when passing from one coordinate system to another.(*)

But what does this have to do with C# or .NET?

In type theory, a the type T is greater (>) than type S if S is a subtype (derives from) T, which means that there is a quantitative description for types in a type hierarchy.

So, how does covariance and contravariance apply to C# (and .NET) generic types?

In C# (and .NET), variance is a relation between a generic type definition and a particular generic type parameter.

Given two types Base and Derived, such that:

  • There is a reference (or identity) conversion between Base and Derived
  • Base Derived

A generic type definition Generic<T> is:

  • covariant in T if the ordering of the constructed types follows the ordering of the generic type parameters: Generic<Base> ≥ Generic<Derived>.
  • contravariant in T if the ordering of the constructed types is reversed from the ordering of the generic type parameters: Generic<Base> ≤ Generic<Derived>.
  • invariant in T if neither of the above apply.

If this definition is applied to arrays, we can see that arrays have always been covariant in relation to the type of the elements because this is valid code:

object[] objectArray = new string[] { "string 1", "string 2" };
objectArray[0] = "string 3";
objectArray[1] = new object();

However, when we try to run this code, the second assignment will throw an ArrayTypeMismatchException. Although the compiler was fooled into thinking this was valid code because an object is being assigned to an element of an array of object, at run time, there is always a type check to guarantee that the runtime type of the definition of the elements of the array is greater or equal to the instance being assigned to the element. In the above example, because the runtime type of the array is array of string, the first assignment of array elements is valid because string ≥ string and the second is invalid because string ≤ object.

This leads to the conclusion that, although arrays have always been covariant in relation to the type of the elements, they are not safely covariant – code that compiles is not guaranteed to run without errors.

In C#, variance is enforced in the declaration of the type and not determined by the usage of each the generic type parameter.

Covariance in relation to a particular generic type parameter is enforced, is using the out generic modifier:

public interface IEnumerable<out T>
{
    IEnumerator<T> GetEnumerator();
}

public interface IEnumerator<out T>
{
    T Current { get; }
    bool MoveNext();
}

Notice the convenient use the pre-existing out keyword. Besides the benefit of not having to remember a new hypothetic covariant keyword, out is easier to remember because it defines that the generic type parameter can only appear in output positions — read-only properties and method return values.

In a similar way, the way contravariance is enforced in relation a particular generic type parameter, is using the in generic modifier:

public interface IComparer<in T>
{
    int Compare(T x, T y);
}

Once again, the use of the pre-existing in keyword makes it easier to remember that the generic type parameter can only be used in input positions — write-only properties and method non ref and non out parameters.

A generic type parameter that is not marked covariant (out) or contravariant (in) is invariant.

Because covariance and contravariance applies to the relation between a generic type definition and a particular generic type parameter, a generic type definition can be both covariant, contravariant and invariant depending on the generic type parameter.

public delegate TResult Func<in T, out TResult>(T arg);

In the above delegate definition, Func<T, TResult> is contravariant in T and convariant in TResult.

All the types in the .NET Framework where variance could be applied to its generic type parameters have been modified to take advantage of this new feature.

In summary, the rules for variance in C# (and .NET) are:

  • Variance in relation to generic type parameters is restricted to generic interface and generic delegate type definitions.
  • A generic interface or generic delegate type definition can be covariant, contravariant or invariant in relation to different generic type parameters.
  • Variance applies only to reference types: a IEnumerable<int> is not an IEnumerable<object>.
  • Variance does not apply to delegate combination. That is, given two delegates of types Action<Derived> and Action<Base>, you cannot combine the second delegate with the first although the result would be type safe. Variance allows the second delegate to be assigned to a variable of type Action<Derived>, but delegates can combine only if their types match exactly.

If you want to learn more about variance in C# (and .NET), you can always read:

Note: Because variance is a feature of .NET 4.0 and not only of C# 4.0, all this also applies to Visual Basic 10.