.NET4.0

C# 4.0: Dynamic Programming

The major feature of C# 4.0 is dynamic programming. Not just dynamic typing, but dynamic in broader sense, which means talking to anything that is not statically typed to be a .NET object.

Dynamic Language Runtime

The Dynamic Language Runtime (DLR) is piece of technology that unifies dynamic programming on the .NET platform, the same way the Common Language Runtime (CLR) has been a common platform for statically typed languages.

The CLR always had dynamic capabilities. You could always use reflection, but its main goal was never to be a dynamic programming environment and there were some features missing. The DLR is built on top of the CLR and adds those missing features to the .NET platform.

Dynamic Language Runtime

The Dynamic Language Runtime is the core infrastructure that consists of:

  • Expression Trees

    The same expression trees used in LINQ, now improved to support statements.

  • Dynamic Dispatch

    Dispatches invocations to the appropriate binder.

  • Call Site Caching

    For improved efficiency.

Dynamic languages and languages with dynamic capabilities are built on top of the DLR. IronPython and IronRuby were already built on top of the DLR, and now, the support for using the DLR is being added to C# and Visual Basic. Other languages built on top of the CLR are expected to also use the DLR in the future.

Underneath the DLR there are binders that talk to a variety of different technologies:

  • .NET Binder

    Allows to talk to .NET objects.

  • JavaScript Binder

    Allows to talk to JavaScript in SilverLight.

  • IronPython Binder

    Allows to talk to IronPython.

  • IronRuby Binder

    Allows to talk to IronRuby.

  • COM Binder

    Allows to talk to COM.

Whit all these binders it is possible to have a single programming experience to talk to all these environments that are not statically typed .NET objects.

The dynamic Static Type

Let’s take this traditional statically typed code:

Calculator calculator = GetCalculator();
int sum = calculator.Sum(10, 20);

Because the variable that receives the return value of the GetCalulator method is statically typed to be of type Calculator and, because the Calculator type has an Add method that receives two integers and returns an integer, it is possible to call that Sum method and assign its return value to a variable statically typed as integer.

Now lets suppose the calculator was not a statically typed .NET class, but, instead, a COM object or some .NET code we don’t know he type of. All of the sudden it gets very painful to call the Add method:

object calculator = GetCalculator();
Type calculatorType = calculator.GetType();
object res = calculatorType.InvokeMember("Add", BindingFlags.InvokeMethod, null, calculator, new object[] { 10, 20 });
int sum = Convert.ToInt32(res);

And what if the calculator was a JavaScript object?

ScriptObject calculator = GetCalculator();
object res = calculator.Invoke("Add", 10, 20);
int sum = Convert.ToInt32(res);

For each dynamic domain we have a different programming experience and that makes it very hard to unify the code.

With C# 4.0 it becomes possible to write code this way:

dynamic calculator = GetCalculator();
int sum = calculator.Add(10, 20);

You simply declare a variable who’s static type is dynamic. dynamic is a pseudo-keyword (like var) that indicates to the compiler that operations on the calculator object will be done dynamically.

The way you should look at dynamic is that it’s just like object (System.Object) with dynamic semantics associated. Anything can be assigned to a dynamic.

dynamic x = 1;
dynamic y = "Hello";
dynamic z = new List<int> { 1, 2, 3 };

At run-time, all object will have a type. In the above example x is of type System.Int32.

When one or more operands in an operation are typed dynamic, member selection is deferred to run-time instead of compile-time. Then the run-time type is substituted in all variables and normal overload resolution is done, just like it would happen at compile-time.

The result of any dynamic operation is always dynamic and, when a dynamic object is assigned to something else, a dynamic conversion will occur.

Code Resolution Method
double x = 1.75;
double y = Math.Abs(x);

compile-time

double Abs(double x)

dynamic x = 1.75;
dynamic y = Math.Abs(x);

run-time

double Abs(double x)

dynamic x = 2;
dynamic y = Math.Abs(x);     

run-time

int Abs(int x)

The above code will always be strongly typed. The difference is that, in the first case the method resolution is done at compile-time, and the others it’s done ate run-time.

IDynamicMetaObjectObject

The DLR is pre-wired to know .NET objects, COM objects and so forth but any dynamic language can implement their own objects or you can implement your own objects in C# through the implementation of the IDynamicMetaObjectProvider interface. When an object implements IDynamicMetaObjectProvider, it can participate in the resolution of how method calls and property access is done.

The .NET Framework already provides two implementations of IDynamicMetaObjectProvider:

  • DynamicObject : IDynamicMetaObjectProvider

    The DynamicObject class enables you to define which operations can be performed on dynamic objects and how to perform those operations. For example, you can define what happens when you try to get or set an object property, call a method, or perform standard mathematical operations such as addition and multiplication.

  • ExpandoObject : IDynamicMetaObjectProvider

    The ExpandoObject class enables you to add and delete members of its instances at run time and also to set and get values of these members. This class supports dynamic binding, which enables you to use standard syntax like sampleObject.sampleMember, instead of more complex syntax like sampleObject.GetAttribute("sampleMember").

C# 4.0: Alternative To Optional Arguments

Like I mentioned in my last post, exposing publicly methods with optional arguments is a bad practice (that’s why C# has resisted to having it, until now).

You might argument that your method or constructor has to many variants and having ten or more overloads is a maintenance nightmare, and you’re right. But the solution has been there for ages: have an arguments class.

The arguments class pattern is used in the .NET Framework is used by several classes, like XmlReader and XmlWriter that use such pattern in their Create methods, since version 2.0:

XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Auto;
XmlReader.Create("file.xml", settings);

With this pattern, you don’t have to maintain a long list of overloads and any default values for properties of XmlReaderSettings (or XmlWriterSettings for XmlWriter.Create) can be changed or new properties added in future implementations that won’t break existing compiled code.

You might now argue that it’s too much code to write, but, with object initializers added in C# 3.0, the same code can be written like this:

XmlReader.Create("file.xml", new XmlReaderSettings { ValidationType = ValidationType.Auto });

Looks almost like named and optional arguments, doesn’t it? And, who knows, in a future version of C#, it might even look like this:

XmlReader.Create("file.xml", new { ValidationType = ValidationType.Auto });

C# 4.0: Named And Optional Arguments

As part of the co-evolution effort of C# and Visual Basic, C# 4.0 introduces Named and Optional Arguments.

First of all, let’s clarify what are arguments and parameters:

  • Method definition parameters are the input variables of the method.
  • Method call arguments are the values provided to the method parameters.

In fact, the C# Language Specification states the following on §7.5:

The argument list (§7.5.1) of a function member invocation provides actual values or variable references for the parameters of the function member.

Given the above definitions, we can state that:

  • Parameters have always been named and still are.
  • Parameters have never been optional and still aren’t.

Named Arguments

Until now, the way the C# compiler matched method call definition arguments with method parameters was by position. The first argument provides the value for the first parameter, the second argument provides the value for the second parameter, and so on and so on, regardless of the name of the parameters. If a parameter was missing a corresponding argument to provide its value, the compiler would emit a compilation error.

For this call:

Greeting("Mr.", "Morgado", 42);

this method:

public void Greeting(string title, string name, int age)

will receive as parameters:

  • title: “Mr.”
  • name: “Morgado”
  • age: 42

What this new feature allows is to use the names of the parameters to identify the corresponding arguments in the form: name:value

Not all arguments in the argument list must be named. However, all named arguments must be at the end of the argument list. The matching between arguments (and the evaluation of its value) and parameters will be done first by name for the named arguments and than by position for the unnamed arguments.

This means that, for this method definition:

public void Method(int first, int second, int third)

this call declaration:

int i = 0;
Method(i, third: i++, second: ++i);

will have this code generated by the compiler:

int i = 0;
int CS$0$0000 = i++;
int CS$0$0001 = ++i;
Method(i, CS$0$0001, CS$0$0000);

which will give the method the following parameter values:

  • first: 2
  • second: 2
  • third: 0

Notice the variable names. Although invalid being invalid C# identifiers, they are valid .NET identifiers and thus avoiding collision between user written and compiler generated code.

Besides allowing to re-order of the argument list, this feature is very useful for auto-documenting the code, for example, when the argument list is very long or not clear, from the call site, what the arguments are.

Optional Arguments

Parameters can now have default values:

public void Method(int first, int second = 2, int third = 3)

Parameters with default values must be the last in the parameter list and its value is used as the value of the parameter if the corresponding argument is missing from the method call declaration.

For this call declaration:

int i = 0;
Method(i, third: ++i);

will have this code generated by the compiler:

int i = 0;
int CS$0$0000 = ++i;
Method(i, 2, CS$0$0000);

which will give the method the following parameter values:

  • first: 1
  • second: 2
  • third: 1

Because, when method parameters have default values, arguments can be omitted from the call declaration, this might seem like method overloading or a good replacement for it, but it isn’t.

Although methods like this:

public StreamReader OpenTextFile(
    string path,
    Encoding encoding = null,
    bool detectEncoding = true,
    int bufferSize = 1024)

allow to have its calls written like this:

OpenTextFile("foo.txt", Encoding.UTF8);
OpenTextFile("foo.txt", Encoding.UTF8, bufferSize: 4096);
OpenTextFile(
    bufferSize: 4096,
    path: "foo.txt",
    detectEncoding: false);

The complier handles default values like constant fields taking the value and useing it instead of a reference to the value. So, like with constant fields, methods with parameters with default values are exposed publicly (and remember that internal members might be publicly accessible – InternalsVisibleToAttribute). If such methods are publicly accessible and used by another assembly, those values will be hard coded in the calling code and, if the called assembly has its default values changed, they won’t be assumed by already compiled code.

At the first glance, I though that using optional arguments for “bad” written code was great, but the ability to write code like that was just pure evil. But than I realized that, since I use private constant fields, it’s OK to use default parameter values on privately accessed methods.

C# 4.0: Covariance And Contravariance In Generics Made Easy

In my last post, I went through what is variance in .NET 4.0 and C# 4.0 in a rather theoretical way.

Now, I’m going to try to make it a bit more down to earth.

Given:

class Base { }

class Derived : Base { }

Such that:

Trace.Assert(typeof(Base).IsClass && typeof(Derived).IsClass && typeof(Base).IsGreaterOrEqualTo(typeof(Derived)));

  • Covariance

    interface ICovariantIn<out T> { }

    Trace.Assert(typeof(ICovariantIn<Base>).IsGreaterOrEqualTo(typeof(ICovariantIn<Derived>)));

  • Contravariance

    interface IContravariantIn<in T> { }

    Trace.Assert(typeof(IContravariantIn<Derived>).IsGreaterOrEqualTo(typeof(IContravariantIn<Base>)));

  • Invariance

    interface IInvariantIn<T> { }

    Trace.Assert(!typeof(IInvariantIn<Base>).IsGreaterOrEqualTo(typeof(IInvariantIn<Derived>))
        && !typeof(IInvariantIn<Derived>).IsGreaterOrEqualTo(typeof(IInvariantIn<Base>)));

Where:

public static class TypeExtensions
{
    public static bool IsGreaterOrEqualTo(this Type self, Type other)
    {
        return self.IsAssignableFrom(other);
    }
}

C# 4.0: Covariance And Contravariance In Generics

C# 4.0 (and .NET 4.0) introduced covariance and contravariance to generic interfaces and delegates. But what is this variance thing?

According to Wikipedia, in multilinear algebra and tensor analysis, covariance and contravariance describe how the quantitative description of certain geometrical or physical entities changes when passing from one coordinate system to another.(*)

But what does this have to do with C# or .NET?

In type theory, a the type T is greater (>) than type S if S is a subtype (derives from) T, which means that there is a quantitative description for types in a type hierarchy.

So, how does covariance and contravariance apply to C# (and .NET) generic types?

In C# (and .NET), variance is a relation between a generic type definition and a particular generic type parameter.

Given two types Base and Derived, such that:

  • There is a reference (or identity) conversion between Base and Derived
  • Base Derived

A generic type definition Generic<T> is:

  • covariant in T if the ordering of the constructed types follows the ordering of the generic type parameters: Generic<Base> ≥ Generic<Derived>.
  • contravariant in T if the ordering of the constructed types is reversed from the ordering of the generic type parameters: Generic<Base> ≤ Generic<Derived>.
  • invariant in T if neither of the above apply.

If this definition is applied to arrays, we can see that arrays have always been covariant in relation to the type of the elements because this is valid code:

object[] objectArray = new string[] { "string 1", "string 2" };
objectArray[0] = "string 3";
objectArray[1] = new object();

However, when we try to run this code, the second assignment will throw an ArrayTypeMismatchException. Although the compiler was fooled into thinking this was valid code because an object is being assigned to an element of an array of object, at run time, there is always a type check to guarantee that the runtime type of the definition of the elements of the array is greater or equal to the instance being assigned to the element. In the above example, because the runtime type of the array is array of string, the first assignment of array elements is valid because string ≥ string and the second is invalid because string ≤ object.

This leads to the conclusion that, although arrays have always been covariant in relation to the type of the elements, they are not safely covariant – code that compiles is not guaranteed to run without errors.

In C#, variance is enforced in the declaration of the type and not determined by the usage of each the generic type parameter.

Covariance in relation to a particular generic type parameter is enforced, is using the out generic modifier:

public interface IEnumerable<out T>
{
    IEnumerator<T> GetEnumerator();
}

public interface IEnumerator<out T>
{
    T Current { get; }
    bool MoveNext();
}

Notice the convenient use the pre-existing out keyword. Besides the benefit of not having to remember a new hypothetic covariant keyword, out is easier to remember because it defines that the generic type parameter can only appear in output positions — read-only properties and method return values.

In a similar way, the way contravariance is enforced in relation a particular generic type parameter, is using the in generic modifier:

public interface IComparer<in T>
{
    int Compare(T x, T y);
}

Once again, the use of the pre-existing in keyword makes it easier to remember that the generic type parameter can only be used in input positions — write-only properties and method non ref and non out parameters.

A generic type parameter that is not marked covariant (out) or contravariant (in) is invariant.

Because covariance and contravariance applies to the relation between a generic type definition and a particular generic type parameter, a generic type definition can be both covariant, contravariant and invariant depending on the generic type parameter.

public delegate TResult Func<in T, out TResult>(T arg);

In the above delegate definition, Func<T, TResult> is contravariant in T and convariant in TResult.

All the types in the .NET Framework where variance could be applied to its generic type parameters have been modified to take advantage of this new feature.

In summary, the rules for variance in C# (and .NET) are:

  • Variance in relation to generic type parameters is restricted to generic interface and generic delegate type definitions.
  • A generic interface or generic delegate type definition can be covariant, contravariant or invariant in relation to different generic type parameters.
  • Variance applies only to reference types: a IEnumerable<int> is not an IEnumerable<object>.
  • Variance does not apply to delegate combination. That is, given two delegates of types Action<Derived> and Action<Base>, you cannot combine the second delegate with the first although the result would be type safe. Variance allows the second delegate to be assigned to a variable of type Action<Derived>, but delegates can combine only if their types match exactly.

If you want to learn more about variance in C# (and .NET), you can always read:

Note: Because variance is a feature of .NET 4.0 and not only of C# 4.0, all this also applies to Visual Basic 10.

The Evolution Of C#

The Evolution Of C#The first release of C# (C# 1.0) was all about building a new language for managed code that appealed, mostly, to C++ and Java programmers.

The second release (C# 2.0) was mostly about adding what wasn’t time to built into the 1.0 release. The main feature for this release was Generics.

The third release (C# 3.0) was all about reducing the impedance mismatch between general purpose programming languages and databases. To achieve this goal, several functional programming features were added to the language and LINQ was born.

Going forward, new trends are showing up in the industry and modern programming languages need to be more:

  • Declarative

    With imperative languages, although having the eye on the what, programs need to focus on the how. This leads to over specification of the solution to the problem in hand, making next to impossible to the execution engine to be smart about the execution of the program and optimize it to run it more efficiently (given the hardware available, for example).

    Declarative languages, on the other hand, focus only on the what and leave the how to the execution engine. LINQ made C# more declarative by using higher level constructs like orderby and group by that give the execution engine a much better chance of optimizing the execution (by parallelizing it, for example).

  • Concurrent

    Concurrency is hard and needs to be thought about and it’s very hard to shoehorn it into a programming language. Parallel.For (from the parallel extensions) looks like a parallel for because enough expressiveness has been built into C# 3.0 to allow this without having to commit to specific language syntax.

  • Dynamic

    There was been lots of debate on which ones are the better programming languages: static or dynamic. The fact is that both have good qualities and users of both types of languages want to have it all.

All these trends require a paradigm switch. C# is, in many ways, already a multi-paradigm language. It’s still very object oriented (class oriented as some might say) but it can be argued that C# 3.0 has become a functional programming language because it has all the cornerstones of what a functional programming language needs. Moving forward, will have even more.

Besides the influence of these trends, there was a decision of co-evolution of the C# and Visual Basic programming languages. Since its inception, there was been some effort to position C# and Visual Basic against each other and to try to explain what should be done with each language or what kind of programmers use one or the other. Each language should be chosen based on the past experience and familiarity of the developer/team/project/company and not by particular features.

In the past, every time a feature was added to one language, the users of the other wanted that feature too. Going forward, when a feature is added to one language, the other will work hard to add the same feature. This doesn’t mean that XML literals will be added to C# (because almost the same can be achieved with LINQ To XML), but Visual Basic will have auto-implemented properties.

Most of these features require or are built on top of features of the .NET Framework and, the focus for C# 4.0 was on dynamic programming. Not just dynamic types but being able to talk with anything that isn’t a .NET class.

Also introduced in C# 4.0 is co-variance and contra-variance for generic interfaces and delegates.

Stay tuned for more on the new C# 4.0 features.

LINQ: Enhancing Distinct With The SelectorEqualityComparer

LINQ With C# (Portuguese)

On my last post, I introduced the PredicateEqualityComparer and a Distinct extension method that receives a predicate to internally create a PredicateEqualityComparer to filter elements.

Using the predicate, greatly improves readability, conciseness and expressiveness of the queries, but it can be even better. Most of the times, we don’t want to provide a comparison method but just to extract the comaprison key for the elements.

So, I developed a SelectorEqualityComparer that takes a method that extracts the key value for each element. Something like this:

public class SelectorEqualityComparer<TSource, Tkey> : EqualityComparer<TSource>
    where Tkey : IEquatable<Tkey>
{
    private Func<TSource, Tkey> selector;

    public SelectorEqualityComparer(Func<TSource, Tkey> selector)
        : base()
    {
        this.selector = selector;
    }

    public override bool Equals(TSource x, TSource y)
    {
        Tkey xKey = this.GetKey(x);
        Tkey yKey = this.GetKey(y);

        if (xKey != null)
        {
            return ((yKey != null) && xKey.Equals(yKey));
        }

        return (yKey == null);
    }

    public override int GetHashCode(TSource obj)
    {
        Tkey key = this.GetKey(obj);

        return (key == null) ? 0 : key.GetHashCode();
    }

    public override bool Equals(object obj)
    {
        SelectorEqualityComparer<TSource, Tkey> comparer = obj as SelectorEqualityComparer<TSource, Tkey>;
        return (comparer != null);
    }

    public override int GetHashCode()
    {
        return base.GetType().Name.GetHashCode();
    }

    private Tkey GetKey(TSource obj)
    {
        return (obj == null) ? (Tkey)(object)null : this.selector(obj);
    }
}

Now I can write code like this:

.Distinct(new SelectorEqualityComparer<Source, Key>(x => x.Field))

And, for improved readability, conciseness and expressiveness and support for anonymous types the corresponding Distinct extension method:

public static IEnumerable<TSource> Distinct<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    where TKey : IEquatable<TKey>
{
    return source.Distinct(new SelectorEqualityComparer<TSource, TKey>(selector));
}

And the query is now written like this:

.Distinct(x => x.Field)

For most usages, it’s simpler than using a predicate.

LINQ: Enhancing Distinct With The PredicateEqualityComparer

LINQ With C# (Portuguese)

Today I was writing a LINQ query and I needed to select distinct values based on a comparison criteria.

Fortunately, LINQ’s Distinct method allows an equality comparer to be supplied, but, unfortunately, sometimes, this means having to write custom equality comparer.

Because I was going to need more than one equality comparer for this set of tools I was building, I decided to build a generic equality comparer that would just take a custom predicate. Something like this:

public class PredicateEqualityComparer<T> : EqualityComparer<T>
{
    private Func<T, T, bool> predicate;

    public PredicateEqualityComparer(Func<T, T, bool> predicate)
        : base()
    {
        this.predicate = predicate;
    }

    public override bool Equals(T x, T y)
    {
        if (x != null)
        {
            return ((y != null) && this.predicate(x, y));
        }

        if (y != null)
        {
            return false;
        }

        return true;
    }

    public override int GetHashCode(T obj)
    {
        // Always return the same value to force the call to IEqualityComparer<T>.Equals
        return 0;
    }
}

Now I can write code like this:

.Distinct(new PredicateEqualityComparer<Item>((x, y) => x.Field == y.Field))

But I felt that I’d lost all conciseness and expressiveness of LINQ and it doesn’t support anonymous types. So I came up with another Distinct extension method:

public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source, Func<TSource, TSource, bool> predicate)
{
    return source.Distinct(new PredicateEqualityComparer<TSource>(predicate));
}

And the query is now written like this:

.Distinct((x, y) => x.Field == y.Field)

Looks a lot better, doesn’t it? And it works wit anonymous types.

Update: I, accidently, had published the wrong version of the IEqualityComparer<T>.Equals method,

LINQ: Single vs. SingleOrDefault

LINQ With C# (Portuguese)

Like other LINQ API methods that extract a scalar value from a sequence, Single has a companion SingleOrDefault.

The documentation of SingleOrDefault states that it returns a single, specific element of a sequence of values, or a default value if no such element is found, although, in my opinion, it should state that it returns THE single, specific element of a sequence of ONE value, or a default value if no such element is found. Nevertheless, what this method does is return the default value of the source type if the sequence is empty or, like Single, throws an exception if the sequence has more than one element.

I received several comments to my last post saying that SingleOrDefault could be used to avoid an exception.

Well, it only “solves” half of the “problem”. If the sequence has more than one element, an exception will be thrown anyway.

In the end, it all comes down to semantics and intent. If it is expected that the sequence may have none or one element, than SingleOrDefault should be used. If it’s not expect that the sequence is empty and the sequence is empty, than it’s an exceptional situation and an exception should be thrown right there. And, in that case, why not use Single instead? In my opinion, when a failure occurs, it’s best to fail fast and early than slow and late.

Other methods in the LINQ API that use the same companion pattern are: ElementAt/ElementAtOrDefault, First/FirstOrDefault and Last/LastOrDefault.

LINQ: Single vs. First

LINQ With C# (Portuguese)

I’ve witnessed and been involved in several discussions around the correctness or usefulness of the Single method in the LINQ API.

The most common argument is that you are querying for the first element on the result set and an exception will be thrown if there’s more than one element. The First method should be used instead, because it doesn’t throw if the result set has more than one item.

Although the documentation for Single states that it returns a single, specific element of a sequence of values, it actually returns THE single, specific element of a sequence of ONE value. When you use the Single method in your code you are asserting that your query will result in a scalar result instead of a result set of arbitrary length.

On the other hand, the documentation for First states that it returns the first element of a sequence of arbitrary length.

Imagine you want to catch a taxi. You go the the taxi line and catch the FIRST one, no matter how many are there.

On the other hand, if you go the the parking lot to get your car, you want the SINGLE one specific car that’s yours. If your “query” “returns” more than one car, it’s an exception. Either because it “returned” not only your car or you happen to have more than one car in that parking lot. In either case, you can only drive one car at once and you’ll need to refine your “query”.