Algorithm: Find All Unique Combinations in a List

My son, Benson Joeris, just showed me a rather cool algorithm that I thought I’d share. I’m really, really lucky to have an algorithms genius on speed dial. I thought this solution was cool because I tend to think of active solutions to problems, not mapping them onto a structure – IOW, this represented a very different approach to the algorithm than I think I would have arrived at.

The goal: Given any list

A, B, C, …

I want to find all combinations, with no duplicates (assuming there are no duplicates in the original list).

<emptySet>

A

B

C

A,B

A,C

A,B,C

I thought of a few rather bizarre approaches before sending Benson an email…

It sounds like you want to enumerate subsets of {1,…,N}. There are 2^N of these, and they can fairly easily be encoded as the integers 1,…,(2^N)-1, by thinking about the numbers in binary. If the least significant bit is 1, then string 1 is included, and if it is 0, string 1 is not included. The next least significant bit encodes whether string 2 is included, and so on. So, with a little bit manipulation, you can decode an integer into the string arrays you are looking for, and then just have to loop over 1,…,(2^N)-1 (or 0,…,(2^N)-1, if you want to include the empty set of strings)

I wrote this up

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }in C# as

`public static List<List<T>> GetAllCombos<T>(this List<T> initialList){    var ret = new List<List<T>>();    // The final number of sets will be 2^N (or 2^N - 1 if skipping empty set)    int setCount = Convert.ToInt32(Math.Pow(2, initialList.Count()));    // Start at 1 if you do not want the empty set    for (int mask = 0; mask < setCount; mask++)    {        var nestedList = new List<T>();        for (int j = 0; j < initialList.Count(); j++)        {            // Each position in the initial list maps to a bit here            var pos = 1 << j;            if ((mask & pos) == pos) { nestedList.Add(initialList[j]); }        }        ret.Add(nestedList);    }    return ret;}    `

This will contain exactly the set I was looking for!

If you’d like me to do a post explaining why this works, let me know.

[ Edit: This was requested and appears here. ]

Visual Studio Support for Creating Your RegEx strings

Bells and Whistles for the new feature?

No, this trick works in all versions of Visual Studio. Well, at least all I can remember. But it is much more useful since VS switched to match the RegEx used in .NET in VS 2012.

You’re about to find out that I’m not really a RegEx genius. I can’t remember those details any better than you can – I’m just better at knowing where the easy to access hints are:

• Create some sample data and open it in Visual Studio
• Go to the old Find Dialog and RegEx in Find Options
• Select Document scope from Look In combo
• Push the button next to the Find What textbox
• This popup reminds you of everything you always forget
• Want more? Click RegEx help button for non-concise version
• Push “Find in Files,” even though that feels illogical
• All your matches will be found
• Since the format sucks, fix it like this (both find and replace)

This gets you the line number for your match, which is often enough.

If you want to get the actual value, you need a little trick. My Replace result format, in my registry, is

\$l\t\$c\t\$f\$e:\t\$0\t\$s\r\n

I can get the actual values, all of them on each line, by doing a replace with an exact match of the search. You get that in VS Replace RegEx with \$0. (If you know RegEx, VS captures the whole string as the first capture).)

Here’s a screen shot of the dialog (while that looks like an ugly RegEx, it’s just what you get selecting symbol match from the dropdown, I didn’t type anything).

And if I run this on a file that just contains

public _whatever123? test

The result is

This isn’t very intelligent RegEx for code, you’ll write something clever that makes sense with your data set.

Once you’ve got the RegEx string you want, copy it into your code and the call to RegEx, where it belongs.

It’s a little trickier to avoid messing up your data set if you are testing captures (Stop reading now if you have no idea what a “capture” means!)

Since it’s just sample data, I just do a series of replacement (massively messing the data file) and Ctl-Z for Undo. I delimit the captures, a dash doesn’t require escaping, and end the set of capture outputs with another delimiter (I use a set of asterisks) to differentiate the captures from the result string.

Using

• Input File (one line):        public _whatever123? Test
• Search For (2 captures): (public) (\b(_\w+|[\w-[0-9_]]\w*)\b)
• Replace With:                -\$1- -\$2- ***
• Resulting File (Ctl-Z to undo): -public- -_whatever123- ***? test
• And the resulting output is:

If you do a lot of RegEx, it’s absolutely worth your time to become familiar with third party RegEx tools.

But if you already know Visual Studio and you just want to stick an occasional RegEx string into your .NET code, or your debugging RegEx, or your mystified about why a RegEx statement looks like it does, use the Find and Replace dialogs in Visual Studio to make life much nicer!

A Baker’s Dozen of Dots to Connect

I am so excited about a future release of .NET and Visual Studio!

I went to the MVP Summit and I’m cool with the fact that I’m under NDA until anything new that might be happening is discussed publically by Microsoft.

But, I also really understand the confusion and frustration across our segment of the industry. Microsoft going dark has often meant a product was in trouble. So, are .NET, Visual Studio and C#/VB in trouble?

The simple answer is – no, there are not in trouble. But a future release is a complex and tricky release. Below are the dots – you connect them. We MVPs are so respectful of NDA that we often don’t say what we can say because we aren’t 100% sure that we can say it. These dots aren’t NDA.

If you’re in the “I trust Microsoft camp,” that’s great. But the reason you can trust Microsoft is that there are a few thousand people under NDA with Microsoft or pushing the boundaries in Open Source and other companies that nudge Microsoft back to the straight and narrow. Imagine a gauntlet that the tools you use to make your living have to go through.

I’ve been one of the voices in that gauntlet for a long time, and I’m incredibly passionate about the future of .NET/Visual Studio. I do try to be polite, but, well, I did go off in the hallway on one unsuspecting Softie when he told me that his team (which is not one I watch closely) was doing something incredibly stupid – sorry about that and I hope his boss explained me to him.

Anyway, I said I’d give you dots.

The first dot is that you really do care. It’s your future.

The second dot is that there are thousands of really smart people helping Microsoft and hundreds within Microsoft that have been working on a future version for a couple of years. There’s a reason I used the hash tag #teamsworkinghardbeingsmart.

The third dot is that Microsoft is working on new compilers for C# and VB. The only thing that should terrify you more than new compilers is the state of the current compilers. They were started about 15 years ago – before .NET, before agile, before common unit testing, before modern decoupling strategies, before the current folks were on the teams, before many of them graduated high school… Sure the old compilers work, sure there are millions of tests, but what do you think changing those compilers involves?

The fourth dot is that the new compilers can’t be significantly slower. Who do you guess has some of the largest .NET programs on the planet? And does Microsoft care about that? Increasing compile times significantly isn’t an option.

The fifth dot is the new compilers can’t break legacy code. Or at least not much and only when we’re warned. Fail this, and nothing else matters.

The sixth dot is that compilers now drive editors via compiler services. If customers hate that version Visual Studio, they won’t use the new compilers.

The seventh dot is that Microsoft is doing something brand new with compilers. I know it sounds geeky and like a big yawn to say that the syntactic trees and semantic trees will be visible and probably mutable (although there is no reason they need to be mutable in the first version). But this changes everything. Assuming we get a good way to communicate between your code and the compilers (a different post topic) this opens the door to removing almost every stupid aspect of code you write today. INotifyPropertyChanged is the poster-child, the tip of the iceberg, and I think the least interesting of what you are bound to see.

The eighth dot is that putting new compilers in place and pulling it all together is really hard. The compiler API has to come before the compiler, before Visual Studio and language changes. But surely they will screw up something in the API that they’ll find as they build the compiler and then they have to change the API, which also affects language services. And surely creating languages services for Visual Studio will uncover things that could be done better in the compiler or services API. If something goes wrong the whole house of cards falls down.

The ninth dot is that a good compiler, an open compiler API, and better language services immediately opens new doors. I’ll tell you in another post what I’m doing with Roslyn right now (still deciding whether to discuss with old API or wait for the next API release to talk about my own stuff). On the languages front, see this post on Mads Torgersen’s NDC talk. On other fronts, what can Visual Studio, Open Source and third parties build with compiler services not tacked on as a way to make VB do what it used to do, and C# catch up with VB and Eclipse?

The tenth dot is that other teams are also #teamsworkinghardbeingsmart. I think the right topics are on the table – diagnostics, programmer productivity, security, unification, quality, modern platforms… Beyond languages and Visual Studio, I know most about the .NET platform, which I watch with a telescope (detail level) and ASP.NET, which I watch with a fish-eye lens (overview). RyuJIT and channel support for ETW are both announced, and expect the current flow of improvements from the .NET internals teams to continue. These teams are so fantastic that you rarely think about their existence – but trust me, I hugged Immo at the summit for good reason (the hug was for both the .NET core teams). And on the web side? Well those teams love OSS, so you can watch their interest in things like OWin and lighter weight server side solutions and saner JavaScript. I suspect the teams I don’t watch are also #teamsworkinghardbeingsmart.

The eleventh dot is that Microsoft needs to tell you a comprehensible story. Since Mads has announced some stuff on languages, I’ll stick to that space. Dribbling out monadic null checking and static type using statements would just muddy the water about what’s important. The compilers, while they won’t affect your code, are the big story on the languages front. But the rest of this stuff? That looks like nice little additions to the language. When you see a coherent story, you’ll be able to mentally do your “I care, I don’t care” checklist. Of course I want Microsoft to release information faster so I can talk about it, but I understand why they want to make it sane for all of us.

The twelfth dot you’re looking for is “When will this happen?” I’ve carefully avoided the words “next” and “VNext” because Microsoft has carefully avoided committing to anything I’m talking about here being in the next version. But I was sent this as a verbatim copy of one of Mads’ NDC slides “The whole C# and VB team is working on it, VS itself is compiled with the Roslyn compiler, Everyone on the team uses the Roslyn IDE.”

The thirteenth dot is that I’m happy. I’m excited about a future release of the entire .NET/Visual Studio family. Read into my happiness all you want and I think there will be boatloads to talk about soon.