What Code Comments are Not For.

I deal a lot with other people’s and legacy code.  One of the things I see very often is what I call "misuse of code comments".  Let me provide an exhaustive list of what code comments are for:

  • Describing why code isn’t doing something obvious

There, done.

What code comments are not for (not complete):

The Obvious

// set the value of i
i = value;

It’s obvious that the code is doing that; the comment adds no value here.  The compiler provides no validation that the "i" in the comment really applies to any variable in the line the comment applies to.  Nor does it even know what line the comment applies to!  Comments like this actually introduce some technical debt because I can refactor this code to move it around independently of the comment and thus the comment would appear to annotate some other line of code. Refactoring tools help somewhat with this; but they only really do name matching in comments when you rename a variable.  Do you think renaming "i" to "count" really means replacing all "i"’s in all comments with "count"?  Probably not; don’t use refactoring tools as a crutch.

Helping the Reader Learn the Language

You can’t possibly know what the reader of your code does and does not know.  This is especially true of what they do and don’t know of language syntax.  The language syntax is your common language; you can only assume they know it. You can’t possibly know if a comment about language syntax is going to help the reader or not.  If they don’t know it, they should look it up.  Comments that describe syntax are a no-win proposition, you can’t possibly have a comment that helps every reader of the code.

An example:
        /// <summary>
        /// Constructor for class.
        /// </summary>
        public MyClass()

If your reader doesn’t know this is a constructor, they probably don’t even know what a c’tor is—this comment isn’t going to help them much. 

Slightly different example:

/// <summary>
/// Initializes a new instance of the MyClass class
/// </summary>
public MyClass()

If the reader doesn’t know what a c’tor does, does all your code include comments that will help this reader?  These comments are a waste of time and add no value. Same technical debt as Obvious, it’s not a syntax error to separate the comment from the declaration; there is a risk they will become disconnected or out of synch.  If the comment has no value having to manage it also has no value and therefore adds work to the project.

Another example verging on Obvious:
public MyClass()
{
  // Empty
}

As this stands, it seems benign. But one, it should be Obvious.  Two, if it’s not, the reader should be brushing up on language syntax.  Three, it’s not verified.  I can edit this c’tor to make it do something else this is perfectly syntactically correct:

public MyClass()
{
  x = 42;
  // Empty
}

Now, the comment is meaningless and potentially confusing.  Reading this for the first time makes you wonder did the class just have // Empty in it in the past and x = 42 was added, or does "empty" mean something different to the author, or did the author suffer from a stroke and possibly need medical attention?

You can assume the reader of your code doesn’t know anything about the code.  If the language can’t express the concepts in the code properly (if it can, you should be writing it that way; if you choose not to, comment why.) then comment why the language isn’t describing the concepts.

WHY not HOW

Writing comments to aid the reader in the understanding of the language is sometimes describing HOW the language is working.  That’s not describing the code but describing the language.  Comments should describe WHY the code was written that way if it’s not obvious.  Again, the language is the common denominator between the reader and the author.  There’s many references the reader can refer to to learn the language–let them do that; you may not be the best person to help someone learn the language; at the very least you don’t know the degree to which they don’t know the language.  Let the code clearly describe HOW.

Use of comments is often a form of religion; people are very opinionated about them in one way or another.   Robert Martin pulls no punches in Clean Code by saying:

“The proper use of comments is to compensate for our failure to express yourself in code. Note that I used the word failure. I meant it. Comments are always failures.”

Martin has previous described comments as “apologies” for “making the code unmaintainable”…

If you want to use concepts like XMLDOC or tools like JavaDoc to document members and classes, that’s fine; just make sure your comments are meaningful and can stand alone.

For what it’s worth, these are comments that I have encountered; more than once and on more than one project.

Do you have some code comment pet peeves?  We’d love to hear them!

(function() {
var po = document.createElement(‘script’); po.type = ‘text/javascript’; po.async = true;
po.src = ‘https://apis.google.com/js/plusone.js’;
var s = document.getElementsByTagName(‘script’)[0]; s.parentNode.insertBefore(po, s);
})();

Avoid the volatile Modifier

[Update: 25-Jan-12 5:45 pm; fixed typo]

I was reminded recently of the misconceptions of the volatile modifier in C#, and I’d thought I’d pass along the recommendations of other’s that is tantamount to “avoid the volatile modifier”.  The volatile modifier in C# “indicates that a field might be modified by multiple threads that are executing at the same time” [1].

The first problem is that documentation.  What does that really mean to someone developing code that uses multiple threads?  Does that make the code “thread-safe”?  The answer is “maybe”; the real answer is “rarely”. Most people just stick volatile on a field because they think that’s what they need to do.

What volatile modifier does do to a field in C# is make all reads to the field use “acquire semantics” and all readswrites use “release semantics”.  Much clearer, right?  Acquire semantics means access is “guaranteed to occur prior to any references to memory that occur after it in the instruction sequence” and release semantics means access is “guaranteed to happen after any memory references prior to the write instruction in the instruction sequence” [6,2]

One of the problems with modern compilers [3] and processers and multithreaded code is optimization.  Within a block of code with no externally visible side effects the compiler is free to re-order instructions and remove instructions to result in the same visible side-effect.  “x+=1;x+=1;” can be freely optimized to “x+=2;” and “x+=1;y+=2;” can be reordered so that order of execution is effectively “y+=2;x+=1;”, for example.  The processor doesn’t optimize multiple instructions into one; but can re-order instructions and make the results of executed instructions visible to other cores/processors visible well after the instructions were executed (processor caching).  With fields, the compiler[3] has less freedom to optimize because side-effects to fields have more visibility—but the processor doesn’t discern between C# fields and any other bit of memory.

So, what does volatile really do for a field?  Realistically it prevents processor re-ordering and caching.  It tells the processor that all accesses to the value of that variable should come from or be made directly to memory, not the cache.

Not having a value cached by the processor so all other processors (and thus all other threads) can see what happens to the value of a variable seems like a really good thing though, doesn’t it?  In that respect, yes, it is a good thing.  But, let’s look at some of the drawbacks of volatile.

The syntax of volatile is that it just annotates a memory location.  In reality it really modifies the operations that occur on that memory location.  Code that operates on that memory location looks the same, regardless of whether it’s modified by volatile or not.  i.e. it’s unclear the code has different side-effects for multithreaded scenarios.  Another problem is that “volatile” is a very overridden word.  It’s used in C++ and Java; but it means something different (sometimes slightly) in each.  Moving back and forth from C++ to C# to Java could lead to using volatile incorrectly.  Volatile also assumes that all accesses to the field need to be protected from re-ordering.  Most of the time, there’s very specific times you want to force side-effects to a field to be made “visible” and, effectively, flush the processors cache to/from memory.  Flushing to the cache on every access is not very performant if you don’t need side-effects truly visible until specific times.  Plus, the volatile modifier means nothing if you pass the field by reference somewhere else. [4,5] There’s also limitations to what you can apply volatile to (e.g. you can’t have a volatile long).

What does volatile really mean from a code perspective?  Well, when you read from a volatile field effectively the following code is executed:

Thread.VolatileRead(ref field);

And when you write to a volatile field effectively the following code is executed:

Thread.VolatileWrite(ref field, newValue);

Both of these methods really just translate to a memory access and a call to Thread.MemoryBarrier (before the memory access for read, and after for write).  What MemoryBarrier does for is is ensure that all cached writes are flushed to memory at the call to MemoryBarrier (i.e after MemoryBarrier, any cached writes that occurred prior to MemoryBarrier are flushed and any cached reads are abandoned).  So, if you’ve noticed there’s nothing field-specific about Volatile[Read|Write], it makes everything flush, not just the field in question.  The performance aspect of the volatile modifier comes into play when you have multiple volatile fields within a class, it’s really only the last one that needs to use VolatileWrite when writing and the first one to use VolatileRead when reading. To steal some code from Jeffery Richter:

m_value = 5;
Thread.VolatileWrite(ref m_flag, 1);

is just as thread-safe as:

m_value = 5;
m_flag = 1;

if m_value and m_flag were volatile.

Same holds true for VolatileRead:

if(Thread.VolatileRead(ref m_flag) == 1)
	Display(m_value);

is just as thread-safe as the following with volatile fields:

if(m_flag == 1)
	Display(m_value);

I feel compelled to bring up invariants here.  When you’re dealing with modifications or accesses of multiple things that can’t operate atomically (i.e. the “validity” of m_value depends on the value of m_flag, making the two fields an “invariant” that can be accessed by a single instruction—i.e. not atomically).  While the above example ensures that changes to m_flag and m_value are made visible to other threads “at the same time” it doesn’t do anything to stop another thread from accessing the m_value before m_flag has been updated.  This may or many not be “correct”.  If it’s not correct, using lock or Monitor to model atomic access to such an “invariant” that isn’t natively atomic is a better choice.

On the topic of lock/Monitor.  It’s important to note that the end of a lock or the call to Monitor.Exit has release semantics and the start of the lock or Monitor.Enter (and variants) have acquire semantics.  So, if all access to a field is guarded with lock or Monitor, there’s no need for volatile or Thread.Volatile[Read|Write].

Using Monitor, lock, volatile, VolatileRead and VolatileWrite correctly shows that you understand what it means to be thread-safe and that you understand what it means to be externally visible and when in the context of your fields and invariants.

There have apparently been some discussions about the usefulness and applicability of volatile in C# as well as VB (and I assume C++11); but, I’m was not privy to those discussions and haven’t been able to find much in the way of reference to those discussions other than third-hand information…  Needless to say, VB doesn’t have anything similar to “volatile”, apparently for a reason.

I’m not saying there aren’t perfectly valid scenarios for volatile; but, look carefully at what you need; you probably could make better use of VolatileRead or VolatileWrite.  Just understand your needs and use what is correct—code on purpose.

[1] volatile (C# Reference)

[2] Acquire and Release Semantics

[3] CSC and JIT compilers.

[4] Sayonara volatile

[5] Aug 2011 Getting Started with Threading webcast with Jeffrey Richter (45MB)

[6] C# Language Specification (ECMA-334)

C#, Async, Limits, oh my!

One of the great sessions at Codemash was a dual-speaker session with Bill Wagner and Jon Skeet—Async from the Inside.

In that session Bill and Jon describe (in great detail) the state machine that the C# compiler generates when it compiles async code involving the await keyword.  When the Async CTP was released this state machine was one of the first things I noticed when I was reflecting through some generated code.  I too noticed the type of the state variable (int) and wondered, at the time, if that would be an issue.  All the information Bill, Jon and I portray is what we’ve learned from reflection, none of this is “insider information”.

Some background: a state machine is generated for each method that uses the await keyword.  It manages weaving back-together the asynchronous calls that you so lovingly linearly describe in an async method.  The state machine not only weaves together the various “callbacks” that it generates, but it also (now) manages the state changes of the data being modified between and by asynchronous calls.  Wonderfully complex stuff.

The state variable is basically an enum of the various states between async calls that the generated code can be in.  I won’t get into too much detail about that (you should have gone Codemash like all the other cool people 🙂 but needless to say having a “limit” on the number of states a method can be in sounds a bit scary.

Of course Jon, at one point, brought the same thing up in the session about the “int” state variable.  This reminded me that I wanted to look into it further—not because I wanted to break the compiler (directly) but to know what the limits are, if any.

A couple of days later while I had some time waiting around at airports on my way home.  I thought I’d test my theory.  If you follow me on Twitter you probably saw some of my discoveries in real time.  For those of you who didn’t, here’s an aggregation of what I found.

First of all, the state machine represents the async states a single method can be in.  This is represented by a signed int.  the only negative value that seems [1] to mean anything is –1, leaving 2,147,483,647 states (or, roughly, await invocations) that can occur in an async method.

First glance, this seems disconcerting.  I quickly wrote some code to generate a method with 2,147,483,648 await invocations in it (that’s Int32.MaxValue + 1 for those without OCD).  Needless to say, that took a few minutes to generate on my laptop, and I have an SSD (which I’ve clocked at increasing my IO by about 7 times on average).  That generated a 45 gigabyte file (one class, one method).

Problems with the type int for the state variable started to seem ridiculous.  But, I didn’t stop there.

Now I’m doing everything outside of Visual Studio (VS) from this point on.  I’m running the app to generate the code from within VS; but everything else is from the command-line.  So, I run CSC on the 45 gig file honestly expecting and internal compiler error.  But, what happens is that I get a CS1504 error ‘file.cs’ could not be opened (‘Arithmetic result exceeded 32-bits. ‘).  Now, the state variable is 32-bits so, it sounds like that could be the culprit.  But, if you look at the error message, it can’t even open the file.  I tried opening the file in VS and it told me the file couldn’t be found…

Okay, at this point it’s seeming even more unlikely that a type of int for the state variable is even remotely going to be an issue.  But, now I’m curious about what 32-bit value has been exceeded.  My theory is now the number of lines in the file…  The compiler has to keep track of each line in the file in case it has to dump an error about it, maybe that’s the 32-bit value? I modify my code generation to limit the the number of await invocations so the number of lines in the file is 2,147,483,647 (kind of a binary search, if this works then I know it still could be the number of lines in the file). Same error.

The error isn’t from the # of lines.  Now my theory is that the overflow is from trying to allocate enough memory load the file (keep in mind, I’ve got 8 GB of RAM and I’m trying to load a 45GB file; but, I have yet to get an out of memory error).  So, I modify my code to generate a file that is approaching 2,147,483,647 bytes in size.  Things are much faster now…  I try again. Now I get the same CS1504 error but the message is ‘file.cs’ could not be opened (‘Not enough storage available to complete this operation’) (I’ve got 100 GB free space on the drive…).  Interesting.  I’ve lessened the data requirement and only now effectively getting “out of memory” errors.

Now I’m just looking for a file that the compiler will load—I’ve given up on some super-large number of await statements…  Longer story, short, I kept halving the size of the file until I reached about 270 megabytes in size then the compiler finally succeeded (meaning ~540 Megabytes failed).

At this point, I’ve successfully proven that a type of int for the state variable is not an issue.  If the compiler could load the 540 megabyte file and I somehow could use 8 bytes per await invocation (“await x;” for example) then I could never reach more than about 70,778,880 await calls in a single method.  Of course, I’m way off here; that number is even much lower; but 70,778,880 is about 3% of 2,147,483,647.  Clearly int is the smallest type that could store anything close to 70,778,880 or less…

Of course, I’m completely ignoring the fact that a 540 MB cs file is completely unmanageable in VS or a project in general; but, why get caught up in silly realities like that.

This state machine is almost identical to those generated by enumerator methods (yield return).  If we assumed that the async state machine generation code is “inherited” (by pragmatic reuse) from the enumerator method state machine generator, we can assume it has very similar limits (but even smaller)—meaning you’d never get close to overflowing its int state variable.

HTMYL.

[1] Again, this all observed; I’m surmising the compiler never uses any other negative value.

(function() {
var po = document.createElement(‘script’); po.type = ‘text/javascript’; po.async = true;
po.src = ‘https://apis.google.com/js/plusone.js’;
var s = document.getElementsByTagName(‘script’)[0]; s.parentNode.insertBefore(po, s);
})();

The TeamCity Database Migration Documenation Could Use Some Work

I have a client who wants all the benefits of ALM but doesn’t really want to spend the money on them.  They’re a start-up; so, I can respect that. As a result, I’ve been getting them up and running with some of the freely-available and open-source tools.  They’re also cross-platform, so getting on some of the great Microsoft programs like BizSpark and WebsiteSpark wasn’t really in the cards.

Anywho…  One of those tools was TeamCity.  It’s a great too for Continuous Integration and Deployment.  I’ve used it with a couple of clients now and have nothing but good things to say about it.  When I installed it, I probably read some yellow box talking about HSQLDB and External Database and evaluating the system and production; but, honestly, just follow the steps. :).  I created several configurations and had artifacts and build logs, so upgrading needed to include backup and restore…  I had a couple spare cycles to look into migrating to an external database (as well as run through a backup/restore cycle) so I thought I might as well bite the bullet.

Well, the title of the post explains it.  I’m not going to go too much into the existing documentation and how it jumps from page to page before you actually get a migration done; I’m just going to say there wasn’t just a single list of steps (and, I think a couple things were missing).  There’s support for multiple platforms and multiple databases, so I get it’s hard.  It’s seemed a bit too hard; so, I’m going to detail the process of migrating a TeamCity database to a SQL Server Express database here in case anyone else might find it useful.

[NOTE: I did this with TeamCity 6.5.4 build 18046, YMMV]

First off, backup.  I did this from the web interface.  I clicked on the Administration link near the top right, clicked the Backup Data link on the right-hand list, selected “Custom” scope and checked everything.  This created a zip file in the backup directory in the TeamCity data directory. For me, this was C:\TeamCity\.BuildServer\backup\TeamCity_Backup_20111130_210020.zip—but, yours will be different.

After doing a TeamCity backup, stop the TeamCity services.  (Start menu, Control Panel, … you know the drill … or net stop TCBuildAgent and net stop TeamCity from a command prompt).

Next, I did a file system backup.  I just copied the TeamCity directory to another folder.  I had installed TeamCity to C:\TeamCity.  I think the default was my user folder; but, I’m only at my client for a limited time and when I’m gone that’s going to cause a bit of grief (yeah, it happened :), so that meant copying C:\TeamCity to somewhere else on C:\.  I’ve digressed from the the Jetbrains instructions already…

Next, I set up the external SQL Server Express database.  I already had SQL Server 2005 Express installed, so, I won’t go through that installation process.  In SQL Server Management Studio I created a “TeamCity” database, and added a “TeamCity” login—you’ll need those later.

Next, I installed some JDBC drivers for SQL Server.  I just grabbed the Microsoft ones—the “Native” driver—I figured they might know a thing or two about writing a SQL Server driver.  I downloaded from here. I unzipped (or ran the EXE) into a directory then copied the sqljdbc4.jar file into the lib\jdbc TeamCity data sub-directory (which was C:\TeamCity\.BuildServer\lib\JDBC for me).

Next, I configured TeamCity to use SQL Server.  I created a database.properties file in the config data folder (C:\TeamCity\.BuildServer\config, for me).  It’s later: the file was similar to:

connectionUrl=jdbc:sqlserver://localhost\\sqlexpress:1433;databaseName=TeamCity
connectionProperties.user=TeamCity
connectionProperties.password=P@nda$

…names and passwords might have been changed to protect the pandas.  Since I’m using SQL Express I had to throw the \\sqlexpress named-instance stuff in there.  If you’re using a different named instance, the”sqlexpress” part will be different.  If you’re using a “real” install of SQL Server, you don’t need the “\\sqlexpress” part at all, e.g. “//localhost:1433”.  I also had to throw the :1433 stuff in there (long story), you may not need that.

At this point, be sure not to be running the TeamCity services and don’t even think about logging into the TeamCity web interface (well, you can’t if the server isn’t running).  As far as TeamCity is concerned at this point, you’ve got a fresh install.  If you log in now it will ask you to create an Administrator login and create all the database tables.  The next step requires an empty database and will fail if you do create that login…

Now, restore.  Okay, I lied, this is one of those parts where the documentation has misled you.  The restore requires that you have an empty config directory—i.e. you can’t have database.properties in the config directory.  You also need an empty system directory (C\TeamCity\.BuildServer\system\ for me)—a fresh install might mean those directories are empty, I don’t know.  I simply renamed config to config-old and system to system-old and created new config and system directoriesThen, I had to add the Java bin directory to the PATH.  TeamCity has all the Java binaries included with it—it doesn’t actually install Java (oh, BTW, TeamCity uses Java).  For me, I just ran path=%path%;c:\TeamCity\jre\bin to let the command prompt know where java.exe is.  If you have Java installed, you might not need to do this.  Then, I performed a restore; which is just a matter of running maintainDB with the restore command from the TeamCity bin directory (C:\TeamCity\bin directory—note no “.BuildServer”—for me):

C:\TeamCity\bin>maintainDB.cmd restore -F c:\TeamCity\.BuildServer\backup\TeamCity_Backup_20111130_210020.zip -A c:\TeamCity\.BuildServer -T C:\TeamCity\.BuildServer\config-old\database.properties

…which tells TeamCity to restore from my backup (zip) file, where the data directory is, and what database configuration to use (note the config-old business).  Either TeamCity or Java doesn’t grok relative directories; so, note that I used “C:\TeamCity” instead of “..\”.

Once the restore is done (and it’s kind enough to remind you :), copy the database.properties file that you used to the config directory (for me, this was just copy ..\.BuildServer\config-old\database.properties ..\.BuildServer\config ).

Then, restart the TeamCity services:

C:\TeamCity\bin>net start TCBuildAgent
C:\TeamCity\bin>net start TeamCity

And you’re done.  If you can now log into the TeamCity web interface with all your old logins and all your old configurations are still there…  I did a bit of housework by deleting config-old, system-old, and went ahead and deleted the C:\TeamCity copy that I made. (I don’t need that, now that I’ve migrated to SQL Server and verified that everything works).

If you ran into a problem along the way, you could simply copy the backed-up TeamCity directory over-top the old one (making sure the services were stopped first) and you should be back to where you were before.  I didn’t have a problem, so I can’t confirm that actually works; but the docs detail that it does.

Since you have to have an empty database and empty config/system directories, I gather the actual restore process would be identical (minus dropping all the tables in the TeamCity SQL Server database).

(function() {
var po = document.createElement(‘script’); po.type = ‘text/javascript’; po.async = true;
po.src = ‘https://apis.google.com/js/plusone.js’;
var s = document.getElementsByTagName(‘script’)[0]; s.parentNode.insertBefore(po, s);
})();

If You’re Using “#if DEBUG”, You’re Doing it Wrong

I was going through some legacy code the other day, refactoring it all over the place and I ran into many blocks of code wrapped in “#if DEBUG”.  Of course, after a bit of refactoring in a RELEASE configuration these blocks of code were quickly out of date (and by out of date, I mean no longer compiling).  A huge PITA.

For example, take the following code:

	public class MyCommand
	{
		public DateTime DateAndTimeOfTransaction;
	}
 
	public class Test
	{
		//...
		public void ProcessCommand(MyCommand myCommand)
		{
#if DEBUG
			if (myCommand.DateAndTimeOfTransaction > DateTime.Now)
				throw new InvalidOperationException("DateTime expected to be in the past");
#endif
			// do more stuff with myCommand...
		}
		//...
	}

If, while my active configuration is RELEASE, I rename-refactor DateAndTimeOfTransaction to TransactionDateTime then the block of code in the #if DEBUG is now invalid—DateAndTimeOfTransaction will not be renamed within this block.  I will now get compile errors if I switch to DEBUG configuration (or worse still, if I check in an my continuous integration environment does a debug build).

I would have run into the same problem had I been in a DEBUG configuration with all the “#if RELEASE” blocks.  Yes, I could create a configuration and define DEBUG and RELEASE and do my work in there.  Yeah, no, not going down that twisted path… i.e. try doing it without manually adding a conditional compilation symbol.

I got to thinking about a better way.  It dawned on me that #if blocks are effectively comments in other configurations (assuming DEBUG or RELEASE; but true with any symbol) and that comments are apologies, and it quickly became clear to me how to fix this.

Enter Extract Method refactoring and Conditional Methods.  If you’ve been living under a rock or have simply forgotten, there exists a type in the BCL: System.Diagnostics.ConditionalAttribute.  You put this attribute on methods that[whose calls] may or may not be included in the resulting binary (IL).  But, they are still compiled (and thus syntax checked).  So, if I followed the basic tenet for code comment smells and performed an Extract Method refactoring and applied the ConditionalAttribute to the resulting method, I’d end up with something like this:

		//...
		public void ProcessCommand(MyCommand myCommand)
		{
			CheckMyCommandPreconditions(myCommand);
			// do more stuff with myCommand...
		}
 
		[Conditional("DEBUG")]
		private static void CheckMyCommandPreconditions(MyCommand myCommand)
		{
			if (myCommand.DateAndTimeOfTransaction > DateTime.Now)
				throw new InvalidOperationException("DateTime expected to be in the past");
		}

 

Now, if I perform a rename refactoring on MyCommand.DateAndTimeOfTransaction, all usages of DateAndTimeOfTransaction get renamed and I no longer introduce compile errors.

Code Contracts

If you look closely at the name I chose for the extracted method you’ll notice that what this DEBUG code is actually doing is asserting method preconditions.  This type of thing is actually supported in Code Contracts.  Code Contracts implement a concept called Design by Contract (DbC).  One benefit to DbC is that these checks are effectively proved at compile time so there’s no reason to perform this check at runtime or even write unit tests to check them because code to violate them becomes a “compile error” with Code Contracts.  But, for my example I didn’t use DbC despite potentially being a better way of implementing this particular code.

Anyone know what we call a refactoring that introduces unwanted side-effects like compile errors?

[UPDATE: small correction to the description of conditional methods and what does/doesn’t get generated to IL based on comment from Motti Shaked]

Working with Subversion Part 2 (or Subversion for the Less Subversive)

In my previous Subversion (SVN) post I detailed some basic commands for using SVN for source code control.  If you’re working alone you could get by with Part 1.  But, much of the time you’re working in a team of developers working on versioned software that will be deployed to multiple places.  This post will detail some more advanced commands for systematically using SVN to simultaneously work on multiple editions and versions of a software project. (hence the “less subversive”).

Practices

The basic concept of working on two editions or two versions of a software project at the same time is considered “branching”.  Simultaneous work is each done on it’s own “branch” (or “fork”) of the project.  Branch can be considered “copy”; but that’s generally only true at the very start of the branch.  With any project you can add and remove files from it; at which point a branch ceases to become a “copy”.  Semantics aside, SVN doesn’t really have a concept of a “branch”.  This is a practice used by SVN users.  SVN really only knows how to “copy” and “merge”.  The practice is the create a standard location to control branches, copy existing files and directories to a branch location, and merge changes from a branch to another location (like the trunk).  To support branching many repositories (repos) have a “trunk” and “branches” folder in the project root.  Trunk work is obviously done in the “trunk” folder and branches are recreated in the “branches” folder.

Branching Strategies

Before getting too much farther taking about things like “trunk”; it’s a good idea to briefly talk about branching strategies.  The overwhelming branching strategy for SVN users has been the “trunk” or “main line” branching strategy.  This strategy basically assumes there is really one “main” software product that evolves over time and that work may spawn off from this main line every so often to work independently and potentially merge back in later.  In this strategy the “current” project is in the “trunk”.

Another strategy is sometimes called the Branch per Release strategy which means there’s a branch that represents the current version of the project and when that version is “released” most work transitions to a different branch.  Work can always more from one branch to another; but there’s no consistent location that project files are located over time.  This a perfectly acceptable strategy and almost all Source Code Control (SCC) systems support it.  The lack of a consistent location of files makes discovery difficult and it really forces the concept of branching onto a team.  I’ve never really found this strategy to be very successful with the teams I’ve worked on, so I prefer the trunk strategy.

Branching

Branching is fairly easy in SVN.  The recommended practice is the perform a copy on the server then pull down a working copy of that branch to perform work on.  This can be done with the svn copy command, for example:

svn copy http://svnserver/repos/projectum/trunk http://svnserver/repos/projectum/branches/v1.0 -m "Created v1.0 branch"

…which makes a copy of the trunk into the branches/v1.0 folder.

Now, if you checkout the root of the project (e.g. svn checkout http://svnserver/repos/projectum) you’ll have all the files the trunk and all the branches.  You can now go edit a file in branches/v1.0 in your working copy and svn commit will commit that to the branch.  If you want to just work with a branch in your working copy you can checkout just the branch.  For example:

svn checkout http://svnserver/repos/projectum/branches/v1.0 .

..which makes a copy of all the files/directories in v.1.0 to the current local directory.  So, if you had …/branches/v1.0/readme.txt in the repo, you’d now have readme.txt in the current local directory.

Same holds true for the trunk, if you want to work on files in the trunk independently of any other branches checkout just the trunk, for example:

svn checkout http://svnserver/repos/projectum/trunk .

It’s useful to work just with the trunk or just with a branch because over time you may have many branches.  Pulling down the trunk and all branches will get time-consuming over time.

While SVN doesn’t really have the concept of a “branch”, it does know about copies of files and tracks changes to those copies.  So, if you shows a log of the changes to a file you’ll see the commit comments for all the branches too.  For example, edit readme.txt in the branch directory and commit the change (svn commit –m "changed readme.txt" for example) then go back to the trunk directory and show the log of readme.txt, for example:

svn log –v readme.txt

…you’ll see the commit comments for both trunk/readme.txt and branches/v1.0/readme.txt.  For example":

————————————————————————
r2 | PRitchie | 2011-11-17 10:23:36 -0500 (Thu, 17 Nov 2011) | 1 line
Changed paths:
   A /trunk/readme.txt
initial commit
————————————————————————

Merging

Okay, you’ve been working in v1.0 for a few days now, committing changes for v1.0.  One of those changes was a bug fix that someone reported in v1.0.  You know that bug is still in the trunk and it’s time to fix it in the trunk too.  Rather than go perform the same editing steps in the trunk, you can merge that change from the v1.0 branch into the trunk. For example, a typo fixed in the readme.txt needs to be merged into the trunk, from a clean trunk working copy (no local modifications):

svn merge http://svnserver/repos/projectum/branches/v1.0

This merges the changes from v1.0 into the working copy of trunk.  You can now review the merges to make sure they’re want you want, then commit them":

svn commit –m "merged changes from v1.0 into trunk"

Merge Conflicts

Of course, sometimes changes are made in the trunk and a branch that conflict with each other (same line was changed in both copies).  If that happens SVN will give you a message saying there’s a conflict:

Conflict discovered in ‘C:/dev/projectum/trunk/readme.txt’.
Select: (p) postpone, (df) diff-full, (e) edit,
        (mc) mine-conflict, (tc) theirs-conflict,
        (s) show all options:

You’re presented with several options.  One is to postpone, which means the conflict is recorded and you’ll resolve it later.  You can see the differences with diff-full.  Or you can accept the left or the right file as iss with mine-conflict or theirs-conflict—to accept the local file, use mine-conflict.  There’s also edit, which will allow you to edit the merged file showing the conflicts.  For example:

one
<<<<<<< .working
three=======
four >>>>>>> .merge-right.r9

This details that your working file has a changed second line “three” but the remote version has a changed second line “four “.  The lines wrapped  with <<<<<<< .working and ======= is the local change, which is followed by the remote change and >>>>>>> .merge-right.r9.  The “merge-right.r9” is diff syntax telling you which side the change came from and which revision (r9 in this case).  You can edit all that diff syntax out and get the file the way you want to merge it, then save it.  SVN will notice the change and present options again:

Select: (p) postpone, (df) diff-full, (e) edit, (r) resolved,
        (mc) mine-conflict, (tc) theirs-conflict,
        (s) show all options:

Notice you now have a resolved option.  If your edits fixed the conflict you can choose resolved to tell SVN the conflict is gone.  You can then commit the changes and the merges will be committed into the repo.

Merging Without Branches

Of course branches aren’t the only source for merges.  You might be working on a file in the trunk that a team member is also working on.  If you want to merge any changes they’ve committed into your working copy, you can use the update command.  For example:

svn update

This will merge any changes files with your local files.  Any conflicts will appear the same way they did with svn merge.

It’s important to note that with SVN you can’t commit changes if your working copy was out of date with the repo.  (e.g. someone committed a change after you performed checkout).  If this happens you’ll be presented with a message similar to:

Transmitting file data .svn: E155011: Commit failed (details follow):
svn: E155011: File ‘C:\dev\projectum\trunk\readme.txt’ is out of date
svn: E170004: Item ‘/trunk/readme.txt’ is out of date

This basically just means you need to run svn update before you commit to perform a merge and resolve any conflicts.

Mouse Junkies

If you don’t generally work with command-line applications and don’t care for the speed increase of not using the mouse, there’s some options you can use to work with SVN.

TortoiseSVN

TortoiseSVN is a Windows Explorer extension that shows directories/files controlled with SVN differently within Explorer.  It will show directories and files that have been modified or an untracked with different icons (well, icon overlays) within Explorer.  It will also let you perform almost all SVN commands from with explorer (via context menu).

VisualSVN

VisualSVN provides much of the same functionality as TortoiseSVN but does it within Visual Studio.  It’s not Source Code Control Visual Studio extension; which is interesting because you can use VisualSVN and another source code control extension (like TFS) at the same time.  VisualSVN requires TortoiseSVN to work correctly, so install that first.

Both TortoiseSVN and VisualSVN make dealing with merge conflicts easier.  I recommend using these for merging instead of the command-line.

Resources

http://svnbook.org/

“Explicit” Tests with Resharper

NUnit introduced a feature called Explicit Tests (a long time ago, I believe) that basically mean a test is considered tagged as Ignore unless the test name is explicitly given to the NUnit runner.

This is useful if you have tests that you don’t want run all the time.  Integration tests or tests highly coupled to infrastructure or circumstance come to mind…  But, it’s difficult to automate these types of tests because you always have to maintain a list of test names to give to the runner.

The ability of NUnit to run explicit tests aside, I don’t generally use the NUnit runner directly; I use other tools that run my tests.  I use tools like Resharper to run my tests within Visual Studio, ContinuousTests for continuous testing, and TeamCity to run my tests for continuous integration.

Continuous integration is one thing, I can configure that to run specific test assemblies and with specific configuration and get it to run whatever unit tests I need it to for whatever scenario.

Within Visual Studio is another story.  I sometimes want to run tests in a class or an assembly but not all the tests.  At the same time I want the ability of the runner to run tests it wouldn’t normally run without having to edit and re-compile code.

With Resharper there are several ways you can do this.  One way is to use the Ignore attribute on a test.  This is effectively the same as the NUnit Explicit attribute.  If I run the test specifically (like having the cursor within the test and pressing Ctrl+U+Ctrl+R) Resharper will still run the test.  If I run all tests in solution/project (Ctrl+U+Ctrl+L/right-click project, click Run Unit Tests) the test is ignored.  This is great; but now this test is ignored in all of my continuous integration environments.  Sad Panda.

If you’re using NUnit or MSTest (amongst others that I’m not as familiar with) as your testing framework you can tag tests with a category attribute (MS Test is “TestCategory”, NUnit is “Category”).  Once tests are categorized I can then go into Resharper and tell it what category of tests to “ignore” (Resharper/Options, under Tools, select Unit Testing and change the “Don’t run tests from categories…” section in Resharper 6.x).  Now, when I run tests, tests with that category are effectively ignored.  If I explicitly run a test (cursor somewhere in test and I press Ctrl+U+Ctrl+R) with an “ignored” category Resharper will still run it.  I now get the same ability as I did with the Ignore attribute but don’t impact my continuous integration environment.  I’ve effectively switched from an opt-in scenario to an opt-out scenario.

With the advent of ContinuousTests, you might be wondering why bother?  That’s a good question. With ContinousTests only the tests that are affected by the changes you’ve just saved—automatically, in the background.  In fact, having any of your tests run whenever you make a change that affects the test is one reason why I make some tests “explicit”.  I tend to use test runners as hosts to run experimental code, code that often will become unit tests.  But, while I’m fiddling with the code I need to make sure it’s only run when I explicitly run it—having it run in the background because something that affects the test isn’t always what I want to do.  So, I do the same thing with ContinousTests: have it ignore certain test categories (ContinuousTess/Configuration (Solution), Various tab, Tests categories to ignore).

Test Categorization Recommended Practices

Of course, there’s nothing out there that really conveys any recommendations about test categorization.  It’s more or less “here’s a gun, don’t shoot yourself”…  And for the most part, that’s fine.  But, here’s how I like to approach categorizing tests:

First principle: don’t go overboard.

Testing frameworks are typically about unit testing—that’s what people think of first with automated testing.  So,  I don’t categorize unit tests.  These are highly decoupled tests that are quick to run and I almost always want to run these tests.  If the tests can’t always run or I don’t want them run at any point in time, they’re probably not unit tests.

Next, I categorize non-unit tests by type.  There’s several other types of tests like Integration, Performance, System, UI, Infrastructure, etc.  Not all projects need all these types of tests; but these other tests have specific scenarios where you may or may not want them run.  The most common, that I’ve noticed, is Integration.  If the test has a large amount of setup, requires lots of mocks, is coupled to more than couple modules, or takes a long time to run, it’s likely not a unit test.

Do you categorize your tests?  If so, what’s your pattern?

Working with Subversion, Part 1

Working with multiple client projects and keeping abreast of the industry through browsing and committing to open source and other people’s libraries means working with multiple source code control (SCC) systems.  One of the systems I use is Subversion (SVN).  It’s no longer one of the SCCs I use most often so I tend to come back to it after long pauses and my SVN fu is no longer what it used to be.  I’m sure my brain is damaged from this form of "task switching", not to mention the time I spend trying to figure out the less common actions I need to perform on a repository (repo).  I usually spend more than few minutes digging up the commands I need for the once-in-a-decade actions I need to perform. 

I don’t foresee getting away from SVN in the near future; so, I’d thought I’d aggregate some of these commands into one place.  My blog is the perfect place to do that (because it’s just for me, right? 🙂

Backing Up

Outside of adding/committing, the most common action to be performed is backing up the repository.  Unfortunately for my brain this is automated and I don’t see it for months at a time. To back up an SVN repository that you’re not hosting or being hosted by third-party software (like VisualSVN Server) then I like dump/load:

svnadmin dump repo-local-path > repo-bkp-path

This let’s you restore to the host that contains all the configuration data like permissions, users, and hooks.

If the repository is completely autonomous (i.e, just a directory on your hard drive and maybe an SVN daemon) then hotcopy is better:

svnadmin hotcopy local-path destination-path

Restoring

If you used the dump method of backing up, you need to use the load command to put the backup into an existing repository.  If you’re not using a hosted repository, you’ll first need to create the repository (svn create repo-local-path) to run load (in which case I’d recommend using hotcopy instead).  To load the dump into the existing repository:

svnadmin load repo-local-path < repo-bkp-path

If you’ve used hotcopy then the backup is the fully functional repository; just make it available to users (i.e. put it where you want it :).

Migrating

Migrating is basically just a backup and restore.  If you’re backing up one repository and putting into an existing repository, use dump/load.  On System A

svnadmin dump repo-local-path > repo-bkp-path

On System B after copying repo-bkp-path from System A

svnadmin load repo-local-path < repo-bkp-path

Even if you weren’t migrating to an existing repo, you could use this method; just add svnadmin create repo-local-path before svnadmin load.  The dump/load method has the added benefit of upgrading the data from one format to another if both systems don’t have the same version of SVN running.  The drawback of migrating with dump/load is that you’ll have to configure manually (or manually copy from the old repo) to get permissions, hooks, etc…

Now you’ve migrated your repo to another computer, existing working copies will be referencing the old URL.  To switch them to the new URL perform the following:

svn switch –relocate repo-remote-URL

Creating Working Copy

If you don’t already have a local copy of the repo to work with, the following command

svn checkout repo-remote-URL working-copy-path

Committing Changes

I added this section because I’ve become used to GIT.  GIT has a working directory and staging area model; so you tag files in the working directory for staging before committing.  This allows you to selectively commit modifications/additions.  SVN is different in that the working directory is the staging area so you effectively have to commit all modifications at once.  (you can stage adds because you manually tell SVN which files to start controlling).

svn status will tell you what’s modified (M) and what’s untracked (?).  To commit all modified

svn commit –m "description of changings included in the commit"

Undoing Add

Sometimes you a schedule something to add on commit by mistake or it’s easier to add by wildcard and remove the files that you don’t want to commit on your next commit.  To remove them from the next commit:

svn revert file-path

Be careful with revert because if the file is already controlled this will revert your local modifications.

Undoing Modifications

To revert the modifications you’ve made locally and restore a file to current revision in the repo:

svn revert file-path

Resources

    http://svnbook.org/

 

What’s your favourite SVN workflow?

Getting a Quick Feel for a New Software Team

I deal with many different teams and many different companies.  I see a lot of teams not get the benefit of known techniques and fail to evaluate and improve.  There’s some check-lists out there that are used to evaluate a company/team before joining it; but I find the lists to be deeply rooted in the past.  They detail such fundamental things that knowing the criteria on the list really doesn’t tell you much about how your time with this team will be or your role within it.

There’s so many low-level features within software development team to aid in successfully delivering software.  But these low-level features are part of higher-level initiatives that if absent make the lower-level features almost moot.  "Do you use source code control?" for example.  Sounds like this is a fundamental part of being able to delivery quality software; and it is.  But, on its own it doesn’t really do a whole lot to help the team deliver software.  Is the code put into source code control using accepted design practices?  Is the code put into source code control in a timely fashion.  Is the code put into source code control not impeding other people?  Etc. etc. etc.  "Yes" to "do you use source code control" without any of other initiatives that follow-on doesn’t give me a warm and fuzzy feeling about getting on the team and focusing on software value and not spending a lot of time on, or stuck in, process.

Over the years I’ve been on many teams hired for a specific role that changed drastically almost as I began working with the team.  I’ve observed many things about many teams and have come up with some things I like to find out about a team before I start so I can better gauge the type of work that I’ll be doing and how successful the team will be fulfilling their goals.

Does the team:

  • have a known product owner/sponsor,
  • have a cross-functional team 6-9 people in size,
  • use appropriate tools,
  • foster SOLID design principles,
  • use continuous integration,
  • use continuous deployment,
  • foster communications with team members and stakeholders,
  • have a known and managed process and visualize workflow,
  • evaluate and improve process as a team,
  • have new candidates write code in their interviews,
  • have a plan that limits work in progress,
  • have a plan that orders or prioritized tasks;

and, how easy is it to change to start doing any of the above items?

“Do you use source code control” is covered by “Does the team use continuous integration” as it’s not just about using source code control, it’s about a process that can’t function properly without source code control.

And, for what it’s worth, this list doesn’t tell you whether you should work on a team or not; it just tells you the type of work you will be doing.  It’s up to you to dig deeper and decide whether you unrealistically want to limit your work to a specific or a small set of roles.  I would only use the last question as a acid test of whether I would join the team or not.  If they are not willing to improve, there’s not much I’m going to be able to do for them.

What do you look for in a team/project?

DevTeach Comes to Ottawa!

DevTeach is coming to Ottawa November 2-4 2011.

 

DevTeach has had some of the best speakers in the industry and some of the best sessions/tracks that I’ve ever attended.

 

This year is no exception, the speaker line-up is full of experts and MVPs like David Starr, Mitch Garvis, James Kovacs, Laurent Duveau, Mario Cardinal, Guy Barrette, Paul Neilson, Rob Windsor, Benjamin Day, Erik Renaud, Joel Semeniuk, etc. and locals like Garth Jones, Wes MacDonald, Joel Hebert, Colin Smith, Colin Melia, Brad Bird, Rick Claus, etc.

 

Tracks include Agile, Web Dev, Rich Client, SQL Server Dev, SQL Server BI, and for the first time at DevTeach, an IT Pro track.

 

If you think you might have trouble convincing your boss you need to go, have a look at http://www.devteach.com/WhoShouldAttend.aspx.

 

To register: http://www.devteach.com/Register.aspx