Security Awareness

Thoughts on a New Year

It’s about this time of year that I think…

  • Why do reporters talk so much about NSA spying and Advanced Persistent Threats, when half the websites in existence will cough up cookies if you search for "-alert(document.cookie)-" ?
  • How can we expect people to write secure code when:
    • they don’t know what it is?
    • they can’t recognise insecure code?
    • it’s easier (more clicks, more thinks, etc) to write insecure code?
  • What does it take for a developer to get:
    • fired?
    • a bad performance review?
    • just mildly discomforted?
  • What is it about developers that makes us all believe that nobody else has written this piece of code before? (or that we can write it better)
  • Every time a new fad comes along, whether it’s XML, PHP, Ruby, etc, why do we spend so much time recognising that it has the same issues as the old ones? But without fixes.
  • Can we have an article on “the death of passwords” which will explain what the replacement is – and without that replacement turning out to be “a layer in front of a big password”?
  • Should you let your application out (publish it, make it available on the Internet, etc) if it is so fragile that:
    • you can’t patch it?
    • you can’t update the framework or libraries on which it depends (aka patch them)?
    • you don’t want a security penetration test to be performed on it?
  • Is it right to hire developers on the basis that they can:
    • steer a whiteboard to a small function which looks like it might work?
    • understand an obfuscated sample that demonstrates an obscure feature of your favourite framework?
    • tell you how to weigh twelve coins, one of which might be a fake?
    • bamboozle the interviewer with tales of technological wonders the likes of which he/she cannot fathom?
    • sing the old school song?

Ah, who am I kidding, I think those kinds of things all the time.

Why don’t we do that?

Reading a story on the consequences of the theft of Adobe’s source code by hackers, I come across this startling phrase:

The hackers seem to be targeting vulnerabilities they find within the stolen code. The prediction is that they’re sifting through the code, attempting to find widespread weaknesses, intending to exploit them with maximum effect by using zero-day attacks.

What I’d love to know is why we aren’t seeing a flood of developers crying out to be educated in how they, too, can learn to sift through their own code, attempt to find widespread weaknesses, so they can shore them up and prevent their code from being exploited.

An example of the sort of comments we are seeing can be found here, and they are fairly predictable – “does this mean Open Source is flawed, if having access to the source code is a security risk”, schadenfreude at Adobe’s misfortune, all manner of assertions that Adobe weren’t a very secure company anyway, etc.

Something that’s missing is an acknowledgement that we are all subject to the same pool of developers.

And attackers.

So, if you’re in the business of developing software – whether to sell, licence, give away, or simply to use in your own endeavours, you’re essentially in the same boat as Adobe prior to the hackers breaching their defences. Possibly the same boat as Adobe after the breach, but prior to the discovery.

Unless you are doing something different to what Adobe did, you are setting yourself up to be the next Adobe.

Obviously, Adobe isn’t giving us entire details of their own security program, and what’s gone right or wrong with it, but previous stories (as early as mid-2009) indicated that they were working closely with Microsoft to create an SDL (Security Development Lifecycle) for Adobe’s development.

So, instead of being all kinds of smug that Adobe got hacked, and you didn’t, maybe you should spend your time wondering if you can improve your processes to even reach the level Adobe was at when they got hacked.

And, to bring the topic back to what started the discussion – are you even doing to your software what these unidentified attackers are doing to Adobe’s code?

Are you poring over your own source code to find flaws?

How long are you spending to do that, and what tools are you using to do so?

Government Shuts Down for Cyber Security

In a classic move, clearly designed to introduce National Cyber Security Awareness Month with quite a bang, the US Government has shut down, making it questionable as to whether National Cyber Security Awareness Month will actually happen.

In case the DHS isn’t able to make things happen without funding, here’s what they originally had planned:

image

I’m sure you’ll find myself and a few others keen to engage you on Information Security this month in the absence of any functioning legislators.

Maybe without the government in charge, we can stop using the “C” word to describe it.

UPDATE 1

The “C” word I’m referring to is, of course, “Cyber”. Bad word. Doesn’t mean anything remotely like what people using it think it means.

UPDATE 2

The main page of the DHS.GOV web site actually does carry a small banner indicating that there’s no activity happening at the web site today.

image

So, there may be many NCSAM events, but DHS will not be a part of them.

Training developers to write secure code

I’ve done an amount of training developers recently, and it seems like there are a number of different kinds of responses to my security message.

[You can safely assume that there’s also something that’s wrong with the message and the messenger, but I want to learn about the thing I likely can’t control or change – the supply of developers]

Here are some unfairly broad descriptions of stereotypes I’ve encountered along the way. The truth, as ever, is more nuanced, but I think if I can reach each of these target personas, I should have just about everyone covered.

Is there anyone I’ve missed?

The previous victim

I’m always happy to have one or more of these people in the room – the sort of developer who has some experience, and has been on a project that was attacked successfully at some point or another.

This kind of developer has likely quickly learned the lesson that even his own code is subject to attack, vulnerable and weak to the persistent probes of attackers. Perhaps his experience has also included examples of his own failures in more ordinary ways – mere bugs, with no particular security implications.

Usually, this will be an older developer, because experience is required – and his tales of terror, unrehearsed and true, can sometimes provide the “scared straight” lesson I try to deliver to my students.

The previous attacker

This guy is usually a smart, younger individual. He may have had some previous nefarious activity, or simply researched security issues by attacking systems he owns.

But for my purposes, this guy can be too clever, because he distracts from my talk of ‘least privilege’ and ‘defence in depth’ with questions about race conditions, side-channel attacks, sub-millisecond time deltas across multi-second latency routes, and the like. IF those were the worst problems we see in this industry, I’d focus on them – but sadly, sites are still vulnerable to simple attacks, like my favourite – Reflected XSS in the Search field. [Simple exercise – watch a commercial break, and see how many of the sites advertised there have this vulnerability in them.]

But I like this guy for other reasons – he’s a possible future hire for my team, and a probable future assistant in finding, reporting and addressing vulnerabilities. Keeping this guy interested and engaged is key to making sure that he tells me about his findings, rather than sharing them with friends on the outside, or exploiting them himself.

“I did a security class at college”

Unbelievably to me, there are people who “done a project on it”, and therefore know all they want to about security. If what I was about to tell them was important, they’d have been told it by their professor at college, because their professor knew everything of any importance.

I personally wonder if this is going to be the kind of SDE who will join us for a short while, and not progress – because the impression they give to me is that they’ve finished learning right before their last final exam.

Salaryman

Related to the previous category is the developer who only does what it takes to get paid and to receive a good performance review.

I think this is the developer I should work the hardest to try and reach, because this attitude lies at the heart of every developer on their worst days at their desk. When the passion wanes, or the task is uninteresting, the desire to keep your job, continue to get paid, and progress through your career while satisfying your boss is the grinding cog that keeps you moving forward like a wind-up toy.

This is why it is important to keep searching to find ways of measuring code quality, and rewarding people who exhibit it – larger rewards for consistent prolonged improvement, smaller but more frequent rewards to keep the attention of the developer who makes a quick improvement to even a small piece of code.

Sadly, this guy is in my class because his boss told him he ought to attend. So I tell him at the end of my class that he needs to report back to his boss the security lesson that he learned – that all of his development-related goals should have the adverb “securely” appended to them. So “develop feature X” becomes “develop feature X securely”. If that is the one change I can make to this developer’s goals, I believe it will make a difference.

Fanboy

I’ve been doing this for long enough that I see the same faces in the crowd over and over again. I know I used to be a fanboy myself, and so I’m aware that sometimes this is because these folks learn something new each time. That’s why I like to deliver a different talk each time, even if it’s on the same subject as a previous lesson.

Or maybe they just didn’t get it all last time, and need to hear it again to get a deeper understanding. Either way, repeat visitors are definitely welcome – but I won’t get anywhere if that’s all I get in my audience.

Vocational

Some developers do the development thing because they can’t NOT write code. If they were independently wealthy and could do whatever they want, they’d be behind a screen coding up some fun little app.

I like the ones with a calling to this job, because I believe I can give them enough passion in security to make it a part of their calling as well. [Yes, I feel I have a calling to do security – I want to save the world from bad code, and would do it if I was independently wealthy.]

Stereotypical / The Surgeon

Sadly, the hardest person to reach – harder even than the Salaryman – is the developer who matches the stereotypical perception of the developer mindset.

Convinced of his own superiority and cleverness, even if he doesn’t express it directly in such conceited terms, this person will see every suggested approach as beneath him, and every example of poor code as yet more proof of his own superiority.

“Sure, you’ve had problems with other developers making stupid security mistakes,” he’ll think to himself, “But I’m not that dumb. I’ve never written code that bad.”

I certainly hope you won’t ever write code as bad as the examples I give in my classes – those are errant samples of code written in haste, and which I wouldn’t include in my class if they didn’t clearly illustrate my point. But my point is that your colleagues – everyone around you – are going to write this bad a piece of code one day, and it is your job to find it. It is also their job to find it in the code you write, so either you had better be truly as good as you think you are, or you had better apply good security practices so they don’t find you at your worst coding moment.

Playing with security blogs

I’ve found a new weekend hobby – it takes only a few minutes, is easily interruptible, and reminds me that the state of web security is such that I will never be out of a job.

I open my favourite search engine (I’m partial to Bing, partly because I get points, but mostly because I’ve met the guys who built it), search for “security blog”, and then pick one at random.

Once I’m at the security blog site – often one I’ve never heard of, despite it being high up in the search results – I find the search box and throw a simple reflected XSS attack at it.

If that doesn’t work, I view the source code for the results page I got back, and use the information I see there to figure out what reflected XSS attack will work. Then I try that.

[Note: I use reflected XSS, because I know I can only hurt myself. I don’t play stored XSS or SQL injection games, which can easily cause actual damage at the server end, unless I have permission and I’m being paid.]

Finally, I try to find who I should contact about the exploitability of the site.

It’s interesting just how many of these sites are exploitable – some of them falling to the simplest of XSS attacks – and even more interesting to see how many sites don’t have a good, responsive contact address (or prefer simply not to engage with vuln discoverers).

So, what do you find?

I clearly wouldn’t dream of disclosing any of the vulnerabilities I’ve found until well after they’re fixed. Of course, after they’re fixed, I’m happy to see a mention that I’ve helped move the world forward a notch on some security scale. [Not sure why I’m not called out on the other version of that changelog.] I might allude to them on my twitter account, but not in any great detail.

From clicking the link to exploit is either under ten minutes or not at all – and reporting generally takes another ten minutes or so, most of which is hunting for the right address. The longer portion of the game is helping some of these guys figure out what action needs to be taken to fix things.

Try using a WAF – NOT!

You can try using a WAF to solve your XSS problem, but then you’ve got two problems – a vulnerable web site, and that you have to manage your WAF settings. If you have a lot of spare time, you can use a WAF to shore up known-vulnerable fields and trap known attack strings. But it really doesn’t ever fix the problem.

Don’t echo my search query

If you can, don’t echo back to me what I sent you, because that’s how these attacks usually start. Don’t even include it in comments, because a good attack will just terminate the comment and start injecting HTML or script.

Remove my strange characters

Unless you’re running a source code site, you probably don’t need me to search for angle brackets, or a number of other characters. So take them out of my search – or plain reject my search if I include them in my search.

Encode everything

OK, so you don’t have to encode the basics – what are the basics? I tend to start with alphabetic and numeric characters, maybe also a space. Encode everything else.

Which encoding?

Yeah, that’s always the hard part. Encode it using the right encoding. That’s the short version. The long version is that you figure out what’s going to decode it, and make sure you encode for every layer that will decode. If you’re putting my text into a web page as a part of the page’s content, HTML encode it. If it’s in an attribute string, quote the characters using HTML attribute encoding – and make sure you quote the entire attribute value! If it’s an attribute string that will be used as a URL, you should URL encode it. Then you can HTML encode it, just to be sure.

[Then, of course, check that your encoding hasn’t killed the basic function of the search box!]

Respond to security reports

You should definitely respond to security reports – I understand that not everyone can have a 24/7 response team watching their blog (I certainly don’t) – you should try to respond within a couple of days, and anything under a week is probably going to be alright. Some vuln discoverers are upset if they don’t get a response much sooner, and see that as cause to publish their findings.

Me, I send a message first to ask if I’ve found the right place to send a security vulnerability report to, and only when I receive a positive acknowledgement do I send on the actual details of the exploit.

Be like Billy – Mind your XSS Manners!

I’ve said before that I wish programmers would respond to reports of XSS as if I’d told them I caught them writing a bubble sort implementation in Cobol. Full of embarrassment at being such a beginner.

That old “cyber offence” canard again

I’m putting this post in the “Programmer Hubris” section, but it’s really not the programmers this time, it’s the managers. And the lawyers, apparently.

Something set me off again

Well, yeah, it always does, and this time what set me off is an NPR article by Tom Gjelten in a series they’re currently doing on “cybersecurity”.

This article probably had a bunch of men talking to NPR with expressions such as “hell, yeah!” and “it’s about time!”, or even the more balanced “well, the best defence is a good offence”.

Absolute rubbish. Pure codswallop.

But aren’t we being attacked? Shouldn’t we attack back?

Kind of, and no.

We’re certainly not being “attacked” in the means being described by analogy in the article.

"If you’re just standing up taking blows, the adversary will ultimately hit you hard enough that you fall to the ground and lose the match. You need to hit back." [says Dmitri Alperovitch, CrowdStrike’s co-founder.]

Yeah, except we’re not taking blows, and this isn’t boxing, and they’re not hitting us hard.

"What we need to do is get rid of the attackers and take away their tools and learn where their hideouts are and flush them out," [says Greg Hoglund, co-founder of HBGary, another firm known for being hacked by a bunch of anonymous nerds that he bragged about being all over]

That’s far closer to reality, but the people whose job it is to do that is the duly appointed law enforcement operatives who are able to enforce law.

"It’s [like] the government sees a missile heading for your company’s headquarters, and the government just yells, ‘Incoming!’ " Alperovitch says. "It’s doing nothing to prevent it, nothing to stop it [and] nothing to retaliate against the adversary." [says Alperovitch again]

No, it’s not really like that at all.

There is no missile. There is no boxer. There’s a guy sending you postcards.

What? Excuse me? Postcards?

Yep, pretty much exactly that.

Every packet that comes at you from the Internet is much like a postcard. It’s got a from address (of sorts) and a to address, and all the information inside the packet is readable. [That’s why encryption is applied to all your important transactions]

So how am I under attack?

There’s a number of ways. You might be receiving far more postcards than you can legitimately handle, making it really difficult to assess which are the good postcards, and which are the bad ones. So, you contact the postman, and let him know this, and he tracks down (with the aid of the postal inspectors) who’s sending them, and stops carrying those postcards to you. In the meantime, you learn how to spot the obvious crappy postcards and throw them away – and when you use a machine to do this, it’s a lot less of a problem. That’s a denial of service attack.

Then there’s an attack against your web site. Pretty much, that equates to the postcard sender learning that there’s someone reading the postcards, whose job it is to do pretty much what the postcards tell them to do. So he sends postcards that say “punch the nearest person to you really hard in the face”. Obviously a few successes of this sort lead you to firing the idiot who’s punching his co-workers, and instead training the next guy as to what jobs he’s supposed to do on behalf of the postcard senders.

I’m sure that my smart readers can think up their own postcard-based analogies of other attacks that go on, now that you’ve seen these two examples.

Well if it’s just postcards, why don’t I send some of my own?

Sure, send postcards, but unless you want the postman to be discarding all your outgoing mail, or the law enforcement types to turn up at your doorstep, those postcards had better not be harassing or inappropriate.

Even if you think you’re limiting your behaviour to that which the postman won’t notice as abusive, there’s the other issue with postcards. There’s no guarantee that they were sent from the address stated, and even if they were sent from there, there is no reason to believe that they were official communications.

All it takes is for some hacker to launch an attack from a hospital’s network space, and you’re now responsible for attacking an innocent target where lives could actually be at risk. [Sure, if that were the case, the hospital has shocking security issues of its own, but can you live with that rationalisation if your response to someone attacking your site winds up killing someone?]

I don’t think that counterattack on the Internet is ethical or appropriate.

UDP and DTLS not a performance improvement.

Saw this update in my Windows Update list recently:

http://support.microsoft.com/kb/2574819

As it stands right now, this is what it says (in part):

image

OK, so I started off feeling good about this – what’s not to like about the idea that DTLS, a security layer for UDP that works roughly akin to TLS / SSL for TCP, now can be made a part of Windows?

Sure, you could say “what about downstream versions”, but then again, there’s a point where a developer should say “upgrading has its privileges”. I don’t support Windows 3.1 any more, and I don’t feel bad about that.

No, the part I dislike is this one:

Note DTLS provides TLS functionalities that are based on the User Datagram Protocol (UDP) protocol. Because TLS is based on the Transmission Control Protocol (TCP) protocol, DTLS performs better than TLS.

Wow.

That’s just plain wrong. Actually, I’m not even sure it qualifies as wrong, and it’s quite frankly the sort of mis-statement and outright guff that made me start responding to networking posts in the first place, and which propelled me in the direction of eventually becoming an MVP.

Nerd

Yes, I was the nerdy guy complaining that there were already too many awful networking applications, and that promulgating stupid myths like “UDP performs better than TCP” or “the Nagle algorithm is slowing your app down, just disable it” causes there to be more of the same.

But I think that’s really the point – you actually do want nerds of that calibre writing your network applications, because network programming is not easy – it’s actually hard. As I have put it on a number of occasions, when you’re writing a program that works over a network, you’re only writing one half of the application (if that). The other half is written by someone else – and that person may have read a different RFC (or a different version of the protocol design), may have had a different interpretation of ambiguous (or even completely clear) sections, or could even be out to destroy your program, your data, your company, and anyone who ever trusted your application.

Surviving in those circumstances requires an understanding of the purity of good network code.

But surely UDP is faster?

Bicycle messengers are faster than the postal service, too. Fast isn’t always what you’re looking for. In the case comparing UDP and TCP, if it was just a matter of “UDP is faster than TCP”, all the world’s web sites would be running on some protocol other than HTTP, because HTTP is rooted in TCP. Why don’t they?

Because UDP repeats packets, loses packets, repeats packets, and first of all, re-orders packets. And when your web delivery over UDP protocol retransmits those lost packets, correctly orders packets, drops repeated packets, and thereby gives you the full web experience without glitches, it’s re-written large chunks of the TCP stack over UDP – and done so with worse performance.

Don’t get me wrong – UDP is useful in and of itself, just not for the same tasks TCP is useful for. UDP is great for streaming audio and video, because you’d rather drop frames or snippets of sound than wait for them to arrive later (as they would do with TCP requesting retransmission, say). If you can afford to lose a few packets here and there in the interest of timely delivery of those packets that do get through, your application protocol is ideally suited to UDP. If it’s more important to occasionally wait a little in order to get the whole stream, TCP will outperform UDP every time.

In summary…

Never choose UDP over TCP because you heard it goes faster.

Choose UDP over TCP because you’d rather have packets dropped at random by the network layer than have them arrive any later than the absolute fastest they can get there.

Choose TCP over UDP because you’d rather have all the packets that were sent, in the order that they were sent, than get most / many / some of them earlier.

And whether you use TCP or UDP, you can now add TLS-style security protection.

I await the arrival of encrypted UDP traffic with some interest.

But officer, everyone else is speeding…

I’ve been consistently amazed by human behaviours for many years, and through many employers.

One of the behaviours that always astonishes me is when I let someone know that they’re violating security policy, or simply behaving in an insecure manner, and rather than changing their behaviour or defending their own actions per se, they respond with some variation of “sure, but such-and-such team/person is already doing that and far worse”.

Maybe it’s my grammar school upbringing, in which it was clear that the response of “but Sir, Jenkins minor was also chewing gum” was not only going to get Jenkins into trouble, but also get me into more trouble (if only when Jenkins found out who snitched) – I really can’t see that there’s any appropriate response to such statements other than to say “well, thank you for drawing my attention to that other infraction, which I will decide to address at my convenience – now, back to your case…” – or perhaps less usefully, “that may be, but it’s you that I caught.”

I readily acknowledge that institutional behaviours are as much learned from the actions of one’s peers, so that it is important to curb widespread culturally-ingrained wrongness.

But I don’t see what people who use this argument expect will happen – is there really a circumstance in their past in which someone said “really? Oh, well then, that’s alright. Carry on.”

MVP news

My MVP award expires on March 31

So, I’ve submitted my information for re-awarding as an MVP – we’ll see whether I’ve done enough this year to warrant being admitted again into the MVP ranks.

MVP Summit

Next week is the MVP Summit, where I visit Microsoft in Bellevue and Redmond for a week of brainwashing and meet-n-greet. I joke about this being a bit of a junket, but in reality, I get more information out of this than from most of the other conferences I’ve attended – perhaps mostly because the content is so tightly targeted.

That’s not always the case, of course – sometimes you’re scheduled to hear a talk that you’ve already heard three different times this year, but for those occasions, my advice would be to find another one that’s going on at the same time that you do want to hear. Talk to other MVPs not in your speciality, and find out what they’re attending. If you feel like you really want to get approval, ask your MVP lead if it’s OK to switch to the other session.

Very rarely a talk will be so strictly NDA-related that you will be blocked from entering, but not often.

Oh, and trade swag with other MVPs. Very frequently your fellow MVPs will be willing to trade swag that they got for their speciality for yours – or across regions. Make friends and talk to people – and don’t assume that the ‘industry luminaries’ aren’t willing to talk to you.

Featured TechNet Wiki article

Also this week, comes news that I’ve been recognised for authoring the TechNet Wiki article of the Week, for my post on Microsoft’s excellent Elevation of Privilege Threat Modeling card game. Since that post was made two years ago, I’ve used the deck in a number of environments and with a few different game styles, but the goal each time has remained the same, and been successfully met – to make developers think about the threats that their application designs are subject to, without having to have those developers be security experts or have any significant experience of security issues.

Changing passwords on a service, part 3

It’s been quite some time since I wrote about changing passwords on a Windows service, and then provided a simple tool written in Visual Basic to propagate a password among several systems sharing the same account.

I hinted at the time that this was a relatively naïve approach, and that the requirement to bring all the services down at the same time is perhaps not what you want to do.

So now it’s finally time for me to provide a couple of notes about how this operation could be done better.

1. If you can’t afford an outage, don’t have a single point of failure

One complaint I have heard at numerous organisations is this one, or words to this effect:

“We can’t afford to cycle the service on a password rotation once every quarter, because the service has to be up twenty-four hours a day, every day.”

That’s the sort of thing that makes novice service owners feel really important, because their service is, of course, the most valuable thing in their world, and sure enough, they may lose a little in the way of business while the service is down.

So how do you update the service when the software or OS needs patching? How do you fix bugs in your service? What happens when you have to take it down because the password has escaped your grasp? [See my previous post on rotating passwords as a kind of “Business Continuity Drill”, so that you know you can rotate the password in an emergency]

All of these activities require stopping and cycling the service.

Modern computer engineering practices have taken this into consideration, and the simplest solution is to have a ‘failover’ service – when the primary instance of the service is taken offline, the secondary instance starts up and takes over providing service. Then when the primary comes back online, the secondary can go back into slumber.

This is often extended to the idea of having a ‘pool’ of services, all running all the time, and only taking out one instance at a time as you need to make changes, bringing the instance back into operation when the change is complete.

Woah – heady stuff, Mr Jones!

Sure, but in the world of enterprise computing, this is basic scaling, and if your systems of applications can’t be managed this way, you will have problems as you reach large scale.

So, a single instance of a service that you can’t afford to go offline – is a failure from the start, and an indication that you didn’t think the design through.

2. Old passwords and new passwords are both valid – for a while

OK, so that sounds like heresy – if you’ve changed the password on an account, it shouldn’t be possible for the old password to work any more, should it?

Well, yes and no.

Again, in an enterprise world, you have to consider scale.

Changing the password on an account isn’t an instantaneous operation. That password change has to be distributed among the authentication servers you use (in the Windows world, this means domain controllers replicating new password information).

To account for this, and the prospect that you may have a process running that didn’t yet have a chance to pick up the new password, most authentication schemes allow tokens and/or passwords to be valid for some period after a password change.

By default, NTLM tokens are valid for an hour, and Kerberos tickets are valid for ten hours.

This means that if you have a pool or fleet of services whose passwords need to change, you can generally take the simple process of iteratively stopping them, propagating the new password to them, and then re-starting them, without the prospect of killing the overall service that you’re providing (sure, you’ll kill any connections that are specifically tied to that one service instance, but there are other ways to handle that).

3. Even if you don’t trust that, there’s help – use two accounts

Interesting, but I can’t afford the risk that I change the password just before my token / ticket is going to expire.

Very precious of you, I’m sure.

OK, you might have a valid concern that the service startup might not be as robust as you hoped, and that you want to ensure you test the new startup of the service before allowing it to proceed and provide live service.

That’s very ‘enterprise scale’, too. There’s nothing worse than taking down a dozen servers only to find that they won’t start up again, because the startup code requires that they talk to a remote service which is currently down.

You wouldn’t believe how many systems I’ve seen where the running service is working fine, but no more can be started up because startup conditions for the service cannot  be replicated any longer.

So, to allow for the prospect that you may fail on restarting your services, here’s what I want you to do:

  1. Start with a large(-ish) pool of services. [See “no single point of failure” above]
  2. All of your services are running as one user account. Create another user account, with the same rights and privileges. Make sure that access rights are provided to the group that these accounts are a member of, rather than to individual accounts.
  3. Wind down one of the services, and shut it down.
  4. Change the downed service to use the second account you just created.
  5. Start up the downed service.
  6. Monitor this newly started service to make sure it starts up successfully and is providing correct service. (Yes, this means that you have the ability to roll-back if something goes wrong)
  7. Repeat steps 3 – 6 with each of the other services in the pool in turn, until all are using the second account.
  8. De-activate / disable logon to the old user account. Do not delete it.

As you can probably imagine, when you next do this process, you don’t need to create the second user account for the server, because the first account is already there, but disabled. You can use this as the account to switch to.

This way, with the two accounts, every time a password change is required, you can just follow the steps above, and not worry.

You should be able to merge this process into your standard patching process, because the two follow similar routines – bring a service down, make a change, bring it up, check it for continued function, go to the next service, continue until all services are done.

No excuses

So, with those techniques under your belt – and the necessary design and deployment practices to put them into place – you should be able to handle all requests to rotate passwords, as well as to handle patching of your service while it is live.

Sorry that this doesn’t come with a script to execute this behaviour, but there are some things I’m hoping you’ll be able to do for yourselves here, and the bulk of the process is specific to your environment – since it’s mostly about testing to ensure that the service is correctly functioning.