The importance of devops when doing data center consolidation or a cloud migration

Introduction

Currently there are a lot of companies going through the process of performing Data Center consolidation and/or Cloud migration of their IT systems. This generally aligns with the goal of cutting costs, being more agile to adjust to business needs, provide faster provisioning, better management and so on.

One of the things which is generally considered is the low “hanging fruits” which in this case is generally the infrastructure pieces leaving all the applications as they are. This is a good way to start but also potentially can create other issues like “fat fingers” since with a better provisioning and self-service IT systems now you start to get everyone with the ability to create new instances and suddenly you loose control again over what exactly is in your data center or in your cloud. You start to have questions about the governance of the process because people tend to remember to provision new instances to test something but also tend to forget to release it when they are done. This actually causes increased costs and requires a very good governance process.

This means that by only going for the infrastructure is not a bad thing, exactly the opposite but requires that you perform the correct steps and put in place the correct processes make sure the end goal you were trying to achieve is protected. For that you need to think beyond the infrastructure and take that time and project to perform also a mindset change from IT Operations to DevOps and remove completely the distance between the Development and the IT Operations (Monitoring, Management, Support and so on).

Agility and DevOps

DevOps has been an emerging trend which promises to deliver value quicker to the customer which actually means at the same time more agility in the full process.

But what is DevOps?

Everybody, like in any other new trend, seems to have a different idea of what it is. 

Here are some typical ideas I’ve heard:

  • DevOps is solely about development and operations collaborating and bridging that existing gap.
  • DevOps is only about providing automation
  • DevOps is about doing small and frequent deployments
  • DevOps is just Kanban for Operations.

All of those have some truth in them but in my opinion DevOps is based on the following three important things:

  • People

It’s about a different mindset when approaching all the process, from development to operations. Is about becoming a real team and stopping the silos and the typical expressions like “The code is Code Complete so now that’s a problem for Operations”.

Actually it’s all about providing a shared goal for all the team involved in a specific project because now their goal isn’t if the code is complete or not, isn’t the number of tickets solved in a determined time. It’s really about what are the business KPIs which this project need to drive, and so everyone is in the same boat. Developers now are interested in the code quality and the number of defects because that will affect the business KPIs and so their targets. Operations are really interested in participating in all the steps of the project and even during the development understanding where things are, which technologies are being used, what telemetry is being provided by the solution so they can better plan for it.

The bottom line is that it’s about shared business goals for the complete team and a different mindset or culture.

  • Processes

DevOps is all about trying to find new ways to improve efficiency and better support for the business.

All the processes are focused on how to deliver better business value quicker to the customer. How to use the right set of processes in order to enhance the quality and speed of everything which is built, operated and supported. How to actually ingest all the feedback and patterns seen during the operations cycle into the product backlog so that they can be quickly adjusted and deliver more value.

This means it’s all about delivering better processes to support the different mindset or culture.

  • Tools
This is a very important part because tools are a way to provide a better support to the processes and people. It’s a way to allow people to be more productive and be quicker understanding what is the best way to achieve the goals and keep measuring how all the processes are being used and tracking against the real goals, which are making the business successful.

 

So what does this mean for the Agility and DevOps relationship?

This means Agility and DevOps are actually related because in order to have agility is not only about using SCRUM/XP or any other agile methodology. It’s also about changing the culture which in this case should have started already when they started using agile methodologies. (You would be surprised with how many times this isn’t actually the case and you have SCRUM being done by project manager on a waterfall way. An example is when someone has one sprint called “Stabilization sprint” or “Performance Tuning Spring”. Why are they needed, wasn’t there a Definition of Done (DoD) in the first place?).

Importance of DevOps for DC Consolidation or Cloud Migration

Having said that it should already be pretty obvious why is it important to think about DevOps when doing a Data Center Consolidation or a Cloud Migration. The reason is actually simple. You don’t want to do the same mistakes again and be in the same position in 1-3 years from now.

When doing a Data Center Consolidation or a Cloud Migration it is important to look at the infrastructure but also:

  • Applications which it is supporting
  • People mindset and culture
  • Processes and tools
  • Key business success metrics and goals

Also it’s important sometimes to have someone with a different mindset look at the plans, since too often I see people which have been doing the exact same thing for several years being the one directing these projects. It’s not that they aren’t important, it’s just that with time the vision about a specific problem tends to create a tunnel vision and breaking that is very important.

Summary

So in summary, Data Center Consolidation or Cloud Migration isn’t just an operations problem, it’s actually something which is going to potentially impact the whole organization and for that reason is a very good time to think about people mindset and culture, processes and tools because you really don’t want to be doing this process over and over again. To achieve this DevOps is definitely the way to go, breaking the mentalities and embracing the business agility and creating shared goals for the teams so that what really matters is supporting the business and not the underlying technologies and infrastructure.

In future posts I will be talking about some tolls which can help implement DevOps processes better and provide the required visibility so that those goals are achieved, but there is something I won’t be able to help and that is changing the mindset and culture of people because that is something each person need to understand and embrace. Having said that my goal is to try to show how important this is and how this trend isn’t only another “buzzword” which someone made up, but really something which can make you and your company more successful.

Why do #Cloud Strategies fail? Applicable to #WindowsAzure, #AWS or #PrivateCloud


Introduction


Lately I’ve been doing a lot of Cloud Strategies for different company sizes, from small to Large Enterprises with multi-billion dollar revenue. By doing that I’ve found a lot of common mistakes which make the Cloud Strategies to fail and in some cases “badly”. Due to this I decided to write a bit about what I think are some of the important reasons for those failures so it the “ride to the cloud can become smoother”.


Common Cloud Strategy Mistakes


So here’s a small list of some of the things which can destroy any Cloud Strategy:

Overlooking the Business Requirements


When talking about the Cloud you see that almost everyone starts by saying that is will cut the IT costs, but completely forget to look at the business part of things and how the business will be impacted by it. Cloud is an excellent opportunity for performing a complete Enterprise Portfolio Assessment since in order to make a really great Cloud Strategy you’ll require to understand the Business, Software, Data and Infrastructure of you company. You should also understand how the real issues that the business wants to solve, and not only throw your IT problems to another platform and wait that everything goes well, normally it doesn’t. One of the most typical problems is that this overlook generated a massive decrease of business continuity, agility and failure in the strategy costing and the reason is simple, there wasn’t a good transition plan and roadmap from the “As-is” state to the “To-Be” state.

Other typical issues are:
  • Building a Cloud Strategy without a business case will be a way to fail. The Business Case is important because we’re talking about a business transformation and it’s important to understand and plan every step we take.
  • Selecting the wrong business domain is another one since normally people think they should take the “lowest hanging fruit” but fail to understand that it is actually bound to a business process that might change with that approach.
  • Selecting mission critical systems as the first systems to migrate to the Cloud is a great way for getting things wrong. We need to understand that the first time we move something it’s not going to go always well and as all the books say, there will be issues that need to be solved, risks to be mitigated and so on, so choosing it is generally a huge risk that normally if same thing fails, even a simple perception, the full Cloud Strategy will fall apart.
  • Selecting legacy systems as the first systems to migrate to the Cloud is another good way to fail. There is a reason why systems when moving to the Cloud need to be generally re-architected, it’s not only in horror stories. Most of the legacy systems aren’t ready for working in a highly available, scalability or even have reliability implemented so any outage or brownout that happens will destroy the solution.

 

Lack of expectation management and following the magazines or opinion makers


A large number of companies still just follow what magazines or opinion makers write about the cloud completely forgetting how it needs to be applied in their specific reality. This normally ends really badly because as any other thing the Cloud isn’t the “silver bullet” that is going to solve all the problems in the world from Business to IT, it’s just another tool in our belt in order to make our business better. Failing to understand this is generally a very good indications that your Cloud Strategy will fail since if you place a massive unmanaged expectations on top of the strategy you’re also setting it up for failure right from the start.

Other important issues are:
  • Before you choose to follow something be sure to prove the claims that you’ve read or heard. Do a Proof-of-Concept, search for other opinions, get recommendations for other companies in the industry. Don’t believe everything you hear and read, your company works in a specific way so the strategy should be done with that in mind.
  • Lack of Strategic Plan and Architecture definition (Enterprise and Solution Architecture) is a great way to fail since without the strategic plan to define what is going to be done, when, who and what gets affected you can’t mitigate any risks in the process. Also not having a clear architecture definition increases the odds of failure because it is they way the strategy is going to be implemented and the way you’ll understand the interdependencies and which potentially new skills sets you’ll need to have.

Embracing Technology just because it’s new


Technology is really great and cool but everything needs to be tested, validated and proven in the right context. Technology is there to help the business and not to make it harder. There needs to be a compelling reason for you to move to a new technology. I’m not saying you should stay with old technologies or even like some say you should be in version – 1 in order to be successful. I like technology and live going with the new things but also like to prove them and understand how they can help the specific business problems we want to solve. So you’ll need to really look at it to make sure it fits all the business, technical and management requirements. Avoid being a technology fanatic, it never works.

Other important issues:
  • Avoid doing things just based on a book you read. Experience how the technology helps, how it “feels” which new skills you’ll needs, if it is going to give you the right telemetry information and so on.
  • Overlooking the full solution requirements is another good way to miss it, you need to understand the big picture before you actually select the technology because that is one of the major drivers for that choice process.

Overlooking the importance of the Enterprise Architecture


I’ve been faced with several situations where companies tend to look only at a specific solution and not how that will affect the full system. It’s critical to look at the Enterprise Architecture because it what gives us the complete view of the system, from Business, Solution, Data and Infrastructure. Avoid thinking that the cloud will solve all your issues until today without a real strategy and that “agile” is just a way to avoid looking at the strategy in detail.

Overlooking potential outages and brownouts


The Cloud has outages and brownouts, there’s no reason to hide it. By having the Cloud the hardware didn’t stop failing, we just got heavy automation and self-healing systems which are managing these huge data centers which create the Cloud. I saw some corporations creating a Cloud Strategy looking at Enterprise Architecture, with a Business Case and everything else we discussed so far, but they forgot that even in the cloud the hardware fails and so they would need to have fault-tolerant in mind, business continuity plans, run-books/playbooks, and so on.Without those your business will be hit.

 

Summary


So as you can understand the journey to the Cloud is something you should do with a real strategy like any other decision you do in your company. This strategy should understand everything from your company from Business to Technology since without it will be doomed to failure and this is true independently of the Cloud vendor you use from Microsoft Windows Azure to Amazon AWS or even Private Cloud, if you don’t have a strategy you won’t achieve your goals. It’s that simple.

#WindowsAzure Express Routing and the importance for Enterprises

Windows Azure just released a new set of features and updates as it was published by Scott Guthrie on his blog post “Azure: ExpressRoute Dedicated Networking, Web Site Backup Restore, Mobile Services .NET support, Hadoop 2.2, and more”.

From this announcement I would like to start focusing on the ExpressRoute Dedicated Networking features.

For a lot of years now I’ve been working with enterprises and helping them to become more agile, increase revenue, reduce costs, and so on and of course for about 7/8 years now doing that leveraging the cloud and helping enterprises create the best possible strategy in order to take advantage of the public cloud.

While doing this a lot of discussion happen from the ones that say everything needs to go into the cloud, to the ones that say nothing can/should go into the public cloud. I’m on the group that says that everything needs to be seen in the specific context where the company you’re talking about is included.

My experience tells me that currently there is no way an enterprise is going to deploy everything solely on public cloud, not even solely on a single cloud provider, since that would be “putting all your eggs in a single basket”. What we need to understand is that the enterprises have a lot of systems that will remain On-Premises for a few more years because the laws need to change or the legacy systems need to be rebuilt, so until then Hybrid is the way to go in the enterprise space.

Now that we’ve understood that Hybrid is the right way to go at this stage for enterprises, we need to also understand that one of the common misunderstandings around the public cloud is that it is a complete black box which we can’t control, and this isn’t actually true, since they provide us ways to connect both On-Premises and Public Cloud providers, in Windows Azure’s case this would mean until now leveraging Windows Azure Virtual Network and Site-to-Site VPN. By having this possibility Windows Azure provided a way for enterprises to leverage the real power of the public cloud and still be in control and secured, but this was an IPSEC tunnel going over the Internet which can have some significant impacts for the performance and quality and costs of the system. But let’s dig deeper on those 3:

  1. Performance
    • Since we’re talking about the Internet we’re talking about a lot of different hops we need to do in order to go from On-Premises to the Windows Azure Data Center, which means our latency is going up and so our performance is going significantly down.
    • This is actually the reason why in some cases we create shadow copies of the On-Premises data used by the solutions deployed in Windows Azure so we can have lower latency and so affect less the performance, but this isn’t possible every time.
  2. Quality
    • Again due to the internet based connection we will have QoS issues with it because there is rarely a great SLA around out internet connections and there is no way for us to improve our quality massively.
  3. Cost
    • One thing that usually happens in these situations is in order not to affect the regular internet connectivity of the company, we will get a new dedicated internet “pipe” which at the same time will create a separation between what’s this VPN connection traffic from all other traffic and also allow us to have much better QoS on it, which comes with a huge price bump also, without even considering the issues on getting everything connected, all the routing done and so on.

 

With the announcement of the ExpressRoute functionality in Windows Azure enterprises will definitely have their life a lot easier and much more secure, since “ExpressRoute enables dedicated, private, high-throughput network connectivity between Azure datacenters and your on-premises IT environments. Using ExpressRoute, you can connect your existing datacenters to Azure without having to flow any traffic over the public Internet, and enable–guaranteed network quality-of-service and the ability to use Azure as a natural extension of an existing private network or datacenter.”

Now enterprises will be able to establish dedicated connection either through Equinix datacenters and in the future also Level 3 since that partnership was also part of the announcement, or by leveraging their MPLS VPN provided currently in the US by AT&T.

In summary, this basically approaches Windows Azure and Enterprises much more a much better quality and strategy for the future.

Windows Azure Storage Performance Best Practices

Windows Azure storage is a very important part of Windows Azure and most applications leverage it for several different things from storing files in blobs, data in tables or even message in queues. Those are all very interesting services provided by Windows Azure but there are some performance best practices you can use in order to make your solutions even better.

In order to help you do this and speed your learning process I decided to share some of the best practices you can use in order to achieve this.

Here is a list of those best practices:

1. Turn off Nagling and Expect100 on the ServicePoint Manager

By now we might be thinking what is Nagling & Expect100. Let me help you better understand that.

1.1. Nagling

“The Nagle algorithm is used to reduce network traffic by buffering small packets of data and transmitting them as a single packet. This process is also referred to as "nagling"; it is widely used because it reduces the number of packets transmitted and lowers the overhead per packet.”

So after understanding the Nagle algorithm should we take it off?

Nagle is great for big messages and when you don’t care about latency but really about optimizing the protocol and what is sent over the wire. In small messages or when you really want to send something immediately the nagling algorithm will create an overhead since it will delay the sending of the data.

1.2. Expect100

“When this property is set to true, 100-Continue behavior is used. Client requests that use the PUT and POST methods will add an Expect header to the request if the Expect100Continue property is true and ContentLength property is greater than zero or the SendChunked property is true. The client will expect to receive a 100-Continue response from the server to indicate that the client should send the data to be posted. This mechanism allows clients to avoid sending large amounts of data over the network when the server, based on the request headers, intends to reject the request.” from MSDN

In order to do this there are two ways:

   // Disconnects for all the endpoints Table/Blob/Queue

   ServicePointManager.Expect100Continue = false;

   ServicePointManager.UseNagleAlgorithm = false;

 

   // Disconnects for only the Table Endpoint

   var tableServicePoint = ServicePointManager.FindServicePoint(account.TableEndpoint);

   tableServicePoint.UseNagleAlgorithm = false;

   tableServicePoint.Expect100Continue = false;

Check that it is done before creating a connection with the client or this won’t have any effect on the performance. This means before you use one of these.

 

    account.CreateCloudTableClient();

    account.CreateCloudQueueClient();

    account.CreateCloudBlobClient();

 

2. Turn off the Proxy Auto Detection

By default the proxy auto detection is on in Windows Azure which means that it will take a bit more time in order to do the connection since it still needs to get the proxy for each request. For that reason it is important for you to turn it off.

 

For that you should do the following change in the web.config / app.config file of you solution.

 

<defaultProxy>
   <proxy bypassonlocal="True" usesystemdefault="False" />
</defaultProxy>




3. Adjust the DefaultConnectionLimit value of the ServicePointManager class



“The DefaultConnectionLimit property sets the default maximum number of concurrent connections that the ServicePointManager object assigns to the ConnectionLimit property when creating ServicePoint objects.” from MSDN



In order to optimize your default connection limit you first need to understand the conditions on which the application actually runs. The best way to do this is by doing performance tests with several different different values and then analyze them.



ServicePointManager.DefaultConnectionLimit = 100;


Hope this helps you the way it helped me.

Lessons Learned Building Secure and Compliant solutions in Windows Azure

In July I  decided to create a series of 3 posts about this topic. Those 3 posts are be:

In this post I’ll be focusing on the last part, which are the lessons learned.

Quick Concepts

When we think about compliance and security there are two concepts we need to consider and master. Those concepts are Data in Transit and Data at Rest. But what is this all about?

Data at Rest

This refers to inactive data which is stored physically in any digital form (e.g. databases , data warehouses, spreadsheets, archives, tapes, off-site backups, mobile devices etc.). In addition, subsets of data can often be found in log files, application files, configuration files, and many other places.

Basically you can think of this as data which is stored on a place which will be able to be retrieved even in case of a restart.

Data in Transit

This is commonly delineated into two primary categories:

  • Data that is moving across public or “untrusted” networks such as the Internet,
  • Data that is moving within the confines of private networks such as corporate Local Area Networks (LANs)

When working in with compliant solutions you always need to have this two in consideration because those will be the two topics that the compliances will focus on.

Lessons Learned

In order to make sure that the solution is “acceptable” from a Data Privacy & Compliance aspect I normally use the following process, that I would like to share with you.

  1. Perform an assessment on the organizational structure in order to understand all the information of where the business is being conducted, and which laws and compliances apply.
    • This is extremely important because if we work the Betting & Gaming industry we might find that they are located in one place but have their gateways on a different one, like Malta, Gibraltar and so on. By understanding this we will be able to understand exactly which compliances should be followed and which ones we should ignore.
    • The same thing applies for example to the Healthcare industry where you have HIPPA compliance but which is important to understand where the company that builds the product is, as well as doing the same for their customers, since different countries will have different compliance requirements.
  2. Understand which countries both the customer and software vendor is located. This will help understand which rules apply to that specific organization and plan for that.
  3. Identify the specific data you need to encrypt or you need to avoid moving into the cloud because of compliance issues.
    • This is an extremely complex exercise because you can’t say on a high level that all the data can or can’t go to the cloud, you need to dig into the compliance and understand exactly which fields can’t be.
    • For example, in the Healthcare industry you have HIPPA compliance which you have to comply with, but you also have to work with both PII (Personal Identifiable Information) and PHI (Personal Health Information), which can’t be in the Cloud at this stage. So normally you hear people saying immediately that this application cannot move into the cloud. That isn’t actually true. If you go and analyze the PHI and PII details you will see that the health information can be anywhere as long as it is not possible to match to which person that information is related to. If you look at it, this isn’t actually that hard to do. You can anonymize the data and place “Patient A” and the full health history in the cloud, do the necessary processing and then just send the information back to on-premises where you have a small table that correlates “Patient A” with the real patient information so doctors can work with.
  4. After understanding everything that is required in terms of requirements and compliances which are applicable to the solution, you need to look at where your Data at Rest is currently being stored inside your customer data center.
    • Databases
    • File Servers
    • Email Systems
    • Backup Media
    • NAS
  5. Now you should locate your Data in Transit across the network channels both internal and external. You should:
    • Assess the data trajectory
    • Assess how data is being transferred between the different elements of the network
  6. Decide how to handle Sensitive Data. There are several options you might take to handle this data.
    • Eradication
    • Obfuscation/Anonymization
    • Encryption
    • Note: Normally we go more with the Encryption option but anonymization is also really important and in some cases the only way to go. For example look at the PII and PHI. Anonymization would be the way to go there.

 

If you follow this simple process you will definitely be successful identifying what needs to be handles and how it needs to be handle and make your compliant solutions able to be moved to Windows Azure.

Hope this helps and please share your feedback so I can better target the posts.

Expert in Strategy, Cloud and other things. :)