Wednesday, March 18, 2015

8 Tips - Build a Highly Available Service

Working at AWS, Citrix, Register.com, Above.net and CenturyLink has taught me a lot about availability and scale. These are the lessons I've learned over time. From infrastructure as a service to web applications, these themes will apply.


Build for failures

Failures happen all the time. Hard drives fail, systems crash, performance decreases due to congestion, power outages, etc. Build a service that can handled failures at any level in the stack. No one server, app, database or employee should be able to cause your service to go offline. Test your failure modes and see how your system recovers. Better yet, use a chaos monkey to continuously test for failures.


Build across multiple failure domains

A failure domain consists of a set of infrastructure that can all go down due a single event. Data centers and public cloud availability zones are examples of failure domains as they can go down due to one event (fire, power, network, etc). Build your service so that it actively serves customers in multiple failure domains. Test it. A simple example is to use global load balancing to route customers to multiple failure domains.


Don't have real time dependencies across failure domains

Don't build a distributed system that relies on synchronous communication across failure domains to serve your customers. Instead, build systems that can independently service your customers completely within a failure domain and make any communication between failure domains asynchronous. Having inter-failure domain dependencies will increase the blast radius for any single outage and increases the overall likelihood of service impacting issues. Also, there are often network instabilities between failure domains that can cause variable performance and periods of slowness to your systems and your customers. One example of this is data replication. Don't require storage writes to be replicated accross failure domains before the client considers the data 'stored'. Rather, store it inside the failure domain and consider it committed. Handle any cross failure domain replication requirements asynchronously, IE, after the fact.


Reduce your blast radius

If a single change or failure can impact 100% of your customers, your blast radius is too large. Break your system up in some way so that any single issue only impacts a portion of your customers. User partitioning, using multiple failure domains (global load balancing), rolling deployments, separate control planes, SOA and A/B testing are a few ways to accomplish this. One example is using partitioning for an email sending service. Assign groups of customers to different groups of email sending servers. If any group of servers has an issue, only a portion of your customers are impacted versus all of them.


Reduce the level of impact

Having your service go completely down for a portion of your customers is much worse than only having a part of your service unavailable for a portion of your users. Break apart your system into smaller units. An example is user authentication. Consider having a scalable, read only, easily replicated system for user logins but have another system for account changes. If you need to bring down the account change system for whatever reason, your users will still be able to login to the service.



Humans make mistakes

Humans are the reason for most service impacts. Bad code deployments, network changes with unintended consciousness, copy/paste errors, unknown dependencies, typos and skill-set deficiencies are just a few examples. As the owner of a service it is critical that you apply the appropriate level of checks, balances, tools and frameworks for the people working on your system. Prevent the inevitable lessons learned of one individual from impacting the service. Peer reviews, change reviews, system health checks and tools that reduce the manual inputs required for 'making changes' can all help reduce service impacts due to human error. The important thing here is that the sum of the things you put in place to prevent human error can not make the overhead of making change so high that your velocity falls to unacceptable levels. Find the right balance.


Reduce complexity

I am not a fan of the band Kiss, but I do like to keep it simple stupid. A system that is too complex is hard to maintain. Dependency tracking becomes impossible. The human mind can only grasp so much. Don't require geniuses to make successful changes on your system.


Use the ownership model

IE, use the devops model. If you build it you should also own it end to end (uptime, operations, performance, etc). If a person feels the pain of of a broken system, that person will do the needed to stop the pain. This has the result of making uptime and system serviceability a priority versus an after thought.


Good luck

--chris




Friday, January 2, 2015

5 Tips - Don't Screw Up the On-Site Interview

Who is the only one that can screw up an on-site interview? You are. 

If you have successfully navigated past the phone screens and find yourself about to go on the on-site interview, chances are you are at least somewhat qualified for the job. Tech companies want to hire you. All you need to do is not screw it up by giving them a reason to not hire you. Pretty easy, eh?

Your interviewers are trying to determine whether you can do the job and whether you can succeed given their environment. Your goal is to convince them that you can by leaving them with a positive impression of you. Enlighten them by following the below tips.

Note: This post is written as you are the interviewee. If you are in the interviewer seat, the below tips can be thought of as 'What to look for in a candidate'. 

Be Passionate

Love what you do. I do. You should too. If you aren't doing something you love, stop what your doing and do something else. People want to work with passionate people. No one wants a downer. No one wants someone just collecting a pay check. Passion is what drives action, improvement and ownership. It pushes people to do what others might think as unattainable. Be passionate. It's contagious. It will lead to amazing things (like a job offer at a great tech company).

When I was interviewing for an engineer role many years ago, the interviewer asked me what I was passionate about. Being young and foolish, I answered, "Solving problems and girls". They asked me to elaborate. I told him how I've used technology's infinite source of problems to feed my problem solving addiction. In doing so, I laid out a few of my accomplishments and how the motivation behind them came from my passion within. I also described that I liked girls, like most boys. At the time, I was somewhat shy around women. I realized that in order to get a girlfriend I needed to change that about myself. I described how I thought of it has just another problem that needed to be solved. I then described my approach at overcoming this shyness by forcing myself to talk with girls. The passion I had drove desirable outcomes in both cases and I made it clear to the interviewer. I ended up getting the job, which I ended up loving. More importantly, my passion resulting in me meeting and marrying my wonderful wife.

Be passionate, or go collect a pay check somewhere else.

Be yourself 

Seriously. Be yourself. Proudly be yourself and show your potential new team that you are a human being. No one wants to hire a rock with good coding skills. Note: if you know of, or if you are, a rock with good coding skills, please contact me as we are hiring. Embrace what makes you unique, your strengths and your weaknesses.

Own the message by connecting 'who you are' with 'how you will succeed in this role'. Some examples: you're desire to work alone allows you to go deep into complex problems, your interpersonal skills helps have helped those around you by creating a strong sense of teamwork, etc.

If you are a jerk, or there is something about yourself that you know to be truly bad in some way, my advise is to solve that problem and improve yourself. Your life will be better off for it. And, it will help you nail the on-site interview.

Be yourself so that they know you aren't a rock (with coding skills). Own the message of who you are so that you. Don't screw it up by pretending.

Have Confidence

If you doubt yourself, others will too, leading to no job for you. The hiring team needs to decide, with a limited amount of data, whether or not to offer you the job. Believing in yourself will help them believe in you. Don't give them a reason to doubt you by doubting yourself.

Again, the key is to own the message. Don't let them paint the picture for you. If you have successes in your past, use them to show that your past success will help you succeed in this role. Don't let them assume it, connect the dots for them. Believe in yourself. Others will too.

If you are lacking in some area for the role, be confident that you can succeed then prove to them why you will succeed. This will earn their trust and make them believe that you can really do it. Be up front with where you are weak but provide thoughts on how you will overcome this issue. If you haven't thought about this and you don't have a method to overcome your weakness, you are screwing up. This shows them that you can't solve your own problems. A candidate once said to me, "Currently, I don't know how to do this job. But, I believe I can overcome this by ... " We ended up hiring him because he owned the message, demonstrated his ability to learn new things, was passionate and addressed our concerns head on. He went on to become a super star and one of our best hires.

Believe in yourself and others will too.

Be honest 

Being caught in a lie will raise a red flag quicker than road runner can say, "Beep Beep!" You want to be genuine and come off as trustworthy. This sounds simple, but be honest throughout the interview. If you don't know an answer, simply say "I don't know". Even better, admit you don't know but follow it up with your thoughts on what the answer or solution could be and why. Doing that will show your integrity while also showing off your ability to think through problems.

A very direct and common interview question that tests honesty and trust are questions about your past mistakes or your current areas of improvement. This is them lobbing up a softball pitch for you to nail out of the park. Be honest! Admit to bringing down the site. Admit that you aren't the best speller. This will prove to them that you can openly discuss when things go wrong and that you aren't full of yourself. Going deeper, if you can demonstrate that you have learned from your previous mistakes and that you took action to improve things, you will show them the passion and drive you have to improve yourself and those around you.

Note: I'd like to point out that some things are better left unsaid. Discretion and situational awareness are key everywhere, but especially key during interviews. I sometimes have problems with this myself and I over share information (like how I answered one of my passions was girls). Make your own decisions up on what to say or not say. However, error on the side of honestly an openness. 

Be able to get stuff done

Be able to get stuff done. To do that, you need to have the required skill-sets, both technical and non-technical. Either you are an expert at something or you are on your way to becoming one. There is no other option in which you will get an offer. If you are an expert, awesome. The only way you can screw it up is if you are in a passionless death spiral to the land of irrelevance.If you aren't an expert (few of us are), or if you are a newbie, you should in learning mode. Most people have the opportunity to learn regardless of where they are in life. Seek out new assignments at work, use your free time, take a class or read a book. Don't solely focus on your primary skill-set (like coding). Soft skills are equaling important to be good at and to improve. Always be learning. Be able to articulate what you have learned recently.

Note, ability isn't having knowledge. Rather, its having the hard and soft skill-sets required to accomplish something. This is what makes or breaks a lot of interviews. Make sure you can, and have demonstrated in the past, the ability to accomplish things that are relevant to your new role. This is super important. I've seen countless college graduates come out of school without the ability to achieve something. Or, I've seen people with years of work experience require someone to hold their hand the entire way. But be like these people. Know how to get stuff done.

Let me share with you a story about a candidate who knew nothing, but could do anything. I once had a network engineer candidate bring a ridiculously large notebook to his on-site interview. This thing contained every bit of reference data a network engineer could ever possibly need. During the interview, for almost ever question, he referenced the notebook in order to answer the questions that could be solved directly with knowledge. We didn't know if he did this out of habit or necessity. Given this fact, the surprising thing to us was that he was great at applying the technical information he had in order to solve problems and delight customers. We discussed these traits, which at the surface seem contradictory, at great length in the debrief. The outcome was that we gave him a very strong offer, but he unfortunately turned it down to go work for Google. Goes to show, if you can get stuff done, that's all that matters.

Be able to demonstrate that you can 'get things done' given the knowledge that is available to you.

Have Common Sense

Don't be a jerk. Some people have common sense, others don't. I myself sometimes lack in this department from time to time.

Before, during and after an interview, be on your best behavior. Do the little things well. Be polite, on time, respectful and show gratitude. Research the company, the people, the product and the market. Show a sense of ownership for yourself and everything around you. Don't be a know-it-all, too aggressive or smelly. The sum of all these little, common sense things, can tip the decision one way or another.

I once saw someone show up 15 minutes late to both of his two on-site interviews. He didn't get the job. Another time, I saw how a candidate's simple followup email (which showed gratitude and passion) resulted in pushing A hiring manager's vote from no-hire to hire.

Do the little things well.

In summary

Don't screw it up.


Good luck,
Chris

PS, my company is hiring for lots of roles. Go check out our current openings at http://www.centurylinkcloud.com/careers

PPS, yes, there were 6 tips. Cliche, I know.

Monday, April 2, 2012

Infrastructure Internet Markets - Then, now and next.

The garden of markets

The internet is like a garden. Things grow, as long as you water them. In fact, the internet is really like a continuously expanding cluster of gardens, each with a different environment (aka market). Sales, marketing, finance, communication, and everything in between.

What I'll be talking about here are the infrastructure markets that grew to meet the needs of the internet.


Some new markets (gardens) of the past ...  


Connectivity, Dialup and beyond

First, there was the need for people and companies to be connected to the Internet. An untold amount of ISPs (internet service providers) were created to meet this need. They ranged from the size of your local mom and pop shop all the way to the size of AOL. Hundreds, thousands, hundreds of thousands of these companies were created world wide in order to meet the demand of humans wanting to communicate with one another. The ISP market will always exist and it will continue to grow and evolve. The speed of our connections will increase and along with it, what we do on the internet will change as well.

Web Hosting. Anyone remember geocities?

How many web hosting companies have existed in the last 15 years? A lot. If companies wanted to get on the internet, the most common way to do that at the time, was to pay a web hosting company to manage the server and internet connectivity for you. In this method, all you had to do was worry about creating and maintaining the content (which was a whole other market).


Today's new garden is... IaaS and PaaS

Infrastructure as a Service (Iaas)

Remember the ISP and web hosting markets exploding? Well, IaaS and PaaS is next! It is now!


When Amazon web services came out, it caused somewhat of a distribution in the industry. Why was this such a big event? Well, it was the first time internet infrastructure could be programatically controlled (can anyone say api). So simple, yet so game changing. Infrastructure as a service (IaaS) is really just web operations (data center, servers, storage, network) with an api in front of it. The benefits are huge to both internet based companies and tradition enterprise IT organizations.

We are now in a gold rush. Everyone under the sun wants to offer an IaaS solution. No one wants to be left behind. ISPs, Data center providers, enterprises, web hosting companies, governments, education, and even individuals are all getting on the boat. All of these people are interesting in providing IaaS features to their clients. Both publicly and privately.

There are now CMS (cloud management solutions) that allows these organizations to create an IaaS solution without the investment in writting their own code. Open Stack, Cloud stack, and eucalyptus seem to be the leading options available today. An organization can have an inhouse proof of concept up and running in a matter of hours. A production ready system can be ready in a few months.

Platform as a Service (Paas)

This is up a layer from IaaS. Instead of offering raw infrastructure, it offers a more of an application platform to its users. Someone only needs to upload their application, ie source code, and the PaaS will run and scale their solution as required.

Salesforce, Engine Yard, AppEngine, and Azure are a few of the leading public PaaS providers. Now, there are also PaaS software solutions that will allow people to create their own PaaS. Openshift and cloudfoundry are a couple in this category.

If I where you

I would get into this scene today. In 5 years from now, no one will be provisioning infrastructure manually anymore.

 

 

 What Next?

Obviously, more growth in IaaS and PaaS is coming. Software and solutions to help secure, monitor, provision, deploy all this stuff is currently being developed. But, what is the next garden? What is the next culture shift?

If I knew the answer to this, I probably wouldn't be talking publicly about it. The point is, some new internet garden will arise in which we will all see ourselves shifting onto.

Do you have a guess? Do you know?

Thanks
Chris