Wednesday, December 2, 2015

Cloud Services Redone - Part 1 of 2

Disruption in Open Source Delivery, the Open Source Service 

What if we took the positives of open sourced software and merged them with the positives of cloud services like DynamoDB, BigTable, Kinesis, Lambda and S3? We would have a disruptive service delivery model.

In a two part post, I'll talk about two new approaches to delivering software as a service so that engineers can build better applications. One model I will call Open Sourced Services (Part 1) and another model I will call Managed Anywhere Services (to-be-written Part 2).

These posts will mostly describe a vision I have but will also include some back story and some technical information. The attended audience is anyone building or using applications. 

Where are we today?

Today's Cloud Services 

Today's application level cloud services like AWS's Kinesis or Google's Big Table provide engineers fully managed services in which you can build an application on top of that require no effort to maintain. These services are secure, autoscaled, monitored, logged, integrated and updated. Engineers can focus on their particular applications versus worrying about things like mysql cluster management or file replication. Personally, I love these services. However, these services lock you into a single cloud vender and you loose control of the functionality of these services. What if we could get the benefit of services without the lock in?

Today's Open Source 

Open source software has many benefits. The source code is open, which gives you increased control. The software is free which helps drive adoption and potentially lowers cost. It gives back to the community by enabling the free re-use of software. And it frees you from cloud lock-in as you can easily more your solution anywhere. However, open source software is normally delivered in a software package in which someone needs to install, monitor, configure, scale and upgrade so that it can be used in production. Each company often does all this a different way and their work isn't reusable by others. Isn't that a shame? What if we could take the increased control, the re-use of code, freedom from cloud lock-in and the free price tag of open source software but deliver it as a service?

Well, what do we want? 

The three main items that stick out are:
  • services
  • control 
  • portability (no lock in)
Below, I will talk about one way how we can get there. 

Model #1 - Open Sourced Services

An open source service is the packaging and delivery of open source software such that it is delivered as a service. An open source service should have these characteristics:

It is consumed as a service, not as software - It provides a function to the consumer in a clearly consumable method. Ie, a NoSQL service provides a way to insert data and retrieve data. The services heals itself. It scales when required. It provides metrics, logs and alerts to the consumer. It is highly available. The service is installed as easy as running a command or clicking on a button.

It is controlled as a service, not as software - The consumer takes action on the service itself versus it's subcomponents. They can initiate upgrades of the service. They can start, stop or restart the service as needed. The service should expose configuration knobs so that the consumer can control the level of redundancy, user access, scaling parameters (auto scaling, infrastructure thresholds, etc), capacity and user access.

It can run anywhere - The service should be able to be deployed anywhere. In a public cloud or in your company's data center.

It is open - The source code of the software and the declarative definition of how the software is wrapped as a service should both be open sourced. This will give the the consumer of the service the control they desire.

How do we get to Open Sourced Services? 

The creators of open source software have not had an easy and unified way to package up their software in a way that meets the above criteria until recently. Thanks to the disruptive technology of containers, such as docker, and orchestration systems (such as kubernetes, swarm and mesosphere), I believe we now have a platform in which we can build our open sourced services on top of.

Note: I'll be using Docker and Kubernetes in the rest of this article but you should know that there are alternatives (swarm, mesospahre, rocket, oci, etc). 

Docker enables us to easily build, ship and run software by packaging it up in a way that will run on a wide range of systems. If open source groups takes their code and packages it inside containers they will have the building blocks required for turning their software into a service (that can run anywhere).

Once the software is packaged in Docker containers we can turn their software into services by using kubernetes's features. Kubernetes's goal is to take a set of infrastructure (that lives anywhere) and provide an API to create services on top of that infrastructure. Kubernetes includes features like load balancing, scaling, rolling-updates, service discovery, name spaces, APIs, scheduling and configurable infrastructure threshold limits. Using these features, we can create declarative services and give control of the service to the consumer. Kubernetes helps groups deliver on the above characteristics for what makes up an open source service.

Creators of open source software now have a way to package their software up and deliver it as a service.  Sweet!!!! So, how would we consume these new services?

How will we consume open sourced services?

First, consumers will need to have one or more kubernetes clusters. Getting kubernetes up and running can take as little as 5 minutes or as long as a few days depending on your situation. Kubernetes supports most public clouds, openstack clouds and on-premise data centers. Consumers can also choose whether to run on top of VMs or physical machines. We can run kubernetes ourselves or we can use a managed option if we are looking for increased support. Notice that the consumer is in complete control here of where their applications will run as they will be freed from any lock-in. Thanks to the common API that lives in any infrastructure environment, consumers can use these clusters to deliver on a multiple cloud provider strategy or they can create a true hybrid strategy.

Second, consumers would need to choose and install some open source services . For example, let's say they want to use a noSQL service for their application. Previously, their choices where using something like AWS's DynamoDB (with AWS lock-in) or they could have engineered their our own Cassandra cluster with a bunch of blood and sweat. With open source services, all consumers will need to do is install the cassandra service on their kubernetes cluster of choice. They might do this with an easy command line tool ('kubernetes install service cassandra') or perhaps by logging into an app store and clicking on the cassandra service. Consumers will want to control the service by setting some parameters, like capacity limits and redundancy options, so that the service meets their needs. After that, the consumer simply starts using the service and keeps an eye on all the fun metrics and logs the service provides.

In this new world, consumers will have more control of their application and the services that they depend on. They can move our applications between clouds or between private and public. Consumers can run their applications actively across multiple clusters or do active-standby. All this is sweetness.

In summary

With open source services there are lots of winners: Developers will be able to build their applications the way they want to while getting the control they need. Tech execs get freedom from cloud lock-in and have a path to a true hybrid solution. Public cloud providers get to continue to sell infrastructure. Open source projects get a way to easily package their software as a service which increases their user's perceived value. Entrepreneur's get a brand new gold rush as their are plenty of challenges and opportunities.

Up next - Marriage #2 - Managed Anywhere Services

One of the down sides to open sourced services is that sometimes there is no one to call in case things go wrong. In my next post I'll write about an alternative approach that aims at solving that problem and opens up a new way for companies to make $$$ by delivering these services.




Some useful links:
  • Learn more about containers and kubernetes:
  • A list of some early versions of open source software packaged up as services:

Wednesday, August 26, 2015

Real-time Data Distribution with Apache Kafka

I wrote a blog post about CenturyLink Cloud's usage of Kafka for CenturyLink Cloud's website. That post, in it's full content, can be found below. Thanks. Chris

Real Time Data Distribution with Apache Kafka

Like most enterprises and service providers, here at CenturyLink Cloud we aim to please our customers by making data based decisions. As we grew our business, so does the amount of data we collect. The effort required to distill, distribute and analyze the data in meaningful ways was increasingly strenuous. In short, we needed to become faster with our analytics. If your organization is becoming overwhelmed with data, read on, as I’ll share with you how we used Apache Kafka to solve the following challenges with collecting and distributing data at scale.

·      Efficient access to data
·      Speed
·      Scalability
·      Distributed, Fault Tolerant and Highly Available

Challenge #1: Efficient access to data

Getting access to data is sometimes the hardest part. Sometimes the data that you need lives in some isolated software service and a one off data export is required. Perhaps your data integration flow is overly complex due to your adoption of a service-oriented-architecture. Maybe you had a process that required multiple teams to due work before data moved around. We saw all of this and more. We wanted to make it easier to move data around which would increase the overall velocity of our organization. Our data distribution flow between services looked something like this:

Previous Data Distribution Model

To fix this, we centralized our data flow with Apache Kafka. Kafka is commonly referred to as a pub/sub (publish and subscribe) messaging system. Simply put, systems publish their data into a Kafka ‘topic’. Other systems subscribe to the desired ‘topic’ within Kafka and extract the data. Kafka acts as a messenger, moving data to where it needs to go. Kafka, or any pub/sub messaging system, decouples the publishing of data from the subscription of data. Publishers only need to worry about sending the message. Subscribers only need to worry about receiving the message. It’s important to know that Kafka is not intended to be used a permanent data store. Rather, it provides a reliable, fast and scalable method to distribute data from data publishers to subscribers. Data storage is left to the consumers. By using Kafka we had an asynchronous, decoupled, centralized communication mechanism that looked like this:

Data Distribution Model with Kafka

We moved away from the many-to-many data distribution model to a centralized model. All of our services only need to send their data once, regardless to the number of destinations for the data. This also means that our services only need to pull from one source versus worrying about implementing a variety of different data integrations technologies. This centralization design reduced the overall effort spent by our engineers distributing data.

Note: There are alternative messaging software systems out there. RabbitMQ is similar in capabilities and may suite your needs better depending on what you need.

Challenge #2: The Need for Speed

Time is money. The quicker we can collect, detect, process, predict, and take action on our data, the faster we can act on behalf of our customers. We needed a data distribution model that enabled both near real time (streaming) analytics as well as the more traditional batch analytics. Enter the Lambda Architecture framework. The Lambda framework calls for all new data to be distributed simultaneously to both stream and batch processing pipelines, as such:

Lambda’s view of data distribution

Using the Lambda framework and Kafka for the messaging queue, we gained the ability to seamlessly add streaming analytics when needed (we use Apache Spark’s Streaming Module). In addition, it helps that Kafka itself is fast. Kafka’s data structure is essentially log files. As data comes into Kafka, Kafka appends the data to a log file. If subscribers are pulling data out in real-time they are also reading from the end of the file. This allows Linux’s page caching to store the needed data in memory for reads while using disk drives for writes. Lambda, Spark and Kafka’s low latency messaging allows us to make near real time decisions for our customers.

Note: An alternative framework to Lambda that is also worth looking into, as described in this article, suggests that users can feed data into a stream processing pipeline which in turn feeds the data into the batch pipeline.

Challenge #3: Scalability

The amount of data we collect and process is large and growing. We needed a data distribution model that will scale horizontally, use cloud infrastructure, support large workloads and auto-scale without impacting the data distribution flow. Kafka was built in order to scale and is has proven to work well under heavy load, as described in this LinkedIn post. Kafka’s ability to scale is achieved through it’s clustering technology. It relies on the concept of data partitioning in order to distribute the load across members in a cluster. If you need more data throughput in a particular location, you simply need to add more nodes to the Kafka cluster. Many cloud providers, like ours, and infrastructure automation frameworks provide the mechanisms to automate the creation and scaling of Kafka clusters.

Challenge #4: Distributed, Fault Tolerant and Highly Available

Losing data is bad. Having data stop flowing is bad. Having data only available in one datacenter is bad. We needed a solution that wasn’t bad and Kafka had us covered. Kafka supports data replication inside a cluster. This means that even if a server in a Kafka cluster crashes, data in the Kafka message bus would not be lost. Great. In addition to replication, Kafka’s uses sequential IDs to enable the graceful restarting of a data flow. If a client stops for any reason, another client can use the sequential IDs to pick up where the other client left off. We process the data in aggregate or to have high availability on the aggregate data, there is a need to replicate data to multiple data centers. Kafka provides an easy mechanism to have a Kafka clusters pull data from other Kafka clusters in other data centers. We use this to pull data from all our data centers to a single data center in order to preform analytics in aggregate. Kafka is reliable, distributed and fault tolerate. It just works.

Note: The delta in sequential IDs between the publisher and subscriber is referred to as the offset. The offset often used as a KPI metric on data flow through Kafka.

What’s wrong with Kafka?

So what’s wrong with Kafka? Not much. It solves our challenges and the price is right, free. The one drawback with Kafka that I’d like to call out is the complexity is puts on it’s clients. For example, each client needs to connect to every Kafka server in a particular cluster versus making only a single connection to the cluster. It needs to do this because the data being put into a Kafka cluster is split across members of the cluster. Thankfully there are SDKs, APIs and tools available that help engineers over come these challenges if a little time is spent researching and testing these items. In the end, we gladly accepted burden on the client in exchange for Kafka’s reliability, performance and scalability.


Kafka is the heart of our data stack. It pumps our data around so we can quickly and accurately act on behalf of our customers. Before Kafka, our data was moving around all willy-nilly. Post Kafka, we have simplified, scaled and structured dataflow that enables our business to move faster than was previously possible. Today, we are publishing operational metrics, logs, usage and performance information into Kafka and we are planning on importing additional types of data. We have a variety of software systems consuming the data from Kafka: Apache Spark for real-time and batch analytics, Cassandra for data storage, a SQL based data warehouse for reporting purposes, custom written applications and ELK (Elasticsearch, logstash and kibana) for operational visualization and search. Kafka is pumping here at Centurylink Cloud.

Thursday, August 20, 2015

Minecraft on Kubernetes

This is a post about launching a minecraft server, inside docker, on a kubernetes cluster.

Why do this?

  • I have a 5 year old son 
  • we like to play games and build stuff with blocks
  • I'm interested in kubernetes
  • I saw this post on minecraft in docker 
  • I wanted to get to know minecraft and kubernetes better

Note: After I got this working, but before I posted this, I came across Julia Ferraioli's series of blog posts about minecraft, docker and kubernetes. These posts are good and you may want to check them out too.


One of the advantages of Docker is sharing. In the public docker hub, there were a few different minecraft builds available. I choose to reuse Geoff Bourne's (@itzg) dockerfile listed here. Its a great build that lets you specificy many server configuration options via environment variables so you don't have to worry about putting together a minecraft configuration file. With my container already created for me, getting minecraft up in kubernetes was easy.

Game on Kubernetes

First, you will need kubernetes cluster. Doesn't matter where your cluster is as the bueaty of kubernetes is this will work regardless of where your cluster is. Since I work at Centurylink Cloud, I used the scripts I created here to create my kubernetes cluster on CL cloud.

Kubernetes has the concept of pods. A pod in k8 is a logical grouping of one or more containers, zero or more storage volumes and an IP address. We will launch the above minecraft docker container in a pod in kubernetes. And, since minecraft isn't a clustered server, we only need to run one pod per minecraft server.

Here is the yaml file I used to create this pod.
(Also located here:

apiVersion: v1
kind: Pod
  name: minecraft-server
    app: minecraft-server
  - name: minecraft-server
    image: itzg/minecraft-server
    - name: OPS
      value: madsurfer79
    - name: EULA
      value: "TRUE"
    - containerPort: 25565

You will want to at least change the username in the OPS ENV variable to be your minecraft username. This let's the server know who has admin privileges on the server. You can add more settings, like game type, minecraft version, etc as you wish. To see a more detailed list of what options are available, go check out here. Now, go create the minecraft server:


#Create pod
> kubectl create -f pod.yml

#See it up and running:
> kubectl get pods
NAME                      READY     STATUS    RESTARTS   AGE
minecraft-dfhse           1/1       Running   0          2h

#If you want this server to be reachable on the internet, you probably need to do some more things. If you are on a public cloud, like Centurylink, amazon or google, this is easy to do. Simply run:

kubectl expose pod minecraft-server --port=25565 --type=LoadBalancer

You will be given a public IP address, which you can see by running:

> kubectl get services


That is it. You and your friends can now play minecraft, inside docker, on kubernetes. 

A few things to note:
  • you may need to wait a few minutes while you wait for the external IP address to be published. 
  • you may also need to open up a firewall rule to this IP address and port. 
  • this isn't a durable configuration. If the pod dies, it won't come back online and the save games and world will be gone for ever. These problems are easily solved in kubernetes. We just need to add a replication controller and persistent storage. Perhaps, I'll work on that next and post it sometime. 


Monday, July 27, 2015

Kubernetes + Boinc for a better world.

Kubernetes, Docker and Boinc make it ridiculously easy to donate cpu cycles to solve earth's scientific challenges.


I figure, if we are deploying example applications to clusters to learn and test, why not deploy work loads that might add some benefit. So that is what I did. To get this all going, it was real simple. I reused a docker image that Ozzy Johnson @ozzydidact put together and I created a replication controller yaml file to spin up a group of containers on a cluster. Check out the above github link for the source code and some further instructions.

Here is me using all this:

MacBook-Pro:boinc ChrisKleban$ kubectl create -f ~/GitHub/docker-boinc/boinc-rc.yml 

MacBook-Pro:boinc ChrisKleban$ kubectl get rc,pods
CONTROLLER      CONTAINER(S)    IMAGE(S)              SELECTOR                        REPLICAS
boinc-workers   boinc-workers   ckleban/boinc-on-k8   name=boinc-workers,version=v1   1
NAME                  READY     STATUS    RESTARTS   AGE
boinc-workers-oprmg   1/1       Running   0          24s

MacBook-Pro:boinc ChrisKleban$ kubectl scale rc boinc-workers --replicas=20

MacBook-Pro:boinc ChrisKleban$ kubectl get rc,pods
CONTROLLER      CONTAINER(S)    IMAGE(S)              SELECTOR                        REPLICAS
boinc-workers   boinc-workers   ckleban/boinc-on-k8   name=boinc-workers,version=v1   20
NAME                  READY     STATUS    RESTARTS   AGE
boinc-workers-036t8   1/1       Running   0          17s
boinc-workers-5yjas   1/1       Running   0          17s
boinc-workers-6jogy   1/1       Running   0          17s
boinc-workers-frp3w   1/1       Running   0          17s
boinc-workers-giob9   1/1       Running   0          17s
boinc-workers-h55wg   1/1       Running   0          17s
boinc-workers-hlh9k   1/1       Running   0          17s
boinc-workers-idcds   1/1       Running   0          17s
boinc-workers-j7uln   1/1       Running   0          17s
boinc-workers-kg6nb   1/1       Running   0          17s
boinc-workers-lgzkd   1/1       Running   0          17s
boinc-workers-ngbz5   1/1       Running   0          17s
boinc-workers-nvdi9   1/1       Running   0          17s
boinc-workers-oprmg   1/1       Running   0          1m
boinc-workers-s1m5t   1/1       Running   0          17s
boinc-workers-twnoj   1/1       Running   0          17s
boinc-workers-wixrg   1/1       Running   0          17s
boinc-workers-xge9n   1/1       Running   0          17s
boinc-workers-xgy33   1/1       Running   0          17s
boinc-workers-yo0du   1/1       Running   0          17s

Sometime soon I'll be using this to test out kubernetes's current resource quota features to ensure a group of containers do not use more than a specified amount of CPU resources in a cluster.

In the future, kubernetes will be releasing priority (QOS) feature which will allow users to specify which groups of containers should take priority when resource starvation occurs. Once this is realized, users will be able to run work loads like this without impacting the higher priority business functions occurring in the cluster. This end goal would be similar to how priority would be used to ensure batch jobs don't cause real time customer facing workloads due to resource constraints.

Enjoy and thanks

Thursday, July 9, 2015

Containers and Clusters - Disrupting how cloud services are delivered

The Problem

Cloud computing and the higher level cloud services that public cloud companies offer have changed our industry. They allow developers to focus on solving customer needs versus worrying about servers, databases, asynchronous messaging, analytics, storage, media encoding, graphic rendering, content delivery and so on. These cloud services enable engineers to create applications quicker and reduce their costs. However, one of the draw backs of using the higher level cloud services is cloud lock in. The SDKs and APIs developers need to use to interact with these services are, for the most part, not standardized. If you use these services and you want to run the same application on another cloud or on premise, you need to rewrite some code. I don't think that this lock in should exclude people from using these services. However, we need a model that allows for these companies to provide innovative high level cloud services while also allowing us the freedom and flexibility of true portability. 

Disruption is needed 

One approach could be to get all the cloud companies to standardize their services. Ha. Anyone see pigs flying? Another approach, and the one I believe in, would be for mainstream adoption of containers and clusters to provide a path for cloud services to run anywhere. While I don't think cloud providers will jump at this notion, I do think there is an ongoing technology change that might force their hands. If they don't jump on board, startups will create similar services and take away their market share. Heres how.


Containers will be the foundation of this revolution. Docker and the rest of the container technologies offer a way for an engineer to package their software, deliver it to a server and run the package anywhere. If you want to learn more about containers and docker you should google it or read this webpage. Docker Docker Docker.  It's all you here about these days. In time, containers will be main stream.


In order to run a production service or application, you will need to build, run and manage many different containers. For redundancy, performance and scalability you will need to spread these containers across multiple servers and data centers. This is starting to sound complex. To make this easier, people have created clusters. 

Kubernetes, Apache Mesos and Docker Swarm are all clustering and scheduling software frameworks that allow organizations to create a logical collection of compute power called clusters. These clusters are made up of servers or VMs and enable engineers to deploy their containers across the infrastructure. In addition, the cluster software also provides container replication, auto-scaling, load balancing, monitoring, logging, scheduling, resource management and so on. The end result is that we can create clusters on the hardware of our choice: in the cloud or on premise. We package up our code and deploy it wherever we want. True utility compute. 

Operating systems for Clusters

To make things easier to manage, clustering software allows engineers to logically define groups of containers and various cluster attributes, like load balancing and security, into logically defined services. This allows engineers to manage services on their clusters, instead of a collection of containers. To make things easier for engineers, enter the concept of cluster operation systems. Two  exist today: Mesosphere's data center operating system (DCOS) which offers a web-portal and CLI and some might consider kubernetes CLI tool a cluster operating system as well. These cluster operating systems make it easier for engineers to deploy and manage  services on one or more clusters. They allow us to see the status of the cluster, the services, the containers and so on. All the things you need to run and maintain your service in production. Cluster operating systems will make deploying your services across multiple clusters a breeze. 

The app store for clusters 

It pretty obvious how this will enable us to do great things. But how will this change how cloud services are delivered? The answer will be a cluster service repository. Like an app-store for clusters. Want to run a HA database with your application? You will simply download the DB service you wish to integrate to your application. Want a web server? Go choose a web stack. Need caching? Need messaging? Simply pick your service. Then, you get to write code to utilize it, package it up and deploy it to whatever cluster you have running. 

Disruption in how services are delivered

Folks in the community will create and manage services in the services repo based on open source software packages. I see traditional software companies packaging their software into cluster services and perhaps charing for licensing. I see startups jumping at the chance to be first to offer new services that run on these clusters. And the kicker, I see cloud providers packaging their existing services so that you can run a copy of their service in your cluster, perhaps for a fee. Imagine running services like AWS Kinesis, Google machine learning, Oracle DB, azure's data warehouse or whatever, anywhere you want. Awesome.  

The end state unicorn

Containers, clusters, cluster operating systems, cluster services and a service repository will change the way we use data centers. It will change the way cloud services and open source software is packaged and delivered. This will have all the benefits of Platform as a Service. The control of doing things yourself. The dream of utility compute will be realized. Cloud provider lock in will be a thing of the past. Clustering and the service appstore will rock the cloud industry. 

What now

There is still a lot of work to do. Some of the things I described are here today and some are just visions that are being worked on. These technologies have huge momentum and I'm personally very interested in all this.  I believe in this disruption and I'll be investing in it one way or another. If you are in the cloud business you should consider this idea and decide if it's worth embracing. If you are an engineer or developer, keep an eye on these developments. 


Friday, May 15, 2015

Global Internet Access with Lightbulbs and Mesh Networking

What if the world was connected together by wireless enabled Lightbulbs and mesh networking software?

A problem worth solving

Many amazing people, groups and companies are working on how to better provide Internet and network access to the masses. Some ideas currently in development by the likes of Google, Facebook and SpaceX (to name a few) are: low orbit satellites, solar powered plains, hot air balloons and fiber to the home. All of these have merit and I applaud them for what they are trying to do and ultimately will do. But I think there is another idea to explore.

The internet isn't everywhere

One way to extend the Internet's reach is through distributed mesh networks. According to wikipedia, mesh networking is defined as ".. a network topology in which each node relays data for the network. All mesh nodes cooperate in the distribution of data in the network. Mesh networks can relay messages using either a flooding technique or a routing technique." 

Let me explain how mesh networks can help. Take my home Internet connection that is provided by my cable company. Anything within wifi range of my router (100 feet?) has internet access. However, if I leave my house, I no longer have wifi access. If there was were hundreds or thousands of devices in the city that formed a mesh network, I would be able to use the mesh network to reach my home's Internet connection no matter where I went in the city. Or, if other people around the city offered to connect their Internet connections to the mesh network, I would be able to use the mesh network to reach the closet or best Internet connection based on where I was. 


Let's look at some simple facts. Lightbulb sockets are everywhere. Some lightbulbs today have wifi. Some have computers in them creating 'Smart' lightbulbs. Some are energy efficient. I've recently read about a lightbulb product that has a built in speaker and bluetooth. This allows people to use lightbulbs in their house as a house wide speaker system. 

What if we built lightbulbs for the purpose of acting as nodes in mesh networks? What if we put software code in these lightbulbs that join and create mesh networks automatically so that all someone needs to do is screw it in a socket? What if people, laptops, phones and IoT devices could freely connect to this light bulb enabled mesh network to communicate with each other and the internet? What if we put these lightbulbs all over the world? What if people, organizations, and Internet service providers connected their Internet connections to these mesh networks so that the mesh networks has gateways to the Internet? 

We would have a series of mesh networks throughout the world that would together bring the Internet to the masses and to the billions of IoT devices that will be coming in the near future. 

Beyond Lightbulbs

Lightbulbs are just one way to create mesh networks. What if a bunch of other things do the same things: Cars, drones, consumer devices (phones, watches, laptops), home routers, artificial birds, etc. Some people are already working on these things which is amazing to see. We just need more of them and for all of the efforts to integrate, versus creating a series of isolated mesh networks.

Privacy and Security

Besides basic Internet connectivity, others have privacy and security concerns. Software and protocols exists today on mesh networks that provide encryption, protection and anonymous network access services to users. These features can easily be enabled by the mesh provider or by the end user through overlay networks.

Path forward

Communities are popping up that are organizing the creation and expansion of these mesh networks. (My local organization is  It would be great to speed up this process with major investment in the hardware rollout and node creation (lightbulbs, cars etc). In the ideal world, governments, companies, communities, individuals and organizations all work together in order to roll out mesh networks that interact freely with one another. 

Wednesday, March 18, 2015

8 Tips - Build a Highly Available Service

Working at AWS, Citrix,, and CenturyLink has taught me a lot about availability and scale. These are the lessons I've learned over time. From infrastructure as a service to web applications, these themes will apply.

Build for failures

Failures happen all the time. Hard drives fail, systems crash, performance decreases due to congestion, power outages, etc. Build a service that can handled failures at any level in the stack. No one server, app, database or employee should be able to cause your service to go offline. Test your failure modes and see how your system recovers. Better yet, use a chaos monkey to continuously test for failures.

Build across multiple failure domains

A failure domain consists of a set of infrastructure that can all go down due a single event. Data centers and public cloud availability zones are examples of failure domains as they can go down due to one event (fire, power, network, etc). Build your service so that it actively serves customers in multiple failure domains. Test it. A simple example is to use global load balancing to route customers to multiple failure domains.

Don't have real time dependencies across failure domains

Don't build a distributed system that relies on synchronous communication across failure domains to serve your customers. Instead, build systems that can independently service your customers completely within a failure domain and make any communication between failure domains asynchronous. Having inter-failure domain dependencies will increase the blast radius for any single outage and increases the overall likelihood of service impacting issues. Also, there are often network instabilities between failure domains that can cause variable performance and periods of slowness to your systems and your customers. One example of this is data replication. Don't require storage writes to be replicated accross failure domains before the client considers the data 'stored'. Rather, store it inside the failure domain and consider it committed. Handle any cross failure domain replication requirements asynchronously, IE, after the fact.

Reduce your blast radius

If a single change or failure can impact 100% of your customers, your blast radius is too large. Break your system up in some way so that any single issue only impacts a portion of your customers. User partitioning, using multiple failure domains (global load balancing), rolling deployments, separate control planes, SOA and A/B testing are a few ways to accomplish this. One example is using partitioning for an email sending service. Assign groups of customers to different groups of email sending servers. If any group of servers has an issue, only a portion of your customers are impacted versus all of them.

Reduce the level of impact

Having your service go completely down for a portion of your customers is much worse than only having a part of your service unavailable for a portion of your users. Break apart your system into smaller units. An example is user authentication. Consider having a scalable, read only, easily replicated system for user logins but have another system for account changes. If you need to bring down the account change system for whatever reason, your users will still be able to login to the service.

Humans make mistakes

Humans are the reason for most service impacts. Bad code deployments, network changes with unintended consciousness, copy/paste errors, unknown dependencies, typos and skill-set deficiencies are just a few examples. As the owner of a service it is critical that you apply the appropriate level of checks, balances, tools and frameworks for the people working on your system. Prevent the inevitable lessons learned of one individual from impacting the service. Peer reviews, change reviews, system health checks and tools that reduce the manual inputs required for 'making changes' can all help reduce service impacts due to human error. The important thing here is that the sum of the things you put in place to prevent human error can not make the overhead of making change so high that your velocity falls to unacceptable levels. Find the right balance.

Reduce complexity

I am not a fan of the band Kiss, but I do like to keep it simple stupid. A system that is too complex is hard to maintain. Dependency tracking becomes impossible. The human mind can only grasp so much. Don't require geniuses to make successful changes on your system.

Use the ownership model

IE, use the devops model. If you build it you should also own it end to end (uptime, operations, performance, etc). If a person feels the pain of of a broken system, that person will do the needed to stop the pain. This has the result of making uptime and system serviceability a priority versus an after thought.

Good luck


Friday, January 2, 2015

5 Tips - Don't Screw Up the On-Site Interview

Who is the only one that can screw up an on-site interview? You are. 

If you have successfully navigated past the phone screens and find yourself about to go on the on-site interview, chances are you are at least somewhat qualified for the job. Tech companies want to hire you. All you need to do is not screw it up by giving them a reason to not hire you. Pretty easy, eh?

Your interviewers are trying to determine whether you can do the job and whether you can succeed given their environment. Your goal is to convince them that you can by leaving them with a positive impression of you. Enlighten them by following the below tips.

Note: This post is written as you are the interviewee. If you are in the interviewer seat, the below tips can be thought of as 'What to look for in a candidate'. 

Be Passionate

Love what you do. I do. You should too. If you aren't doing something you love, stop what your doing and do something else. People want to work with passionate people. No one wants a downer. No one wants someone just collecting a pay check. Passion is what drives action, improvement and ownership. It pushes people to do what others might think as unattainable. Be passionate. It's contagious. It will lead to amazing things (like a job offer at a great tech company).

When I was interviewing for an engineer role many years ago, the interviewer asked me what I was passionate about. Being young and foolish, I answered, "Solving problems and girls". They asked me to elaborate. I told him how I've used technology's infinite source of problems to feed my problem solving addiction. In doing so, I laid out a few of my accomplishments and how the motivation behind them came from my passion within. I also described that I liked girls, like most boys. At the time, I was somewhat shy around women. I realized that in order to get a girlfriend I needed to change that about myself. I described how I thought of it has just another problem that needed to be solved. I then described my approach at overcoming this shyness by forcing myself to talk with girls. The passion I had drove desirable outcomes in both cases and I made it clear to the interviewer. I ended up getting the job, which I ended up loving. More importantly, my passion resulting in me meeting and marrying my wonderful wife.

Be passionate, or go collect a pay check somewhere else.

Be yourself 

Seriously. Be yourself. Proudly be yourself and show your potential new team that you are a human being. No one wants to hire a rock with good coding skills. Note: if you know of, or if you are, a rock with good coding skills, please contact me as we are hiring. Embrace what makes you unique, your strengths and your weaknesses.

Own the message by connecting 'who you are' with 'how you will succeed in this role'. Some examples: you're desire to work alone allows you to go deep into complex problems, your interpersonal skills helps have helped those around you by creating a strong sense of teamwork, etc.

If you are a jerk, or there is something about yourself that you know to be truly bad in some way, my advise is to solve that problem and improve yourself. Your life will be better off for it. And, it will help you nail the on-site interview.

Be yourself so that they know you aren't a rock (with coding skills). Own the message of who you are so that you. Don't screw it up by pretending.

Have Confidence

If you doubt yourself, others will too, leading to no job for you. The hiring team needs to decide, with a limited amount of data, whether or not to offer you the job. Believing in yourself will help them believe in you. Don't give them a reason to doubt you by doubting yourself.

Again, the key is to own the message. Don't let them paint the picture for you. If you have successes in your past, use them to show that your past success will help you succeed in this role. Don't let them assume it, connect the dots for them. Believe in yourself. Others will too.

If you are lacking in some area for the role, be confident that you can succeed then prove to them why you will succeed. This will earn their trust and make them believe that you can really do it. Be up front with where you are weak but provide thoughts on how you will overcome this issue. If you haven't thought about this and you don't have a method to overcome your weakness, you are screwing up. This shows them that you can't solve your own problems. A candidate once said to me, "Currently, I don't know how to do this job. But, I believe I can overcome this by ... " We ended up hiring him because he owned the message, demonstrated his ability to learn new things, was passionate and addressed our concerns head on. He went on to become a super star and one of our best hires.

Believe in yourself and others will too.

Be honest 

Being caught in a lie will raise a red flag quicker than road runner can say, "Beep Beep!" You want to be genuine and come off as trustworthy. This sounds simple, but be honest throughout the interview. If you don't know an answer, simply say "I don't know". Even better, admit you don't know but follow it up with your thoughts on what the answer or solution could be and why. Doing that will show your integrity while also showing off your ability to think through problems.

A very direct and common interview question that tests honesty and trust are questions about your past mistakes or your current areas of improvement. This is them lobbing up a softball pitch for you to nail out of the park. Be honest! Admit to bringing down the site. Admit that you aren't the best speller. This will prove to them that you can openly discuss when things go wrong and that you aren't full of yourself. Going deeper, if you can demonstrate that you have learned from your previous mistakes and that you took action to improve things, you will show them the passion and drive you have to improve yourself and those around you.

Note: I'd like to point out that some things are better left unsaid. Discretion and situational awareness are key everywhere, but especially key during interviews. I sometimes have problems with this myself and I over share information (like how I answered one of my passions was girls). Make your own decisions up on what to say or not say. However, error on the side of honestly an openness. 

Be able to get stuff done

Be able to get stuff done. To do that, you need to have the required skill-sets, both technical and non-technical. Either you are an expert at something or you are on your way to becoming one. There is no other option in which you will get an offer. If you are an expert, awesome. The only way you can screw it up is if you are in a passionless death spiral to the land of irrelevance.If you aren't an expert (few of us are), or if you are a newbie, you should in learning mode. Most people have the opportunity to learn regardless of where they are in life. Seek out new assignments at work, use your free time, take a class or read a book. Don't solely focus on your primary skill-set (like coding). Soft skills are equaling important to be good at and to improve. Always be learning. Be able to articulate what you have learned recently.

Note, ability isn't having knowledge. Rather, its having the hard and soft skill-sets required to accomplish something. This is what makes or breaks a lot of interviews. Make sure you can, and have demonstrated in the past, the ability to accomplish things that are relevant to your new role. This is super important. I've seen countless college graduates come out of school without the ability to achieve something. Or, I've seen people with years of work experience require someone to hold their hand the entire way. But be like these people. Know how to get stuff done.

Let me share with you a story about a candidate who knew nothing, but could do anything. I once had a network engineer candidate bring a ridiculously large notebook to his on-site interview. This thing contained every bit of reference data a network engineer could ever possibly need. During the interview, for almost ever question, he referenced the notebook in order to answer the questions that could be solved directly with knowledge. We didn't know if he did this out of habit or necessity. Given this fact, the surprising thing to us was that he was great at applying the technical information he had in order to solve problems and delight customers. We discussed these traits, which at the surface seem contradictory, at great length in the debrief. The outcome was that we gave him a very strong offer, but he unfortunately turned it down to go work for Google. Goes to show, if you can get stuff done, that's all that matters.

Be able to demonstrate that you can 'get things done' given the knowledge that is available to you.

Have Common Sense

Don't be a jerk. Some people have common sense, others don't. I myself sometimes lack in this department from time to time.

Before, during and after an interview, be on your best behavior. Do the little things well. Be polite, on time, respectful and show gratitude. Research the company, the people, the product and the market. Show a sense of ownership for yourself and everything around you. Don't be a know-it-all, too aggressive or smelly. The sum of all these little, common sense things, can tip the decision one way or another.

I once saw someone show up 15 minutes late to both of his two on-site interviews. He didn't get the job. Another time, I saw how a candidate's simple followup email (which showed gratitude and passion) resulted in pushing A hiring manager's vote from no-hire to hire.

Do the little things well.

In summary

Don't screw it up.

Good luck,

PS, my company is hiring for lots of roles. Go check out our current openings at

PPS, yes, there were 6 tips. Cliche, I know.