Scott Lowe's Blog

Syndicate content
Updated: 31 min 41 sec ago

Making AWS re:Invent More Family-Friendly

Mon, 11/13/2017 - 20:00

AWS re:Invent is just around the corner, and Spousetivities will be there to help bring a new level of family friendliness to the event. If you’re thinking of bringing a spouse, partner, or significant other with you to Las Vegas, I’d encourage you to strongly consider getting him or her involved in Spousetivities.

Want a sneak peek at what’s planned? Have a look:

  • Monday’s activity is a full-day trip to Death Valley, including a stop at Bad Water Basin (significant because it is 280 feet below sea level, making it the lowest place in the Western Hemisphere!). Lunch is included, of course.
  • On Tuesday, Spousetivities participants will get to visit a number of locations on the Las Vegas Strip, including Siegfried and Roy’s Secret Garden, the Wildlife Habitat at the Flamingo, and the Shark Reef at Mandalay Bay. Transportion is provided for longer connections, but there will be some walking involved—wear comfortable shoes!
  • Wednesday includes a visit to Red Rock Canyon and Hoover Dam. There will some opportunities for short sightseeing walks in Red Rock Canyon (plus the 13-mile scenic drive), and the Hoover Dam tour includes access to the generator room (a very cool sight).
  • Wrapping up the week on Thursday is a helicopter tour with views of the Hoover Dam, Lake Mead, Fortification Hill, the Colorado River, and—of course—the Grand Canyon! This includes a landing on the floor of the Grand Canyon for a snack and beverages. This is an amazing experience. (I’ve personally taken this tour and it is fabulous.)

Registration is open right now, so sign up before it’s too late. Prices for these activities is reduced from standard retail rates thanks to sponsorship from VMware NSX.

Categories: Scott Lowe

Technology Short Take 90

Fri, 11/10/2017 - 13:00

Welcome to Technology Short Take 90! This post is a bit shorter than most, as I’ve been on the road quite a bit recently. Nevertheless, there’s hopefully something here you’ll find useful.

Networking Security Cloud Computing/Cloud Management
  • Google is rolling out Kubernetes 1.8 to Google Container Engine; this blog post talks about some of the new features and functionality.
  • Lior Kamrat has a multi-part series on an architecture that blends Mesosphere DC/OS, Azure, Docker, and VMware vSphere. Check out the beginning of the series here.
Operating Systems/Applications Storage
  • I’m clearly behind the times in some of my reading, as this great article by J Metz was just brought to my attention recently. J does a good job of laying out the various competing forces that drive product/technology evolution and selection in the storage space, though one might argue these same forces are at work in other areas besides just storage.
Virtualization Career/Soft Skills

Thanks for reading! Feel free to hit me up on Twitter if you have feedback or would like to share a link I should include in a future Technology Short Take.

Categories: Scott Lowe

How to Tag Docker Images with Git Commit Information

Wed, 11/08/2017 - 13:00

I’ve recently been working on a very simple Flask application that can be used as a demo application in containerized environments (here’s the GitHub repo). It’s nothing special, but it’s been useful for me as a learning exercise—both from a Docker image creation perspective as well as getting some additional Python knowledge. Along the way, I wanted to be able to track versions of the Docker image (and the Dockerfile used to create those images), and link those versions back to specific Git commits in the source repository. In this article, I’ll share a way I’ve found to tag Docker images with Git commit information.

Before I proceed any further, I’ll provide the disclaimer that this information isn’t unique; I’m building on the work of others. Other articles sharing similar information include this one; no doubt there are countless more I haven’t yet seen. I’m presenting this information here simply to show one way (not the only way) of including Git commit information with a Docker image.

Getting the necessary information from Git is actually far easier than one might think. This variation of the git log command will print only the full hash of the last commit to the repository:

git log -1 --format=%H

If you prefer the shortened commit hash (which is what I use currently), then just change the %H to %h, like this:

git log -1 --format=%h

Getting the information out of Git is only half the puzzle, though; the other half is getting it into the Docker image. The answer lies in some changes to the Dockerfile and the use of an additional command-line flag when building the image.

First, you’ll need to add lines like this to your Dockerfile:

ARG GIT_COMMIT=unspecified LABEL git_commit=$GIT_COMMIT

The first line defines a build-time argument, and the use of =unspecified means that if the built-time argument is omitted or not supplied, it will default to the value of “unspecified”. The second line takes the information from the argument and adds it as a label on the image.

With the Dockerfile prepared to leverage Git commit information, all that’s necessary is to build the image with the --build-arg flag, like this (here I’m showing the command I’d use to build the “flask-web-svc” image for the Flask application I’ve been building):

docker build -t flask-local-build --build-arg GIT_COMMIT=$(git log -1 --format=%h) .

Here I’m using Bash command substitution to take the output of git log -1 --format=%h and supply it to docker build as the GIT_COMMIT argument (i.e., what the Dockerfile is expecting). This command assumes that you’re building the Docker image from the latest Git commit; if this isn’t the case, then you’ll need to modify your command. As I mentioned earlier, if you omit the --build-arg parameter, then the label will be assigned with the default value of “unspecified”.

When you build the image this way, you can then see the Git commit attached to the image as a label using this command:

docker inspect flask-local-build | jq '.[].ContainerConfig.Labels'

Note I’m using the incredibly-useful jq tool here. (If you’re not familiar with jq, check out my introductory post.)

Assuming that the build was successful and the container operates as expected/desired, then you can tag the image and push it to a registry:

docker tag flask-local-build slowe/flask-web-svc:0.3 docker push slowe/flask-web-svc:0.3

I also create a GitHub release corresponding to the Git commit used to build an image, so I can easily correlate a particular version of the Docker image with the appropriate commit in the repository. This makes it easier to quickly jump to the Dockerfile for each version of the Docker image. So, for example, when I release version 0.3 of the Docker image (which I recently did), I also have a matching v0.3 release in GitHub that points to the specific Git commit from which version 0.3 of the Docker image is built. This allows me—and anyone else consuming my Docker image—to have full traceability from a particular version of a Docker image all the way back to the specific Git commit from which that Docker image was built.

I imagine there are probably better/more efficient ways of doing what I’ve done here; feel free to hit me up on Twitter to help me improve. Thanks for reading!

UPDATE: Michael Gasch also pointed out that git rev-parse HEAD will return the full (long) commit hash from the last commit, so this is another way to get the information from Git. Given the nature of Git, no doubt there are countless more!

Categories: Scott Lowe

Deep Dive into Container Images in Kolla

Wed, 11/08/2017 - 00:30

This is a liveblog of my last session at the Sydney OpenStack Summit. The session title is “OpenStack images that fit your imagination: deep dive into container images in Kolla.” The presenters are Vikram Hosakote and Rich Wellum, from Cisco and Lenovo, respectively.

Hosakote starts with an overview of Kolla. Kolla is a project to deploy OpenStack services into Docker containers. There are two ways to use Kolla: using Ansible (referred to as Kolla-Ansible) or using Kubernetes (referred to as Kolla-Kubernetes). Hosakote mentions that Kolla-Kubernetes also uses Helm and Helm charts; this makes me question the relationship between Kolla-Kubernetes and OpenStack-Helm.

Why Kolla? Some of the benefits of Kolla, as outlined by Hosakote, include:

  • Fast(er) deployment (Hosakote has deployed in as few as 9 minutes)
  • Easy maintenance, reconfiguration, patching, and upgrades
  • Containerized services are found in container registry
  • One tool to do multiple things

Hosakote briefly mentions his preference for Kolla over other tools, including Juju, DevStack, PackStack, Fuel, OpenStack-Ansible, TripleO, OpenStack-Puppet, and OpenStack-Chef.

Other benefits of using containers for OpenStack:

  • Reproduce golden state easily
  • No more “Works in DevStack” but doesn’t work in production
  • Production-ready images (this seems specific to Kolla, not just containers for OpenStack control plane)
  • Easy to override the configuration files
  • Dev mode (enables faster dev/test cycles)
  • Portable, tested, replicated images in well-known registries (again, this is specific to Kolla, not just using containers for the OpenStack control plane)

Next, Hosakote goes over the Kolla architecture. The “deploy node” runs Ansible, Docker, the Docker Python libraries, and Jinja. All the Dockerfiles, Ansible playbooks, Jinja templates, Helm charts, etc., are pulled from GitHub. By default, Kolla uses the Docker Hub for container images, but it can also use a custom registry. The deploy node deploys Docker containers with OpenStack services onto target nodes.

When Kubernetes is in play (via Kolla-Kubernetes), the architecture is similar but with the addition of Kubernetes-specific components (such as the kubelet on worker nodes).

Hosakote takes care to point out that Kolla can be used for a variety of “advanced” configurations, like:

  • Jumbo frames
  • SR-IOV (PCI passthrough)
  • Ceph storage
  • EFK (ElasticSearch, Fluentd, Kibana)
  • Prometheus (cloud-native monitoring and alerting)
  • Numerous other customizations and configuration overrides

Next, Hosakote briefly reviews how to use Kolla (Kolla-Ansible, specifically). He went through the content so quickly that I wasn’t able to catch all the details.

Now, Wellum takes over the presentation to discuss Kolla-Kubernetes. Wellum mentions a few resources for using Kolla-Kubernetes, but immediately transitions into a discussion of why you may want to build your own custom Kolla images (as opposed to using the “official” Kolla images available on Docker Hub). Some of these reasons include:

  • You run a “custom” or “proprietary” version of OpenStack
  • You are an OpenStack contributor and would like to test in production-quality environments
  • Your company develops drivers unique to hardware (this allows you to easily stand up OpenStack so you can spend more of your time focusing on hardware support/integration)

This leads Wellum into an example of how you might use Kolla to build custom images for a couple of OpenStack services, but use “vanilla” upstream code for the rest of the OpenStack services. Wellum again mentions the use of Helm, but does not provide any information on interaction between Kolla-Kubernetes and OpenStack-Helm.

Next, Wellum shows a recorded demo that shows everything he just discussed about the example of building custom images for a couple of OpenStack services. The presenters run into some permissions issues with Google Docs, but after a few minutes manage to finally resolve those so they can show the demo. The demo, unfortunately, is limited in size and has poor color contrast that makes it difficult to see from the back of the room.

After watching the demo for a while, I wrapped up the liveblog as the session was running over its time limit.

Categories: Scott Lowe

Carrier-Grade SDN-Based OpenStack Networking Solution

Tue, 11/07/2017 - 22:30

This session was titled “Carrier-Grade SDN Based OpenStack Networking Solution,” led by Daniel Park and Sangho Shin. Both Park and Shin are from SK Telecom (SKT), and (based on the description) this session is a follow-up to a session from the Boston summit where SK Telecom talked about an SDN-based networking solution they’d developed and released for use in their own 5G-based network.

Shin starts the session with some presenter introductions, and sets the stage for the presentation. Shin first provides some background on SKT, and discusses the steps that SKT has been taking to prepare their network for 5G infrastructure/services. This involves more extensive use of virtual network functions (VNFs) and software-defined infrastructure based on open software and open hardware. Shin reinforces that the SKT project (which is apparently called COSMOS?) exclusively leverages open source software.

Diving into a bit more detail, Shin talks about SONA Fabric (which is used to control the leaf/spine fabric used as the network underlay), SONA (which handles virtual network management), and TACO (which is an SKT-specific version of OpenStack). The network monitoring solution is called TINA, and this feeds into an integrated monitoring system known as 3DV.

TACO (stands for SKT All Container OpenStack) is containerized OpenStack, leveraging Kubernetes and OpenStack-Helm. The container images come from Kolla and Docker is the container engine in use. TACO is built entirely from the upstream source code, which feeds into a CI/CD pipeline that generates working images, packaging, configuration, etc.

Park now takes over to discuss ONOS. ONOS is led by the ONF and is a “carrier-grade” network operating system; the first version was released in 2015. SKT now uses a test version of ONOS. Key features of ONOS are scalability, performance, high availability, a modular architecture that supports pluggable southbound interfaces, and a set of northbound abstractions for greater interoperability.

SONA (which stands for Simplified Network Overlay Architecture) is the SKT-specific solution designed on top of ONOS for SKT. All of SONA, according to Park, has been upstreamed into the ONOS repository. With regard to OpenStack, SONA comprises three different ONOS applications:

  • OpenStackNode (initializes nodes)
  • OpenStackNetworking (programs switching/routing rules via OpenFlow to OVS, both on compute nodes and gateway nodes)
  • OpenStackNetworking UI (user interface)

On the data plane side, SONA uses OVS and a vRouter. No details are provided on this vRouter construct, other than it was developed by SKT for SONA.

Key features of SONA include:

  • Implemented as an ML2 plugin for Neutron
  • Uses VXLAN-based overlays
  • No agents and no network node
  • Virtual router scalability
  • Optimized East-West routing (no virtual router needed)
  • Provides a UI-based flow tracer

The text on the slide talks about no need for network nodes, but Park repeatedly mentions gateway nodes and the ability to scale gateway capacity; I’m guessing that gateway nodes are a replacement for network nodes. Gateways are responsible for handling North-South traffic flows.

Park talks for a minute about the inspiration for the flow tracing UI, which comes out of trying to troubleshoot traffic flows using OVS-related CLI commands and grep.

Next, Park shows a brief video demo of the flow tracing user interface.

Following the demo, Park outlines a few use cases for SONA, all of which involve integration between OpenStack networking and physical networking for the purpose of supporting physical network slicing, vEPC (virtual Evolved Packet Core), and VNF deployments.

SONA Fabric is a pure OpenFlow-based leaf/spine fabric solution, supporting ECMP, failure detection, auto-recovery, and physical/virtual network integration. SONA Fabric has a more limited hardware support list, due to pure OpenFlow architecture. I find it interesting that SKT says their solution is completely based on “open hardware,” yet Cisco and Arista are mentioned as being supported by SONA Fabric. (Perhaps there is a disconnect/difference between the SONA Fabric support and what SKT specifically is using.)

Next, Park shows a video demo of TINA (SKT Integrated Network Analyzer), which provides 3-D visualization of network status and network performance information. It’s quite impressive, to be honest.

At this point, Shin takes over to talk about what’s missing for true carrier grade solution? The primary concern is the data plane, where performance for VNFs still can’t match PNFs. The solution is to more fully embrace data plane acceleration using technologies like DPDK, NIC offloading, and Xeon+FPGA acceleration. Shin mentions that SKT is working to conduct a commercial proof-of-concept with TACO, SONA, and some data plane acceleration solutions early next year, and hopes to share some results next year.

At this point, the presenters wrap up the session.

Categories: Scott Lowe

Can OpenStack Beat AWS in Price

Tue, 11/07/2017 - 22:00

This is a liveblog of the session titled “Can OpenStack Beat AWS in Price: The Trilogy”. The presenters are Rico Lin, Bruno Lago, and Jean-Daniel Bonnetot. The “trilogy” refers to the third iteration of this presentation; each time the comparison has been done in a different geographical region (first in Europe, then in North America, and finally here in Asia-Pacific).

Lago starts the presentation with an explanation of the session, and each of the presenters introduce themselves, their companies, and their backgrounds. In this particular case, the presenters are representing Catalyst (runs Catalyst Cloud in New Zealand), OVH, and EasyStack—all three are OpenStack-powered public cloud offerings.

Lago explains that they’ll cover three common OpenStack scenarios:

  • Private cloud
  • Managed private cloud
  • OpenStack-powered public cloud

Lin takes point to talk a bit about price differences in different geographical regions. Focusing on AWS, Lin points out that AWS services are about 8% higher in Europe than in North America. Moving to APAC, AWS services are about 29% higher than in North America. With this 29% price increase, I can see where OpenStack might be much more competitive in APAC than in North America (and this, in turn, may explain why OpenStack seems much more popular in APAC than in other regions).

Lago takes over now to set some assumptions for the comparisons. He provides an overview of the total cost of ownership (TCO) model used; the presenters are also making available a Google Docs spreadsheet that implements the TCO model and their assumptions. This includes things like hardware costs, power costs, number of full-time employees (FTEs) used to support OpenStack, etc. All this information is used to derive the per-vCPU cost is when running workloads on OpenStack; this enables (hopefully) fair comparisons with AWS.

This leads Lago to share a slide that compares various EC2 m4 instances to comparable OpenStack instances; in all cases, the OpenStack instance was far less expensive than EC2 (again, this comparison is based on a number of assumptions, such as a quite large cloud).

Bonnetot now takes the lead to talk about the OpenStack public cloud use case (comparing AWS against an OpenStack-powered public cloud). The comparison here was done using the AWS Simple Monthly Calculator and then compared against matching (as close as possible) OVH public cloud resources. Bonnetot recognizes that these comparisons can’t be perfect due to differences between providers, and that every effort was made to try to equalize differences between providers and their offerings. Lin also takes a few minutes to talk about how they tried to account for differences in services and guarantees between providers.

Coming finally to the comparison—based on a pretty standard three-tier application architecture—Lin shows that AWS runs about $11K a year, OVH comes in at about $9K a year, and private OpenStack cloud comes in just over $3K a year. Lin drives into a few additional details, noting that additional “sysadmin” time is added to OVH and OpenStack amounts to account for the lack of managed services that are present on AWS. In APAC, the AWS price runs up to almost $15K (representing that 29% premium for AWS services in APAC regions).

A second comparison—based on a back-end for a mobile application—the AWS price is around $27K, compared to $18K for OVH and $4K for an OpenStack private cloud. Lin calls out the cost of the bandwidth when using a public cloud provider in AWS, but I don’t recall any of the presenters mentioned that they accounted for network costs and network services in their models. This may represent a gap in the comparison that unfairly skews the benefits in favor of private clouds. Additional details provided by Lin show that the APAC price for AWS is again about 30% higher (up to $36K).

Bonnetot steps forward to cover a third comparison, this time an example for a deep learning use case. This example discusses the use of GPU-equipped instances. In this example, AWS costs $167K annually, compared to $126K annually for OVH (no OpenStack private cloud comparison was given here, I assume because of the hardware needs).

The last use case is an archival use case. In this case, the presenters discuss archiving 60TB of data in an object storage service (S3 versus Swift). In this case, OVH comes out first (around $1700), AWS second (just shy of $3K), and an OpenStack private cloud comes in last (just over $3K). Lago theorizes that the model may be due to some incorrect assumptions around hardware platforms and hardware configurations, and invites attendees to collaborate with them in fine-tuning the assumptions behind the model. In this last model, AWS again comes in at about 22% more expensive in APAC than in the US.

In conclusion, Lago steps up to remind attendees that it’s not only about cost; it’s also about risk. At sufficient scale, a private cloud can be far more cost effective than a public cloud, but the risk of using a private cloud is higher (more staff required, more training, getting the company to use the cloud, etc.). Managed private cloud increases cost, but can reduce risk. Public cloud offers a far lower risk, but also typically has a much higher cost.

The presenters wrap up by providing a URL to the TCO spreadsheet (http://bit.ly/2dFGvfQ) and again invite attendees to collaborate with them on fine-tuning the assumptions and the TCO model.

Categories: Scott Lowe

Lessons Learnt from Running a Container-Native Cloud

Tue, 11/07/2017 - 21:30

This is a liveblog of the session titled “Lessons Learnt from Running a Container-Native Cloud,” led by Xu Wang. Wang is the CTO and co-founder of Hyper.sh, a company that has been working on leveraging hypervisor isolation for containers. This session claims to discuss some lessons learned from running a cloud leveraging this sort of technology.

Wang starts with a brief overview of Hyper.sh. The information for this session comes from running a Hypernetes (Hyper.sh plus Kubernetes)-based cloud for a year.

So, what is a “container-native” cloud? Wang provides some criteria:

  • A container is a first-class citizen in the cloud. This means container-level APIs and the ability to launch containers without a VM.
  • The cloud offers container-centric resources (floating IPs, security groups, etc.).
  • The cloud offers container-based services (load balancing, scheduled jobs, functions, etc.).
  • Billing is handled on a per-container level (not on a VM level).

To be honest, I don’t see how any cloud other than Hyper.sh’s own offering could meet these criteria; none of the major public cloud providers (Microsoft Azure, AWS, GCP) currently satisfy Wang’s requirements. A “standard” OpenStack installation doesn’t meet these requirements. This makes the session more like a subtle but unmistakable plug for Hyper.sh, rather than a more genuine desire to share information learned on running containers on OpenStack at large scale. This impression is reinforced by a few slides in which Wang extolls the benefits of a container-native cloud as opposed to more “traditional” approaches, followed by a few more slides plugging Hyper.sh directly.

The next couple of slides also focus exclusively on Hyper.sh, and it appears that the majority of this session will be an advertisement for Hyper.sh (instead of focusing on helping the OpenStack community understand how OpenStack components and architecture may be affected by the extensive use of containers within an OpenStack deployment). At this point, I wrap up my liveblog.

Categories: Scott Lowe

Make Your Application Serverless

Tue, 11/07/2017 - 21:00

This is a liveblog from the last day of the OpenStack Summit in Sydney, Australia. The title of the session is “Make Your Application Serverless,” and discusses Qinling, a project for serverless (Functions-as-a-Service, or FaaS) architectures/applications on OpenStack. The presenters for the session are Lingxian Kong and Feilong Wang from Catalyst Cloud.

Kong provides a brief background on himself and his co-presenter (Wang), and explains that Catalyst Cloud is an OpenStack-based public cloud based in New Zealand. Both presenters are active technical contributors to OpenStack projects.

Kong quickly transitions into the core content of the presentation, which focuses on serverless computing and Qinling, a project for implementing serverless architectures on OpenStack. Kong points out that serverless computing doesn’t mean there are no servers, only that the servers (typically VMs) are hidden from view. Functions-as-a-Service, or FaaS, is a better term that Kong prefers. He next provides an example of how a FaaS architecture may benefit applications, and contrasts solutions like AutoScaling Groups (or the equivalent in OpenStack) with FaaS.

Some key characteristics of serverless, as summarized by Kong:

  • No need to think about servers
  • Run your code, not the whole application
  • Highly available and horizontally scalable
  • Stateless/ephemeral
  • Lightweight/single-purpose functions
  • Event-driven style
  • Pay only for what you use

Some use cases for serverless/FaaS:

  • Scheduled (cron) jobs
  • Microservices-based applications
  • Data processing tasks
  • IoT
  • Mobile backends
  • Chatbots
  • Side tasks like sending a follow-up email in response to an order, etc.

Some existing serverless/FaaS solutions include AWS Lambda or Azure Functions; in the open source space, there are projects like Apache OpenWhisk, Fission, or Kubeless. Most of these solutions/projects are leveraging container technologies.

This leads Kong to introduce Qinling, which is an OpenStack project intended to provide FaaS. Qinling came out of some of Kong’s work in Mistral, and the name has no special meaning other than it is from Kong’s home province in China.

Qinling uses Kubernetes on the backend to orchestrate containers that will, in turn, be used to execute functions. Architecturally, Qinling has only two components: the qinling-api service and the qinling-engine service. Some of the features present in Qinling include:

  • RESTful API (API objects include runtimes, functions, executions, and jobs)
  • Integrates with other OpenStack services
  • Supports code/Docker images/Swift object as a function
  • Sync/Async/Periodic execution
  • Scale up/scale down
  • Supports OpenStack CLI

Qinling can also consume/integrate with Aodh, Zaqar, and Swift (as mentioned above).

Kong now moves into a pair of demos. The first demo models a common Lambda use case surrounding an action executing after an object is uploaded to S3. In this demo, a Qinling function consumes an alarm from Aodh generated from an object being uploaded to Swift. The demo works, but is a bit slow; Kong explains that this is due to cold start considerations (first-time invocation takes longer than subsequent invocations).

Kong now hands it over to Wang to show the second demo. The second demo shows how to use Zaqar with Qinling. (Zaqar, for those who aren’t aware, is multi-tenant “messaging-as-a-service” for OpenStack, similar in nature to Amazon SQS.) This example shows how you might accomplish communications between various functions. Although it takes a few minutes, the demo is successful in sending an SMS to the presenters’ phones in response to a service being down in an OpenStack cloud.

Some additional resources on Qinling are shared by the presenters:

At this point, the presenters end the session.

Categories: Scott Lowe

How to Deploy 800 Servers in 8 Hours

Mon, 11/06/2017 - 23:30

This is a liveblog of the session titled “How to deploy 800 nodes in 8 hours automatically”, presented by Tao Chen with T2Cloud (Tencent).

Chen takes a few minutes, as is customary, to provide an overview of his employer, T2cloud, before getting into the core of the session’s content. Chen explains that the drive to deploy such a large number of servers was driven in large part by a surge in travel due to the Spring Festival travel rush, an annual event that creates high traffic load for about 40 days.

The “800 servers” count included 3 controller nodes, 117 storage nodes, and 601 compute nodes, along with some additional bare metal nodes supporting Big Data workloads. All these nodes needed to be deployed in 8 hours or less in order to allow enough time for T2cloud’s customer, China Railway Corporation, to test and deploy applications to handle the Spring Festival travel rush.

To help with the deployment, T2cloud developed a “DevOps” platform consisting of six subsystems: CMDB, OS installation, OpenStack deployment, task management, automation testing, and health check/monitoring. Chen doesn’t go into great deal about any of these subsystems, but the slide he shows does give away some information:

  • The OS installation subsystem appeared to leverage IPMI and Cobbler.
  • The OpenStack deployment subsystem used Ansible (it’s not clear if it was using OpenStack-Ansible or just Ansible with custom playbooks).

Going into a bit more detail, Chen explains that the OS installation process uses PXE plus Kickstart, all glued together using Cobbler. For the 800-node deployment, T2cloud used a total of 3 Cobbler servers, running about 20 servers concurrently in each group. Each server took about 10 minutes to deploy. Chen mentions that they used Cobbler snippet to dynamically locate the target disk when deploying the operating system; this is a feature I hadn’t heard of before, and sounds very useful (especially in large-scale environments). Chen also explains how to use Cobbler to revert to the “old style” of interface naming (i.e., eth0 instead of enoXXXXXX or similar).

Coming to the OpenStack deployment portion, Chen discusses the benefits that led T2cloud to choose Ansible for deployment. T2cloud used Ansible both for ad hoc commands as well as with playbooks. Chen reviews some of the key components they needed in order to use playbooks: inventory, roles, modules, group variables, and host variables. (These are all pretty standard Ansible constructs.) Chen does not mention anything about the modules that T2cloud used, other than to say they are idempotent and explain what that means.

Chen does review some performance optimizations for Ansible:

  • Disable the gather_facts function
  • Enable pipelining
  • Increase the number of forks running
  • Upgrade SSH (as necessary) and use ControlPersist

Next, Chen reviews some issues and their respective solutions discovered during this deployment:

  • The Neutron agents were down, even though the services were running; this was later determined to be an insuficient number of file descriptors for the system because RabbitMQ was using too many descriptors
  • The Cinder services failed to start, due to an maximum connection limit on the backend MySQL database(s)
  • There were a number of other limits or maximum connections that also had to be increased; Chen shows a slide listing quite a few of them

This brings Chen to a discussion of the health and monitoring system, which was used before the deployment, during the deployment, and after the deployment (for different purposes in each phase, of course).

In closing, Chen provides some suggestions/recommendations to attendees:

  • Do more tests to find bottlenecks.
  • Always do health checks.

At this point, Chen ends the session and opens up for questions from the audience.

Some things I would’ve liked to hear from Chen:

  • How did they get inventory from Cobbler to Ansible? Was it manual, or was there an automated process?
  • Which Ansible modules did they use? If they built their own, why?
  • What tools were used in the health check/monitoring system?

Without more detailed information, the session was good but not as helpful as it could have been.

Categories: Scott Lowe

How to Deploy 800 Servers in 8 Hours

Mon, 11/06/2017 - 23:30

This is a liveblog of the session titled “How to deploy 800 nodes in 8 hours automatically”, presented by Tao Chen with T2Cloud (Tencent).

Chen takes a few minutes, as is customary, to provide an overview of his employer, T2cloud, before getting into the core of the session’s content. Chen explains that the drive to deploy such a large number of servers was driven in large part by a surge in travel due to the Spring Festival travel rush, an annual event that creates high traffic load for about 40 days.

The “800 servers” count included 3 controller nodes, 117 storage nodes, and 601 compute nodes, along with some additional bare metal nodes supporting Big Data workloads. All these nodes needed to be deployed in 8 hours or less in order to allow enough time for T2cloud’s customer, China Railway Corporation, to test and deploy applications to handle the Spring Festival travel rush.

To help with the deployment, T2cloud developed a “DevOps” platform consisting of six subsystems: CMDB, OS installation, OpenStack deployment, task management, automation testing, and health check/monitoring. Chen doesn’t go into great deal about any of these subsystems, but the slide he shows does give away some information:

  • The OS installation subsystem appeared to leverage IPMI and Cobbler.
  • The OpenStack deployment subsystem used Ansible (it’s not clear if it was using OpenStack-Ansible or just Ansible with custom playbooks).

Going into a bit more detail, Chen explains that the OS installation process uses PXE plus Kickstart, all glued together using Cobbler. For the 800-node deployment, T2cloud used a total of 3 Cobbler servers, running about 20 servers concurrently in each group. Each server took about 10 minutes to deploy. Chen mentions that they used Cobbler snippet to dynamically locate the target disk when deploying the operating system; this is a feature I hadn’t heard of before, and sounds very useful (especially in large-scale environments). Chen also explains how to use Cobbler to revert to the “old style” of interface naming (i.e., eth0 instead of enoXXXXXX or similar).

Coming to the OpenStack deployment portion, Chen discusses the benefits that led T2cloud to choose Ansible for deployment. T2cloud used Ansible both for ad hoc commands as well as with playbooks. Chen reviews some of the key components they needed in order to use playbooks: inventory, roles, modules, group variables, and host variables. (These are all pretty standard Ansible constructs.) Chen does not mention anything about the modules that T2cloud used, other than to say they are idempotent and explain what that means.

Chen does review some performance optimizations for Ansible:

  • Disable the gather_facts function
  • Enable pipelining
  • Increase the number of forks running
  • Upgrade SSH (as necessary) and use ControlPersist

Next, Chen reviews some issues and their respective solutions discovered during this deployment:

  • The Neutron agents were down, even though the services were running; this was later determined to be an insuficient number of file descriptors for the system because RabbitMQ was using too many descriptors
  • The Cinder services failed to start, due to an maximum connection limit on the backend MySQL database(s)
  • There were a number of other limits or maximum connections that also had to be increased; Chen shows a slide listing quite a few of them

This brings Chen to a discussion of the health and monitoring system, which was used before the deployment, during the deployment, and after the deployment (for different purposes in each phase, of course).

In closing, Chen provides some suggestions/recommendations to attendees:

  • Do more tests to find bottlenecks.
  • Always do health checks.

At this point, Chen ends the session and opens up for questions from the audience.

Some things I would’ve liked to hear from Chen:

  • How did they get inventory from Cobbler to Ansible? Was it manual, or was there an automated process?
  • Which Ansible modules did they use? If they built their own, why?
  • What tools were used in the health check/monitoring system?

Without more detailed information, the session was good but not as helpful as it could have been.

Categories: Scott Lowe

IPv6 Primer for Deployments

Mon, 11/06/2017 - 22:00

This is a liveblog of the OpenStack Summit Sydney session titled “IPv6 Primer for Deployments”, led by Trent Lloyd from Canonical. IPv6 is a topic with which I know I need to get more familiar, so attending this session seemed like a reasonable approach.

Lloyd starts with some history. IPv6 was released in 1980, and uses 32-bit address (with a total address space of around 4 billion). IPv4, as most people know, is still used for the majority of Internet traffic. IPv6 was released in 1998, and uses 128-bit addresses (for a theoretical total address space of 3.4 x 10 to the 38th power). IPv5 was an experimental protocol, which is why the IETF used IPv6 as the version number for the next production version of the IP protocol.

Lloyd shows a graph showing the depletion of IPv4 address space, to help attendees better understand the situation with IPv4 address allocation. The next graph Lloyd shows illustrates IPv6 adoption, which—according to Google—is now running around 20% or so. (Lloyd shared that he naively estimated IPv4 would be deprecated in 2010.) In Australia it’s still pretty difficult to get IPv6 support, according to Lloyd.

Next, Lloyd reviews decimal and binary number conversion as a precursor to better understand how both IPv4 and IPv6 numbers are represented for human consumption, and how subnetting is handled by masking out some number of bits. Whereas IPv4 uses dotted decimal format, IPv6 uses hexadecimal numbering. Lloyd reminds attendees that MAC addresses also use hexadecimal, though only with 48 bits (instead of 128 bits with IPv6).

Bringing up an IPv6 address, Lloyd shares a few tricks to make the address easier to handle. For example, you can remove leading zeroes. You can also use a double colon (::) to remove consecutive series of all zeroes (this can only be done once per address). Using a colon in the address (for example, 2001:db8::8a2e:0:4) means we have to enclose the address in square brackets before appending a colon and a port number.

Subnetting in IPv6 is handled a bit differently, naturally, given the expanded address space. Large ISPs would generally get a /32 prefix (resulting in 65,536 /48 prefixes). Large organizations would generally get a /48 prefix (resulting in 65,536 /64 prefixes, or 256 /56 prefixes). A /56 prefix is for a typical end-user site, which gives an end-user 256 /64 prefixes. And a /64 prefix is considered a single local network.

IPv6 also has some special addresses. The first special address is a link local address (similar to the “169.254” subnet in IPv4). This link local address is derived from the MAC hardware address (using EUI-64, which pads out the 48-bit MAC address to 64 bits and flips a bit to note that it was automatically generated). The link local address is always kept, even if the interface is assigned a global address. The RFC1918 equivalent of link local in IPv6 is called ULA (Unique Local Addressing) is fd00::/8 followed by a randomly-generated section (gives about a trillion unique combinations, making address space collisions far less likely).

The next topic that Lloyd addresses is Neighbour Discovery (ND), the IPv6 equivalent of ARP in IPv4. ND uses the link local address as the source, and a multicast address as the target. Address configuration—specifically, automatic address configuration—is handled via one of two methods. The first is SLAAC, which combines EUI-64 and the link prefix with a Router Advertisement (RA) to assign addresses. Early versions of SLAAC didn’t support providing DNS server addresses, but this has been addressed in more recent versions of SLAAC. Users can also use DHCPv6, which operates much in the same way as DHCPv4. DHCPv6 has the option of including additional information that isn’t supported when using RAs. RAs can point hosts to use DHCPv6. There’s also a method of getting the address via SLAAC but configuration via DHCPv6. Finally, there’s also something called DHCPv6-PD (Prefix Designation), which allows networks to request a prefix from an ISP to then handle themselves.

Lloyd talked briefly about MTU, but I missed the information. NAT isn’t generally used with IPv6, although preliminary support for it is found in very recent Linux kernels. Another consideration Lloyd mentions is single stack vs. dual stack (only IPv6 or both IPv6+IPv4); most initial deployments will use dual stack.

At this point, Lloyd shifts his focus to talk specifically about IPv6 in OpenStack. Generally speaking, OpenStack has relatively strong IPv6 support throughout. The exception to this is Open vSwitch, which currently can’t do GRE tunneling using IPv6 in the underlay. The fix for this is to replace GRE with VXLAN or Geneve, both of which can use IPv6 in the underlay. Lloyd also reminds attendees that they’ll need to check with vendors to verify IPv6 support in their plugins or commercial solutions.

With regards to IPv6 in tenant networks, tenants should use globally unique addresses. This is done via Neutron’s address scopes. Then, out of that address scope, set up a subnet pool (this allows Neutron to hand out a smaller section of the overall address scope). Then, individual tenants can just create a subnet; that subnet can be pulled from a default subnet pool.

When creating a Neutron subnet, you can specify SLAAC or DHCPv6 for address configuration.

IPv6 support is weaker when it comes to DVR (Distributed Virtual Router) or L3 HA. IPv6 does work with DVR, but the traffic isn’t fully distributed; there’s no IPv6 support in L3 HA.

When it comes to integrating Neutron IPv6 networks with external IPv6 networks, you can do static routing, use BGP, or you can use DHCPv6-PD (Neutron can act as a DHCPv6-PD client).

Lloyd next discusses some concerns around cloud-init and VM metadata. I didn’t catch all the considerations, but it sounded like the workaround was using config-drive to address the considerations.

Reverse DNS is also a consideration around IPv6 and OpenStack. This is not generally a problem, though it may be a problem if you’re sending email messages directly from instances (reverse DNS will typically be necessary in that instance).

At this point, Lloyd opens the session up to questions from the audience.

Categories: Scott Lowe

Battle Scars from OpenStack Deployments

Mon, 11/06/2017 - 21:00

This is the first liveblog from day 2 of the OpenStack Summit in Sydney, Australia. The title of the session is “Battle Scars from OpenStack Deployments.” The speakers are Anupriya Ramraj, Rick Mathot, and Farhad Sayeed (two vendors and an end-user, respectively, if my information is correct). I’m hoping for some useful, practical, real-world information out of this session.

Ramraj starts the session, introducing the speakers and setting some context for the presentation. Ramraj and Mathot are with DXC, a managed services provider. Ramraj starts with a quick review of some of the tough battles in OpenStack deployments:

  • Months to deploy OpenStack at scale
  • Chaos during incidents due to lack of OpenStack skills and knowledge
  • Customers spend lengthy periods with support for troubleshooting basic issues
  • Applications do not get onboarded to OpenStack
  • Marooned on earlier version of OpenStack
  • OpenStack skills are hard to recruit and retain

Ramraj recommends using an OpenStack distribution versus “pure” upstream OpenStack, and recommends using new-ish hardware as opposed to older hardware. Given the last bullet, this complicates rolling out OpenStack and resolving OpenStack issues. A lack of DevOps skills and a lack of understanding around OpenStack APIs can impede the process of porting applications to OpenStack, which in turn makes it harder to justify the cloud project.

Ramraj then spends a few minutes talking about how and why a managed services provider might be able to help address some of the challenges around an OpenStack deployment. Ramraj and the other presenters do not, unfortunately, provide any suggestions for resolving the challenges described above other than recommending the use of a managed services provider, making this session far less useful than it might otherwise have been.

Ramraj closes out her portion of the session with a quote from Sun Tzu’s Art of War: “Every battle is won before it is fought.”

At this point, Ramraj brings up Farhad Sayeed to tell the story of deploying OpenStack at American Airlines. Sayeed spends a couple of minutes talking about the goals American Airlines hoped to achieve with an OpenStack-based private cloud deployment. Then Sayeed describes a couple “road bumps” they experienced:

  • Had to work closely with the networking group
  • Tested a number of different distributions

I’m not clear how either of these are actual problems; rather, they seem like perfectly natural parts of deploying something like OpenStack (which affects lots of different aspects of IT).

Next Sayeed describes some of their target workloads; American Airlines is pushing a “PaaS”-first strategy leveraging Cloud Foundry. For IaaS, American is supporting RHEL and currently testing Ubuntu. American Airlines is still in a very exploratory stage with containers.

Looking back, Sayeed shares a few lessons learned:

  • Need a collaborative culture
  • Lots of training and retooling required
  • Less is more (less customization makes it easier)
  • Use a good service provider

I can absolutely agree with the first three lessons learned shared by Sayeed; the fourth might be the right approach for some organizations, but not necessarily all organizations.

Sayeed closes out his portion of the session by encouraging attendees to embrace automation and orchestration, and hands it over to Rick Mathot (also of DXC). Mathot spends a few minutes talking about managing the “toughest” battle: managing client expectations. Mathot talks about topics like focusing on business value (asking “is it useful?” and be able to quantify business value), lead from the front (but keep an eye on the rearview mirror), and make sure what you’re doing is relevant.

At this point, Mathot wraps up the content and opens the session up for questions from the audience.

Categories: Scott Lowe

Kubernetes on OpenStack: The Technical Details

Mon, 11/06/2017 - 00:45

This is a liveblog of the OpenStack Summit session titled “Kubernetes on OpenStack: The Technical Details”. The speaker is Angus Lees from Bitnami. This is listed as an Advanced session, so I’m hoping we’ll get into some real depth in the session.

Lees starts out with a quick review of Bitnami, and briefly clarifies that this is not a talk about OpenStack on Kubernetes (i.e., using Kubernetes to host the OpenStack control plane); instead, this is about Kubernetes on OpenStack (OpenStack as IaaS, Kubernetes to do container orchestration on said IaaS).

Lees jumps quickly into the content, providing a “compare-and-contrast” of Kubernetes versus OpenStack. One of the key points is that Kubernetes is more application-focused, whereas OpenStack is more machine-focused. Kubernetes’ multi-tenancy story is shaky/immature, and the nature of containers means there is a larger attack surface (VMs provide a smaller attack surface than containers). Lees also points out that Kubernetes is implemented mostly in Golang (versus Python for OpenStack), although I’m not really sure why this matters (unless you are planning to contribute to one of these projects).

Lees next provides an overview of the Kubernetes architecture (Kubernetes master node containing API server talking to controller manager and scheduler; kubelet, cAdvisor, and kube-proxy on the worker nodes; etcd as a distributed key-value store for storing state in the Kubernetes master; pods running on worker nodes and having one or more containers in each pod).

Next, Lees shows another Kubernetes diagram, but this time the diagram illustrates the “connection points” between Kubernetes and the underlying cloud (OpenStack, in this particular case).

Lees spends some time reviewing the basics of Kubernetes networking, reviewing the core constructs leveraged by Kubernetes. In the process of reviewing Kubernetes networking, Lees points out that there are lots of solutions for pod-to-pod (east-west) traffic flows. Traffic flows for internet-to-pod (north-south) are handled a bit differently; Kubernetes assumes each pod has outbound connectivity to the Internet. For inbound connectivity, this is where Kubernetes Services come into play; you could have a Service of type NodePort (unique port forwarded by kube-proxy on every node in the Kubernetes cluster) or a Service of type LoadBalancer (which uses a cloud load balancer with nodes & NodePorts as registered backends).

Having now covered the Kubernetes concepts, Lees shifts his focus to concentrate on the “connection points” between Kubernetes and OpenStack (the underlying cloud provider in this particular case). These connection points are provided by the OpenStack cloud provider in Kubernetes (enabled via the --cloud-provider and --cloud-config flags). Lees shares the story of how his experimentation with OpenStack and Kubernetes in 2014 led to implementing the OpenStack provider for Kubernetes in 2014.

The first connection point Lees discusses in detail regards instances (compute). This requires the Nova Compute v2 API. One challenge in the Kubernetes provider is that the instance ID isn’t necessarily unique and isn’t resolvable via DNS (generically). To help address this, the Kubernetes provider requires the node name to be the same as the OpenStack instance name (which is not the same as the hostname or the instance ID). This is due, in part, to how the Kubernetes provider determines IP addresses for the node. (Side note: Kubernetes isn’t yet very IPv6-friendly, so Lees recommends avoiding putting IPv6 addresses on Kubernetes nodes.)

The next connection point is zones, where Kubernetes looks up the region and availability zone. This information is copied into a label (Lees doesn’t say but I assume these labels are assigned to the nodes).

Load balancing is the next connection point that Lees reviews. This integration is based on Service objects specified of “type=LoadBalancer” (as described earlier). LBaaS v1 and v2 are supported, but Kubernetes 1.9 removes LBaas v1 support. Lees points out that this portion of code is quite complex.

Next up, Lees talks about the “routes” portion of the provider. This requires the Neutron “extraroute” extension, and implements “kubenet” networking using Neutron routers. This adds routes to the Neutron router for each node’s pod subnet, and adds entries into the node’s allowed-address pairs. (It’s worth noting that kubenet is deprecated in favor of CNI.)

The Cinder volume plugin isn’t technically part of the Kubernetes provider, but use the Kubernetes provider to gather information and support OpenStack integration. The plugin doesn’t yet support Cinder v3; for v1 and v2 implementations, it attaches/detaches volumes from a VM as required for scheduled pods. This plugin does support dynamic provisioning (creating/deleting volumes on the fly).

The last connection point that Lees discusses is the Keystone password authentication plugin that sets up an authenticator to work against Keystone. Lees points out that this is pretty much a terrible idea, and recommends not using this approach. Lees is also careful to point out that this integration point does not bring multi-tenancy to Kubernetes.

OK, so what does all this mean? Lees shifts focus now to try to pull all this information together. First, Lees recommends using Magnum if it’s available in your OpenStack cloud. If Magnum isn’t available, Lees says that the OpenStack Heat kube-up script is unmaintained and probably should be avoided.

Next, Lees provides some recommendations for deployment:

  • A dedicated Kubernetes cluster for each (hostile) tenant (don’t mix tenants in a single Kubernetes cluster)
  • Use three controller VMs (ideally spread across availability zones; minimum of three in order to form a majority for etcd)
  • Spread worker VMs across availability zones, and put them all into one Neutron private network
  • Use fewer, larger VMs for worker nodes (instead of many smaller VMs)
  • Set up an LBaaS load balancer to handle the API access (since there are multiple master VMs)—this is necessary for the worker nodes to come up properly
  • Set up a separate LBaaS and a floating IP network for service access
  • Try to avoid Flannel or Weave when running on Neutron; instead, shoot for Kubenet (deprecated), Calico, or Flannel (Host-GW)
  • Lees seems less enthusiastic about Calico as opposed to Flannel with the host gateway backend

Lees provides a sample cloud.conf configuration (used to configure the OpenStack provider for Kubernetes), and mentions the Cinder API support and how to work around it (this won’t be necessary in a year).

Looking ahead to future work, Lees talks about efforts to make things more automatic with smart defaults. Development efforts are also working to move stuff out of cloud.conf into per-object annotations. Within the Kubernetes community, a lot of work is happening around moving cloud providers out of the core Kubernetes code base, and obviously the OpenStack provider would be affected by this effort.

Lees points out some ways to provide feedback and contact the team that is working on the OpenStack provider for Kubernetes. He then wraps up the session.

Categories: Scott Lowe

Issues with OpenStack That Are Not OpenStack Issues

Mon, 11/06/2017 - 00:30

This is a liveblog of OpenStack Summit session on Monday afternoon titled “Issues with OpenStack that are not OpenStack Issues”. The speaker for the session is Sven Michels. The premise of the session, as I understand it, is to discuss issues that arise during OpenStack deployments that aren’t actually issues with OpenStack (but instead may be issues with process or culture).

Michels starts with a brief overview of his background, then proceeds to position today’s talk as a follow-up (of sorts) to a presentation he did in Boston. At the Boston Summit, Michels discussed choosing an OpenStack distribution for your particular needs; in this talk, Michels will talk about some of the challenges around “DIY” (Do It Yourself) OpenStack—that is, OpenStack that is not based on some commercial distribution/bundle.

Michels discusses that there are typically two approaches to DIY OpenStack:

  • The “Donald” approach leverages whatever around, including older hardware.
  • The “Scrooge” approach is one in which money is available, which typically means newer hardware.

Each of these approaches has its own challenges. With older hardware, it’s possible you’ll run into older firmware that may not be supported by Linux, or hardware that no longer works as expected. With new hardware, you may run into issues where Linux doesn’t support the new hardware, drivers aren’t stable, firmware revisions aren’t supported, etc. So, neither approach is perfect.

Continuing the discussion around hardware, Michels points out that older hardware may not have as many CPU cores or offer support for as much RAM in a single box. The ratios of CPUs to RAM for the default flavors may not play well with CPU and RAM support in older hardware platforms, and the default overcommitment settings in OpenStack may also cause some issues with optimizing utilization of the OpenStack cloud. (I’m not sure how this applies to DIY OpenStack, as this would also affect commercial distributions of OpenStack.)

Next, Michels talks about BIOS and BIOS updates, firmware, and IPMI. By a show of hands, Michels shows that a very small portion of attendees actually do BIOS and firmware updates on a regular basis. Michels points out that it may sometimes be necessary to run BIOS, firmware, or IPMI updates in order to optimize performance and manageability. (Again, I’m not sure how this is specific to DIY OpenStack.)

The topic of storage is the next topic that Michels tackles, discussing some design considerations around the use of traditional hard disk drives (HDDs) versus solid state drives (SSDs). He also mentions some considerations around SSD encryption.

Networking support is another area to consider; Michels reminds attendees that support for newer networking technologies like 25G Ethernet may require firmware updates, BIOS updates, kernel updates, and occasional reboots to fix networking outages. Michels points out that you can also run into issues with newer hardware switches and how those switches may (or may not) integrate with or support OpenStack.

Things brings Michels to the topic of operating system updates, and some of the considerations around handling operating system updates. The topic of running your own continuous integration (CI) system is a related topic, and Michels shares some pain points around their own experience trying to use a CI pipeline for OpenStack code in their own deployment.

Related to discussions about operating system updates, Michels spends a few minutes talking about some of the kernel-related issues he’s seen. These issues include not only issues with functionality (i.e., something just plain not working) but also issues that result in reduced performance (such as forgetting to tune kernel parameters to better support 25G/40G NICs).

There’s also the issue of software issues, although I’m not clear whether Michels is referring to issues with software running on OpenStack or software leveraged by OpenStack but isn’t technically part of OpenStack (I tend to lean more toward the latter since he mentioned some issues with Libvirt). Michels points out that this may be a benefit of DIY OpenStack, as many vendor-provided or vendor-packaged OpenStack distributions may not offer you the ability to update specific packages or components.

Michels spends a few minutes talking about some issues with the Puppet manifests/modules provided by the OpenStack Foundation, but there were some issues with these manifests/modules that may require fixes.

Of course, then there’s the issue of customers putting your OpenStack cloud to work, and then uncovering potential issues or problems with the implementation. Modifying OpenStack to better accommodate customer demands may result in issues/errors elsewhere in OpenStack. Michels also talks about performance issues with the OpenStack CLI (these issues are apparently still not resolved).

CPU masking by hypervisors running under OpenStack may also affect performance, particularly if features like hardware encryption are masked out. This, in turn, may result in complaints from customers who are using OpenStack instances for use cases where hardware acceleration/offloading would be beneficial.

Michels talks about the need to read the OpenStack documentation; otherwise, you can avoid documented issues with certain configurations (the example Michels provides is running the Nova API in Apache with WSGI, which works [mostly] but isn’t supported).

Wrapping up the session, Michels ends the session and opens up for questions from the audience. As I mentioned a couple of times earlier in the liveblog, I don’t understand how many of the topics that Michels mentioned are specific to DIY OpenStack; it seems that many of these issues could equally affect commercial packages of OpenStack. I had hoped that Michels’ discussion would focus more on process and culture, but there was no discussion of process or culture and instead a strong focus on obscure technical issues. As such, the session was very technical and may have had value for a number of attendees, but I didn’t find it as useful as I’d hoped it would be.

Categories: Scott Lowe

To K8s or Not to K8s Your OpenStack Control Plane

Mon, 11/06/2017 - 00:15

This is a liveblog of the Monday afternoon OpenStack Summit session titled “To K8s or Not to K8s Your OpenStack Control Plane”. The speaker is Robert Starmer of Kumulus Technologies. This session is listed as a Beginner-level session, so I’m hoping it’s not too basic for me (and that readers will still get some value from the liveblog).

Starmer begins with a quick review of his background and expertise, and then proceeds to provide—as a baseline—an overview of containers and Kubernetes for container orchestration. Starmer covers terminology and concepts like Pods, Deployments (and Replica Sets), Services, StatefulSets, and Persistent Volumes. Starmer points out that StatefulSets and Persistent Volumes are particularly applicable to the discussion about using Kubernetes to handle the OpenStack control plane. Following the discussion of Kubernetes components, Starmer points out that the Kubernetes architecture is designed to be resilient, talking about the use of etcd as a distributed state storage system, multiple API servers, separate controller managers, etc.

Next, Starmer spends a few minutes talking about Kubernetes networking and some of the components involved, followed by a high-level discussion around persistent volumes and storage requirements, particularly for StatefulSets.

Having covered Kubernetes, Starmer now starts talking about the requirements that OpenStack has for its control plane, mapping these requirements to the functionality provided by Kubernetes.

At this point, Starmer transitions into talking about whether Kubernetes is appropriate for the OpenStack control plane. If you’re building a multi-tenant, multi-site production grade service, Starmer believes Kubernetes adds a great deal of value. For single-tenant production implementations or development environments, Starmer isn’t convinced that the functionality of Kubernetes outweighs the additional complexity.

If you do decide to proceed with using Kubernetes for the OpenStack control plane, Starmer reviews a couple of options for doing exactly that. One option is OpenStack-Helm, which leverages Helm (as the name would imply). Another option is Kolla-Kubernetes, which leverages work done by the Kolla project along with some fine-grained Helm charts.

Starmer next reminds attendees that leveraging Kubernetes for the OpenStack control plane means some additional skills are needed: Kubernetes itself, containers (Docker or LXC, typically Docker), Helm, monitoring (often Prometheus), and new languages (like Golang). Ansible is probably something else you’ll need, since most of the container generation tools leverage Ansible to build the container images.

Returning to the discussion of whether someone should use Kubernetes for the OpenStack control plane, Starmer summarizes his findings by saying that for organizations that won’t have someone owning the Kubernetes side of the house, they should probably just stick with containerized OpenStack control plane components. However, if the expertise and skills are there to support Kubernetes, then it is a perfectly valid approach.

At this point, Starmer wraps up the session.

Categories: Scott Lowe

OpenStack Summit Sydney Day 1 Keynote

Mon, 11/06/2017 - 00:00

This is a liveblog of the day 1 keynote here at the OpenStack Summit in Sydney, Australia. I think this is my third or fourth trip to Sydney, and this is the first time I’ve run into inclement weather; it’s cloudy, rainy, and wet here, and forecasted to remain that way for most of the Summit.

At 9:02am, the keynotes (there are actually a set of separate keynote presentations this morning) kicks off with a video with Technical Committee memebers, community members, and others talking about the OpenStack community, the OpenStack projects, and the Summit itself. At 9:05am, the founders of the Australian OpenStack User Group—Tristan Goode and Tom Fifield—take the stage to kick off the general session. Goode and Fifield take a few minutes to talk about the history of the Australian OpenStack User Group and the evolution of the OpenStack community in Australia. Goode also takes a few moments to talk about his company, Aptira.

After a few minutes, Goode and Fifield turn the stage over to Mark Collier and Lauren Sell from the OpenStack Foundation. Collier and Sell set the stage for the upcoming presentations, do some housekeeping announcements, and talk about sponsors and support partners. Sell also takes a moment to provide a quick introduction to OpenStack, since about 40% of the attendees are here at the Summit for the very first time. Collier mentions that there are over 60 different locations/regions where an OpenStack-powered public cloud has a presence. This leads Collier to talk about free trials of OpenStack-powered public clouds, called the OpenStack Passport (visit https://openstack.org/passport/ for more information).

Collier brings out Monty Taylor, who talks about OpenStack Zuul (the integration testing system behind OpenStack, which is itself hosted on OpenStack). The latest release of Zuul, version 3, is now better designed to be run by other organizations (it was more closely tied to OpenStack in previous versions). Version 3 of Zuul leverages Ansible (instead of Jenkins Job Builder) and now has support for GitHub.

At this point, Collier brings out Jonathan Bryce, Executive Director of the OpenStack Foundation, to talk for a few minutes about the future of OpenStack. Bryce takes a few minutes to reflect on the history of OpenStack, looking back at the previous fifteen summits (this summit is the sixteenth summit). Bryce also reflects on the nature of OpenStack, looking at the four “opens” (open source, open design, open technology, and open community). This discussion leads Bryce to talk about the challenges facing open source projects; in particular, making sure open source innovation can be put to work by users. Bryce brings out Alison Randall to talk about how integration is a key effort to help put open source innovation to work for users and customers.

Randall talks for a few moments about the integration efforts happening within the OpenStack community, not just within the OpenStack projects but also among OpenStack projects and “external” open source projects. Randall revisits the “integration engine” theme used in the Austin Summit, talking about integration isn’t just about technology, it’s also about knowledge and information.

Bryce and Randall now bring up four steps on which the OpenStack Foundation will be focusing around integration:

  1. Find the common use cases. (Putting logos on a slide doesn’t make technologies work together.)
  2. Collaborate across communities, building trust and establishing standardized interfaces. (The OpenDev event is one example.)
  3. Build the required new technology, addressing the gaps identified in the previous steps.
  4. Test everything end-to-end, making sure that the integration is real, stable, and reliable. (OPNFV and XCI are examples of this in action.)

Bryce’s focus now shifts to organizing the OpenStack community, and how this is necessary in order to help support the four integration steps described above. Bryce encourages the OpenStack community to step up and focus on building an integration engine that helps users address their challenges.

At this point, Bryce hands the stage over to Sorabh Saxena from AT&T. Saxena leads off his presentation by talking about OpenStack’s role in FirstNet, a network focused on supporting first responders. Saxena encourages the attendees and the community to focus on building a “next-generation OpenStack” that embodies the principles Bryce mentioned in his presentation; in particular, Saxena calls out the edge computing use case. Some AT&T services built on OpenStack are DirecTV Now and Cricket Wireless, according to Saxena. This leads Saxena to provide an update on AIC (AT&T Integrated Cloud), a solution build on OpenStack. Next, Saxena talks about key efforts that “next-generation” OpenStack should address: security, simplified operations, upgrades and installation, and culture and process. At the end of his talk, Saxena brings out a couple presenters (couldn’t catch their names) to show a demo of some of the work AT&T is doing around the “next-generation” efforts Saxena described.

Saxena wraps up and Sell returns to the stage, highlighting a few other telecom-related sessions happening at the Summit this week. Sell then introduces Commonwealth Bank and Quinton Anderson to talk about their use of OpenStack.

Anderson talks for a moment about Commonwealth Bank, then focuses on some of the challenges users encounter when composing open source cloud technologies to address business needs. Anderson outlines Commonwealth Bank’s use of Ironic to help with their Hadoop installations, and talks about using Docker/Vault/Calico under Mesos+Marathon to support Yarn, Spark, etc. Anderson goes into some detail on the CI/CD pipeline his organization is using to manage “infrastructure as code”, and then talks about the collection of open source projects the bank has had to compose together. This list includes Jenkins, Notary, Vault, Nginx, Mesos, Kubernetes, Calico, Docker, Spark, Hadoop, Cassandra, and OpenStack.

Sell returns to the stage now, highlighting some financial services-related sessions happening at the summit this week.

Continuing the keynote presentations, Sell introduces Anni Lai from Huawei. Lai talks about the “three Internet giants” in China: Baidu, Alibaba, and Tencent. After a moment talking about Tencent’s business, Lai introduces Bowyer Liu, who is doing a live demo of mobile access to the OpenStack Summit web site (not sure what the unique value of Tencent was in the demo). Liu then takes center stage to talk a few minutes about Tencent, fairly quickly transitioning to Tencent’s use of OpenStack, called TStack. As an example, Liu talks about how WeChat Procurement—a payment system—runs on OpenStack and handles 200,000 orders daily. Tencent also offers an OpenStack-powered public cloud offering called Tencent Cloud (TStack is an internal private cloud running OpenStack.). Liu discusses other initiatives stemming from Tencent’s efforts, including various government efforts and industry-specific cloud projects.

Collier now returns to the stage, recapping the Tencent presentation and now transitioning to the Superuser program. To that end, Collier brings out Allison Price and Nicole Martinelli. Price and Martinelli talk about the Superuser program, reviews the recent awards, and introduces Thomas Andrew from Paddy Power Betfair. Andrew announces the candidates for the next Superuser candidates: China Railway Corporation, China UnionPay, City Network, and the Tencent TStack team. The winner is the Tencent TStack team. Liu returns to the stage, along with Collier and the rest of the Tencent TStack team.

After some photos, Collier shifts the focus to savings realized via OpenStack, and brings out Joseph Sandoval and Nicolai (?) Brousse from Adobe Advertising Cloud. Cost efficiency was a big driver for Adobe, leading them to move from public cloud (where they’d started) to a private cloud using OpenStack. According to their internal measurements, Adobe realized a 30% savings in moving to an OpenStack-powered private cloud. However, the speakers reiterated the need for close alignment with the business, perseverance with the users who are consuming the private cloud/services, and retaining focus on delivering value to the business.

Collier returns to the stage to provide the Adobe team with “100K cores” stickers, and introduces the next speaker. The next speaker is Lew Tucker, VP and CTO for cloud computing from Cisco. Tucker’s focus, based on the opening slide, is on multi-cloud architectures. Tucker reviews a few statistics that show the trend is clearly toward the use of multiple clouds, and shows a slide outlining all of the various efforts within Cisco that address multi-cloud architectures. However, challenges clearly remain that have not yet been addressed, and this leads Tucker to review Cisco’s recent announcement with Google Cloud. Complexity around microservices architectures leads Tucker to talk about leveraging a service mesh—such as that provided by Istio—to help simplify some of the complexity around microservices-based architectures, and Tucker talks about Cisco’s joint efforts with Google, Lyft, and others in building out Istio and Istio-based service meshes.

As Tucker wraps up his presentation, Sell returns to the stage to introduce the next set of speakers. Carrying on a tradition of hearing about OpenStack in science and research, Sell introduces Dr. Steven Manos and Dr. Steve Quenette to talk about Nectar, a national “research cloud” that spans seven locations across Australia. Nectar supports over 27,000 cores and 12,000 registered users. These speakers, in turn, introduce Brendan Mackey to talk about his research on climate change and biodiversity (his research uses a “virtual laboratory” powered by the Nectar Cloud). After Mackey’s talk, the next speaker is introduced: Gary Eagan, who steps up to talk about brain research powered by OpenStack.

Collier and Sell return to the stage to wrap up the presentations, providing a “shout out” to the Science Working Group and their efforts in supporting OpenStack in science.

Categories: Scott Lowe

A Sublime Text Keymap for Bracketeer

Thu, 11/02/2017 - 22:00

I’ve made no secret of the fact that I’m a fan of Sublime Text (ST). I’ve evaluated other editors, like Atom, but still find that ST offers the right blend of performance, functionality, customizability, and cross-platform support. One nice thing about ST (other editors have this too) is the ability to extend it via packages. Bracketeer is one of many packages that can be used to customize ST’s behavior; in this post, I’d like to share a keymap I’m using with Bracketeer that I’ve found very helpful.

Bracketeer is a package that modifies ST’s default bracketing behavior. I first started using Bracketeer to help with writing Markdown documents, as it makes adding brackets (or parentheses) around existing text easier (it automatically advances the insertion point after the closing bracket). After using Bracketeer for a little while, I realized I could extend the keymap for Bracketeer to have it also help me with “wrapping” text in backticks and a few other characters. I did this by adding this line to the default keymap:

{ "keys": [ "`" ], "command": "bracketeer", "args": { "braces": "``", "pressed": "`" } }

With this line in the keymap, I could select some text, press the backtick character, and ST would surround the selected text in backticks and position the insertion point after the closing backtick. Handy! Without Bracketeer, doing this would have just replaced the selected text with a single backtick.

Unfortunately, there was a drawback: If I wanted just a single backtick, I’d still get a pair of them. In digging around for a solution/workaround, I found that ST’s keymap allows you to specify a “context,” or a condition, that applies to whether the keymap is applied.

To put this into action, I added a context to the keymap entry above to make it look like this:

{ "keys": ["`"], "command": "bracketeer", "args": { "braces": "``" }, "context": [ { "key": "selection_empty", "operator": "equal", "operand": false, "match_all": true } ] }

Configured this way, Bracketeer only activates when text is selected. When text isn’t selected, then Bracketeer doesn’t modify the default behavior. This allows me to use Bracketeer to easily surround selected text with just about any character: backticks, asterisks, pipe symbols, or underscores.

For a full copy of my modified Bracketeer keymap, see this gist.

Categories: Scott Lowe

Strange Error with the Azure CLI

Tue, 10/31/2017 - 14:00

Over the last week or so, I’ve been trying to spend more time with Microsoft Azure; specifically, around some of the interesting things that Azure is doing with containers and Kubernetes. Inspired by articles such as this one, I thought it would be a pretty straightforward process to use the Azure CLI to spin up a Kubernetes cluster and mess around a bit. Simple, right?

Alas, it turned out not to be so simple (if it had been simple, this blog post wouldn’t exist). The first problem I ran into was the upgrading the Azure CLI from version 2.0.13 to version 2.0.20 (which is, to my understanding, the minimum version needed to do what I was trying to do). I’d installed the Azure CLI using this process, so pip install azure-cli --upgrade should take care of it. Unfortunately, on two out of three systems on which I attempted this, the Azure CLI failed to work after the upgrade. I was only able to fix the problem by completely removing the Azure CLI (which I’d installed into a virtualenv), and then re-installing it. First hurdle cleared!

With the Azure CLI upgraded, I proceeded to ensure that I was appropriately logged in (via az login), created a resource group in “west us 2” (via az group create), and then tried to launch an ACS cluster:

az acs create --orchestrator-type=kubernetes --resource-group=my-rg \ --name=my-k8s-cluster --generate-ssh-keys

The Azure CLI created a service principal for me (as expected), but then errored out with a permissions-related error referencing a different service principal.

Using az ad sp list, I looked at the service principal in question; it was not the service principal created automatically by the az acs create command, but a built-in service principal named “AzureContainerService”. Thinking I’d done something wrong, I deleted everything I’d done so far—removed the resource group, deleted the automatically-created service principal—and tried again.

No joy; same error. Dave Strebel (@dave_strebel on Twitter) offered some assistance but couldn’t reproduce the error. OK, let’s try deleting the “AzureContainerService” principal. Nope, that just made things worse. Thinking that perhaps something had gone wrong with my subscription, I deactivated that subscription and created a new one.

That didn’t work either; same error (I had to correct a few subscription-related issues via az account first). At this point, Dave offered to take a deeper look for me (thanks Dave!), so I sent him some information. While I was waiting to hear back from Dave, I tried installing the Azure CLI on Windows 10, just to see if there was some sort of platform issue at play here. I ran into the same error there.

After a while, Dave contacted me and suggested I run a few commands. After some trial and error, the correct set of commands that ultimately enabled az acs create to work as expected were these:

az provider register -n Microsoft.Compute az provider register -n Microsoft.Network az provider register -n Microsoft.Storage

After running these commands and giving some time for the registrations to fully complete (the az provider show command doesn’t really help much, to be honest, despite the suggestion from the Azure CLI otherwise), then creating a Kubernetes-powered ACS cluster using az acs create worked as expected. Hurray!

So, what are the lessons I learned from this experience?

  • If you’re trying to use az acs create and getting a strange permissions error related to the “AzureContainerService” service principal, try the az provider register commands listed above. (I still don’t know what these commands actually do, or why they were required.)
  • If you installed the Azure CLI using pip, upgrading it via pip may not work as expected. I recommend using a virtualenv to make removing the Azure CLI and re-installing it a simple process. (This doesn’t apply to Windows, naturally, where the typical installation method is an MSI package.)
  • Dave Strebel is a great example of community advocacy.

On to more adventures!

Categories: Scott Lowe

Technology Short Take 89

Fri, 10/27/2017 - 13:00

Welcome to Technology Short Take 89! I have a collection of newer materials and some older materials this time around, but hopefully all of them are still useful. (I needed to do some housekeeping on my Instapaper account, which is where I bookmark stuff that frequently lands here.) Enjoy!

Networking
  • This is a slightly older post providing an overview of container networking, but still quite relevant. Julia has a very conversational style that works well when explaining new topics to readers, I think.
  • Russell Bryant has a post on Open Virtual Network (OVN), a project within the Open vSwitch (OVS) community. If you’re not familiar with OVN, this is a good post with which to start.
Servers/Hardware

Hmm…I didn’t find anything again this time around. Perhaps I should remove this section?

Security
  • This blog post discusses some of the new network security functionality available in vSphere Integrated Containers (VIC) version 1.2; specifically, the new container network firewall functionality.
  • The NIST and DHS have teamed up on some efforts to secure BGP; more information is available in this article.
  • When I was using Fedora, I needed some useful information on firewall-cmd, and found this article to be helpful.
  • Much wailing and gnashing of teeth occurred as a result of the discovery of the KRACK attack.
Cloud Computing/Cloud Management
  • Here’s a handy tutorial on using Docker for persisting state across AWS Spot Instances.
  • I like this article on using Couchbase on AWS from Kubernetes because it addresses an often-overlooked (in my opinion) aspect of containerized/microservices architectures: they still need to communicate to external services.
  • I wonder how many more Kubernetes provisioning tools will emerge before tool consolidation starts happening? Here’s another one.
  • The 1.8 release of Kubernetes has integration with the 1.0 beta version of containerD (see this post by Docker, or visit the GitHub page for the cri-containerd plugin). If you’re not familiar with containerD, you may find this post helpful.
  • Paul Johnston tackles some myths regarding vendor lock-in and serverless.
  • Mark Brookfield shares a bad experience he had with running NetBSD on Amazon Web Services. I can certainly see Mark’s perspective regarding some perceived failings of AWS; at the same time, I can also understand the need for AWS to limit their support of community-provided AMIs. (At their scale—millions of customers—I can see why they’d need to carefully limit how far they push the support boundary.) For what it’s worth, I’ve never tried NetBSD, but I have yet to run into any similar issues with any distribution of Linux I’ve tried.
Operating Systems/Applications Storage
  • Tom Scanlan shares how to use VIC volumes as a way of helping address persistent storage challenges with containers.
Virtualization Career/Soft Skills
  • Roman Dodin shared this post with me about using GitLab and CloudFlare to host a Hugo-powered blog. I do like the use of GitLab CI to help automate the build of the site; that’s pretty handy.
  • I can’t tell you just how much I agree with this statement from this post: “User groups should not be an avenue for sales.” Amen! If you’re a partner/vendor/reseller/whatever and you’re participating in user group meetings, don’t try to turn it into a sales presentation. Make it a conversation, an opportunity to build a rapport with customers and potential customers.

That’s all for now. Check back again in about 2 weeks for the next Technology Short Take. In the meantime, feel free to hit me on Twitter if you have a link you think I should include in a future post. Thanks for reading!

Categories: Scott Lowe

Posts from the Past, October 2017

Thu, 10/26/2017 - 13:00

After over 12 years of writing here, I’ve accumulated a pretty fair amount of content. To help folks discover older content, I thought it might be fun to revisit what I’ve published in October in years past. Here are some “posts from the past,” all from October of previous years. Enjoy!

October 2005

Protecting Against OpenSSL SSLv2 Flaw

October 2006

I was spending a great deal of time with Active Directory back then:

Finding Recently Created Active Directory Accounts
Refined Solaris 10-AD Integration Instructions

October 2007

Storage was the name of the game a decade back:

Sanrad Configuration Basics
VM File-Level Recovery with NetApp Snapshots

October 2008

Quick Note on ESX and ESXi Storage Multipathing
Is Power the Key to Controlling the Cloud?

October 2009

Fibre Channel, FCoE, and InfiniBand, oh my!

New User’s Guide to Managing Cisco MDS Zones via CLI
I/O Virtualization and the Double-Edged Sword
Setting up FCoE on a Nexus 5000

October 2010

Shortening URLs via bit.ly from the CLI
Shortening URLs via bit.ly the Apple Way

October 2011

Content Creation and Mind Mapping

October 2012

In October 2012 I was neck-deep in learning all I could learn about Open vSwitch, and a pretty fair amount of the content I produced then is still applicable even today:

Some Insight into Open vSwitch Configuration
Link Aggregation and LACP with Open vSwitch

October 2013

In 2013 I started looking at configuration management tools, starting with Puppet.

Installing Open vSwitch on Ubuntu with Puppet
Managing SSH Authorized Keys with Puppet

October 2014

Multi-Machine Vagrant with YAML

October 2015

Adding an Interface to an OpenStack Instance After Creation

October 2016

Last year I was sharpening skills with various tools I’d learned, like Vagrant and Ansible:

A Triple-Provider Vagrant Environment
Managing AWS Infrastructure with Ansible

Categories: Scott Lowe