I recently gave a presentations at the OpenStack Summit in Boston with the same title. You can find a video of the talk here: https://www.openstack.org/videos/boston-2017/dirty-clouds-done-dirt-cheap. This blog post will cover the same project, but it will go into a lot more detail, which I couldn’t cover during the presentation.
Just a heads up this post is long! I try to cover every step of the project with all the details I could remember. It probably would have made sense to split things up into multiple posts, but I wrote it in a single sitting and doing that felt weird. If you’re just looking for a quicker overview, I recommend watching the video of my talk instead.
The Project Scope
When I was in college I had a part time job as a sysadmin at a HPC research lab in the aerospace engineering department. In that role I was responsible for all aspects of the IT in the lab, from the workstations and servers to the HPC clusters. In that role I often had to deploy new software with no prior knowledge about it. I managed to muddle through most of the time by reading the official docs and frantically google searching when I encountered issues.
Since I started working on OpenStack I often think back to my work in college and wonder if I had been tasked with deploying an OpenStack cloud back then would I have been able to? As a naive college student who had no knowledge of OpenStack would I have been successful in trying to deploy OpenStack by myself? Since I had no knowledge of configuration management (like puppet or chef) back then I would have gone about it by installing everything by hand. Basically the open question from that idea is how hard is it actually to install OpenStack by hand using the documentation and google searches?
Aside from the interesting thought exercise I also have wanted a small cloud at home for a couple of reasons. I maintain a number of servers at home that run a bunch of critical infrastructure. For some time I’ve wanted to virtualize my home infrastructure mainly just for the increased flexibility and potential reliability improvements. Running things off a residential ISP and power isn’t the best way to run a server with a decent uptime. Besides virtualizing some of my servers it would be nice to have the extra resources for my upstream OpenStack development, I often do not have the resources available to me for running devstack or integration tests locally and have to rely on upstream testing.
So after the Ocata release I decided to combine these 2 ideas and build myself a small cloud at home. I would do it by hand (ie no automation or config management) to test out how hard it would be. I set myself a strict budget of $1500 USD (the rough cost of my first desktop computer in middle school, an IBM Netvista A30p that I bought with my Bar Mitzvah money) to acquire hardware. This was mostly just a fun project for me so I didn’t want to spend an obscene amount of money. $1500 USD is still a lot of money, but it seemed like a reasonable amount for the project.
However, I decided to take things a step further than I originally planned and build the cloud using the release tarballs from http://tarballs.openstack.org/. My reasoning behind this was to test out how hard it would be to take the raw code we release as a community and turn that into a working cloud. It basically invalidated the project as a test for my thought exercise of deploying the cloud if I was back in college (since I definitely would have just used my Linux distro’s packages back then) but it made the exercise more relevant for me personally as an upstream OpenStack developer. It would give me insight as to where what we’re there are gaps in our released code and how we could start to fix them.
Continue reading Dirty Clouds Done Cheap
A few months ago at the Liberty QA code sprint in Ft. Collins, CO we started work on the OpenStack-Health dashboard. I also recently announced the dashboard to the openstack-dev ML to try and raise it to the attention of the broader community and to try and get more users and feedback on how to improve things. I figured it’d be good to write up a more detailed post on the basics of how the dashboard is constructed, it’s current capabilities and limitations, and where we’d like to see it move in the future. Especially given that the number of contributors to the project is still quite small. For the project to really grow and be useful for everyone in the community we need more people helping out on it.
Continue reading Exploring the OpenStack-Health dashboard
A little over a year ago I started writing subunit2sql, which is a project to collect test results into a SQL DB. The theory behind the project is that you can extract a great deal of information about what’s under test from the results of tests when you look at the trend data from it over a period of time. Over the past year the project has grown and matured quite a bit so I figured this was a good point to share what you can do with the project today and where I’d like to head with it over the next year. Continue reading Using subunit2sql with the gate
Last week we had a 3 day code sprint for the QA program in NYC: https://wiki.openstack.org/wiki/QA/CodeSprintKiloNYC HP hosted the event at the office in Chelsea. Overall it was a very productive week were we accomplished a great deal. The goal of the sprint was to make a good push on a list of priority items which had seemed to be a bit stagnant and get some code landed to try and push forward on them. It was also a valuable opportunity for a bunch of us to get together and get to know the people we work with daily basis a bit better. Something which is often hard to do over IRC, gerrit, or the ML. Continue reading OpenStack QA Code Sprint in NYC
A few months ago I made the post about debugging a gate failure. It has been linked around and copied to quite a few places and seems to be a very popular post. (definitely the most popular so far on this blog) I figured since the bug I opened from that was closed as invalid a while ago that I should write an update about the conclusion to the triage efforts for the OOM failures on neutron jobs. It turns out that my suppositions in the earlier post were only partially correct. The cause of the failures was running out of memory but what was leading to the OOM failures wasn’t just limited to neutron. It was just that the neutron jobs ran with more services which used more memory which made failures there more common.
Continue reading Gate Bug Triage Conclusion
Recently I was helping someone debug a gate failure they had hit on one of their patches. After going through the logs with them and finding the cause of the failures, I was asked to go through how I debug gate failures. To help people understand how to get from a failure to a fix.
I figured I would go through what I did and why on that particular failure and use it as an example to explain my initial debug process. I do want to preface this by saying that I wasted a great deal of time debugging this failure, because I missed certain details at first, but I’m leaving all of those steps in here on purpose.
The log url for this failure is:
Continue reading Triaging and classifying a gate failure
Based on some of the comments that were posted on the recent “Which Program for Rally” ML thread I feel that there’s been some continued confusion around exactly how all the projects work together in the QA program. So after discussing it with a wise council of my elders, I decided to start a blog so that I had a place to post more details and I could give a high level overview and clarify how everything works. I’m not really sure how much I’ll be using this blog in the future, as having one is something I’ve resisted for quite some time. But, I felt that making this post warranted me giving in to peer pressure.
Today’s the QA program :
So today in the QA program we have 3 projects, here is a high level over:
- Tempest: The OpenStack Integrated test suite, it’s concerned with just with having tests and running them
- devstack: A documented shell script to build complete OpenStack development environments.
- grenade: Upgrade testing using 2 versions of devstack to test an offline upgrade. Tempest can optionally be used after each version is deployed to verify the cloud.
Each of these projects is independent and is useful by itself. They have defined scope (which admittedly gets blurred and constantly evolves) and when used together along with external tools they can be used in different pipelines for certain goals.
Continue reading QA Program: From Juno into the Future