John Ewart

Kenshō, a PostgreSQL dashboard for ops folks

I’ve been thinking about building a nice PostgreSQL dashboard for some time now. Recently I ran across pgHero and I decided that, while pgHero looks awesome, installing Ruby and then having to deploy all the extra bits that go with it is a real drag. I’m a huge fan of the story around deployment of JVM applications; gripe all you want about the language but being able to deploy a fully-functional piece of software, including all of its dependencies, as a single JAR file to a host is a godsend. Deploying new versions of the app and all of its dependencies is as simple as rolling out a new JAR and then restarting the JVM. And the JVM is fast.

The other weekend I decided to take a stab at writing one using Dropwizard and D3.js, and the result is shaping up nicely as Kenshō, which is available under the Apache 2.0 license on GitHub. In its current form, Kenshō is designed to collect metrics every few seconds and record them using in-memory time-series datastructures from another project of mine, Shuzai, which has the goal of providing PANDAS-like functionality in the JVM using Java. This means that Kenshō can store millions of datapoints with very little memory usage and can then time-box and re-scale the data points using a variety of consolidation techniques, such as summation or mean, as needed to consolidate data into smaller time boxes.

Dashboard

Dashboard image
Dashboard image

Functionality

Currently, Kenshō records query data for the database you are connected to including:

  • Index usage data
  • Slow query data
  • Query data
    • Execution count
    • Average query times
    • Aggregate query times
    • SQL query strings
  • Database-level information
    • Transaction count
    • Open connection count
    • Concurrent query count
    • Database size

With Kenshō you can: view the size of your database over time; see how the space is being used by which tables; learn which tables are taking advantage of indexes, and which are not; identify queries with poor performance profiles; see the performance of a specific query over time; and see the overall health and activity of your database.

Future functionality

In the future I am planning to add:

  • An explain analyzer view for queries with some meaningful parsing
  • Index analysis and suggestions
  • A syslog sink for using Kenshō without having to have direct access to the database (though this will limit some functionality)
  • More interesting time series analyses
  • Ability to persist time-series data on-disk using
  • Options for storing time series data in an off-box data store
  • Change tracking (i.e addition of indexes or other events and seeing how those affect performance)
  • Replication status, performance and health dashboard
  • More dashboards
  • User authentication

Contributing

If you are interested in hacking on Kenshō, feel free to open a GitHub issue or send me an email; the more help the better!

Orchestration of Chef Resources with ZooKeeper

At the Chef community summit there were a number of discussions around provisioning complex systems using Chef. One of the interesting topics that came up was the issue of finer-grained orchestration when converging nodes. As a result, I decided to build chef-orchestrator-zk , a LWRP that implements basic distributed locking using ZooKeeper.

Interfaces in Ruby

In Ruby, interfaces are not really a part of the programming paradigm. By its very nature, Ruby is designed to be flexible and loosely-typed, and that provides developers with a lot of power. Over the years I’ve seen a number of projects that are written in Ruby become internally inconsistent. What I mean by this is that developers decide that their abstractions or structure isn’t working and instead of working from the bottom up, just start writing modifications to make things work. Making things work is incredibly valuable, but it can lead to piling up technical debt. Some of these problems feel as though they could be well solved by enforcing interfaces upon developers who are following your programming model.

Take, for example, Fog.io, which I mentioned in a previous post. Fog has a lot of interesting things going on but one of my primary frustrations is the lack of consistency between providers, which is exactly the problem that Fog should be solving. And Fog is not alone, I have seen this problem both on open source and internal projects (even at large organizations with a significant amount of programming discipline and policy).

Just because Ruby does not formally support interfaces in the same way Java or C# do, doesn’t mean that it is impossible to maintain a set of interfaces in your code and ensure that developers are adhering to those interfaces. One idea that I have is to use a common set of spec tests that traverse the inheritance tree of your “interface” classes in your project and then assert that those descendants follow the specification.

Take for example, some code in Ruby that looks like this:

An example interface in Ruby

class FunctionalInterface 
   def do_thing
     raise "This is not implemented!"
   end
end

class Herp < FunctionalInterface
   def do_thing
      puts "herp"
   end
end

class Derp < FunctionalInterface 
  def do_thing
    puts "derp"
  end
end

class Blorp < FunctionalInterface
  def do_some_other_thing
     puts "whoops"
  end
end

Here, we have three classes, FunctionalInterface, Herp and Derp; the interface doesn’t actually implement any code, only raises an exception if you call the methods. By doing this you guarantee that anyone implementing this interface needs to override the methods or else they will see an exception in their code. Now if you develop some RSpec tests that exercise this you can guarantee that anything implementing this interface at least implements the proper methods:

describe FunctionalInterface do
    before :all do 
      @implementations = 
        ObjectSpace.each_object(Class)
          .select { |klass| klass < FunctionalInterface }
    end
    
    it "should enforce that the interface is implemented" do 
      @implementations.each do |klass|
        entity = klass.new
        expect { entity.do_thing }.to not_raise 
      end
    end
end

This is purely a naive example, and fog has many more extensions but the theory is the same. To keep things sane, enforce that any new extensions implement the interface before they are accepted as part of the project and that they pass the basic tests of taking the proper parameters and returning the proper data. If the interfaces are implemented correctly you can even go beyond unit testing to functional tests that actually talk to the upstream service providers and they should all behave accordingly.

I do not argue that interfaces are always necessary; I have seen many examples of putting the cart before the horse where someone has designed interfaces for things that only have one implementation. However, there are plenty of times when I long for interfaces in Ruby akin to the ones in Java, C# or other languages.

Book: Chef Essentials

I’m excited to announce that Chef Essentials, my third book on Chef, is now available. It is available at Amazon or directly from the publisher, Packt Publishing. This book represents an update and expansion of my first book about Chef, Instant Chef Starter. Chef Essentials is almost three times the length of the Chef Starter and targets folks who want to learn how to get started using Chef for essential and more advanced usage.

Chef Essentials contains more detailed contents on the areas that the first book covered as well as new material altogether. Examples of what Chef Essentials covers includes:

  • Information on managing infrastructure and how to model it with Chef
  • A complete example of deploying a web application and dependent infrastructure
  • Testing your work with Test Kitchen and ChefSpec
  • Writing recipes to support multiple platforms
  • Chef integration with Vagrant
  • Advanced uses of data bags including encrypted data
  • Installation of the lastest stable Chef server (11.x)

If you are new to Chef, I hope that Chef Essentials provides you with what you need to know in order to get up and running with Chef quickly and effectively!

Thoughts about Fog.io

Fog billed as “the cloud services library.” This implies, at least to me, that the library provides a consistent interface for interacting with cloud services such as AWS, DigitalOcean, Rackspace Cloud, and so on. And by consistent interface what I mean is “the STL of cloud libraries”; in such a way that I can (through very few changes, if any) replace AWS with DigitalOcean with Joyent, and my code will remain as change-free as possible. I concede that this is not always possible, that building abstractions comes with the implicit cost of losing specificity, and that some technologies do not abstract well. Also note that, while this post focuses on fog.io as an example, it is not the only villain out there; Ruby itself does not lend well to enforceable structure in code.

What I expected

I would have expected it to provide a common facade on top of the provider-specific libraries. By this I mean that the above example of key pairs would work and use the underlying AWS-SDK , or Joyent libraries, whichever, instead of re-inventing the wheel and implementing all of the AWS API calls itself. I would expect that there be a main module named ‘fog’ which provides the interface, and then sub-modules such as fog-aws that required the proper underlying provider library. That being said, it seems as though there are some providers that operate in that fashion (fog-softlayer, and fog-brightbox being among them).

What I found

What I got was a bunch of loosely-related libraries that provide their own implementations that are subtly different from one another (in such as way as to be genuinely confusing as to how to implement the right logic). The libraries replace provider-specific terminology with their own terminology regarding states and various other messaging from the provider (which, by the way, is possibly worse than the poor interface facade because it confuses signals that are taken from documentation and user interfaces that people expect).

I’ve noticed this lack of consistency as I work on things like chef-metal-fog, which rely on Fog for their underlying API calls. Here we will look at fog’s logic for fetching the public key pairs from both Digital Ocean and Joyent. Note that the code I am putting forth was taken from a pull request and modified slightly for readability; as such I lay no claim to it being the most idiomatic example. (I would argue, however, that even if there is a better way, this shouldn’t even be possible). Let’s take a look at some examples of where this happens, in this source file.

when 'DigitalOcean'
current_key_pair = compute.ssh_keys.select { |key| 
  key.name == new_resource.name }.first
      
if current_key_pair
  @current_fingerprint = current_key_pair ? 
    compute.ssh_keys.get(current_key_pair.id).ssh_pub_key : nil
else
  current_resource.action :delete
end

Here is the analogous example when using the Joyent cloud. Notice that it looks surprisingly different.

when 'Joyent'
current_key_pair = begin
  compute.keys.get(new_resource.name)
rescue Fog::Compute::Joyent::Errors::NotFound
  nil
end
if current_key_pair
  @current_id = current_key_pair.name
  @current_fingerprint = if current_key_pair.respond_to?(:fingerprint)
    current_key_pair.fingerprint
  elsif current_key_pair.respond_to?(:key)
    public_key, format = Cheffish::KeyFormatter.decode(current_key_pair.key)
    public_key.fingerprint
  else
    nil
  end
end

In order for me to want to use fog, it would need to implement an actual interface and strictly require that all of the adapters adhere to the same interface. For example, this whole block should be replaced with one, simple, statement like the following:


@current_fingerprint = begin
                        compute.keys.get(key_name).fingerprint
                      rescue Fog::Compute::Errors::NotFound 
                        nil
                      end
                      

Instead, we have not only two different implementations that achieve the same goal, but interface objects (compute, in this case) that should have the exact same set of methods ambiguously have two completely different ways of accessing the same data across providers. In the case of Joyent, we access the public keys via compute.keys where DigitalOcean uses compute.ssh_keys, and to top it all off, the objects returned by Joyent respond to methods like fingerprint and key where the objects created by DigitalOcean respond to ssh_pub_key to get at the same bits of data.

Using provider-specific libraries

At this point you would be better off using two completely different libraries, one for each specific cloud platform, because you are not getting much benefit out of fog. In fact, I would argue that you are actually reducing code quality in this case. Instead of implementing small, modular, and easy to read implementations, you now have a huge mess of conditionals all over the place. Because you will only be exercising any one of them across all of the conditional checks (i.e when you are using Joyent as your provider, every case of this will be using the Joyent code) so you would be better off just moving that logic into its own module instead, keeping the code easier to read and making the mental model much simpler because each module is focused only on that specific provider leading to shorter methods if nothing else.

Moving away from fog.io

For this reason, I have begun building a pure AWS driver for chef-metal instead of focusing on the fog driver beyond maintenance. The un-needed abstraction from fog has not only made interfacing with the various providers more complicated than it needs to be, it also means that we’re not always using the provided API clients such as the AWS SDK.Additionally, moving to the AWS SDK has allowed us to build AWS-specific primitives for metal that either wouldn’t exist or would be much harder to map to when using fog such as SQS, SNS and so on. We also get the added benefit of using code that comes from the folks at AWS and is therefore likely to be up-to-date and have better support from AWS than the fog.io implementation.

Webinar: Automating your infrastructure with Chef

On August 6th, I gave a webinar for O’Reilly books on using Chef to automate your infrastructure. Slides are available here if you would like to download them, or they are embedded below.

Hello, Chef!

As of yesterday, I have said goodbye to my friends and colleagues at Amazon and begun my work for Chef! I learned a lot from my time at Amazon and met some amazing and bright folks but I’m looking forward to my new role at Chef and being able to spend some more time writing open source software.

I’ve been active in the Chef world for a number of years now as both a user, and an author; now I’ll get to work on making it more awesome as part of my day job. Initially I’ll be working on chef-metal, bringing our Docker support up to par with our other drivers, and I’m working on codifying some tools I’ve been using for years to build a profile manager for people who manage multiple Chef environments.

Capping Riak Memory Consumption

In a production environment, we were noticing that when storing or reading data in Riak, there would be periods of time where it would act as though it was being throttled by I/O wait time. This didn’t seem to make sense as each machine showed beam.smp ballooning to 100% memory usage and I/O wait was typically pretty low (<1%). It would seem that setting the eleveldb cache to a fixed size per partition lets you limit the amount of memory being used. I’m not 100% sure that the documented default of 8MB per partition is accurate, it seems that if you don’t specify a value, the default is “as big as it can be”. In this case Riak was eating up all physical memory and then cause other things to eat up swap which caused contention and therefore some pretty slow interactions.

Mounting LVM partitions from Xen Server

After pulling some disks from a Xen Server that had LVM volume groups on them, I needed to mount them in order to pull the data off of them. The trick here is that Xen Server exposes the LVM logical volumes as raw disks to the guest so you need to probe the disk label and make it available to the system the disks are in.

Steps

  1. Scan for physical volumes with pvscan
  2. Scan for volume groups with vgscan
  3. Scan for logical volumes with lvscan
  4. List partitions in the correct logical volume with kpartx
  5. Make the partitions available to the system with kpartx
  6. Mount the partition

Using kpartx

List the partitions with kpartx

root@debian:~# kpartx -l /dev/VG_XenStorage-e5c11bff-0232-1f73-db15-14e42830fb1d/LV-6b056523-bb43-48f6-ac0e-f4bc69984355
VG_XenStorage--e5c11bff--0232--1f73--db15--14e42830fb1d-LV--6b056523--bb43--48f6--ac0e--f4bc69984355p1 : 0 6442448863 /dev/VG_XenStorage-e5c11bff-0232-1f73-db15-14e42830fb1d/LV-6b056523-bb43-48f6-ac0e-f4bc69984355 2048

Add the partitions to the device mapper with kpartx so that they can be mounted somewhere:

root@debian:~# kpartx -a /dev/VG_XenStorage-e5c11bff-0232-1f73-db15-14e42830fb1d/LV-6b056523-bb43-48f6-ac0e-f4bc69984355

Now mount the partition. Note that in this case, each partition has the same path as the parent LVM logical volume but has a “pXXX” appended to the end, where XXX is the partition number.

In my case, ‘p1’ was appended:

mount /dev/mapper/VG_XenStorage--e5c11bff--0232--1f73--db15--14e42830fb1d-LV--6b056523--bb43--48f6--ac0e--f4bc69984355p1 /data

Java's date formatter is not thread safe

As it turns out, Java’s date formatter is not thread safe, it uses internal variables to store the various bits of the date for formatting. My solution was to replace the single instance variable that was being shared with a date formatter factory that would generate an appropriate date formatter. Note that this is only one of many solutions, and not necessarily the most efficient one.