- I work at Chef
- I do a lot of programming in Ruby, Python, Java, Scala, C, C++ and C#
- I spend most of my time developing things behind the scenes and like to make as much of that available as possible
- I like to write books
- I’ve been working on contributing to the Gearman project
- I have taught courses at the University of California, Merced, Columbia Community College in beautiful Columbia, CA as well as California State University, Stanislaus
Tagscomputing ruby chef devops gearman books programming database mysql testing life tmux java gis maven geotools jai riak metrics ubuntu linux dropwizard hibernate postgis soa python writing docs xen virtualization apple hardware performance networking web cloud rackspace documentation work postgresql
As of yesterday, I have said goodbye to my friends and colleagues at Amazon and begun my work for Chef! I learned a lot from my time at Amazon and met some amazing and bright folks but I'm looking forward to my new role at Chef and being able to spend some more time writing open source software.
I've been active in the Chef world for a number of years now as both a user, and an author; now I'll get to work on making it more awesome as part of my day job. Initially I'll be working on chef-metal, bringing our Docker support up to par with our other drivers, and I'm working on codifying some tools I've been using for years to build a profile manager for people who manage multiple Chef environments.
After pulling some disks from a Xen Server that had LVM volume groups on them, I needed to mount them in order to pull the data off of them. The trick here is that Xen Server exposes the LVM logical volumes as raw disks to the guest so you need to probe the disk label and make it available to the system the disks are in.
- Scan for physical volumes with pvscan
- Scan for volume groups with vgscan
- Scan for logical volumes with lvscan
- List partitions in the correct logical volume with kpartx
- Make the partitions available to the system with kpartx
- Mount the partition
List the partitions with kpartx
Add the partitions to the device mapper with kpartx so that they can be mounted somewhere:
Now mount the partition. Note that in this case, each partition has the same path as the parent LVM logical volume but has a “pXXX” appended to the end, where XXX is the partition number.
In my case, ‘p1’ was appended:
In a production environment, we were noticing that when storing or reading data in Riak, there would be periods of time where it would act as though it was being throttled by I/O wait time. This didn’t seem to make sense as each machine showed beam.smp ballooning to 100% memory usage and I/O wait was typically pretty low (<1%). It would seem that setting the eleveldb cache to a fixed size per partition lets you limit the amount of memory being used. I’m not 100% sure that the documented default of 8MB per partition is accurate, it seems that if you don’t specify a value, the default is “as big as it can be”. In this case Riak was eating up all physical memory and then cause other things to eat up swap which caused contention and therefore some pretty slow interactions.
As it turns out, Java’s date formatter is not thread safe, it uses internal variables to store the various bits of the date for formatting. My solution was to replace the single instance variable that was being shared with a date formatter factory that would generate an appropriate date formatter. Note that this is only one of many solutions, and not neccessarily the most efficient one.
Recently I needed to configure a Rackspace Cloud load balancer to support WebSockets. Initially I tried TCP (which seemed to be a logical choice) but that resulted in dropped connections. Even though I didn’t expect it to work, I tried HTTP as WebSockets is effectively HTTP with a connection upgrade, but the conversation would stop after the Connection-Upgrade header was sent. Hopefully this will be of some use to someone else as well.
After some digging, it turns out that the trick is to use the TCP_CLIENT_FIRST protocol which expects that the client be the first one to pass packets to the server (for example HTTP GET requests). The documentation on these options is actually stored with the RackSpace API Developer docs.
The downside to this is that you can’t use any of the Layer 7 monitoring / verification (i.e checking HTTP status codes), but it will work just fine for mapping requests. I combined this with Nginx’s WebSocket proxy support and everything works smoothly.
If you want to use Sphinx to build PDF documentation using MacTeX built from Macports, you will need to install the following ports (note that this may not be a minimal list, but it is a functional list):
I’m not sure if the -extra packages are needed, but they don’t take up much space so I went ahead and installed them.
I learned that the OSGEO Python module has a nice way to convert between WKT representations and Proj4. I have some LiDAR data from the [CA DWR][dwr] of the Central Valley of California and it’s been recorded using a non-standard projection (essentially UTM10 in feet instead of meters). I had created a proj4 string that I’ve been using with Python but needed it in WKT format to use with geotools in a Java program. I found some examples [here][proj4conversionsource] and have reproduced them here for convenience.
http://wwwdwr.water.ca.gov [proj4conversionsource]: http://spatialnotes.blogspot.com/2010/11/converting-wkt-projection-info-to-proj4.html
I needed to transform some spatial data into a custom CRS (a previously mentioned reference system from the CA DWR) in some Java code I’m writing to compute elevation profiles of river channels. I only had the CRS in proj4 format which, as far as I’m aware, is not supported directly by GeoTools. With a little help from some Python code, I was able to convert my proj4 definition, which looks like this:
into its corresponding WKT format:
Given this WKT format, you can create a custom CRS using the GeoTools CRSFactory class. After some hunting around, I discovered that you can use it like such:
Which will create a custom coordinate reference system without having to replace the epsg database with a properties file that contains all the default EPSG codes with your own added to it (which seemed like too much work for this use case.) Then you can use it anywhere you want like you’d use any other CRS object.
Grabbing elevation data from the USGS webservice is pretty straightforward – make an HTTP request to http://gisdata.usgs.gov/xmlwebservices2/elevation_service.asmx/getElevation with the required parameters (X and Y coordinates, elevation units and the data source), and you get back some XML with the results.
In my case, my area of interest is in the US so the source layer is NED.CONUS_NED which is the National Elevation Dataset Contiguous U.S. 1 arc second elevation data. X and Y are provided in lon/lat format and in my case, I want the data back in feet. Another thing to note is the presence of the parameter “Elevation_Only” which is required when querying for elevation data.
Today I had to move a Chef server from an Ubuntu 12.04 machine to a Debian 6 box. The simplest path, which worked for me was to:
- Setup chef on the new machine
- Shutdown the services on the old host
- Compress and move the contents of /etc/chef and /var/lib/chef on the old server
- Shutdown services on the target system
- Uncompress the tarball on the new server
- Export the contents of the chef CouchDB database on the old system
- Use the CouchDB Python package (installable via pip / easy_install as 'couchdb')
- Dump the data with couchdb-dump http://127.0.0.1:5984/chef > /tmp/chef.json
- Copy the json file to the new host and import it by:
- Creating the chef database with curl -X PUT http://127.0.0.1:5984/chef
- Importing the JSON via couchdb-load --input chef.json http://localhost:5984/chef
- Restarting the services on the new host
- Telling Chef to rebuild its indices (otherwise Solr won't know about the existing data in CouchDB) via knife index rebuild