John Ewart

May 10, 2013 — Mounting LVM partitions from Xen Server on another machine

After pulling some disks from a Xen Server that had LVM volume groups on them, I needed to mount them in order to pull the data off of them. The trick here is that Xen Server exposes the LVM logical volumes as raw disks to the guest so you need to probe the disk label and make it available to the system the disks are in.


  1. Scan for physical volumes with pvscan
  2. Scan for volume groups with vgscan
  3. Scan for logical volumes with lvscan
  4. List partitions in the correct logical volume with kpartx
  5. Make the partitions available to the system with kpartx
  6. Mount the partition

Using kpartx

List the partitions with kpartx

root@debian:~# kpartx -l /dev/VG_XenStorage-e5c11bff-0232-1f73-db15-14e42830fb1d/LV-6b056523-bb43-48f6-ac0e-f4bc69984355
VG_XenStorage--e5c11bff--0232--1f73--db15--14e42830fb1d-LV--6b056523--bb43--48f6--ac0e--f4bc69984355p1 : 0 6442448863 /dev/VG_XenStorage-e5c11bff-0232-1f73-db15-14e42830fb1d/LV-6b056523-bb43-48f6-ac0e-f4bc69984355 2048
List the partitions with kpartx

Add the partitions to the device mapper with kpartx so that they can be mounted somewhere:

root@debian:~# kpartx -a /dev/VG_XenStorage-e5c11bff-0232-1f73-db15-14e42830fb1d/LV-6b056523-bb43-48f6-ac0e-f4bc69984355
Attach partitions with kpartx

Now mount the partition. Note that in this case, each partition has the same path as the parent LVM logical volume but has a “pXXX” appended to the end, where XXX is the partition number.

In my case, ‘p1’ was appended:

mount /dev/mapper/VG_XenStorage--e5c11bff--0232--1f73--db15--14e42830fb1d-LV--6b056523--bb43--48f6--ac0e--f4bc69984355p1 /data
Mount newly attached partition

May 10, 2013 — Capping Riak Memory Consumption

In a production environment, we were noticing that when storing or reading data in Riak, there would be periods of time where it would act as though it was being throttled by I/O wait time. This didn’t seem to make sense as each machine showed beam.smp ballooning to 100% memory usage and I/O wait was typically pretty low (<1%). It would seem that setting the eleveldb cache to a fixed size per partition lets you limit the amount of memory being used. I’m not 100% sure that the documented default of 8MB per partition is accurate, it seems that if you don’t specify a value, the default is “as big as it can be”. In this case Riak was eating up all physical memory and then cause other things to eat up swap which caused contention and therefore some pretty slow interactions.

Apr 23, 2013 — Java's date formatter is not thread safe

As it turns out, Java’s date formatter is not thread safe, it uses internal variables to store the various bits of the date for formatting. My solution was to replace the single instance variable that was being shared with a date formatter factory that would generate an appropriate date formatter. Note that this is only one of many solutions, and not neccessarily the most efficient one.

Apr 22, 2013 — Rackspace Cloud load balancer configuration for WebSockets

Recently I needed to configure a Rackspace Cloud load balancer to support WebSockets. Initially I tried TCP (which seemed to be a logical choice) but that resulted in dropped connections. Even though I didn’t expect it to work, I tried HTTP as WebSockets is effectively HTTP with a connection upgrade, but the conversation would stop after the Connection-Upgrade header was sent. Hopefully this will be of some use to someone else as well.

After some digging, it turns out that the trick is to use the TCP_CLIENT_FIRST protocol which expects that the client be the first one to pass packets to the server (for example HTTP GET requests). The documentation on these options is actually stored with the RackSpace API Developer docs.

The downside to this is that you can’t use any of the Layer 7 monitoring / verification (i.e checking HTTP status codes), but it will work just fine for mapping requests. I combined this with Nginx’s WebSocket proxy support and everything works smoothly.

Mar 04, 2013 — Sphinx, MacTeX and Macports

If you want to use Sphinx to build PDF documentation using MacTeX built from Macports, you will need to install the following ports (note that this may not be a minimal list, but it is a functional list):

  • texlive-latex
  • texlive-latex-recommended
  • texlive-fonts-recommended
  • texlive-fonts-extra
  • texlive-latex-extra

I’m not sure if the -extra packages are needed, but they don’t take up much space so I went ahead and installed them.

Jan 20, 2013 — Creating a custom CRS with GeoTools

I needed to transform some spatial data into a custom CRS (a previously mentioned reference system from the CA DWR) in some Java code I’m writing to compute elevation profiles of river channels. I only had the CRS in proj4 format which, as far as I’m aware, is not supported directly by GeoTools. With a little help from some Python code, I was able to convert my proj4 definition, which looks like this:

+proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=ft +no_defs
Custom Proj4 definition

into its corresponding WKT format:

PROJCS[ "UTM Zone 10, Northern Hemisphere",
  GEOGCS["GRS 1980(IUGG, 1980)",
  UNIT["Foot (International)",0.3048]
Custom WKT

Given this WKT format, you can create a custom CRS using the GeoTools CRSFactory class. After some hunting around, I discovered that you can use it like such:

String customWKT = "PROJCS[ \"UTM Zone 10, Northern Hemisphere\",\n" +
                    "  GEOGCS[\"GRS 1980(IUGG, 1980)\",\n" +
                    "    DATUM[\"unknown\"," +
                    "       SPHEROID[\"GRS80\",6378137,298.257222101]," +
                    "       TOWGS84[0,0,0,0,0,0,0]" +
                    "    ],\n" +
                    "    PRIMEM[\"Greenwich\",0],\n" +
                    "    UNIT[\"degree\",0.0174532925199433]\n" +
                    "  ],\n" +
                    "  PROJECTION[\"Transverse_Mercator\"],\n" +
                    "  PARAMETER[\"latitude_of_origin\",0],\n" +
                    "  PARAMETER[\"central_meridian\",-123],\n" +
                    "  PARAMETER[\"scale_factor\",0.9996],\n" +
                    "  PARAMETER[\"false_easting\",1640419.947506562],\n" +
                    "  PARAMETER[\"false_northing\",0],\n" +
                    "  UNIT[\"Foot (International)\",0.3048]\n" +
try {
  CRSFactory factory = ReferencingFactoryFinder.getCRSFactory(null);
  CoordinateReferenceSystem customCRS = factory.createFromWKT(customWKT);
} catch (FactoryException e) {
GeoTools CRSFactory Example

Which will create a custom coordinate reference system without having to replace the epsg database with a properties file that contains all the default EPSG codes with your own added to it (which seemed like too much work for this use case.) Then you can use it anywhere you want like you’d use any other CRS object.

Jan 20, 2013 — Converting between Proj4 and WKT with Python

I learned that the OSGEO Python module has a nice way to convert between WKT representations and Proj4. I have some LiDAR data from the [CA DWR][dwr] of the Central Valley of California and it’s been recorded using a non-standard projection (essentially UTM10 in feet instead of meters). I had created a proj4 string that I’ve been using with Python but needed it in WKT format to use with geotools in a Java program. I found some examples [here][proj4conversionsource] and have reproduced them here for convenience.

#!/bin/env python

import os
import sys
import string
import osgeo.osr

if (len(sys.argv) <> 2):
        print 'Usage: [WKT Projection Text]'
        srs = osgeo.osr.SpatialReference()
        print srs.ExportToProj4()
Convert WKT to Proj4
#!/bin/env python

import os
import sys
import string
import osgeo.osr

if (len(sys.argv) <> 2):
        print 'Usage: [Proj4 Projection Text]'
        srs = osgeo.osr.SpatialReference()
        print srs.ExportToWkt()
Convert Proj4 to WKT [proj4conversionsource]:

Dec 30, 2012 — USGS Elevation Data with Java

Grabbing elevation data from the USGS webservice is pretty straightforward – make an HTTP request to with the required parameters (X and Y coordinates, elevation units and the data source), and you get back some XML with the results.

In my case, my area of interest is in the US so the source layer is NED.CONUS_NED which is the National Elevation Dataset Contiguous U.S. 1 arc second elevation data. X and Y are provided in lon/lat format and in my case, I want the data back in feet. Another thing to note is the presence of the parameter “Elevation_Only” which is required when querying for elevation data.

String dataUrl = "" +
                 "/getElevation?X_Value=" + x + 
                 "&Y_Value=" + y + 
                 "&Elevation_Only=TRUE" +
                 "&Elevation_Units=FEET" +

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(dataUrl);

// normalize text representation
// The results are contained in a single <double> node
NodeList listOfDoubles = doc.getElementsByTagName("double");

if(listOfDoubles.getLength() > 0)
    Node elevationNode = listOfDoubles.item(0);
    Element elevationElement = (Element)elevationNode;
    Double elevation = Double.parseDouble(elevationElement.getFirstChild().getNodeValue());
    return elevation;
Grabbing USGS Elevation Data
<?xml version="1.0" encoding="utf-8"?>
XML response

Dec 21, 2012 — Migrating your Chef Server

Today I had to move a Chef server from an Ubuntu 12.04 machine to a Debian 6 box. The simplest path, which worked for me was to:

  • Setup chef on the new machine
  • Shutdown the services on the old host
  • Compress and move the contents of /etc/chef and /var/lib/chef on the old server
  • Shutdown services on the target system
  • Uncompress the tarball on the new server
  • Export the contents of the chef CouchDB database on the old system
    • Use the CouchDB Python package (installable via pip / easy_install as 'couchdb')
    • Dump the data with couchdb-dump > /tmp/chef.json
  • Copy the json file to the new host and import it by:
    • Creating the chef database with curl -X PUT
    • Importing the JSON via couchdb-load --input chef.json http://localhost:5984/chef
  • Restarting the services on the new host
  • Telling Chef to rebuild its indices (otherwise Solr won't know about the existing data in CouchDB) via knife index rebuild

Nov 20, 2012 — DAO Testing with Hibernate

I’m currently building some web services using Dropwizard (Jetty, Jersey, Jackson, Hibernate all bundled up), and needed to test the DAO. Dropwizard has some convenient interfaces to load configuriation files and bring up hibernate sessions, etc. when the service boots up; however, this does not translate well to JUnit tests (there’s a lot of plumbing involved in making a Service that doesn’t translate to tests.) Fortunately, though, since it’s all just sugar coating you can just as easily setup your own sessions and transactions. I found that creating a parent class for my DAO tests that does all the setup is a convenient way of handling all the plumbing you need.

public class DAOTests {
    SessionFactory sessionFactory;

    public DAOTests() {
        AnnotationConfiguration config=new AnnotationConfiguration();
        config.setProperty("hibernate.current_session_context_class", "thread");
        config.setProperty("hibernate.show_sql", "false"); 


    public Session getSession()
        Session session;

        try {
            session = sessionFactory.getCurrentSession();
        } catch (SessionException se) {
            session = sessionFactory.openSession();

        return session;
Setting up Hibernate

Then each test suite for the DAO in question can just deal with the things it cares about, and any Hibernate-related mojo sits in the superclass. An example DAO test could look like:

public class TodoDAOTest extends DAOTests {
 TodoDAO todoDAO;

 public void initialize() {
   todoDAO = new TodoDAO(sessionFactory);

   // Delete all the old junk...

   Query q = getSession().createQuery("delete from Todo");


 public void filtersTodos() throws Exception {

   for(int i = 0; i < 10; i++)
      Todo t = new Todo();

   assertEquals(TodoDAO.findAllByNoteId(1).size(), 10);

Example DAO test

Nov 12, 2012 — Xen DomU clock issues

Xen DomU guests can exhibit strange issues with clock skew (I didn’t have this problem until I started running Debian in PV mode). The issue seems to occur when you set the VM clock to be using jiffies. In my case, the clock was running anywhere from 1.5-2X the normal clock speed causing all kinds of odd behavior from inaccurate load calculation to issues with SSL certificates and chef working properly. The fix was to set the time source to ‘xen’ rather than ‘jiffies’.

echo xen > /sys/devices/system/clocksource/clocksource0/current_clocksource
Set Xen DomU guest time source

Nov 12, 2012 — Python has a simple HTTP server

Today, I learned that Python’s HTTP library has a simple server built in. It will serve up the local directory on port 8000 and is very useful for things such as viewing Sphinx-generated documentation.

python -m SimpleHTTPServer
Simple Python webserver

Nov 11, 2012 — Custom Serialization of Models with Jackson

Today, I learned that you can customize the code that is executed to serialize an object for Jackson. I’m currently implementing a service using Dropwizard with Hibernate Spatial and serializing the Point object as JSON led to infinite recursion issues. As a fix, I wrote a serializer for the Point class for Jackson:

public class CustomPointSerializer extends JsonSerializer<Point> {
   public void serialize(Point value, JsonGenerator jgen, SerializerProvider provider)
           throws IOException, JsonGenerationException {
       String result = "{'x': " + value.getX() + ", 'y': " + value.getY() + "}";
Custom Jackson serializer.

Nov 10, 2012 — PostGIS with Hibernate Spatial and Dropwizard

Update (Jan 16, 2013): I have created an example bootstrap application on GitHub called dropwizard-postgis-example

Today, I learned that Dropwizard allows you to setup arbitrary hibernate properties in the YAML configuration file.

Hibernate 4.x with Spatial extensions doesn’t always seem to accurately infer that it should be using the PostGIS dialect when you have PostgreSQL + PostGIS (still investigating if that should automatically happen or not.) If it does not infer that it should be using the PostGIS dialect, you can force it to do so by setting the hibernate.dialect property.

Normally you would use an XML file for the configurations but dropwizard-hibernate uses a SessionFactory and loads the configuration from a DatabaseConfiguration entity that’s deserialized from the configuration YAML. Thankfully it is smart enough to allow for arbitrary properties via the key. This is a working YAML entry for using Hibernate 4.0 with spatial extensions and Dropwizard:

  # the name of your JDBC driver
  driverClass: org.postgresql.Driver
  # the username
  user: username
  # the password
  password: password
  # the JDBC URL
  url: jdbc:postgresql://localhost:5432/gis_database
    hibernate.dialect: org.hibernate.spatial.dialect.postgis.PostgisDialect
Database entry for setting Hibernate's dialect

You will know that it has picked the wrong dialect if you get the following output:

HHH000400: Using dialect: org.hibernate.dialect.PostgreSQLDialect
Standard PostgreSQL dialect

After setting the hibernate.dialect setting as above, you should see:

HHH000400: Using dialect: org.hibernate.spatial.dialect.postgis.PostgisDialect
PostGIS dialect

Nov 09, 2012 — Using Sphinx for RESTful API docs

Today, I learned that Sphinx has a really awesome plugin for generating API docs for HTTP-based APIs called sphinxcontrib.httpdomain. It’s extremely simple to use and generates very nice looking documentation (which you’d expect from Sphinx.)

Oct 29, 2012 — Riak MapReduce with Ruby

Today, I learned how to use MapReduce in Riak using the Ruby client.

results = 
               obj = JSON.parse(v.values[0].data);
               return [{ 
                   'date': obj.rank_date, 
                   'engine': obj.engine, 
                   'count': obj.total_number_of_results
        reduce("function(values) { 
                 result  = {};
                  result[] = result[] || {'google': -1, 'yahoo': -1}; 
                  result[][v.engine] = v.count;

                 return result;
               }", :keep => true).run
A MapReduce call using the Ruby Riak client.

Oct 01, 2012 — Ruby Enterprise Edition and Ubuntu 12.04

I recently had to install Ruby Enterprise Edition 2010.02 (yes, it's old...) on a new VM I had setup with Ubuntu 12.04. The unfortunate thing is that 12.04 ships with GCC 4.6 which seems to have made some subtle changes to the ptr_diff structure that make it incompatible with this older version of REE. Fortunately, there is a simple fix.

Aug 18, 2012 — GeoTools, Maven and JAI, Oh My!

In Riversim, I am leveraging GeoTools to generate GeoTIFF overlays of river channels and width imagery that I am generating. However, I encountered an issue when trying to use GeoTools with Maven's assembly plugin. The core of the issue is that GeoTools dependencies have the colliding META-INF filenames for various component packages and the assembly plugin just clobbers files (or ignores them) rather than merging them.

Mar 25, 2011 — MySQL, Sub-selects and IN()

I ran across a bug in MySQL that was causing a simple query to take thirty minutes to an hour (yes, an hour) to return results on three million rows in an InnoDB table. The end of about an hour's worth the digging yielded a known issue in the MySQL 5.x series which is a most unusual bug. The heart of the issue is in the implementation of IN using sub-selects inside of the IN clause. If you use a select statement inside the IN statement, then the query planner refuses to use an index for the outer select statement (even if you FORCE INDEX). This is apparently a known bug in MySQL 5.x, and is supposed to be fixed in MySQL 6.0 (presently in Alpha).

One of the very strange observations about this bug is that the issue does not appear to occur with "NOT IN"; It would seem fairly safe to assume that they both share similar logic because of their similarity, but that appears to not be the case. In the query that was coming across this issue we have a mix of IN and NOT IN statements, both of which were using sub-selects, and yet once I removed the IN sub-select, everything worked fine.

In fact, removing the sub-select from the IN clause reduced query time from over an hour to a few seconds at the most; I'd call that a pretty big improvement.

Mar 11, 2011 — Testing asynchronous workers

Testing is an important part of any infrastructure. Knowing that your code works the way it should through iterations is critical for success as well as to increase developer happiness. Writing tests can be a very daunting task, especially for systems where there are complex interactions (lots of asynchronous interactions) or where test coverage is lower than it should be (playing catchup).

One of the largest challenges I've encountered when dealing with 'workers' is testing end-to-end functionality. As asynchronous processes, they are disconnected from the remainder of the system and often rely on external APIs to get and send data. This means that they are not necessarily operating within the framework of your app so you can't always test them the same way you would test other components. As a result testing API controllers may be easy using Rspec but testing external components that use that API can be a little more challenging.

(N.B: Since I am currently doing a lot of Ruby programming right now, this is a bit Ruby-based. That doesn't mean there aren't other, similar, libraries that could be used with your language of choice)