In a production environment, we were noticing that when storing or reading data in Riak, there would be periods of time where it would act as though it was being throttled by I/O wait time. This didn’t seem to make sense as each machine showed beam.smp ballooning to 100% memory usage and I/O wait was typically pretty low (<1%). It would seem that setting the eleveldb cache to a fixed size per partition lets you limit the amount of memory being used. I’m not 100% sure that the documented default of 8MB per partition is accurate, it seems that if you don’t specify a value, the default is “as big as it can be”. In this case Riak was eating up all physical memory and then cause other things to eat up swap which caused contention and therefore some pretty slow interactions.
1 min read