Thoughts on Dependencies
Dependency management in a project is a thing that, often-times, nobody thinks about until there is a problem. One of the great things about package managers like Maven, Bundler, PIP and their relatives is that adding a new dependency is a snap. It can take less than thirty seconds to add a new Ruby gem dependency to a Gemfile and install it via Bundler. This is a great convenience to developers; in the “old days” you would need to download and build every C / Perl / Ruby / Java library that you needed.
The value of libraries
Third-party libraries often provide large chunks of functionality that decrease your development time when building a product. For example, user-authentication mechanisms or web crawlers can be extremely time-consuming to implement. Even a simple implementation could take days or weeks to build an initial version of; even longer to implement these correctly as often the first 90% of development happens quickly followed by the second 90% that is required to make sure everything works correctly.
Why would anyone want to re-implement a user-authentication mechanism? Chances are that most projects would not want to. There are lots of edge-cases to consider and a community-managed project is quite likely to have addressed most, if not all, of those already through the course of its development. As a result, you can increase your development velocity by leveraging the combined development effort of the community at large. Bug fixes and improvements come in through the project’s contributors, you don’t need to do a lot of maintenance for that specific set of functionality, everybody wins.
The downside of dependencies
There is a flip-side to this: dependencies are like a relationship. You take from them, and maybe you give some back, but ultimately you are tying up some part of your future with this new dependency. If you depend on the Apache Commons IO package you are implicitly saying “I trust that the Commons IO package will bring me more value than hassle.” More often than not these are reasonable assumptions but it’s a good idea to weigh the pros and cons before consuming any dependency.
Some potential problems that might stem from upstream dependencies:
- A widespread bug is caused when updating a particular library
- Taking a new version of a library requires numerous updates other dependencies
- System libraries need to be updated for a new version of a dependency
- Taking (or avoiding) a new version of a library leads to a security vulnerability
- Deployed artifacts now consume an astronomical amount of space for every deployment
- Two, or more, dependencies cause irreconcilable conflicts at build- or (even worse) run-time
Consider an imaginary Rails application that we are working on. We need an HTTP
client (of course!) so we add
httparty. Soon we find that we need a
parallelizing HTTP client later on so we add
faraday. Now we
want to add a web-crawler to our app and we add
magic-spider-foo (which just
curb). Now, at this point we have no fewer than three completely
different HTTP clients in your project, along with all of their dependencies
both pure-Ruby and native (
typhoeus depends on libcurl, as does
of which may have wildly different expectations of underlying versions).
Additionally we now have various parts of the system using totally different
HTTP clients making HTTP calls, so now we have completely different code and
behavior in various portions of your code when making HTTP calls to upstream
services. (Which means we need to test all of these configurations and can’t
necessarily share test components between them).
Native dependencies in languages like Ruby, Java or Python can be challenging to manage because they often require very specific underlying library versions that do not always map well to pre-built packages for your platform. As a result they can be difficult to deploy and manage without the aid of an additional configuration management tool such as Puppet or Chef to ensure that the system libraries are available, or without managing packages for your particular deployment platforms.
Ruby is not unique in this, any sufficiently complex programming language (which
is to say all of them) have the potential for a wild garden of dependencies. For
example, Maven offers the
<exclusions> tag that permits you to suppress
downstream dependencies (effectively pruning your dependency graph by hand).
This is used in order to prevent compile-time or runtime issues from occurring
when you have conflicting APIs (i.e conflicting sub-versions of a project that
are not compatible). This tells us that this is a hard problem™ and that
sometimes a human being has to intervene on behalf of the dependency manager.
Not all libraries are created equal
The quality bar for all projects is not held equal. In particular, many open source projects see many contributors and sometimes even change ownership which leads to discontinuity in vision and oversight. This is not to say that open source projects are bad, in fact they are incredibly useful and provide a lot of value. It is important to evaluate the quality of a project when taking a dependency on it; effectively you are counting on that project’s functionality to provide you with enough value to justify not implementing it yourself.
There are times when it makes more sense to copy and paste some code rather than
consume an entire library. If you only need a handful of methods, as long as
they are independent, it may make more sense to simply lift that portion of code
(with appropriate credit and licensing compliance) into your application. One
does not always need the entire something the size of the entire Apache
package to implement the
isEmpty method on a string.
Some folks may cry that this is not “DRY” enough or that you “might miss out
on bug fixes upstream”. Certainly this is true for the entirety of
nobody should re-implement all of it in their application, these libraries
exist to make your life easier. I have personally encountered a number of projects
where an attempt to maximize the body of shared code caused more harm than it
did good. In particular, systems that rely on a multitude of services suffer when
you tie them together using too much common code.
Adopting a new pet
So you’ve decided that you need to take on a new dependency - great! There are a number of things to consider when taking on a third party library:
Software licensing is a jungle - there are so many licenses available and a number of them are in direct conflict with what most people would consider business requirements. A lot of businesses do not want to (or simply cannot) release their source code for the things they are working on. That doesn’t mean that these companies are bad, or evil, or that they don’t contribute in other ways or on other projects. What it does mean though, is that it can be very challenging to know exactly what licenses are in the third-party dependencies that you have taken.
Dealing with licenses
Leverage a tool, such as Pivotal’s LicenseFinder, which is capable of analyzing your dependency graph and generating a report of licenses that are being used. The great thing about LicenseFinder is that it works with a number of package-management systems across a variety of languages including Maven, Bundler, PIP and other popular tools. It can even scan licenses in polyglot projects where your Ruby application may have some Node.js dependencies, for example.
Lots of libraries experience performance regressions. Oftentimes they don’t even know it; they don’t use the library the way that you do and, even if they were aware of your usage, likely don’t have a way to test the things you do.
Catching performance regressions
When writing unit and integration tests (you are doing this, aren’t you?) make sure that you are factoring in performance tests that execute your code with timing data. Keep track of how that performance changes and consider setting a baseline for these tests. If the timing exceeds a certain threshold it’s worth investigating; this is good practice even if you don’t have any upstream dependencies.
Security issues are particularly hairy; the larger your dependency graph, the larger the surface area for exposure is. Every dependency you take has the potential to expose your software to bugs and security flaws. If you don’t have a team dedicated to tracking vulnerabilities in the wild and notifying you when they happen and helping you patch them you will be on the hook for this.
Working to avoid security holes
Minimize your dependencies where it makes sense; know what sort of functionality each dependency has and where potential holes might affect you. Follow the mailing lists and keep up-to-date with the change logs of your dependencies. Security holes can appear in any library so the smaller your chain of dependencies the fewer things you have to keep track of. Admittedly this can be a lot of work, particularly for complex or rapidly changing dependencies; it helps to have a team that focuses on security for your organization but not everyone has access to such a resource.
Ensuring proper behavior
In a similar vein alongside performance issues are regressions in system
behavior. Bugs (or “features” as they are sometimes called) can introduce
side-effects in your stack. These new behaviors can introduce localized or
systemic issues in your application. Consider the capability for any Ruby
library to monkey-patch core parts of the Ruby standard library. Imagine if one
of your dependencies had taken it upon themselves to alter something like
Net::HTTP to their liking – what might happen to the rest of your
application’s HTTP requests? (Hint: It’s not a pretty sight, I’ve been bitten
by this once.)
Minimizing disruption due to poorly behaving libraries
Using continuous integration and writing comprehensive tests that not only perform localized unit tests but that exercise your dependencies at a variety of depths will help to catch these types of issues. In our case, updating a gem caused a cascading failure through our tests. This let us catch the issue before it made its way to production; things would have been less pleasant had we not been able to do so. Another helpful feature of most modern package managers is the ability to pin versions; in this way you can accept what should be minor changes (say, point releases) without much oversight but require manual changes to import new major versions of a dependency.
Weigh your options
It’s hard to get a good feeling for just how well-behaved or large a dependency is going to be; some things that seem rather innocuous can have far-reaching effects on your application’s stability, security and performance. Even mature, well established, projects that have plenty of experience releasing software for years will periodically introduce bugs that can affect your code. Through a combination of observation, testing and general awareness you can maximize the value that you get from third party packages while (hopefully) minimizing the blast radius when things do go wrong.