Jan vs. Machine: Effective Typesafe Config

Typesafe Config is a library that aims to do one thing well: reading configuration files containing settings for your applications and libraries. It has a number of features that makes it a good choice for the job: it's pure Java so works with any JVM based language, it's self contained so pulls in no external dependencies, and offers a flexible feature sets for reading and accessing configuration files. No wonder then that it's widely used in the JVM world, especially in Scala projects (natural given its origin at Typesafe the company, now Lightbend).

Having come across this library at a number of companies, I often find that it's not used in the most efficient way. So here's my attempt at pulling together some recommendations for how to use it effectively, to get the most out of it. The following advice is more opinionated than the official docs but this is what I find works well in practice.

DO put configuration in the right file

Typesafe Config lets you put configs in several files. The two files that are typically used are reference.conf and application.conf.

The docs suggest that libraries provide defaults in reference.conf and applications provide configuration in application.conf. I find this to be insufficient, because it means that if you have multiple application.conf files, one for each environment, you will end up with a lot of duplication between these files if they contain configs that apply across environments.

Instead, I suggest the following rules:

Put defaults in reference.conf
Put environment specific configuration only in application.conf

In other words, applications can provide their defaults in reference.conf as well. This should not be used for environment specific settings, but instead for settings that are configurable but have meaningful defaults, such as message topics, queue sizes, timeouts, and so on.

DON'T include an application.conf file in your source code

This perhaps somewhat surprising suggestion derives from the previous point: If an application.conf file contains environment specific configuration, which environment is the file in your source tree referring to? Dev? Or production? There are various options for how to provide the application configuration instead:

Include several files, e.g. application.conf.dev, application.conf.qa etc in your source tree. Then specify the one to use for example by using a command line argument to your application, or a custom build step for your environment.
Don't provide application configs in source code. This means providing configs from an external system at runtime. Note that in this case, it's still worth providing a dev config file in the source code, to make it easy to run your application locally while developing it. Just make sure this file isn't called 'application.conf', so it doesn't accidentally get included in the build and picked up in real deployments!

Which ones of these approaches you choose depends more on what fits best in with your deployment infrastructure.

DO namespace your configurations

Some developers when first providing configs for their application use simple configuration keys like "eventlistener.port". The problem with this however is that you risk your config parameters clashing with those of libraries you depend on. Remember that when resolving configurations, the library will merge _all_ reference.conf files it seems in its classpath, from all libraries you include. You really want to make sure that your configs don't clash with those of others.

Fortunately, it's simple to do this: simply put your configs in a namespace that's unique to your application or component. In the HOCON files supported by Typesafe Config, that's as simple as wrapping your configs in a block:

com.mycompany.myapp {
<... rest of configs here ...>
}

I suggest you think of your configuration as a global tree of settings that could be provided as a single logical config file - even if you may not choose to do this in practice. I.e. even if you're working on a microservice that's part of a much larger system, think of your configuration as part of a single configuration tree for the system as a whole.

Doing this means you are free to move code between libraries and services at a later date without clashes. And, it means you can easily provide the configuration for your application from a central configuration service, should you choose to. The ability to configure a whole system for an environment from a single file, as opposed to a large number of config files (often containing duplicate information) can really simplify microservice style deployments.

DO take advantage of the HOCON format

Typesafe Config supports multiple formats including Java properties files and JSON. But, there's no reason to go for these formats when it also supports HOCON, a much richer format. Take advantage of this to make your config files as simple, duplication- and noise-free as possible. For example:

Include comments to explain the purpose of settings
Use units for things like durations, to avoid mistakes about milliseconds vs seconds etc.
Use block syntax to avoid repetition
Avoid unnecessary quotes and commas

See the full description of the HOCON format for a comprehensive description of all you can do with it.

DON'T use "." as a word separator

In HOCON, a full stop (".") has a special meaning: it's the equivalent of a block separator. In other words, this configuration:

service.incoming.queue.reconnection.timeout.ms=5000

is the equivalent of:

service {
incoming {
queue {
reconnection {
timeout {
ms = 5000
}
}
}
}
}

...which is probably not what you intended! Instead using snake case for compound words, only using the block structure where you actually want a configuration block or object, for example:

service.incoming-queue.reconnection-timeout=5000 ms

You'll want a block where you may want additional parameters that logically belongs to the same entity, so that these parameters can be accessed as a single object. In the above example we may add another parameter as follows:

service.incoming-queue.ack-timeout=2000 ms

This means that the 'service.incoming-queue' object now has two properties.

DON'T mix spaces and tabs

Need I say more?

DO limit where you access configurations in your code

The Typesafe Config API makes it easy - maybe too easy! - to get a configuration object for the application. You can do just:

Config config = ConfigFactory.load()

This makes it all too tempting to load a configuration object every time you need to get a setting. There are some problems with this though. First of all, there are different ways to produce the Config object for an application. You can for example read it from a non-standard file, from a URL, or compose it by merging several sources. If you use the singleton-style way of getting at the Config object, you will have to potentially change a lot of code when you change the specifics of how you load the configuration.

Instead, I suggest this rule:

Only load the configuration once in any given application.

DON'T depend directly on configuration in application components

On a similar note, I also strongly recommend not passing around Config objects in your application. Instead, extract the parameter that each component needs and pass it in as regular parameters, grouping them into objects if necessary. This has several advantages:

Only your main application has a dependency on the Typesafe Config API. This means it's easier to change the config API at some point should you wish to, and most of your code can be used in a different context where a different config mechanism is used.
It makes testing easier, in that you don't have to construct Config objects to pass in to objects under test.
The API of your components can be more type safe and descriptive.

In other words, instead of doing this:

class MessageHandler(config: Config) {
val hostname = config.getString("hostname")
val portNumber = config.getInt("portnumber")
val connectionTimeout =
config.getDuration("connection-timeout", TimeUnit.MILLISECONDS).millis
//...
}

Do this:

case class ConnectionParameters(hostname: String, portNumber: Int, connectionTimeout: FiniteDuration)

class MessageHandler(brokerParameters: ConnectionParameters) {
// ...
}

You'll probably want to define a factory method that reads objects like ConnectionParameters from Config, calling these when creating and wiring up your application components.

DON'T refer to configs using absolute paths

I sometimes come across code that reads configuration as follows:

def getMessageHandlerConfig(config: Config): ConnectionParameters = {
val hostname = config.getString("com.mycompany.message-handler.hostname")
val portNumber = config.getInt("com.mycompany.message-handler.portnumber")
val connectionTimeout = config.getDuration("com.mycompany.message-handler.connection-timeout", TimeUnit.MILLISECONDS).millis

ConnectionParameters(hostname, portNumber, connectionTimeout)
}

Here we are converting parameters from a Config object into a nicely typed object that we can pass around our application. So what's wrong with this?

The problem is that this code contains knowledge of the overall shape of the configuration tree, using the full parameter path of a configuration item (e.g. "com.mycompany.message-service.hostname"). It only works if you pass in the root Config object, and assumes that these parameters will always live at this point and this point only in the config tree. This means that:

We can never use this method for getting parameters of a component that lives in a different part of the tree.
If we reorganise our config tree, we have to change a lot of code.

Much better instead then, to have this method to only read parameters directly from the Config object it's given:

def getMessageHandlerConfig(config: Config): ConnectionParameters = {
val hostname = config.getString("hostname")
val portNumber = config.getInt("portnumber")
val connectionTimeout = config.getDuration("connection-timeout", TimeUnit.MILLISECONDS).millis

ConnectionParameters(hostname, portNumber, connectionTimeout)
}

Now this can be called with a sub object of the overall configuration, for example:

val incomingMessageHandlerParams = getMessageHandlerConfig(config.getConfig("com.mycompany.incoming-message-handler"))

And you can now use the same method for other parts of your configuration tree:

val outgoingMessageHandlerParams = getMessageHandlerConfig(config.getConfig("com.mycompany.outgoing-message-handler"))

DO use PureConfig in Scala projects

The fact that Typesafe Config is a pure Java library is nice for portability, but it's a bit of a limitation when working in Scala. The library's API follows Java conventions and returns nulls, throws exceptions, and doesn't support Scala types like Option. It would be nice to have a more functional, idiomatic Scala API instead. Luckily, such a thing exists, in the shape of the PureConfig library.

In addition to fixing the aforementioned inconveniences, it also helps you get rid of some boilerplate code, as it can automatically map configs to Scala case classes. It has a number of optional integrations for mapping configs to types for common third party libraries, such as Joda Time, Cats and (the awesome!) Squants. And it can even cope with mapping snake case or hyphenated names to the more idiomatic camel case in case classes.

My experience with this library has been very positive, so if you're working in Scala I would strongly suggest trying this out instead of working with the basic Java API.

In summary

The Typesafe Config API has a lot going for it and is well worth a look if you're writing JVM based applications and services. If you do use it, I suggest having a closer look at the documentation, as well as consider the points I mentioned above, to make the most of it.

If you have further suggestions, or perhaps disagree with my recommendations I would love to hear about it. Get in touch, either in the comments below or ping me on Twitter.

Tuesday 19 July 2016

Effective Typesafe Config