First there was XML, then JSON and YAML. Think you had enough of configuration languages? Well, now there is TOML (well its been there a while, but its getting popular now). And the python community seems to be embracing this format in a big way, to the extent that Python 3.11 includes a new tomllib package in the standard library for working with the format.

TOML stands for Tom's Obvious Minimal Language, developed in 2013 by – you guessed it – Tom! (Full name: Tom Preston-Werner)

It is a format used for configuration files and it designed to be simple, obvious and minimal. Here is what it looks like

[tool.poetry]
name = "article-classifier"
version = "0.1.0"
description = "Project"
authors = ["Playful Python <[email protected]>"]
packages = []

[tool.poetry.dependencies]
python = "^3.8,<3.11"
aiosqlite = "^0.17.0"
fastapi = "^0.73.0"
uvicorn = {extras = ["standard"], version = "^0.17.4"}

[tool.poetry.dev-dependencies]
pytest = "^6.2.5"
pytest-cov = "^3.0.0"
pytest-watch = "^4.2.0"
coverage = "^6.0.1"
pytest-asyncio = "^0.18.1"

[build-system]
requires = ["poetry-core>=1.0.0", "setuptools"]
build-backend = "poetry.core.masonry.api"

We already have XML, JSON and YAML. Do we really need another format for configuration files? The problem is that each of these three formats have significant problems.

XML

XML was the popular choice in the 2000s. The issue with XML is that it is simply too verbose. You have to type so much for simple configurations. You have to match open and close tags. Empty tags with no children have a weird syntax. It's really easy to mess it up. While it was ok for a machine to work with the format, it is simply not easy for a human to work with.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
     
  <groupId>com.mycompany.app</groupId>
  <artifactId>my-app</artifactId>
  <version>1.0-SNAPSHOT</version>
     
  <properties>
    <maven.compiler.source>11</maven.compiler.source>
    <maven.compiler.target>11</maven.compiler.target>
  </properties>
     
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.12</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

JSON

When 2010 came around, people were moving from XML to JSON. Unlike XML, JSON had a very clean and minimal structure. It became the de-facto format for APIs to communicate with each other. It is mostly very easy to read a JSON file.

Unfortunately, JSON also has many limitations. You can't add comments for example. This is a severe limitation if you expect a human to be able to open, read and modify the file, JSON is also very strict parsing requirements, and will fail if you use single quotes instead of double quotes, or leave a trailing comma at the end of an array. Like XML, JSON works fine as a format for interchange between APIs, but isn't great when you expect humans to work with it.

{
  "name": "my_package",
  "description": "",
  "version": "1.0.0",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "repository": {
    "type": "git",
    "url": "https://github.com/monatheoctocat/my_package.git"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "bugs": {
    "url": "https://github.com/monatheoctocat/my_package/issues"
  },
  "homepage": "https://github.com/monatheoctocat/my_package"
}

YAML

That brings us to YAML. Where to start with YAML 😶 At first glance it seems to be easy enough to read and write YAML. But YAML is really, really complex. As different situations came up, they all got added into the spec, increasing the bloat and the complexity. Did you know that there are NINE different ways to write multi-line strings in YAML? It's a mess. Avoid.

What about using Python itself?

One approach that is sometimes taken is to put all the configuration into a python file. That way you can use the full power of the language, including comments, multi-line strings and so on. While it sounds like a good idea at first, it has one serious problem: it also allows one to write and run arbitrary code. Imagine a bug in the config file that instead of a simple syntax error ends up running some dangerous code. You definitely don't want that happening in a human editable configuration file.

A Quick Tour of TOML

And that's where TOML comes into the picture. TOML has been designed to be simple to read and write. It supports features for human interaction. Things like comments and multi-line strings. It is minimal, having learnt the lesson from YAML. At the same time, it can easily represent complex nested structures that are often used in application configuration files.

For this reason, TOML has been quickly gaining popularity in the python community, and recently many configuration files have been standardised on TOML. Given that it is going to be used more in core python, it made sense to introduce a parser into the standard library. And that's what we got in 3.11

Lets take a look at that pyproject.toml file from the top of the article

[tool.poetry]
name = "article-classifier"
version = "0.1.0"
description = "Project"
authors = ["Playful Python <[email protected]>"]
packages = []

[tool.poetry.dependencies]
python = "^3.8,<3.11"
aiosqlite = "^0.17.0"
fastapi = "^0.73.0"
uvicorn = {extras = ["standard"], version = "^0.17.4"}

# these packages are just needed for testing
[tool.poetry.dev-dependencies]
pytest = "^6.2.5"
pytest-cov = "^3.0.0"
pytest-watch = "^4.2.0"
coverage = "^6.0.1"
pytest-asyncio = "^0.18.1"

[build-system]
requires = ["poetry-core>=1.0.0", "setuptools"]
build-backend = "poetry.core.masonry.api"

First, we can define various sections in square brackets like [build-system]. When parsed, they will become python dictionaries. Sections can have subsections separated using . Above we have one section tool.poetry and a subsection tool.poetry.dependencies.

Within each section, you have key = value pairs. Keys are usually bare strings without quotes, though you can use quotations around the key name if you want. Values can be one of many types: TOML supports strings (4 types), numbers, arrays and objects and a few more types. It is quite intuitive, arrays are in square brackets [ ] while objects are in curly brackets { }. In the example above, requires has an array value, while the uvicorn key has an object value. TOML supports nested structures, note how the extras key within the object has an array value. Lines with # are comments.

One thing we find here is that the configuration file above is easy to read, and easy to modify by hand. There is no verbosity or deep indentation and everything is clear at a first glance.

For more information on writing TOML and all the features, check out the examples in the spec

The tomllib library

Given that many configuration files in the python ecosystem will be in TOML going forward, Python 3.11 has a new tomllib library for parsing this format. Having it in the standard library makes it easier for everyone to use this format in their applications.

Using it is straightforward and follows the same interface as the json module. The only thing to note here is that the file should be opened in "rb" mode

import tomllib

with open("pyproject.toml", "rb") as f:
    data = tomllib.load(f)

print(data)

Python parses TOML files as you would expect. TOML Arrays become Python lists and TOML Objects become dictionaries. Strings, numbers and other data types are parsed as you would expect.

For the pyproject.toml above, you will get this output after parsing

{
  'tool': {
    'poetry': {
      'name': 'article-classifier', 
      'version': '0.1.0', 
      'description': 'Project', 
      'authors': ['Playful Python <[email protected]>'], 
      'packages': [], 
      'dependencies': {
        'python': '^3.8,<3.11', 
        'aiosqlite': '^0.17.0', 
        'fastapi': '^0.73.0', 
        'uvicorn': {'extras': ['standard'], 'version': '^0.17.4'}
      }, 
      'dev-dependencies': {
        'pytest': '^6.2.5', 
        'pytest-cov': '^3.0.0', 
        'pytest-watch': '^4.2.0', 
        'coverage': '^6.0.1', 
        'pytest-asyncio': '^0.18.1'
      }
    }
  }, 
  'build-system': {
    'requires': ['poetry-core>=1.0.0', 'setuptools'], 
    'build-backend': 'poetry.core.masonry.api'
  }
}

One interesting thing is that this module only reads and parses configuration files. It does not support writing TOML files.

If you want to write TOML output then the Tomli-W package is recommended.

Sometimes you want a style preserving library. For example a configuration file might be edited by a human as well as written by your app. If the human has added comments, or changed a single line to multi-line syntax, then you don't want the app to wipe all that out when it updates the file. The tomlkit package is recommended for this.

Summary

If you've never tried working with TOML before, nows the best time to start. TOML is fairly simple to use and with this package in the standard library, it makes a lot of sense to use TOML as the first choice for complex configuration.

Tagged in: