Jan 1, 2019

Comic Book Archives Suck

If you have ever tried to read a comic book digitally, you have probably stumbled upon comic book archives, usually in the form of .cbz or .cbr files.

These are extremely simple files - merely an archive of images, each representing a page.

Or that’s the idea anyway - there is no actual standard. This means that you don’t know what archive type the file will be - that’s what the different extensions are for, it can be a zip, a rar, a 7z, a tar, or an ace. You also don’t know what image format you will find - they can be anything from JPEGs to animated GIFs, with several formats in the same archive.

But all that is relatively easy to handle with the help of libraries. What’s most annoying about these archives is that there is absolutely no metadata beyond the image file names.

So, for example, a comic reader has no way of knowing which page is which beyond simple guesswork based off file names (which thankfully tend to be the page number).

There is also no way of knowing the title or issue number without guessing from the file name, which is in no way a reliable source.

All this frustration has come about because I am making a comic book reader, which you can find here.

I don’t have library management implemented yet, however I am planning on doing it sooner or later, and I am thinking of storing the metadata in a simple YAML or JSON file inside the comic book archive, something simple like this:

series: "Saga"
title: "Saga #54"
issue: 54

This approach has the bennefit of being simple and portable, and it should just be ignored by readers that don’t support it.

Who knows though, maybe this is actually a horrible idea.