Thursday, July 8, 2010

Vacation and validation

The semester is finally over so I’ll be heading on vacation tomorrow. I’ll be gone for about two weeks, and in that time no issues will be examined, no code will be written etc. Well, not really. In general, I won’t have internet access but I may pop in from time to time to check the Sigil forum on MobileRead. No promises though.

While I’ll be spending most of my time lying on the beach, soaking up the sun with a comfortable book[1] by my side, I’ll also be doing some light development work. By “light” I mean finishing the doxygen comments conversion for Sigil (which has been taking forever) and some code for the new epub validation library I’m writing. Not a lot of code, just a bit.

What’s that last part about? Well what’s the most requested feature for Sigil on the tracker? It’s integrated epub validation. I’ve planned for months to start working on this as a separate project in August, and I’ve just started doing some architectural design for the library[2]. The dependencies are in place, the build system works across platforms and I have a pretty good idea of how things are going to mesh together.

The project will in fact be composed of the following:

  • the main epub validating library, written in portable C++;
  • a CLI application that uses that library (think epubcheck);
  • a GUI application with pretty buttons and drag-and-drop support.

I know a ridiculous number of people who don’t validate their epub books just because they are too scared of the command line to use epubcheck. While I think that’s absurd and childish, I can’t deny that having a simple GUI would be easier for the average user. And if the GUI supported dragging an epub file from the desktop and dropping it on the app window to initiate the validation, well that would be very useful. It could certainly speed things up.

The goals of the project are as follows:

  • check for everything epubcheck checks for, and much more;
  • be easily embeddable into native code applications (no frigging Java);
  • be easy to use and easy to understand: “unfinished element” means diddly-squat to people who don’t already know what the element is supposed to contain[3];
  • have developers that are active and responsive (and an active development process in general);
  • optionally warn about valid epub constructs that cause problems for certain high-profile Reading Systems.

That last part is interesting. We all know ADE has many quirks, and so does the iPad and other RS’s. Wouldn’t be lovely if you could instruct your epub validator to check for at least some of those problems as well?

I think that would be fairly useful.

This library will then be embedded into Sigil, and a simple toolbar button press will validate your epub and report any problems.

Before anyone gets ahead of themselves, bear in mind I won’t hit all those goals for the first release. It will be incremental and it will get better in time. But the scope of this project is thankfully much smaller than Sigil’s, so that’s a big relief.


[1] Currently it’s The Stand by Stephen King. That book has like a billion pages. It’s probably not helping matters that I’m reading the extended version.

[2] A software library, not a building. There was a misunderstanding about this a couple of days ago, and needless to say I laughed my ass off.

[3] This will entail writing some of the internal validators by hand and not resorting to schemas. Painful, yes, but necessary. Not just because of usability, but correctness too. For instance, a schema can’t check that the files listed in the manifest are actually in the epub. It won’t tell you if you’ve included the same file multiple times with different ID’s. There are many examples, most of which are checks that are way more important than an ID starting with a number.

I won’t do this for everything (I still intend to use an XML Schema for XHTML validation), but I will do it for the OPF. Most validation problems are there anyway. The OPF is the heart of an epub.