Sunday, July 25, 2010

Vacation time over, get back to work!

So I just returned from vacation.

I didn’t write a single line of code for the never-ending doxygen comment conversion, but I did end up working quite a bit on the epub validating library. It’s progressing quite nicely, it’s already a couple of KLOC. I’m also doing it TDD-style which I’m finding to be a rather pleasant workflow, once you get used to it. Seeing a test fail and catch a regression I wouldn’t notice until after it was too late brings a smile to my face. On a related note, gtest and gmock are awesome.

Oh, remember when I said that writing XML validation checkers by hand instead of using schemas would be painful? Guess what, it is. It really, really is. :)

One of the other benefits of working on this library is that I’m solving all sorts of problems with Xerces integration. The library uses Xerces for everything XML, and working with it has given lots of insights that will be applicable to Sigil when I start replacing QDom with it.

I’ll be mostly working on this library over the coming weeks, so Sigil will see only bug fixes going forward. Hey, the library is the major future feature for Sigil since it will be integrated into it, so transitively I’ll still be working on Sigil all this time.

Qt 4.7 is just around the corner too[1]. When it does arrive, you can expect a version of Sigil integrating it very quickly, provided there are no problems migrating to the newer Qt[2]. The QtWebKit improvements alone make me giddy like a little schoolgirl. 4x faster rendering? Hell yeah!


[1] Actually I expected it to be released while I was on vacation. Or at least QtWebKit 2.0. Nokia devs said that was coming in May, and it’s the very end of July now… grumble

[2] I seriously expect no problems. Nokia has always been adamant about backwards compatibility, and from past experience I can say they usually do a good job on this.

Thursday, July 8, 2010

Vacation and validation

The semester is finally over so I’ll be heading on vacation tomorrow. I’ll be gone for about two weeks, and in that time no issues will be examined, no code will be written etc. Well, not really. In general, I won’t have internet access but I may pop in from time to time to check the Sigil forum on MobileRead. No promises though.

While I’ll be spending most of my time lying on the beach, soaking up the sun with a comfortable book[1] by my side, I’ll also be doing some light development work. By “light” I mean finishing the doxygen comments conversion for Sigil (which has been taking forever) and some code for the new epub validation library I’m writing. Not a lot of code, just a bit.

What’s that last part about? Well what’s the most requested feature for Sigil on the tracker? It’s integrated epub validation. I’ve planned for months to start working on this as a separate project in August, and I’ve just started doing some architectural design for the library[2]. The dependencies are in place, the build system works across platforms and I have a pretty good idea of how things are going to mesh together.

The project will in fact be composed of the following:

  • the main epub validating library, written in portable C++;
  • a CLI application that uses that library (think epubcheck);
  • a GUI application with pretty buttons and drag-and-drop support.

I know a ridiculous number of people who don’t validate their epub books just because they are too scared of the command line to use epubcheck. While I think that’s absurd and childish, I can’t deny that having a simple GUI would be easier for the average user. And if the GUI supported dragging an epub file from the desktop and dropping it on the app window to initiate the validation, well that would be very useful. It could certainly speed things up.

The goals of the project are as follows:

  • check for everything epubcheck checks for, and much more;
  • be easily embeddable into native code applications (no frigging Java);
  • be easy to use and easy to understand: “unfinished element” means diddly-squat to people who don’t already know what the element is supposed to contain[3];
  • have developers that are active and responsive (and an active development process in general);
  • optionally warn about valid epub constructs that cause problems for certain high-profile Reading Systems.

That last part is interesting. We all know ADE has many quirks, and so does the iPad and other RS’s. Wouldn’t be lovely if you could instruct your epub validator to check for at least some of those problems as well?

I think that would be fairly useful.

This library will then be embedded into Sigil, and a simple toolbar button press will validate your epub and report any problems.

Before anyone gets ahead of themselves, bear in mind I won’t hit all those goals for the first release. It will be incremental and it will get better in time. But the scope of this project is thankfully much smaller than Sigil’s, so that’s a big relief.


[1] Currently it’s The Stand by Stephen King. That book has like a billion pages. It’s probably not helping matters that I’m reading the extended version.

[2] A software library, not a building. There was a misunderstanding about this a couple of days ago, and needless to say I laughed my ass off.

[3] This will entail writing some of the internal validators by hand and not resorting to schemas. Painful, yes, but necessary. Not just because of usability, but correctness too. For instance, a schema can’t check that the files listed in the manifest are actually in the epub. It won’t tell you if you’ve included the same file multiple times with different ID’s. There are many examples, most of which are checks that are way more important than an ID starting with a number.

I won’t do this for everything (I still intend to use an XML Schema for XHTML validation), but I will do it for the OPF. Most validation problems are there anyway. The OPF is the heart of an epub.