Friday, October 8, 2010

Sigil 0.3.0RC2

Well that was fast.

A file-corruption-on-save issue was detected in RC1 after it was published. It was taken down until the problem was resolved, and now it has been.

RC2 is now available for download.

Sigil 0.3.0RC1

I’ve just released Sigil 0.3.0RC1. Please note that this release include some major changes under the hood, and as such may not be very stable. From the test I’ve made, it works great. Any problems I’ve found I’ve quickly fixed. But since this release brings a new version of Qt and completely replaces the internal XML DOM provider from QDom to Xerces, there are bound to be some regressions I’ve missed. Bear that in mind.

Here’s the changelog:

  • fixed a validation issue caused by using the American spelling for "acknowledgments" where the OPF spec uses the British "acknowledgements" (issue #611)
  • Sigil now uses "application/x-font-ttf" as the media type in the OPF for TrueType fonts (issue #609)
  • on Mac OS X, the universal build now includes an x64 version of Sigil, and builds now use Cocoa instead of Carbon; support for Mac OS X 10.4 is dropped along with support for PowerPC Macs
  • fixed a problem with opening files from the Ubuntu "Open With" menu (issue #524)
  • made Tidy handle common user errors like "&co." in the HTML source instead of "&co."
  • fixed a rare Tidy bug with disappearing spaces when the only whitespace in an element was the newline following a start tag (issue #387)
  • changed the internal DOM engine from Qt's QDom to Xerces; this should also bring numerous bug fixes and performance improvements plus a small (~10%) decrease in memory consumption (issue #367)
  • updated Qt to 4.7: this should bring a 400%+ performance increase in Book View rendering along with countless smaller performance improvements and bug fixes across the board
  • switched to the MSVC 10 compiler for Windows releases; should bring ~5% general performance improvement
  • fixed several crash/error problems relating to opening, saving and modifying epub files which have onerous file permissions set for internal content files (issue #574)
  • added a workaround for broken epubs created by other epub-producing software which caused a crash on certain searches with the Find dialog (issue #548)
  • fixed a problem with Book Browser's "Merge with previous" action if a file was previously deleted from the Book Browser (issue #565)
  • fixed a problem with chapter splits being placed in the wrong reading order if a file was previously deleted from the Book Browser (issue #497)

Performance improvements all around, plus some major fixes. Mac and Linux users would sometimes get a “permission denied” error when opening an epub with Sigil; this was caused by badly constructed epub files. A workaround has been implemented so you should not see this anymore.

Mac users also get an x64 build in the universal binary, and this should bring another 10% performance improvement to people who have a Mac that can support this architecture. Support for Tiger is dropped along with support for PowerPC Macs.

Linux users get some custom love too: you should now be able to use Ubuntu’s “Open With” menu with Sigil.

If you see any regressions, please report them ASAP. Sigil 0.2.4 stays the “official” versions until any major problems with 0.3.0 are resolved.

Saturday, October 2, 2010

Introducing FlightCrew, the epub validator

I’ve been talking about this for a while under the name of “that epub validating library”, and now it has a name. That name is FlightCrew.

It’s a C++, cross-platform, native code epub validator (it’s also open source). The project is composed of three parts:

  1. FlightCrew, the validation library;
  2. FlightCrew-cli, the command-line front-end to the FlightCrew library;
  3. FlightCrew-gui, the GUI front-end to the FlightCrew library.

There are installers and DMG’s for download that package FlightCrew-gui, which provides a nice GUI interface to the underlying library. Errors have a reddish background (ok, it’s pink), while warnings have a yellow one. Here’s a screenshot:

I’ve kept the interface to a minimum on purpose. There’s something to be said about simplicity. As Antoine de Saint-Exupéry said: “Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”

You can also drag files from your desktop or file browser and drop them on the FlightCrew-gui window. This will instantly run validation checks on it. This drag-and-drop interface works on all platforms.

FlightCrew-cli is included in all packages of FlightCrew-gui, and does the same thing as that application only from a command-line interface. It works the way epubcheck works—feed it a file, it spews out warnings and errors if necessary.

The current version number for all this is 0.7.0, which I’m using as a sort-of indication of it’s completness. I started working on this back in July, but since there was a rather lovely summer between now and then, it has only had about a month and a half’s work put into it. It’s still roughly 20 KLOC with a complete test suite, so it’s no slouch.

Why FlightCrew is better than epubcheck

First off, “better” is a dirty word. Each tool has its pros and cons. Epubcheck’s (EC) advantage is that it checks for a few things FlightCrew (FC) doesn’t (yet). But the reverse is also true: FC checks for a lot of things EC doesn’t. Off the top of my head, FC performs an extensive reachability analysis and will warn you if you have some resources listed in the manifest that are not used anywhere. It will also report an error if you have an OPS1 document that the user can reach—through the <guide> or <tours> element, the NCX or just normal links in the text—but that is not listed in the <spine>. This is one crucial mistake that can now be caught. Reachability analysis also catches files that are used but not present in the manifest.

There are many other things that FC will check for that EC will not, and most of those you care about deeply and just don’t know it. The things EC checks for that FC doesn’t? Two “big ones”: OPF-listed fallbacks and DTBook syntax verification. If you haven’t heard of either, then you’ve never used them, never will and probably shouldn’t. These are very rarely used features of epub that I have personally never seen used in practice. But they’re big parts of the epub specifications so FC should check for them (and will, fairly soon) for the sake of completeness. There are a few other odds and ends that EC looks for but FC doesn’t.

But here’s where FC blows EC out of the water…

Error reporting done right

Let’s pretend I don’t know most of the epub specs by heart and that I’m a newcomer to epub. I made my first epub book, and I’ve heard that I should validate it. I’ve downloaded both FC and EC and now I’m going to use both. I’m going to use EC first because I’ve heard it’s “what the pros use”. Note that the a pair of EC/FC examples refers to the exact same problem with the file, and the messages usually come with line numbers (unless otherwise specified) which have been omitted. Commentary has been added for the sake of ridicule.

EC: length of first filename in archive must be 8, but was 19 [no line number, ed.]

Um… what? WTF is that supposed to mean? What filename? And why must it be exactly 8? What the hell are you talking about?

FC: Bytes 30-60 of your epub file are invalid. This means that one or more of the following rules are not satisfied:
  1. There needs to be a "mimetype" file in the root folder.
  2. Its content needs to be *exactly* "application/epub+zip".
  3. It needs to be the first file in the epub zip archive.
  4. It needs to be uncompressed. [no line number, ed.]

Ah… not only does this point out the problem (correctly!), it also tells me how to fix it. Nice.

EC: required attributes missing

Huh? I understand that you’re trying to tell me that some required attributes are missing (one or more? you haven’t said), but how about telling me which ones you frigging bastard. Am I supposed to read through the entire XHTML specification, hunting down which attributes this element should declare or even know that I’m supposed to do exactly that?

FC: missing required attribute 'alt'

Thanks! That was awesome. Saved me a ton o’ hassle.

EC: unfinished element

Exceedingly useful, that. Mind telling me how it’s unfinished?

FC: The <title> element is missing.

Now that’s more like it. I’d kiss you if I could.

EC: unfinished element

Didn’t you just say this? And on the exact same element? Why am I getting this again, I thought I fixed this…

FC: The <identifier> element is missing.

You just keep getting better and better!

EC: unfinished element

Fuck you, epubcheck. Fuck you.

FC: The <language> element is missing.

Want to pet my cat? ‘Cause you’re awesome, and I only let awesome things pet my cat.

Back to reality. I hope you got the point, cause if you haven’t, I can pull out tons of other examples.

Now I know that most of the error messages from EC are actually coming from an internal component of it called Jing and that has crappy error messages, but as a user I don’t care. Adobe should use something better instead, or fix Jing. And lots of the crappy messages come from EC core; that little “length of filename” gem was all theirs.

Have I also mentioned that EC development is pretty much dead? From it’s public source repository, it has had a whopping one source code commit in the last ten months, and that commit was four days ago.

In short, use FC first, then EC to get some of the checks FC doesn’t (yet) perform. After FC becomes a strict superset of all EC functionality (roughly a couple of months), drag epubcheck down to the cellar and shoot it in the back of the head.

Or just stop using it, if you prefer.

Footnotes

[1] A funny way of saying HTML document. It’s more than that of course, but for now mentally replace “OPS” with “HTML” .

Friday, October 1, 2010

Mac OS X 10.4 support is going the way of the dodo

Just a quick note to people who may still be using Sigil on Tiger: 0.2.4 was the last version to feature support  for that operating system. Why? Many reasons. Let’s start with a  few:

  • Sigil 0.3.0 will use Qt 4.7, and Nokia is slowly discontinuing support for Carbon with this release. It’s still there, but it’s second-tier.
  • So far, Sigil was offered in 32bit versions for PPC and Intel architectures on Mac (as a universal binary). I want to provide 64bit versions, and for that I need to use the Cocoa version of Qt, which doesn’t exist for Tiger.
  • From site statistics, about 4.5% of all Mac users of Sigil use Tiger. 73% use Snow Leopard. These people would like explicit 64 bit support, and could certainly put it to good use. As previously reported, Sigil runs about 10% faster on such architectures.

There are a few other reasons.

In short, this is the right way to go. I know some people will be seriously pissed off at this, but I have to think about the majority of the users. Sigil 0.2.4 will be indefinitely available for download on the project site (as with all versions of Sigil), so you can always just continue using that.