Friday, February 12, 2010

Loading performance and 0.2.0

I’m the first person to acknowledge that loading epub and HTML files in Sigil 0.1.x is slow. Very slow. Abysmally.

Why is that? Well there are two components in the 0.1.x loading process:

  1. Extracting the epub file, reading the OPF, running Tidy on the source, updating resource references etc. Let’s call this the “file load”.
  2. After the file load creates one large HTML flow, this is then sent to the Book View (integrated QtWebkit) for rendering. Let’s call this the “QtWebkit load”.

You can tell when the file load has finished and the QtWebkit load is starting: the moment you see “File Loaded” in the status bar is the moment that all Sigil code stops executing and now it’s up to QtWebkit to render the page.[1]

QtWebkit load

QtWebkit load takes longer, by far. Using the Three Men in a Boat epub file[2] from the MobileRead ebook uploads forum as a reference, the whole loading procedure takes 75 seconds in Sigil 0.1.9.[3] Way, way too long.

Of these 75 seconds, only 14.5 are spent in the file load. The rest is all QtWebkit, so not something I can directly influence. The QtWebkit load used to be more than twice as fast in Sigil 0.1.5, but the subsequent versions of Sigil include Qt 4.6 (instead of 4.5), and in that version QtWebkit is much slower. Nokia developers admit they introduced some major performance regressions and are currently working on fixing that.

I have no intention on waiting for that to happen. It was bad before, but now it’s horrible. I considered it far too slow in Qt 4.5, and now?… So how about I come up with a way to work around this problem?

Currently, Sigil takes in all of the XHTML files in an epub, puts them all together and displays them as one. So you get one large “flow” where you can do all the editing. I chose this model because it’s the one used in the popular[4] Book Designer (which doesn’t support epubs).

This is where Sigil 0.2.0 comes in.

Sigil no longer does that. All the original XHTML files are preserved and are edited one by one. Since there is no one huge flow, QtWebkit rendering performance goes up tremendously (since there is less to render).

Now, when Sigil loads your epub file, the first XHTML file by reading order is loaded in the initial tab. Since this first XHTML file is usually a cover page, it takes less than half a second to render. So now instead of 60 seconds for the QtWebkit load, you get 0.2 seconds (for the TMB file).

Great, ha? :)

File load

But then again, there are those 14.5 seconds for the file load. It would be great if I could get that down.

Most of that time is spent doing two things: running Tidy on the large concatenated HTML document, and updating the resource reference paths. The updating process takes much longer.

The resource updating process is necessary since Sigil 0.1.x renames your resource files. Since images, CSS files, HTML files, fonts etc. now have different names, all the HTML tags and style rules referencing them have to be updated. This takes a long time.

In Sigil 0.2.0, there are now multiple XHTML files, and they all have to be updated the same way. The original resource filenames are now preserved, but the file structure changes so we still need to update the paths. All the XHTML’s have different content, but the file path updates are universal. This means we can now parallelize this:

  1. We create a thread pool equal to the number of logical CPU’s[5] on the system;
  2. We split the updating process into “tasks”, where each task represents the required update operations to be performed on each XHTML file;
  3. We let the threads munch the tasks as they become ready to process them.

So if you have a dual core system like I do, two different threads execute two tasks at the same time. As they finish the task they have been working on, they arbitrarily pick a new one and work on that. So the more logical CPU’s you have, the more threads you can run, the more tasks your computer can work on at the same time, the faster the file loading will be.

I then plugged in the old updating subsystem into this multi-threaded architecture and ran it on my dual core. The file load on TMB dropped to 11 seconds. Not quite the ideal linear behavior, but that’s to be expected since there’s overhead in talking to the threads, managing the task pool etc. And the OS eats your cores too, so your threads can’t stay active all the time. Also, not everything in the file load can be parallelized; lots of things have to stay sequential.

The other major problem is that TMB has a huge number of images, meaning that many HTML “<img>” elements have to be updated. With a more conservative epub file, the numbers would be even better.

But a 25% improvement on a measly dual core isn’t half bad. It would certainly be faster on a quad. But I can take that even more down, I know I can.

So I spent about six hours in front of a code profiler and Visual Studio, tracing the bottlenecks and optimizing the “hot” paths. The major bottleneck was—as expected—the large and cumbersome resource updating subsystem. After rewriting it in what must have been ten different ways (each version slightly faster than the previous), I came up with the final design.

For the sake of reference, my profiler says that the old version takes on average 470 milliseconds to run through one XHTML file in TMB. After six hours messing with it, the final version takes 15 milliseconds. That’s 31 times faster.

File load for TMB? It’s now 3.3 seconds. Including the 0.3 for the rendering of the cover page, it’s 3.6.

So from 75 seconds in Sigil 0.1.9 down to 3.6 seconds in the development version of 0.2.0, I think I’ve done a pretty good job improving the loading speed.

For epubs with a “normal” number of images and computers with more logical CPU’s, it’s even faster.

epub name Time – 0.1.9 (s)[3] Time – dev0.2.0 (s)[3]
Three Men in a Boat 75 3.6
Sylvie and Bruno 82.5 6
Savage Stories of Conan 90.2 4.5
David Copperfield 98 3.2

These are all x86 times. For x64, knock off 10%.

Footnotes

[1]  “Render” means that the colors for the pixels on the screen have to be calculated, i.e. the screen has to be “painted”.

[2] Written by Jerome K. Jerome and painstakingly hand-crafted by MobileRead user zelda_pinwheel. It’s a great book, you should read it. It’s also an amazing epub file, I use it as my main reference during Sigil development.

[3] x86 Windows version of Sigil, on Windows 7 x64. Computer is a Core 2 Duo 6400 with 4GB RAM.

[4] But horrible.

[5] Logical CPU’s are the number of actual cores on your system and any “virtual” cores from HyperThreading.

Sunday, February 7, 2010

63 bits plus one

Some of you may have noticed that while you can get precompiled binaries of Sigil for Linux in x86 and x64[1] flavors, you only have an x86 version for Windows. Why? Well Microsoft made very a good job ensuring that 32 bit applications still run on 64 bit Windows. So there was no need for x64 Sigil. On Linux, it’s slightly different and nowhere near that easy.[2]

So what are the main benefits (from an application’s point of view) to the newer instruction set/architecture?

  1. The application has access to a larger address space (both physical and virtual),
  2. Registers are 64 bits, not 32; this allows better/faster 64 bit math,
  3. Double the number of general-purpose registers,
  4. Double the number of XMM registers,
  5. SSE instructions can be safely used knowing that all x64 CPU’s have to support them.

And several other things. The drawback is that your compiled code is now larger: pointers are all 64 bits, but the caches on the CPU’s are the same size. This is a big issue.

In the end, whether you’ll see direct performance improvements from moving to x64 depends entirely on the application. Some will, but some won’t. The Visual Studio devs have chosen not to make the transition just yet. I know other developers who have stated that their apps also behave worse on x64.

So you have to profile it to know for sure.

I used to think Sigil wouldn’t benefit from x64, performance wise. But then there was that nagging feeling telling me I should test it and see. The main reason why I didn’t want to do this is because I’d have to setup an entire new build system, with an entirely new Qt etc., etc. I already have four: Win x86, Lin x86, Lin x64 and Mac Universal. And building Qt (AGAIN) takes about five hours. You basically have to sit in front of the console, watching green text fly by because it likes to flake out in the middle and then you have to start it again. Not my ideal way to spend an afternoon.

But today I came down with a fever so I was too weak to do anything useful anyway. I may as well sit dozing in front of a screen. So I did.

Many hours later, after I got everything compiled and working, I set out to test Sigil 0.1.8 in x86 versus the same in x64. The test would measure the time it takes to load an epub book, from start to finish (this is easily the longest running operation in Sigil 0.1.8). I chose three epub files, did five runs for each on both versions, recorded the times and voila!

The x64 version was consistently 10% faster.

So my assumption was wrong. In light of these results, the next public release of Sigil (which should be 0.2.0) will in all likelihood include an x64 version for Windows.

In other news, I’ll be getting the 2010 version of Visual Studio when it ships in two months, for several reasons. But it will also provide tangible performance improvements for Sigil, since MSVC10 C++ compiler optimizations have improved. That’s another 10%.

And then there’s Sigil 0.2.0 and multithreaded, multi-flow loading. Now if only Nokia could get QtWebKit in a respectable shape…

 

Footnotes

[1] Some say “64 bit”, some “x64” or “x86-64” or “AMD64” or “Intel 64” and they all argue which is the correct one… I don’t care. I’m calling it “x64”, since that’s what people around me seem to be using. You know what I mean: the 64 bit extension to the x86 instruction set that AMD came up with and then Intel licensed.

[2] Also, Linux users tend to yell a lot more when their needs and/or desires are not met. Believe me, you don’t want to know.