eBook File Formats

This is the fourth in a series of essays on electronic publishing. Previous essays can be found by following the links below.

Essay 1
Essay 2
Essay 3

As the electronic publishing world expands, the number of file formats grows, too. We’ll look at three major file formats, PDF, ePub, and .mobi. While I would like to discuss the Kindle format, it is proprietary and I don’t know very much about it. That doesn’t mean that we’ll ignore the Kindle. The Kindle can read .mobi format files.

Let’s start with the original digital document, the PDF file. Adobe created the pdf, which stands for portable document format. The history of pdf goes all the way back to the early nineties as an attempt at a paperless office. We all know how well that worked out. Still, pdf has several advantages and it’s the right solution for some applications.

The major advantage of pdf is that it reproduces documents exactly as they might appear on a printed page. This includes text and images, the latter being much more challenging in other electronic file formats.

Many short fiction markets regularly use pdf to produce their zines. In some cases the pdf file is the only version that is produced. But the strength of pdf is also one of its drawbacks.

The pdf file is a WISYWIG file. For the younger reader, that means what you see is what you get. If you want the text larger, you zoom the entire pdf file. That requires the reader to scroll left and right to get the entire sentence to appear on the screen, making it a less than stellar choice for ebook readers.

The Kindle format being proprietary forced the creation of another, more standard file format that would meet the needs of the eBook user and be transportable from device to device. The internet already evolved a file format that works very well for resizing and self-adapting line length in the form of html. The epub file is similar, but it uses a stricter and less flexible version of html called xhtml. The main difference is that a lot of shortcuts you can get away with in html cause errors in xhtml. On the other hand, the typical epub file doesn’t need much in the way of fancy coding.

The epub file is actually a collection of compressed files. Different aspects of display are handled by different files. For instance, the table of contents has its own file with hyperlinks to content files. Lines in the file have specific formatting requirements. They are not difficult, but they aren’t always intuitive, either.

the major benefit of epub is transportability, and they can be read on many different brands of ebook readers and even individual files in a web browser. Text can be scaled and lines will wrap on the screen. The drawback is when it comes to printing. Since the document can appear different depending on what reader and who is using it, there isn’t a single “right way” the page is supposed to look. But then, how many people print out their ebooks? This can also cause pagination to occur at weird places on the screen, indicated only by the page number appearing in unexpected places. This is not hard to get used to.

The third file we’ll discuss is the mobi file. It’s difficult for the typical author to understand much about these files because they are formated in a way that can’t be read by people. This is a drawback because the file must be generated by converting from another format (epub works well). The main reason you might use this format is to provide your own material to owners of Kindle readers. Since the kindle format is proprietary, we can’t produce those without going though Amazon. Mobi files allow us access to these ebook readers without involving Amazon.

I go into a lot more detail, including how to construct these three file types and make your own ebooks in my free pdf primer, eBooks for the 21st-Century Author.

Tagged . Bookmark the permalink.

Got something to say? Go at it!