Distribution Everywhere

Hugh McGuire; Andrew Savikas

2

Hugh McGuire and Andrew Savikas

Andrew Savikas is the CEO of Safari, was formerly VP Digital Initiatives at O’Reilly Media, and is on the Board of Directors of the Book Industry Study Group (BISG). You can find Andrew on Twitter at: @andrewsavikas.

Distributing (in) the Future

Publishers working to understand the changing distribution landscape must formulate a strategic response that both responds to the present reality and provides the flexibility to adapt to multiple possible futures. There are two fundamental aspects to the changing nature of publishing: changes to the form of what’s created and consumed, and changes to the format in which the content still created in traditional forms is produced and distributed (developed, packaged, and sold). This chapter focuses on the format side of the equation, so let’s first define the form component so we can put it to the side.

By form I mean the character of the content—what length is it, what style is it presented in, and does it use text, animation, video, or some combination of the three? Examples of content forms are articles, movies, plays, games, songs, essays, and of course books. Kevin Kelly defines the form of the book nicely^[1]: “A book is a self-contained story, argument or body of knowledge that takes more than an hour to read. A book is complete in the sense that it contains its own beginning, middle, and end.” Nothing about the physical (or virtual) container for that book, its price, or how it is found by a reader is inherent in that definition—all of those are aspects of the book’s format.

There are many examples of disruption for certain types of books that have been replaced by better forms for doing the same job the book used to do. Atlases, dictionaries, and telephone books will never again be the dominant way that people understand geography, word meaning, or contact information, no matter how efficiently those books can be produced or sold. (Phone books are delivered to your door for free, yet many people now find them nothing more than a nuisance, and have agitated for ways to opt-out of delivery^[2].)

Although it is important to ask whether there is an alternate form for doing the same job that a particular book does today, this chapter is about those types of books that will remain viable as self-contained stories, arguments, or bodies of knowledge—and what the profound changes to the format in which those books are created, packaged, distributed, priced, and sold mean for publishers.

The format in which demand for books is met includes things like business model, price, supply chain, and distribution mechanism. Form disruption can erase demand for particular books; format disruption means radical changes in how continued demand for the form is met. (For an example of a form change, think newsreel; for format change, think Netflix.)

Aggregation

For many in publishing, “aggregation” is a dirty word, typically tossed around in conversations that include discussions of Google, especially if the conversation involves newspapers. But aggregation is at the heart of every effective distribution system—aggregators moderate and simplify a complex network of many-to-many relationships^[3], aggregating buyers and sellers at a scale neither side could achieve on its own, creating value for both sides. Supermarkets are aggregators, shopping malls are aggregators, eBay is an aggregator, Craigslist is an aggregator, and cable operators are aggregators. Jonathan A. Knee wrote about the role of aggregators in the media business in a recent issue of The Atlantic^[4]:

In fact, the dirty little secret of the media industry is that content aggregators, not content creators, have long been the overwhelming source of value creation. Well before Netflix was founded in 1997, cable channels that did little more than aggregate old movies, cartoons, or television shows boasted profit margins many times greater than those of the movie studios that had produced the creative content. It is no coincidence that although, say, 90 percent of the public discourse surrounding Comcast’s recent $30 billion acquisition of NBC Universal involved the Conan O’Brien drama or the shifting fortunes of Universal Pictures, in reality, 82 percent of the new company’s profits come in through the cable channels.

The economic structure of the media business is not fundamentally different from that of business in general. The most-prevalent sources of industrial strength are the mutually reinforcing competitive advantages of scale and customer captivity. Content creation simply does not lend itself to either, while aggregation is amenable to both.

That aggregators can capture so much value in the ecosystem reinforces the notion that the value is as much (or more so) in the service provided by the aggregator as in the content itself, something I wrote about in 2009^[5]:

Whether they realize it or not, media companies are in the service business, not the content business. Look at iTunes: if people paid for content, then it would follow that better content would cost more money. But every song costs the same. Why would people pay the same price for goods of (often vastly) different quality? Because they’re not paying for the goods; they’re paying Apple for the service of providing a selection of convenient options easy to pay for and easy to download.

Aggregators provide clear value to buyers and sellers, and while it is possible to build a large and profitable business serving customers directly (either through a direct sales channel or through vertical integration), most publishers depend on aggregators to connect their content with a large enough audience.

But of course aggregators are not merely benevolent actors in the market, and both buyers and sellers understand the importance of ensuring that no single aggregator develops too much control over either side (or especially both sides!) of the transaction. For publishers, that debate was long about whether superstores like B&N or Borders (and later Amazon.com) were gaining too much control. Managing aggregators requires a delicate balance, as bigger aggregators can deliver a larger audience but typically use those same economies of scale to extract more favorable pricing terms. Ask anyone who sells to Walmart.^[6]

Ideally, a content seller will seek to cultivate a rich ecosystem of multiple aggregation points, offering a diversity of services and options to customers. But each unique destination point introduces friction into the supply chain needed to service it. Reducing friction is the reason organizations like the Book Industry Study Group^[7] (BISG) exist—to develop and maintain standards and practices^[8] that reduce those friction points. But the “sameness” of an efficient ecosystem of aggregators leaves suppliers and aggregators alike open to another threat: commoditization and its accompanying downward pricing pressure.

With sufficient choice for customers among substantially similar products, the aggregator with the biggest economies of scale can make up for a reduction in margin from lower prices on individual transactions by selling more widgets. Those economies of scale also offer the largest players the ability to use that volume to spread out the high fixed-cost investment in more efficient operational capabilities and things like algorithmic recommendation engines, reinforcing the quality of the “service” component of what customers pay for.

Injecting variety into the formats available—different business models^[9], different discovery mechanisms^[10], and different packaging options^[11], for example—provides a hedge against consolidation and commoditization, as does any way of differentiating a more profitable sales channel that cannot easily be matched by other aggregators. The downside is that each variation in format typically introduces that channel friction. Anyone who has dealt with the metadata used by today’s burgeoning ebook reseller landscape can attest to the impact of friction.

But how do you efficiently and effectively develop and cultivate the kind of rich distribution ecosystem needed to defend against any individual aggregator gaining too much control, while minimizing the cost of format friction? The rest of this chapter describes how O’Reilly Media has done it in the context of the digital disruption of the last decade, and many of the techniques described can be applied within your own business.

Digital History at O’Reilly

In the late 1990s, readers of computer books began expecting their books to behave like much of the other technical information and documentation they used to do their jobs. That is, they expected their books to be digital, searchable, and hyperlinked, like the Web. In response, O’Reilly launched a series of products called “CD Bookshelves.” These were CD-ROMs packaged in a box sized to fit on a bookstore shelf,^[12] and contained HTML versions of multiple books on a related topic. Similar to the Encarta model of Encyclopedia distribution, CD Bookshelves put an electronic version of a printed text onto a reader’s PC. It was O’Reilly’s first real foray into “digital distribution.”

At the same time, companies like Microsoft (with their nascent MSN service) and AOL were rushing to fill their walled-garden services with content that would attract subscribers, and several began reaching out to publishers to license book content. Most proposals offered a small percentage in licensing fees, not unlike the translation rights business publishers are familiar with. But if the Web was going to disrupt the book business the way it was disrupting so many other industries (travel, insurance, and investing to name a few), then digital distribution and consumption would inevitably become not just an ancillary channel, but the primary means of distribution and consumption. And single-digit licensing percentages would devastate any publisher’s business model.

In 2001, O’Reilly partnered with Pearson Technology Group to build Safari Books Online,^[13] a joint venture for distributing a library of computer books delivered on the Web for a monthly subscription. The pricing model and sales strategy developed for Safari Books Online were radically different from what its founders were familiar with, which had the side benefit of not competing directly with the existing print business.

At the time, O’Reilly books were made the same way most publishers make books: A manuscript was written in Word, then laid out in a desktop-publishing program like InDesign or FrameMaker, with metadata managed in a title database. And of course the entire production, manufacturing, and distribution workflow was optimized for selling print books at retail. Ebooks for Safari were an afterthought, the responsibility of a separate team isolated from anyone involved with making and distributing the “real” book. This made sense when digital sales were relatively small—manual rework for the needs of a small sales channel was more efficient than investing in systems or processes optimized for that channel. But by 2005, Safari Books Online had grown into the second-largest sales channel for O’Reilly, and there was a clear need to reevaluate the way we met demand for that channel.

On one hand, it was a channel we owned and believed in, and it was growing beyond expectations. Because it is owned by publishers, the terms are friendly, and it contributes to the diversity of aggregation points needed to hedge against dominance by any one aggregator. On the other hand, the publishing and metadata workflow in place at O’Reilly in 2005 meant substantial conversion costs and delays for getting titles into Safari. Customers were understandably frustrated when titles took days or weeks to appear in Safari, often long after they were available in print from retailers. If the new digital channels on the horizon each required the kind of rework then needed for Safari Books Online, we would be in trouble.

Several attempts were made from an operational perspective around new production tools, but all of them failed because each assumed the solution was to graft a new tool or process onto the existing workflow, because disrupting the way books were made would mean disrupting the biggest and most profitable path to market—print books, sold at retail.

The Toolchain

While much of the early Web was static documents connected to each other via hyperlinks, Web publishing quickly evolved to separate content creation from content presentation. The raw materials are captured and stored in a database, then assembled and delivered on demand, often customized for one individual. Very few web pages are presented identically to two different people; ads, navigation headings, and links to new or related information are commonly created in realtime with each request for a particular page.

A key benefit of separating content creation and storage from content presentation is that a single source can be repurposed for multiple presentation formats. This concept is pervasive within today’s mobile app ecosystem. As an example, the restaurant reviews found on Yelp^[14] live in a database and are dynamically delivered to desktop and mobile web browsers, as well as iPhone and Android apps, all with different interfaces and affordances, and often customized for an individual user (based on, for example, her location when using the service). If books were to be delivered on the Web, they needed to behave more like dynamic Web content and less like digital representations of printed books. But as long as the assumption persisted that the printed book was the primary goal, meeting digital demand would remain an inefficient and costly afterthought.

Was there a way to separate the content creation and storage for book content from its presentation? The answer for O’Reilly was to standardize around an XML format that we’d help create, known as DocBook XML^[15]. DocBook is a semantic markup language, meaning it’s intended for describing what a particular piece of text is, rather than how it should look. When reading text, we use visual formatting hints to infer structure—the big text at the top of a page is a heading; text formatted with italics is meant to be emphasized. What appears in its own box on the side of a page is interpreted as a sidebar of brief but related material. Indicating structure using presentation works very well when there’s a single presentation, but quickly breaks down when you want to either manipulate the text for new purposes, or present it in multiple ways. And the biggest downside is that while people are great at pattern matching based on visual cues, computers remain lousy at it. Google can’t effectively “look at” each web page to determine what its title is based on the relative font size or position on the page the way people do, so most web pages tell Google what their title is explicitly. You can see this yourself from your Web browser, by choosing View→Source while looking at any web page. Near the top there’s text that looks like this:

<title>O'Reilly Media - Technology Books, 

Tech Conferences, IT Courses, News</title>

Note that this is the same text that appears in the top of your browser window. Semantic markup like that uses computer-friendly labels to indicate the structure humans typically infer from presentation. In the case of DocBook, that means that in addition to elements like “title,” there are labels for things like “chapter,” “sidebar,” “index,” and “warning”—all of the common structural building blocks for a technical book. When everything is labeled like that, it becomes easy to apply different formatting rules for different presentation needs. That’s accomplished through the use of stylesheets, with each one designed for a different presentation format. You see this when you view a website such as Yahoo.com^[16] on your laptop and then on your smartphone or tablet. In each case, the content is substantially similar, but it’s presented differently and optimized for the particular screen you are using.

Adapting that approach for books means reorienting the production process toward the “final” output of that semantically rich XML, rather than a PDF destined for the printer. Capturing the content and all of the semantic information about its structure independently of the particulars of presentation is a powerful capability, one that underpins the ability to rapidly respond to business opportunities for new distribution and presentation formats.

DocBook XML was the logical choice for O’Reilly for two main reasons. First, it is a mature, well-documented open standard, with a large ecosystem of tools and users. Second, a modified version of DocBook was already the format used for delivering content to Safari Books Online. That meant that if we could find a way to get print-friendly PDFs from DocBook, we’d be able to produce books for both of our (radically different) sales channels simultaneously from the same source files.

That large ecosystem of tools and users meant that there was already a very mature and robust set of open-source stylesheets^[17] intended to do exactly what we wanted: to take a set of DocBook source files and create multiple outputs, each with its own formatting rules. We could even create multiple versions of the same output format; for example, a PDF intended for printing (with crop marks and high-resolution images) and a PDF designed for viewing digitally (with color images and hyperlinks). By customizing the stylesheets with our branding, we could deliver three different “final” outputs (print PDF, web PDF, and Safari) from the same source file at the same time, while retaining the flexibility to modify the presentation formatting independently of the content.

When EPUB^[18] emerged as the standard for the growing ebook market, we partnered with Adobe to contribute changes to those open source stylesheets^[19] to support output as EPUB (and with some additional processing, in Kindle-compatible Mobi format as well). That meant that as long as our production workflow resulted in a high-quality DocBook XML version of a book, we could deliver multiple print and digital versions at the same time from the same source. That’s a very powerful capability in a rapidly changing market.

The toolchain was not without trade-offs. When print is just one of many output formats, you give up a degree of control over things like page and line breaks, things that many production staff have spent years fretting about. But we decided those tradeoffs were worth the substantial improvement in flexibility, especially if the assumption that print would decline relative to digital sales held.

Seizing Market Opportunities

The ability to quickly and efficiently produce multiple digital versions of a book also supported a refocus on driving direct ecommerce sales of O’Reilly ebooks. Beginning in 2008, O’Reilly began offering what we called ebook bundles^[20] for new titles. Customers buying direct from oreilly.com received a web-friendly PDF, an EPUB file, and a Kindle-compatible Mobipocket file. This diversity of formats could be assembled with no additional cost or delay. The bundled offer recognized that our customers often wanted different formats for different situations—a PDF for quick searching on a laptop, but an EPUB for use on an iPhone during the morning commute.

Having all new titles automatically output as EPUB files also positioned us to quickly exploit new sales opportunities. Our partnership with Lexcycle, makers of the Stanza ereader app (a relationship that ended not long after Amazon acquired Lexcycle and development stalled), allowed us to generate hundreds of individual ebook apps from those EPUB files for sale in the App Store long before iBooks emerged. While the market for individual ebook apps for iOS has migrated toward resellers like Kindle and iBooks, we continue to sell similar EPUB-based apps in the Android Market, and we continue to sell many iOS apps to customers in countries that don’t yet have access to the iBookstore. The ability to offer a bundle of formats also means that we’ve been able to add Android .apk application files^[21] and accessible DAISY talking book versions^[22] for many titles.

Those multiple output versions are like snapshots of the underlying XML source, generated at a particular point in time. And because creating those snapshots is automated, it can be done on-demand, any time that the source XML changes. There are strong similarities to software development in that approach—programmers write and revise plain-text computer code, then “compile” that code into a program or application. Our DocBook XML is like the source code, and the output formats are like the applications. Software developers spend more time than just about anyone writing, editing, and collaborating on complex long-form text documents, so it makes sense to borrow and adapt some of the tools and techniques they use to make that task more manageable.

Most of us are now quite accustomed to getting updates to software applications, which is a reflection of the need to continuously refine and improve software to respond to customer needs, technology changes, and market dynamics. Many (though certainly not all) types of books would benefit from that capability, especially if the book is delivered digitally. If our smartphone apps can tell us when they’re updated, why not our books, too? When a correction or change needs to be made to an O’Reilly book, the XML source files are modified, and the output formats are “recompiled” to include the changes. Free lifetime updates became a powerful differentiator (along with offering multiple formats) for oreilly.com as a direct sales channel.

Extending the toolchain all the way back to the authoring stage means the ability to use the same single-source, multiple-output capability well before a book is “finished.” Other efforts to achieve this include the Pressbooks^[23] system used for this book. Taking inspiration from systems assembled by several authors, O’Reilly built its Open Feedback Publishing System^[24] to support early release and feedback for books in progress. And because we can deliver updates to customers after the purchase, many titles^[25] also go up for sale on oreilly.com well before publication and are included in Safari Books Online as part of their “Rough Cuts” program. This “release early, release often” attitude mirrors the philosophy of software development, and challenges the idea that many books will ever really be “done.” The concept of a “networked book” is covered in this book by Bob Stein. Kevin Kelly summarizes the idea nicely^[26]:

One quirk of networked books is that they are never done, or rather that they become streams of words rather than monuments. Wikipedia is a stream of edits, as anyone who has tried to make a citation to it realizes. Books too are becoming flows, as precursors of the work are written online, earlier versions published, corrections made, updates added, revised versions approved. A book is networked in time as well as space.

As print-on-demand prices have continued to decline, we’ve even been able to extend that update model to print. Through a partnership with Ingram and Lightning Source^[27], as updates are made to books, fresh PDFs are sent to Lightning Source so that the next order received for that title will be fulfilled with a printed book that contains the latest changes.

While the author-facing parts of the O’Reilly toolchain may still be a bit too technical for many authors (though those who do use it are quite happy^[28]), projects like Pressbooks and new writing apps like Scrivener^[29] are bringing the same principles of separation of content from formatting, multiple output formats “compiled” from the same source, and rapid iteration to a more mainstream author audience.

Conclusion

In addition to being able to quickly react to changes in the sales and distribution landscape, our toolchain means we can also actively nudge customers and channel partners in a direction we believe makes the most sense for us and the market at large. We have the luxury of a strong and profitable direct sales channel (though one that was built with years and years of sustained effort!), and we hold that up as the ideal from a customer experience perspective—multiple, DRM-free formats and free lifetime updates in particular. None of the major ebook resellers can match that offer, but we also know that plenty of customers will prefer to buy from Amazon, Apple, Kobo, or someone else, both in print and electronically. So anyone who buys either a print or electronic O’Reilly book anywhere can “register” that purchase with us, and for $5 they get all of those other ebook formats and the free lifetime updates. Nudge, nudge.

When I talk about what we’ve done at O’Reilly, it’s often dismissed because we’re seen as a technology company. But I can’t stress enough that five years ago, even though we were publishing books about many of the technologies we eventually used for our multichannel publishing toolchain, nobody involved with actually producing, distributing, or selling those books knew any more than other publishers about multichannel publishing or a digital-first workflow. A culture that welcomed experimentation was critical, but that was as much driven by the necessity of nearly a decade of declining print sales—sales pressure many other publishers are just now starting to feel. And the tools, standards, techniques, and business opportunities available today are much more developed than they were five years ago.

Focusing a distribution strategy merely on efficiently filling existing channels with a standard set of products is only part of the challenge. A product development capability to quickly create a variety of print and digital outputs can give you the flexibility to quickly respond to new market opportunities and more effectively encourage a diverse ecosystem of aggregation points, all while hedging against the dominance of any single aggregator.

Give the author feedback & add your comments about this chapter on the web: https://book.pressbooks.com/chapter/distribution-everywhere-andrew-savikas

2