Friday, October 24, 2014

How We Build the OpenStax College Books



In case you are not familiar with OpenStax College (OSC), it is a part of our OpenStax family that publishes free, open, peer-reviewed, commercial quality textbooks. The initial goal is to create textbooks for the top 20 Community College courses in the US. We have 9 books completed, 4 in production and funding for the remaining books.

OSC books are available in print, so there was a technical challenge to generate a print quality PDF for each of the books.  Our earlier PDF generation pipeline uses Latex which results in nice black and white PDFs, but did not meet the OSC requirements.  To meet the OSC requirements, we are using HTML5, CSS3 and a commercial product called PrinceXML. PrinceXML is the only commercial product used by OpenStax CNX. Each of the books has a different design and collation requirement so CSS has to be created for each unique element. The books also share features so we have tried to structure the CSS so it relies heavily on the Cascading part of Cascading Style Sheets.

Another requirement is that we must build the PDFs without human intervention.  It must be totally automated.  Since users can derive copies of the OSC and modify them, we also must deal with missing or added content from the original.

Code Structure


Our CSS code is broken down into 2 main parts: Slots and Skeletons. Skeletons define the namespaces used in a book.  Slots are used to define the styles for the namespaces. Some books are derived from others so their Slot and Skeleton files could be smaller than other books. There are also other CSS files that control numbering (using counters), page formatting and utilities.

PDF Generation Workflow




The workflow is
  • Content is stored as XML (CNXML) in OpenStax CNX
  • The XML is converted into HTML using a highly modified version of the Docbook transform
  • The resulting HTML plus CSS is passed to PrinceXML which generates the PDF
PDFs are generated when books are published or the content inside the book is updated.  All of the front matter is entered by hand into the PDF, but the remainder of the book is auto-generated.