Friday, October 24, 2014

How We Build the OpenStax College Books



In case you are not familiar with OpenStax College (OSC), it is a part of our OpenStax family that publishes free, open, peer-reviewed, commercial quality textbooks. The initial goal is to create textbooks for the top 20 Community College courses in the US. We have 9 books completed, 4 in production and funding for the remaining books.

OSC books are available in print, so there was a technical challenge to generate a print quality PDF for each of the books.  Our earlier PDF generation pipeline uses Latex which results in nice black and white PDFs, but did not meet the OSC requirements.  To meet the OSC requirements, we are using HTML5, CSS3 and a commercial product called PrinceXML. PrinceXML is the only commercial product used by OpenStax CNX. Each of the books has a different design and collation requirement so CSS has to be created for each unique element. The books also share features so we have tried to structure the CSS so it relies heavily on the Cascading part of Cascading Style Sheets.

Another requirement is that we must build the PDFs without human intervention.  It must be totally automated.  Since users can derive copies of the OSC and modify them, we also must deal with missing or added content from the original.

Code Structure


Our CSS code is broken down into 2 main parts: Slots and Skeletons. Skeletons define the namespaces used in a book.  Slots are used to define the styles for the namespaces. Some books are derived from others so their Slot and Skeleton files could be smaller than other books. There are also other CSS files that control numbering (using counters), page formatting and utilities.

PDF Generation Workflow




The workflow is
  • Content is stored as XML (CNXML) in OpenStax CNX
  • The XML is converted into HTML using a highly modified version of the Docbook transform
  • The resulting HTML plus CSS is passed to PrinceXML which generates the PDF
PDFs are generated when books are published or the content inside the book is updated.  All of the front matter is entered by hand into the PDF, but the remainder of the book is auto-generated.

8 comments:

  1. Replies
    1. The main reason is the difficulty in finding Latex developers to work on the code. HTML and CSS developers are easier to find. Most skilled Latex folks do it as part of a non-developer job. Also, styling is much easier in CSS which is very import requirement for these books.

      Delete
  2. What are the top 20 Community College Courses in the US? Are any of them developmental courses? I bet College Composition I & II are on the list.

    ReplyDelete
  3. Epigenetics doesn't change the genetic code, it changes how that's read. Perfectly normal genes can result in cancer or death. Vice-versa, in the right environment, mutant genes won't be expressed. Genes are equivalent to blueprints; epigenetics is the contractor. They change the assembly, the structure.
    njmcdirect

    ReplyDelete
  4. As Marketing students in universities worldwide, many of you might have to write a dissertation paper as a final project before you complete your bachelor's degree. See more apa literature review

    ReplyDelete
  5. The easiest way to get a non fake degree is to enroll here http://www.lifeexperienceuniversitydegree.com/non-fake-degree/ which is a leading portal of providing and offering non fake degrees.

    ReplyDelete