Friday, October 24, 2014

How We Build the OpenStax College Books



In case you are not familiar with OpenStax College (OSC), it is a part of our OpenStax family that publishes free, open, peer-reviewed, commercial quality textbooks. The initial goal is to create textbooks for the top 20 Community College courses in the US. We have 9 books completed, 4 in production and funding for the remaining books.

OSC books are available in print, so there was a technical challenge to generate a print quality PDF for each of the books.  Our earlier PDF generation pipeline uses Latex which results in nice black and white PDFs, but did not meet the OSC requirements.  To meet the OSC requirements, we are using HTML5, CSS3 and a commercial product called PrinceXML. PrinceXML is the only commercial product used by OpenStax CNX. Each of the books has a different design and collation requirement so CSS has to be created for each unique element. The books also share features so we have tried to structure the CSS so it relies heavily on the Cascading part of Cascading Style Sheets.

Another requirement is that we must build the PDFs without human intervention.  It must be totally automated.  Since users can derive copies of the OSC and modify them, we also must deal with missing or added content from the original.

Code Structure


Our CSS code is broken down into 2 main parts: Slots and Skeletons. Skeletons define the namespaces used in a book.  Slots are used to define the styles for the namespaces. Some books are derived from others so their Slot and Skeleton files could be smaller than other books. There are also other CSS files that control numbering (using counters), page formatting and utilities.

PDF Generation Workflow




The workflow is
  • Content is stored as XML (CNXML) in OpenStax CNX
  • The XML is converted into HTML using a highly modified version of the Docbook transform
  • The resulting HTML plus CSS is passed to PrinceXML which generates the PDF
PDFs are generated when books are published or the content inside the book is updated.  All of the front matter is entered by hand into the PDF, but the remainder of the book is auto-generated.

Thursday, September 18, 2014

Rewrite Technologies

Our team has been hard at work rewriting the OpenStax CNX site.  A frequently asked question is "What tech are you using?". This post covers a high level overview of the new site.

Architecture


Rewrite is a Single-Page App which means most of the logic lives in a Javascript client. The Client accesses data via REST APIs that are written in Python.  All of our data is stored in a PostgresSQL database. The details of the separate components follow.

Webview

Webview is the Javascript Client.  The basis of the Client is Backbone.js and Bootstrap.  We use several other Javascript packages as well.
Most of the Javascript is written in Coffeescript and compiled to Javascript.  CSS is developed using Less.

Webview requests json from the Archive via REST APIs.  The json contains the HTML for the content and any metadata. The Client parses the json and displays it.

Editing is a separate view in Webview.  When the user selects to edit, the views are swapped out. The new editor is based on the open-source HTML5 editor Aloha.  We have added several plugins to Aloha that are textbook editing specific.  The development on the editor was done by our team and the OERPub team with OERPub doing the bulk of the work.

We are using Nginx as our web server.

Archive

Archive stores published content and handles search. Content retrieving and search are handled via APIs written in Python. When content is requested, the json is built via stored procedures in Postgres. The json is built using the json functions in Postgres.

Search is done with optimized queries.  We are caching subject and one word searches long term and all searches short term to improve performance.  When a user pages through search results, the cached result is used to populate the next page.

Archive will run on an WSGI compatible server.  We are currently using Waitress as our server.

Publishing

The publishing application integrates with the Archive database. It allows users and third-party applications to publish content to the Archive, where it can be read and distributed to the public. Publishing is built similar to Archive, but differs in many ways. Archive is a read-only content API. Publishing provides an additional set of APIs that handle the publishing workflow, which includes user interactions like license and role (e.g. author or translator) acceptance, as well as the triggering of export files. Users of the OpenStax CNX system will typically never directly interact with this application. Almost all of the backend business logic is handled within the publishing application.

Authoring

Unpublished content is stored in Authoring in Postgres.  Authoring also has APIs used by the Editor and the Workspace. When content is published, an EPUB is generated and passed from Authoring to Publishing.  The EPUB format was selected to pass information between components because it encapsulates all of the info needed.

The Workspace is a listing of all content a user has access to edit. Books and Pages can be created in the Workspace and they can be deleted as well.

OpenStax Accounts

Users are now stored in a shared accounts component.  This was developed so users could have the same account on all OpenStax sites. All CNX users have been migrated to the new Accounts. Accounts uses OAuth so users will also be able to log in using Google, Twitter and Facebook. CNX will no longer create CNX user accounts.  New users will need to use one of their existing OAuth accounts to log in.

Logging

We are currently logging information, errors and user interactions to Syslog.  Our goal is to load the logs into Graphite so we can visually see how our site is being used.

Transformation Services

Transformation Services generates Export files(PDF, EPUB, Zip, etc.) and imports content to the editor. The initial design uses the same import and export code from Legacy CNX inside of a messaging system wrapper.

The messaging using RabbitMQ and several messaging queues. The queues will give us persistence of the file generation requests.  The requests will be sent by Publishing after content has been added to Archive. Publishing will pass an EPUB to Transformation Services that contains all of the data needed to generate the files.  

OpenStax CNX is a deceptively complex site that has many moving parts.  Our goal with this architecture was to design a component based system that can be easily updated and tested without impacting all of the site. All of our code is on Github.

Many thanks to CNX team members Michael Mulich and Derek Kent for reviewing and contributing to this post.

Friday, August 22, 2014

Tech Behind Search Improvements

Over the summer, we have made several improvements to the new version of the site (Rewrite),. One of the biggest improvements was with search. Since we released Rewrite, the performance of search has been a concern.  We did little to improve the speed before the release because of time constraints. It was decided that we needed to revisit it this summer to get things in great shape for the Fall semester.

We implemented the following changes:
  • Cache single word and subject searches - we added Memcached to store some searches.  A cron job is reloading the subject searches into the cache on a regular basis.
  • Pagination on Search Result page - previously we were displaying all of the search results on the same page. This caused a long page render time which made the slow search even slower. We are now displaying 10 items on each page. Books are displayed first since most users are looking for books. When possible, we use the cached search results to return the next page.
  • Improved SQL performance - we tweaked the SQL used for the queries to optimize them.
All of these changes are no-brainers, but have vastly improved search.

Monday, July 14, 2014

OpenStax CNX Development Tools

Over the last couple of years, we have changed our internal development tools. We do our own version of Agile development and have found these tools best meet our needs.

For Sprint planning, we use Trello.  Our team members are in many locations, so having a web-based tool to outline our Sprints has been very important. We create cards for User Stories or issues and work from the boards for Sprint planning and working on Sprints.


Our code is stored in Github. We previously ran our own SVN server, but slowly migrated all of our code to Github. It is a great tool. Our workflow for using Github is

  • Each component has a separate Repository.
  • Each Repository has a Master branch.
  • Each Repository has a production branch that contains the code currently in production.  This allows us to continue working and merging to Master, but also be able to fix problems in production easily from the production branch,
  • Developers branch off Master and code the Trello card they are working on. Once the code is completed and unit tested, the developer creates a Pull Request in Github. The Pull Request is to merge the code into Master.
  • A Pull Request triggers a code review by another team member.  Code reviews generally result in a review of the code as well as a manual test of the code.
  • Pull Requests are also unit tested using automated testing via Travis-CI
  • Once a Pull Request is approved, it is merged and the branch is deleted from Github.

Most of our meetings are held on Skype.  Skype has the simplicity of making a phone call and is mostly reliable.  We also use Google Hangouts when we need to share code or other screen sharing. It works really well, but if not as easy to start up as a Skype call.

Our team relies on IM. We have a Jabber server that some of the team uses and others use Google Talk or Hangouts.  IM is our virtual hallway and is a key part of our communication.

Wednesday, June 4, 2014

Publishing Added to OpenStax CNX Demo Site


We have recently added publishing to our Demo site.  If you are not aware, Demo is our alpha testing ground for the new version of our site,  Not everything is working yet and there are bugs remaining, but users can get a feel for what is coming soon to OpenStax CNX.

Users can

  • Create, edit and publish a new Page
  • Create, edit and publish a new Book 
  • Derive a Page or Book when viewing content
  • Derive a Page when editing a Book 
  • Publish a Book with published and unpublished Pages (the unpublished Pages get published with the Book). 
Please take a look and give us some feedback at techsupport at cnx dot org.  We would love to have you help us improve editing and publishing. The site will be updated regularly so check back often to see our progress. 

Tuesday, May 13, 2014

OpenStax College Android App Released


A new Android app that features the OpenStax College books was released on May 2nd.  The features include

  • Viewing released books
  • Saving Books or Pages to your Bookmarks
  • Take notes while reading
  • Export your notes to a text file or share them.
  • Share a Book or Page
The app is open source and the code is in Github. You can install the app from the Google Play store.

The app can be used to access these books
  • College Physics
  • Introduction to Sociology
  • Biology
  • Concepts of Biology
  • Anatomy and Physiology
  • Introductory Statistics
  • Principles of Economics
  • Principles of Microeconomics
  • Principles of Macroeconomics


From a technical point of view, the app is based off of the existing OpenStax CNX Android app.  As part of our name change, the mobile version of our site was modified to work better with the Android app.  The result is a simplified look of the content (see screenshot).

If you use the app, we would love to hear feedback either as a rating of the app or by contacting us at android at cnx.org.

Wednesday, April 23, 2014

OpenStax CNX Featured On Floss Weekly Podcast


Two OpenStax team members were featured guests on the Floss Weekly podcast this morning.  Floss Weekly covers open source software for the TWIT podcast network.  Kathi Fletcher and Ross Reedstrom told the history of the project along with a discussion of the work we are currently doing on OpenStax CNX. The podcast is available to view or download from TWIT.