Thursday, September 18, 2014

Rewrite Technologies

Our team has been hard at work rewriting the OpenStax CNX site.  A frequently asked question is "What tech are you using?". This post covers a high level overview of the new site.

Architecture


Rewrite is a Single-Page App which means most of the logic lives in a Javascript client. The Client accesses data via REST APIs that are written in Python.  All of our data is stored in a PostgresSQL database. The details of the separate components follow.

Webview

Webview is the Javascript Client.  The basis of the Client is Backbone.js and Bootstrap.  We use several other Javascript packages as well.
Most of the Javascript is written in Coffeescript and compiled to Javascript.  CSS is developed using Less.

Webview requests json from the Archive via REST APIs.  The json contains the HTML for the content and any metadata. The Client parses the json and displays it.

Editing is a separate view in Webview.  When the user selects to edit, the views are swapped out. The new editor is based on the open-source HTML5 editor Aloha.  We have added several plugins to Aloha that are textbook editing specific.  The development on the editor was done by our team and the OERPub team with OERPub doing the bulk of the work.

We are using Nginx as our web server.

Archive

Archive stores published content and handles search. Content retrieving and search are handled via APIs written in Python. When content is requested, the json is built via stored procedures in Postgres. The json is built using the json functions in Postgres.

Search is done with optimized queries.  We are caching subject and one word searches long term and all searches short term to improve performance.  When a user pages through search results, the cached result is used to populate the next page.

Archive will run on an WSGI compatible server.  We are currently using Waitress as our server.

Publishing

The publishing application integrates with the Archive database. It allows users and third-party applications to publish content to the Archive, where it can be read and distributed to the public. Publishing is built similar to Archive, but differs in many ways. Archive is a read-only content API. Publishing provides an additional set of APIs that handle the publishing workflow, which includes user interactions like license and role (e.g. author or translator) acceptance, as well as the triggering of export files. Users of the OpenStax CNX system will typically never directly interact with this application. Almost all of the backend business logic is handled within the publishing application.

Authoring

Unpublished content is stored in Authoring in Postgres.  Authoring also has APIs used by the Editor and the Workspace. When content is published, an EPUB is generated and passed from Authoring to Publishing.  The EPUB format was selected to pass information between components because it encapsulates all of the info needed.

The Workspace is a listing of all content a user has access to edit. Books and Pages can be created in the Workspace and they can be deleted as well.

OpenStax Accounts

Users are now stored in a shared accounts component.  This was developed so users could have the same account on all OpenStax sites. All CNX users have been migrated to the new Accounts. Accounts uses OAuth so users will also be able to log in using Google, Twitter and Facebook. CNX will no longer create CNX user accounts.  New users will need to use one of their existing OAuth accounts to log in.

Logging

We are currently logging information, errors and user interactions to Syslog.  Our goal is to load the logs into Graphite so we can visually see how our site is being used.

Transformation Services

Transformation Services generates Export files(PDF, EPUB, Zip, etc.) and imports content to the editor. The initial design uses the same import and export code from Legacy CNX inside of a messaging system wrapper.

The messaging using RabbitMQ and several messaging queues. The queues will give us persistence of the file generation requests.  The requests will be sent by Publishing after content has been added to Archive. Publishing will pass an EPUB to Transformation Services that contains all of the data needed to generate the files.  

OpenStax CNX is a deceptively complex site that has many moving parts.  Our goal with this architecture was to design a component based system that can be easily updated and tested without impacting all of the site. All of our code is on Github.

Many thanks to CNX team members Michael Mulich and Derek Kent for reviewing and contributing to this post.