shed - the shared editor

This past january, as part of my efforts for tutoring, I put together a collaborative code editor to use with the people I’m working with. I had been using a combination of pramp (2 hour time limit), codepad (buggy implementation) and google docs (no code highlighting, no code running) for remote sessions, but after running into enough hitches, I decided I wanted to self host my own. A demo is available at https://code.algorithm.city

The project took about 2 weeks: the first few days were spent researching what sort of algorithms are used for collaborative editing, followed by trying them out until I found one that works well for me.

The short summary is that I used ot.js + codemirror + docker to get a simple editor working, the following post will detail some of the libraries I tried out and why they didn’t work for me.

Picking a synchronization algorithm

After reading through the answers on stack overflow, I started by trying out automerge (a simple CRDT library) with ace editor. There were quite a few issues, but the biggest was: when synchronizing content, the whole editor would get reset and the cursor would jump around. The second major issue is that CRDT requires the full history of an object to work properly and storing each CRDT object in SQLite was a pain.

Realizing that I needed to insert content instead of reset the whole editor’s content, I moved on to using google’s diff-patch-match algorithm. The benefit is that changes can be incrementally applied, retaining the client’s cursor positions. I spent a while reading about diff-patch-match and watching neil’s excellent google tech talk on its implementation

However, one of the weaknesses of diff patch match is that only one change can be in flight at a time. This made realtime a problem for me, as you couldn’t see what the other person was typing, nor would it stand up to synchronizing every few seconds.

After trying out repl.it’s multiplayer mode with a friend and seeing that it worked really well, I got a bit discouraged with the diff patch match algorithm. I looked into what repl.it was using and saw that the data being sent over the wire looked like Operational Transforms and decided to try out OT myself.

I looked at ot.js, but found the documentation / examples hard to follow. After some searching, I ran across a good demo that I was able to adapt. The demo uses socket.io + codemirror to enable collaborative editing, as well as shared cursors and highlighting. Once I had a working demo, it was straight forward to adapt it to what I wanted to do.

Running remote code

Aside from collaborative editing, the other main portion of a collaborative environment is the ability to run the code in the editor. To do so, I used a simple Docker image based on alpine linux that would run the code inside a VM.

The nice part of using Docker is that you can add a time limit, CPU quota and memory usage quota to a given docker instance, so a given process can’t starve the machine or hog RAM.

The code and input is passed to Docker via stdin and separated by a ^D character. Inside the docker image, the code and input is separated and passed to the code runner for that language and stdout/stderr are then piped back out.

All in all, using Docker and VMs makes this task much easier than it used to be: previously, I would have had to figure out a safe way to sandbox code so it wouldn’t mess up the machine. Now I only have to worry about docker exploits and whether someone can break out of the docker instance (technically, they can if they use zero-day exploits, but I trust most people will not be able to)

Status

Currently, I use shed on a weekly basis for algorithms tutoring and for the most part it works really well, but one person I work with in India tends to have problems with it: we haven’t figured it out - maybe it is latency, maybe it is their browser, maybe it is buggy code. I’m looking forward to ironing those bugs out, though. If you use shed and spot any bugs, please file them on the github - it will help me greatly.

okay

Picking a synchronization algorithm

Running remote code

Status