Version Control for Writers

update: I’ve added some links to related ideas, applications, etc. I also have a follow-up post, with more details about the features of version control software, from a writer’s point of view.

Every so often, when I’m between projects, I start thinking less about what I write and I think more about how I write. One of my ongoing projects is to digitize a lifetime worth of my grandmother’s writing, so I’m also thinking a about how she wrote.

Multiple Drafts

For each of her short stories, for example, I have inherited multiple versions: each one typed out (often in duplicate) and many with dates. This makes it very easy for me, in posterity, to follow the development of her work, find the most recent versions and so on. I wonder: how might I organize my work in a similar, computerized way? It seems to me that the old “save as” trick is not very much more efficient than my grandmother’s habit of using carbon paper. In 2013, surely there are some more sophisticated tools for storing and comparing multiple drafts, or versions, of a written document. Might I use those tools, to study her work, or to keep track of my own? Wouldn’t others want tools like these? Authors, editors, literary scholars, archivists: all eventually have to do work with multiple versions of a text.

It’s called “Version Control.”

A recent introduction to version control says:

What is version control, and why should you care? Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. Even though the examples show software source code as the files under version control, in reality any type of file on a computer can be placed under version control.

Not every writer does care about version control. English fantasy writer Terry Pratchett said, “I save about twenty drafts — that’s ten meg of disc space — and the last one contains all the final alterations. Once it has been printed out and received by the publishers, there’s a cry here of ‘Tough shit, literary researchers of the future, try getting a proper job!’ and the rest are wiped.” I believe, to the contrary that some writers, such as lawyers for example, should not be so inclined to say “tough shit” to the readers of the future who might wish to know how some documents may have evolved, but I digress. (I could digress further with: Leaves of Grass or Piers Plowman or William Blake.) My point is that if it became easier to manage and store the drafts, then perhaps more writers would be inclined to do so, and there would be less “tough shit” later on, for anyone interested in the writer’s process.

If you’re completely new to the idea of version control, you may benefit from reading Tom Preson-Werner’s fable/introduction to Git, which is a popular kind of version control.

Flashbake

I am absolutely not the first writer to think about computerized version control. (In coming days, I’ll add to this post with links to similar conversations.) In fact, I owe a lot of my thinking on the subject to Cory Doctorow (whose blog’s comments provided the above Pratchett quote) and to Thomas “cmdln” Gideon (host of the fabulously nerdy Command Line podcast). Together, they are the authors of a piece of software called “Flashbake”. Doctorow’s post on Boing Boing about Flashbake provides an excellent introduction to the software, its strengths and weaknesses and the motivations behind its creation.

I was prompted to do this after discussions with several digital archivists who complained that, prior to the computerized era, writers produced a series complete drafts on the way to publications, complete with erasures, annotations, and so on. These are archival gold, since they illuminate the creative process in a way that often reveals the hidden stories behind the books we care about. By contrast, many writers produce only a single (or a few) digital files that are modified right up to publication time, without any real systematic records of the interim states between the first bit of composition and the final draft.

In another article, Doctorow elaborated on the many other benefits he enjoys while writing with version control.

Now, this may be of use to some notional scholar who wants to study my work in a hundred years, but I’m more interested in the immediate uses I’ll be able to put it to — for example, summarizing all the typos I’ve caught and corrected between printings of my books. Flashbake also means that I’m extremely backed up (Git is designed to replicate its database to other servers, in order to allow multiple programmers to work on the same file). And more importantly, I’m keen to see what insights this brings to light for me about my own process. I know that there are days when the prose really flows, and there are days when I have to squeeze out each word. What I don’t know is what external factors may bear on this.

In a year, or two, or three, I’ll be able to use the Flashbake to generate some really interesting charts and stats about how I write: does the weather matter? Do I write more when I’m blogging more? Do “fast” writing days come in a cycle? Do I write faster on the road or at home? I know myself well enough to understand that if I don’t write down these observations and become an empiricist of my own life that all I’ll get are impressionistic memories that are more apt to reflect back my own conclusions to me than to inform me of things I haven’t noticed.

Gideon provided some even more detailed notes about the Flashbake software on his blog and podcast.

So, why don’t I just install this “flashbake” software and simply move on with my writing and my projects? Well, that’s a fair question. The trouble is that, well, this version control stuff is pretty complex stuff, compared to the average word processor. As a writer whose day job is managing websites, I’m up to the challenge, but while I’m at it, I wonder whether I can do anything to make this easier for others. The Lifehacker article about Flashbake gives a “nerd alert” before listing some substantial knowledge prerequisites:

Flashbake is a command-line system for advanced users, and requires a Linux-like shell like Cygwin for Windows or Mac OS X’s built-in Terminal. It is most definitely not for folks looking for something like Microsoft Word’s versioning. It is, however, for people who make heavy use of plain text files, don’t mind firing up the terminal and running a script or two, and know what cron is. Since Flashbake is an interface to Git written in Python, you’ll need all three installed to get this party started.

Why don’t I use one of the many workarounds available out there? Because most of them are for linear workflows, or they don’t store enough versions, or they’re 100% cloud-based and I’m afraid they’ll go offline and take my work with them when they go.

Surely there’s a middle way that balances power, usability and (long-term) access to the documents.


I’ve posted this idea in a few other places as well…

Useful links I found along the way…

Writing in Version Control — Andy Taylor
The Limitations of GitHub for Writers – ProfHacker – The Chronicle of Higher Education
Simplifying Writing Workflow » Linux Magazine
AmigoFish: The Command Line Podcast (MP3 Feed): TCLP 2009-02-25 Flashbake (Comment Line 240-949-2638)
OpenOffice.org Document Version Control With Mercurial : David R. Heffelfinger
iPhylo: Setting up a local Wikisource
BioStor
FromThePage
Words and what not
Keeping Your Life in Subversion – O’Reilly Media
msofficehg – Microsoft Office (Excel, Word, PowerPoint) add-ins that assist document version control with TortoiseHg – Google Project Hosting
Version Control for Microsoft Word Collaborative Writing | MacResearch
Inundata – How to ditch Word
Sublime Text 2 and Markdown: Tips, Tricks, and Links
vcs-home
Working with Git on Windows • Beanstalk Guides
Git Magic – Preface
Inside the Leviathan – James Fallows – The Atlantic

13 responses to “Version Control for Writers

  1. You should ditch the wordprocessor. It is only slowing you down. Take the time to learn git. I bet you can use it very successfully using only git add, git commit, and git revert. Use it with a GUI, if you’d like.

    I’ve been writing for several years using markdown + git and I couldn’t be happier. Plain text is the only future-proof solution.

  2. Mica,

    I completely agree with you that plain text is the only future-proof solution. For my part, I think I will ditch the wordprocessor and learn git. I learned markdown in all of ten minutes, since I’m already very familiar with HTML. (besides, there are lots of great markdown-friendly apps out there) My concern, though, is for what is arguably the majority of writers out there for whom it just isn’t practical to ditch the wordprocessor. What about working with their writings to compare versions, .etc. Is one of the word processor file formats better for Git than the binary formats?

  3. Why do you say that “plain text is the only future-proof solution”? I’m not challenging that; I’m just curious.

    At any rate, Markdown *isn’t* plain text. Despite the punny name, it’s a markup language, just like HTML or wiki markup or what-have-you. It happens to be fashionable and well supported at the moment, but that’s no guarantee that it won’t fall by the wayside as technologies/needs/fashions change.

  4. I meant file format. Xml html and many newer word processing formats: they all boil down to a plain text file.

    As for archival-quality markup, you’re exactly right. I’m not sure there’s a golden standard but there are some candidates, such as TEI XML, but again those prove difficult for average writers.

  5. Anyway, Markdown is pretty limited. ASCII only means no diacritics or non-Latin characters. That rules out most languages other than English—heck, it rules out a lot of English words (if you want to spell them properly, anyway). And en-dashes and em-dashes. And typographical quotes. And even basic mathematical symbols. And lots of other things that make printed language readable and understandable.

    If those aren’t a problem for your writing process, I’m sure it can be useful as a workflow tool. But I don’t see it as an appropriate format for a final edit ready for layout.

    Then again, I’m definitely no Markdown expert, so I would be happy to be proven wrong.

  6. Interesting; I didn’t know that.

    As a writing tool, Markdown certainly seems worth a try—you never know for sure what’ll work until you try it.

    You might want to have a look at this book, which is on GitHub and written in Markdown (which can be “compiled” to HTML, ePub, etc.). It’s a technical book, and collaboratively edited, so probably not a perfect analogue for the application you have in mind—but it was my first real exposure to a book being written that way, and I thought it was pretty cool.

    (It’s also quite a good introduction to Backbone; I’ve been working my way through it.)

  7. Yeah, ASCII-only is a bad idea in 2013. Fortunately, John Gruber knew that already for a long time, so current markdown tools can simply use Unicode (UTF-8). Problem solved.

  8. His introduction is enjoyably themed, and the “day” structure he uses for his explanation might be a really great jumping-off place for further exploration about the questions I’m raising here.

  9. So I’m giving Markdown a try and so far, so good… so much so that I thought: what if I wanted to convert years worth of .doc, .docx and .rtf files to plaintext or markdown? I definitely want to do that in a batch. Textutil for OS X does a great job of converting to .txt but for markdown? I’m learning Pandoc, but it doesn’t like my non utf-8 documents 🙁 I’ll keep plugging at it.

Comments are closed.