Useful Features of Version Control Software

This is a follow-up to my earlier post entitled Version Control for Writers.

An article called “Git Foundations” has done a lot for me, to help explain the concept of version control, from the point of view of a writer. I’m testing out some different ways to use version control for my writing, but first I’d like to describe some of the basic concepts. (For a description of the basic commands, for a common form of version control, take a look at Git for the Lazy.)

I’m writing this for two audiences. First, for writers who might want an explanation of the value they might get from using version control software. Second, for programmers, who might want to understand the contexts in which a writer might use the tools they’ve built.

I’m new to this so, please, if anyone notices any mistakes or confusing bits, please point them out. My hope is that once I properly understand the features, I can apply them wisely.

Features

I’ve learned that these are the basic features of most version control systems. I’ll describe each in terms of how I think it would be helpful for writers. In some cases, this might differ a bit from the original purpose of the functionality.

Repository
For a writer, it might be easier to think of this as your “project.” It’s basically a group of files and directories related to a thing that you’re working on. As you work, you might add to your files, add new files, delete files, rename them and so on. Version control can help you to keep track of the history of these changes as you go. Your repository will also contain the archived, previous versions of your project, along with information about the changes you made over time. It can also help you to undo changes, or to merge things from one version of a file to another.
Commit
You may also hear this called “snapshot.” To commit is to store the current contents of the project, along with a log message from the user describing the changes. I think that for a writer there’s a difference between “hitting save” and “taking a snapshot.” Many of us hit save all the time, but that doesn’t mean we’ve hit an important milestone. The nice thing about a commit’s log message is that, when we’re ready to commit, we can also make a note of what it is we were working on. Example messages might be: “I fixed all the verb tense agreements” or “I changed a character’s name from Darby to Darcy” or “responded to editors comments from an e-mail I got yesterday.” These log messages can go a long way to help explain the archive, for later use.
Working Files
These are the files you’re currently working on, as opposed to some previous or alternate version of the files. Those are saved elsewhere in your repository.
Branch
Any copy of your project is called a “branch.” Your working copy of a project is a branch. Let’s say you want to start working on a second edition of your book: that’s a “branch” of your book. Your old first draft: that’s a branch, too. Your editor’s marked-up copy of your text is a branch. Version Control can enable you to move from branch to branch, like the smart monkey that you are.
Tag
Any copy of your project is called a branch, but a “tag” is a special “version.” In the computer world a tag is named with a [version number](http://en.wikipedia.org/wiki/Software_versioning#Incrementing_sequences) like 1.0 or 2.2.5. For a writer, it might help to think of this as a “milestone” for an unpublished work or even an “edition” for a published work. For example, perhaps you sent a manuscript off to an editor to be considered for publication. That copy is a special version, or tag. Nevertheless, it may not get published and you may want to keep working on a new copy or branch of it, while it is being considered.
Diff
Remember in English class when your teacher gave you two poems, asked you to read them and to write an essay to compare and contrast the two? Perhaps in a more advanced class, you were given two drafts, editions or translations of the same text and asked to do the same? In those later cases, the differences can be quite granular sometimes. Lucky for us, the programmers have already devised tools that make it easy to do the compare/contrast, without having to re-read both versions and without writing a whole essay about it. The differences can be clearly marked for your consideration. Some word processors already know how to compare one document to another but Diff can also compare groups of files to each other.
Merge
You’ve given out six copies of your draft to your writer’s group. Each member of the group made different marks on a different copy (or branch). Now you have to combine the edits, or not combine some of them, into the working copy of your project. With the help of diff this can be done more efficiently: when there’s a change you want to adopt, you can merge it into your working copy.

Limitations

Both word processors and version control systems have some limitations.

Track Changes and “Save As”There’s a lot more information online about the limitations of Microsoft Word and the limitations of word processors in general, but for now, suffice it to say that they only have very simple version control abilities: you can track (linear) changes, and you can “save as” and you can compare up to two different versions, but without much information about the relationship between them.Text-OnlyThe version control functionality described above works best on files that are text only. (There are exceptions to this, but I don’t understand them yet.) The files your word processor saves by default, although they contain text, are not stored in a text-only format. I venture to guess that 99% of writers aren’t willing to give up their word processor and its format, which is often required of them by their editors and publishers. For more about this limitation, read [The Limitations of GitHub for Writers](The Limitations of GitHub for Writers)

Next Steps

My goal is to find a workable, user-friendly way to use these powerful tools as part of my writing process.

This week I’ve been experimenting with different tools designed for version control. I’ve tried two of them, Flashbake and Git-Annex Assistant. I’d like to describe my experience with these, from a writer’s point of view.

Once I understand the basic features, then I’ll post about the different types of users who work with electronic documents, including writers, editors, publishers and archivists. Then, I can try to measure: how well can the features work for these users?

5 responses to “Useful Features of Version Control Software

  1. I stumbled across your posts on Git from your tweet. I’m thrilled to see a writer using Git–I’m a programmer who loves Git and thinks a lot of people miss out on its power.

    You might take a look at Pandoc; it’s an open-source markdown converter that supports converting to more than just HTML. To quote the documentation, “it can write plain text, markdown, reStructuredText, XHTML, HTML 5, LaTeX (including beamer slide shows), ConTeXt, RTF, DocBook XML, OpenDocument XML, ODT, Word docx, GNU Texinfo, MediaWiki markup, EPUB (v2 or v3), FictionBook2, Textile, groff man pages, Emacs Org-Mode, AsciiDoc, and Slidy, Slideous, DZSlides, or S5 HTML slide shows.” That list makes me feel better about lock-in. Oh, and notice that it can convert markdown to Word docs (both kinds!) and RTF. Maybe that would make it more acceptable in place of Word?

    Another tool you might take a look at would be LyX. It’s a GUI-based, cross-platform application that works with LaTeX, which is another document markup format that’s common in mathematics. Handy thing is, LaTeX is text-based, though not as pretty as markdown, and so can be version controlled. I’ve not used LyX much myself (as I tend to either write Pandoc markdown or straight LaTeX), but it might be interesting.

    Your comments on tags didn’t sound exactly like the way that I would have used them. I don’t think that you’d need a branch after you tag a version. You could keep working on the current branch, and only if revisions came in would you need to branch off the tagged version. I’m not sure if you know, but git branch can branch from any commit, not just the most recent.

    Thanks for the posts. It’s good to hear about how other people (non-programmers) use Git.

  2. Andy,

    Thanks for your informative comment! I’m considering LyX as an alternative to Word. I’m glad you pointed it out.

    I do think that some combination of Pandoc and Git would make it very easy for a writer to take full advantage of version control. For this, and also if one were to switch to software like LyX: is there a way to batch-convert lots of old Word docs into one of the Pandoc formats? Do they have to be .docx files or can Pandoc work with .doc files?

  3. Andy, thanks for coming back around!

    I gave up on LyX pretty quick actually. It took entirely too long to start up on my machine, as compared to a simple text editor. Besides that, I did start to worry about conversion to other formats. In addition to that, I thought that LaTeX might be overkill for me. I’m already very proficient with HTML and for now that suits me nicely.

    I did try the Pandoc route (via Textutil). I had some trouble learning to do the conversion in large batches, but after a night or two I think I’ve managed to scrape together a semi-workable solution. It worked for me, anyway. Others in my situation? Maybe they’d want something much friendlier. I added a new post with details here:
    http://nocategories.net/?p=2890

    my (sloppy, novice) code is up on Github now, as a way for me to learn that as well: https://github.com/dylan-k/Palabra

    So far, I find Markdown to be useful but I’m having trouble choosing from among all the “flavours”. I’m fond of the metadata headers provided for by multimarkdown but my poetry files would prefer GitHub style markdown over the default. Adding two extra spaces at the end of every manually broken line for hundreds of poem files? No thanks, but I’ll do it if I have to. I’m unsure: how do you indicate which flavor of markdown you’ve chosen, or does it matter?

    For GUI software, I’m trying Sublime Text, since it it cross-platform and seems like something that can grow along with me thanks to all its packages. I’m also tinkering with ByWord, Ulysses and Scrivener but I seem to be gravitating toward Sublime Text.

  4. I’m starting to agree with you that my interpretation of tags and branches here may be off the mark a bit. I’ve run into an interesting use-case and now I’m unsure whether to use branches, forks, subrepository, subtree, selective checkout or something else entirely!

    I have, for example, a repository that contains 200 .txt files. Each one is the text of a poem. Now, I’d like to set aside 20 of those files, for example, to create a manuscript. While I’m working with those 20 files and thinking of them as a group, I may make changes to them. I would like, of course, for those changes to be reflected in my “master” repository of all 200 files. Is there a way to manage such a thing with Git?

Comments are closed.