Archiving old GIT projects on Beanstalk or Github

EDIT: re-written to make it much clearer what I’m trying to achieve, and why.

Why Archive? Why not Archive?

The pricing model for hosted git has settled down to:

  1. Pay per month
  2. …for an upper limit on the number of active repositories
  3. ……measured by simultaneously limiting the number of “projects” (separate git repositories) and “people” (user accounts that are allowed to access projects)

Their aim seems to be: charge for peak concurrent usage, rather than for total historical usage.

For instance: if you have 20 user accounts allowed, and you use all of them, then delete 10, you can create 10 new ones. The vendor will NOT delete all the history of the deleted accounts – they just won’t allow you to login as those users any more.

This probably is setup that way to make sure:

  1. their revenue scales with their costs – these days, with scalable hardware costs, that’s straightforward.
  2. their prices scale with the budget-size of their customers

Some SAAS vendors selling at the same kind of price level / model allow this same “disabling” function on whole projects, not just on people. That enables you to e.g.:

  1. Work on 1 new project day-to-day
  2. Have 10 “old” projects that are no longer active (previously shipped)
  3. Reserve the right to temporariliy activate any ONE of the 10 – e.g. to enact a quick bugfix / maintenance release
  4. …while only paying for “2 simultaneous projects”

This works fine – the resource usage is closer to that of a company with only 2 active projects than it is to a company with 12 active projects, and the price you’re able to pay is too.

Unfortunately, at the moment neither Beanstalk nor Github offer this – although they’re both great git-hosting services.

Archive options

In practice, we need to support these use-cases:

  1. Old project that MIGHT still be around in a local repository needs small tweaks and a quick re-launch: typically only 1-3 files need to be edited.
  2. Very old project that definitely isn’t in a local repository any more: ditto
  3. New project needs to solve a problem that was previously solved in an old project: need read ONLY access to the full project to revise “how we fixed this last time”

Before signing up to a git host, I asked each of them how they coped with these use-cases (in some detail); each company responded with, essentially:

We don’t have any support for this. Best we can sugest: copy all old projects into a single repository

Archiving git projects via copy/paste

What happens when you try to do this?

Well, for a start, you can’t “just copy” the contents of one git project into another.

Git uses hidden directories to manage its source control, and is hugely reliant upon them. This causes a handful of problems relating to “is file X still file X?”, one of which is this one of copying between repositories.

Naively, you’d try to do this:

  1. Create a global “archive” repository where you will move *all* old projects (this was initially recommended to us by Beanstalk/Github)
  2. PULL the latest copy of the git repos you want to archive
  3. MOVE the root directory into your “archive” repository
  4. PUSH the “archive” repository to the git-host

In practice, what happens is:

  1. Every modern Operating System moves the .git hidden directory too
  2. …so you fail to do a checkin (and at this point: a lot of the current Git GUI clients will break in interesting ways; it’s a good test for a new client if you’re considering buying one)
  3. …so you fail to do the PUSH

Copy/pasting by discarding the history

EDIT: if git-archive works for you, I’d use that instead. Everyone I know who’s used it has had at least some problems, so I’m leaving this section for now – but scroll down to the next section and check git-archive too.

On anything unix-based (i.e. linux + OS X) the simple path is to use the command-line (or “terminal” as OS X calls it)

  1. “cd [the root directory of your project you want to archive]”
  2. “cd ..”
  3. “cp -R [the directory of project to archive] [the root directory of the “archive” repository]/[name of the project you want to archive]”
  4. “cd [the root directory of the “archive” repository]/[name of the project you want to archive]”
  5. “chmod -R u+rw .git” (otherwise you’ll have to say “yes” to every individual file delete)
  6. “alias rm=rm” (otherwise you’ll have to say “yes” to every individual file delete)
  7. “rm -R .git”

…then, in your git-client:

  1. COMMIT the “archive” repository

…then, in your git-host service:

  1. DELETE the old project

The key points here:

  1. You’re copying the repository, not moving it – so the original is unaffected (if you don’t have to delete it yet, you might as well leave it intact)
  2. You’re removing all git status from the files: it becomes a virgin archive
  3. DISADVANTAGE: you’re throwing away (deleting) all history for the old project.

Using Git Archive

Git archive isn’t perfect (archiving is a complex task, and from what I’ve seen git archive doesn’t cover every use-case). I’ve met a couple of people who’ve tried it and given up (e.g. because it didn’t support submodules), but it might work for you: Worth a try.

Other alternatives

In furture, I’m going to try out some of the many other alternatives listed on the SO page linked above.

7 thoughts on “Archiving old GIT projects on Beanstalk or Github

  1. Hidden files are apparently a very successful idiot proofing device since they appear to have completely baffled you. Seriously, as barriers to entry go this one is barely even there and yet you still manage to stumble over it.

    You want to know how to archive a git repo? `tar czf`. Unarchive? `tar xzf`. Done. This is *nix. It’s not rocket science.

    And… Linus is ignorant or lacking in skill? What… I don’t even… That’s the most painfully, facepalmingly ironic thing I’ve heard in a very long time.

  2. “Hidden files are apparently a very successful idiot proofing device since they appear to have completely baffled you.”

    If that’s how this comes across, then this post needs re-writing.

    “You want to know how to archive a git repo? `tar czf`.”

    That gives you a file list that can’t be looked at, files that can’t be pulled out, and changes that can’t be diff’ed. If you receive a *possible* bug report against an old project, you want to do these things first, quickly.

    Copying the repos preserves all of those, because you still have the complete file structure on your website easily navigable and diffable (and copy/pasteable) direct onto other projects, or into an IDE window.

    I’ll re-write the blog post, make this clearer, since I’ve obviously failed to explain properly.

  3. To give a concrete example, we have a recent project which contains multiple gigabytes of source assets. Having to download, de-archive, and then delete that much data just to e.g. view a single text file … would be an exercise in frustration.

    In similar circumstances, I’ve worked on one which had 20+ gigabytes of source assets, all of which had to be in source-control.

    “Unarchive? `tar xzf`. Done. This is *nix. It’s not rocket science.”

    Theoretically yes, but in a lot of situations it’s not really helpful.

  4. “That gives you a file list that can’t be looked at, files that can’t be pulled out, and changes that can’t be diff’ed.”

    Actually, ou can do all of those things with tarballs. You should really make at least a passing attempt to learn about something before you make claims about it. When you don’t, as evidenced here, you just sound completely ignorant. Are you sure you’re a programmer? Why are you using git if you can’t be bothered to obtain a basic competence in it or the linux philosophy on which its design is based?

    If you’re not archiving to save space as well as github private repo slots, just put your entire git directories somewhere. That’s it. One step. If you want, you can even host them yourself with gitolite, gitosis, or a number of other open source tools. This takes about 30 minutes to set up.

    Git repos are entirely self contained. That’s the genius behind the .git directory, a not-so-subtle point that you seem to have completely missed. This makes it completely trivial to archive a git repository with standard POSIX tools (like the “tape archive” tool, the one expressly designed for creating archives). Yet, unbelievably, here you are complaining about how difficult it is. Face. Palm.

    Oh, and by the way, if you’re going to do this incredibly stupid “copy and `rm .git`” thing, at least use the -f flag to rm rather than wasting 4 steps reinventing it yourself.

Leave a Reply

Your email address will not be published. Required fields are marked *