EDIT: re-written to make it much clearer what I’m trying to achieve, and why.
Why Archive? Why not Archive?
The pricing model for hosted git has settled down to:
- Pay per month
- …for an upper limit on the number of active repositories
- ……measured by simultaneously limiting the number of “projects” (separate git repositories) and “people” (user accounts that are allowed to access projects)
Their aim seems to be: charge for peak concurrent usage, rather than for total historical usage.
For instance: if you have 20 user accounts allowed, and you use all of them, then delete 10, you can create 10 new ones. The vendor will NOT delete all the history of the deleted accounts – they just won’t allow you to login as those users any more.
This probably is setup that way to make sure:
- their revenue scales with their costs – these days, with scalable hardware costs, that’s straightforward.
- their prices scale with the budget-size of their customers
Some SAAS vendors selling at the same kind of price level / model allow this same “disabling” function on whole projects, not just on people. That enables you to e.g.:
- Work on 1 new project day-to-day
- Have 10 “old” projects that are no longer active (previously shipped)
- Reserve the right to temporariliy activate any ONE of the 10 – e.g. to enact a quick bugfix / maintenance release
- …while only paying for “2 simultaneous projects”
This works fine – the resource usage is closer to that of a company with only 2 active projects than it is to a company with 12 active projects, and the price you’re able to pay is too.
In practice, we need to support these use-cases:
- Old project that MIGHT still be around in a local repository needs small tweaks and a quick re-launch: typically only 1-3 files need to be edited.
- Very old project that definitely isn’t in a local repository any more: ditto
- New project needs to solve a problem that was previously solved in an old project: need read ONLY access to the full project to revise “how we fixed this last time”
Before signing up to a git host, I asked each of them how they coped with these use-cases (in some detail); each company responded with, essentially:
We don’t have any support for this. Best we can sugest: copy all old projects into a single repository
Archiving git projects via copy/paste
What happens when you try to do this?
Well, for a start, you can’t “just copy” the contents of one git project into another.
Git uses hidden directories to manage its source control, and is hugely reliant upon them. This causes a handful of problems relating to “is file X still file X?”, one of which is this one of copying between repositories.
Naively, you’d try to do this:
- Create a global “archive” repository where you will move *all* old projects (this was initially recommended to us by Beanstalk/Github)
- PULL the latest copy of the git repos you want to archive
- MOVE the root directory into your “archive” repository
- PUSH the “archive” repository to the git-host
In practice, what happens is:
- Every modern Operating System moves the .git hidden directory too
- …so you fail to do a checkin (and at this point: a lot of the current Git GUI clients will break in interesting ways; it’s a good test for a new client if you’re considering buying one)
- …so you fail to do the PUSH
Copy/pasting by discarding the history
EDIT: if git-archive works for you, I’d use that instead. Everyone I know who’s used it has had at least some problems, so I’m leaving this section for now – but scroll down to the next section and check git-archive too.
On anything unix-based (i.e. linux + OS X) the simple path is to use the command-line (or “terminal” as OS X calls it)
- “cd [the root directory of your project you want to archive]”
- “cd ..”
- “cp -R [the directory of project to archive] [the root directory of the "archive" repository]/[name of the project you want to archive]”
- “cd [the root directory of the "archive" repository]/[name of the project you want to archive]”
- “chmod -R u+rw .git” (otherwise you’ll have to say “yes” to every individual file delete)
- “alias rm=rm” (otherwise you’ll have to say “yes” to every individual file delete)
- “rm -R .git”
…then, in your git-client:
- COMMIT the “archive” repository
…then, in your git-host service:
- DELETE the old project
The key points here:
- You’re copying the repository, not moving it – so the original is unaffected (if you don’t have to delete it yet, you might as well leave it intact)
- You’re removing all git status from the files: it becomes a virgin archive
- DISADVANTAGE: you’re throwing away (deleting) all history for the old project.
Using Git Archive
Git archive isn’t perfect (archiving is a complex task, and from what I’ve seen git archive doesn’t cover every use-case). I’ve met a couple of people who’ve tried it and given up (e.g. because it didn’t support submodules), but it might work for you: Worth a try.
In furture, I’m going to try out some of the many other alternatives listed on the SO page linked above.