Wednesday, 21 January 2009

Shallow branches or history horizons

There is an idea floating around and I'm curious to see if it is an idea that has merit and worth putting effort into. This idea is in the DVCS space and is called "shallow branches" or "history horizons".

The concept itself is pretty simple. When using a DVCS with a project with a long history, each and every user has a copy of this history. Now much of this history may be ancient (for some definition of ancient, 6 months, 6 years, whatever). Most developers will never have a need to go into the ancient history of a project, and so a truncated history is fine as long as their branches that they create are still mergable with the main repository.

Here's how it could play out:

  • Bob wants to work on the fooix project to fix a minor bug, this is Bob's first look at the fooix source. The fooix project has been around for eons and has a huge history. Bob doesn't care about the history, he just wants to do his simple fix (think a typo).

  • Bob grabs the fooix trunk branch but only gets enough history to create the working files.

  • Bob makes his fix, and publishes his branch for the fooix developers to grab.



The advantage here is that when Bob grabs his branch, he is only getting just enough history to work, and so his resulting repository is smaller and faster.

Commands that worked by inspecting the history would stop at the repository's horizon and say something like "and that's all I've got". Obviously there'd need to be a way to say "go and get me another 4 months of history" or even "ok, now I'm really interested, get me the complete history".

This is conceptually different from a lazy loading or stacked repository as there is an explicit horizon where normal history commands stop.

So lazyweb, the question I have is this: "Is this a worthwhile feature in a DVCS tool?"

14 comments:

ChrisW said...

Yes, worth it, but I can see problems.

Take this:

http://svn.zope.org/Zope/trunk/Extensions/README.txt?rev=24563&view=markup

...it hasn't changed in eons of revisions. What would you do in this case?

Also, what would you do in the case where some lines in a file (and they're usually whitespace or boilerplate!) haven't changed since revision 1?

Surely, in these circumstances, your method ends up dragging down a whole heap of revisions again?

ddaa said...

Fuck yes! History horizons are worthwile.

As ChrisW noted, it would be *great* to have the ability to download incomplete past history. For example, if I want to do "bzr blame Extensions/README.txt", I just need a bunch of old revisions, inventories, but not all their file texts: I just need file texts for README.txt.

I am sure there are a lot of ways like that where history downloading can be made "lazy". But plain simple history horizons with shallow branches would already be killer.

Chris Double said...

Git has something similar, whereby you can clone a branch at a specified depth of revisions:

"Create a shallow clone with a history truncated to the specified number of revisions. A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it), but is adequate if you are only interested in the recent history of a large project with a long history, and would want to send in fixes as patches."

The limitations there are annoying in that you can't publish your repository, you can only use it to generate patches. Still it's proven useful to me when I don't want to pull the entire codebase down just to do a single patch to be reviewed.

Darcs has 'partial branches' which are also similar. You can clone a repository pulling only patches up to the last snapshot (a tag usually). It also has limitations on what you can do with the resulting partial repository. With Darcs 1.0 I always had problems with partial repositories getting corrupted but I'd hope Darcs 2 has solved those.

Even with the limitations in the functionality in these tools I've found it useful so think it would be a useful feature.

pascal-bach said...

I now am waiting some time for this to be implemented. Most of the time I don't need the whole history if I want to hack on a project. So if I don't use it why should I download it? It only takes me time to do so!

But I think it is important that a branch with truncated history can be either promoted to a full branch or does have some sort of lazy loading in cases where more history is needed to perform a task.

In an ideal case you should be able to do anything you can with a normal branch with your shallow branch too.

Jez Higgins said...

Yes, would love it.

My primary use of bzr at the moment is as a front end for svn. It's does a cracking job. Grabbing the initial checkout can take an age though. If you're on an unreliable network (like a crappy VPN say) you can be in for a load of frustration.

Being able to grab the last X revisions, or Y months, or whatever, would be fantastic. If there was a mechanism to reach back further on-demand that would be even better, but I'd view that as a bonus.

Jamu Kakar said...

I would love this. I often bzr branch lp:$project when I'm curious about $project. I really don't care about the history at this point; I just want to look at the code.

It would be nice if there were a way to make shallow branches the default. I guess you could alias 'bzr branch' to 'bzr branch --horizon-revision -2' or something. It would be cool if the horizon revision could be specified absolutely, as in --h-r 12, or relatively as in --h-r -3.

It would be important for a branch with a horizon to be a first class citizen, like any other. It's fine to ask me to download remote revisions, if that's what's needed to perform an operation, but please don't disable operations like push, pull, etc.

David said...

I strongly prefer a horizon with lazy loading. I see no advantage to a hard limit.

kA said...

yes, surely

Steve McInerney said...

Yes Please!

Would be ideal for the copying of branches to production we do; and for the testing phase.

A history of 1-2 months at most would be more than enough for the rare occasion we need to manually back-out a change.
And not having to maintain all the excess... overhead? would be a GoodThing(tm).

Mackenzie said...

YES!

It takes so long to branch in bzr sometimes, and I almost *never* want to go backward in the history (just want to be able to keep pulling so my changes are mergeable), so this would be fantastic.

Tim Penhey (thumper) said...

@ChrisW - with bzr there is a full text copy of a file saved every now and then, so even though a particular line hasn't been modified since the dawn of time, you don't need to go back to revision 1 to get it.

@David - a hard limit means that all history operations will still work even though earlier revisions may not be there. If a lazy load was required then you'd need to be connected to another repository whenever you tried to do a history operation that would need older revisions. A hard limit with a way to get more history is one solution.

SamB said...

I want this, too. Is there a launchpad bug or something that I can lean on?

SamB said...
This comment has been removed by the author.
SamB said...

I guess I should have tried googling before asking the question; it was not hard to find first https://blueprints.launchpad.net/bzr/+spec/shallow-checkouts/, and from there the associated bug https://bugs.launchpad.net/bzr/+bug/46561.