How Bazaar: 2009

Friday 13 November 2009

Don't wait for perfection

Way back in July I was thinking of writing a post about the new branch listings in Launchpad. I was working on making branch listings for distributions and distroseries work, for example Package branches for Ubuntu and Packages branches for Ubuntu Lucid. But the code wasn't entirely optimised. Then as things happen, the optimisation got pushed back... and back. And finally when I did get the optimisation in, I didn't feel it was worthy of talking about.

I guess the thing to remember is: don't wait for perfection. Sure it wasn't perfect, but if more people were accessing the pages, the optimisation may have happened sooner.

One thing going on at the moment is more integration of the lazr-js widgets. The main merge proposal page now has in in-page multi-line editor for the commit message. Sure, it needs tweaking, but the main functionality is there. More ajaxy goodness is finding its way into Launchpad.

One of the things that I'm thinking about right now is splitting the concepts of code reviews and merge proposals. At the moment we almost use the term interchangeably, which does cause some confusion. I'd like to have the merge proposal reflect the meta-data and information around the intent to have work from one branch be landed or merged into another branch (normally the trunk branch), and the code review the conversation that goes on around the merge proposal. Merge proposals may have an associated code review, but right now, a code review must be associated with a merge proposal.

Associated with this, I'd like to extract some state information. Currently merge proposals have only one status, which really reflects two things. I'd like to break this out into two: review status; and merge status. Review status would be one of: work in progress; needs review; approved; rejected; superseded; and maybe withdrawn. Merge status would be one of: proposed; merged; queued; and merge failed. Queued relates to the merge queues which are currently partially implemented in the Launchpad codebase, and merge failed is the state that a proposal would be set to when a landing robot like PQM or Tarmac attempt to land the branch but it fails due to either merge conflicts or test failures.

My goal for the next six months it to write more often, talk about ideas more, and not wait for perfection.

Sunday 5 July 2009

Breaking up work for review

It was Friday morning after three days of working on one feature. Last thing Thurday I counted the size of the change and it was over 1100 lines and I wasn't quite finished. I found myself in the situation that turns up periodically that I wanted to break up my work into cohesive reviewable chunks. Now it isn't a matter of taking commits x through y as chunk one, and so on, as the size grew organically as I changed what needed to be changed, and wrote what needed to be written without really stopping to think about the size of the change until it was done. However now it was done, I wanted to break it up.

Last time I did this, I used looms, but Aaron told me we could do it easily using his new Bazaar pipeline plugin. So I spent some time talking through with Aaron on how to do it, promising to write it up if it worked well. I must say that it was good. During the process we identified a number of enhancements to the plug in to make it even easier.

I'm going to show the progression we made, along with our thoughts. I have trimmed some of the output when I've decided that it doesn't add value.

The first thing I had to do was to get the pipeline plugin.

$ bzr branch lp:bzr-pipeline ~/.bazaar/plugins/pipeline

Unfortunately this seemed to clash slightly with the QBzr plugin. The were both trying to redefine merge. Personally I don't use QBzr and had probably just installed it to take a look, so I removed that plugin.

Caution: the pipeline plugin relies on switch so works with lightweight checkouts. This is how I work anyway, so I didn't have anything to do here, but if you work differently, YMMV.

The pipeline plugin is designed around having a set of branches one after (individual pipes) the other that perform a pipeline, clever eh? When you have the pipeline plugin, any branch is also considered a pipeline of one.

$ bzr show-pipeline
*  nice-distribution-code-listing

What I was wanting to do was to break up this work into a number of distinct change sets, each that could be reviewed independently. We decided that the way to do this was to create a pipe before the current one, and bring changes in. This is done with the command add-pipe.

$ bzr add-pipe factory-tweaks --before nice-distribution-code-listing
$ bzr show-pipeline
   factory-tweaks
*  nice-distribution-code-listing

Right here we decided that there should be an easier way to add a pipe before the current pipe, as right now it needs a pipe name. A bug was filed to track this.

You can see from the show-pipeline command that the new pipe is before the current one. The pipeline plugin addes a number of branch aliases:

:first - the first pipe in the pipeline

:prev - the pipe before the current pipe

:next - the pipe after the current pipe

:last - the last pipe in the pipeline

Now to make the switch to the first pipe. Both :prev and :first refer to the same branch here, and I could have used either.

$ bzr switch :prev
  ... changed files shown
All changes applied successfully.
Now on revision 8747.

Now this pipe was added from the pipe after it, so it starts off with the same head revision. Not exactly the starting point I wanted, so we replaced the head of this branch with the last revision of the trunk branch that we had merged in.

$ bzr pull --overwrite -r submit:

The submit: alias refers to the submit branch. This is often trunk, and is in my project layout (specified using submit_branch in .bazaar/locations.conf).

Now the lower pipe was a copy of trunk. A good place to start adding changes I think. The next problem was how to get the changes from the following pipe into this one. Our first attempt was to merge in the following branch, shelve what we didn't want, throw away the actual merge, but keep the changed text, and commit.

$ bzr merge :next
$ bzr shelve
$ bzr stat
modified:
  lib/lp/testing/factory.py
pending merge tips: (use -v to see all merge revisions)
  Tim Penhey 2009-07-02 New view added.
$ bzr revert --forget-merges
$ bzr stat
modified:
  lib/lp/testing/factory.py
$ bzr commit -m "More default args to factory methods and whitespace cleanup."

Now this seemed very convoluted. Why merge and then forget the merge? I seemed kinda icky, but it worked. The next thing to do is to merge these changes down the pipeline. This is done through another command pump.

$ bzr pump

This merges and commits the changes down the pipeline. If there are conflicts, it stops and leaves you in the conflicted pipe. This didn't occur here, nor did it occur for any of my other ones. Here you can see the commit message that pump used:

$ bzr switch :next
$ bzr log --line -r -1
8714: Tim Penhey 2009-07-03 [merge] Merged factory-tweaks into nice-distribution-code-listing.
$ bzr switch :prev

Now it was time to add the next pipe.

$ bzr add-pipe code-test-helpers
$ bzr show-pipeline
*  factory-tweaks
   code-test-helpers
   nice-distribution-code-listing
$ bzr switch :next

This time, instead of merging in the changes, we shelved them in. The shelve command. The shelve command can apply changes from arbitrary revisions, and it also knows about files. The change that I wanted in this branch was a single added file, so I could tell shelve about that file.

$ bzr shelve -r branch::next lib/lp/code/tests/helpers.py
Selected changes:
+   lib/lp/code/tests/helpers.py
Changes shelved with id "2".

However the big problem with this is it all looks backwards. We are shelving from the future not the past. This really did my head in. Shelve would say "remove this file?" and by shelving it, it would add it in. It worked but made my head fuzzy. We filed a bug about this too. By adding a better way to take the changes, the command could do the reversal for you and provide you with a nicer question.

More of the same happened for the next few pipes, and I won't bore you with repeated commands.

On the whole, the pipeline plugin worked really well. I was able to break my work up into five hunks which could be reviewed easily. In the end I kept working on the branch that was my original, so all my original history remained intact. It would have been just as easy to add another pipe and take the remaining changes. This would have left me with five branches, each with one commit. This works well for the way we work as we have reviews based on branches. Each pipe could be pushed to Launchpad and a review initiated for it. With some more UI polish, I think pipelines will be even more awesome than I think they are now.

Thursday 18 June 2009

You're doing it wrong!

Just yesterday I found a missing feature in one of the apps I just started using. My thought processes were something along the lines of “hey, I could add this feature and it would be good”. So I went to the project's website, found their source code repository, and got blown away by the comment that was with it:

Please note that code you get from this repository is not intended for productive use (unless it's tagged as a released version, of course, in which case the usual alpha/beta disclaimers apply ;-)). We like to break our codebase, config files, database schemas and all kinds of stuff. We sometimes commit non-compiling revisions to facilitate collaborative development. Running such an unstable version might trash your settings, your backlog and maybe your computer. You have been warned!

Eh? OK, I get the first sentence. It is even a good disclaimer. Tagged releases are more stable. People regularly commit code that is unpolished. Sometimes even with some known bugs or issues.

The second sentence has me going “NO!?! What are you doing?”

The third sentence just blew my mind. This project is using a DVCS. Not my DVCS of choice, but really that doesn't matter. All DVCSs are made to have good merging and sharing of code between developers. Saying “We sometimes commit non-compiling revisions to facilitate collaborative development” is just a lack of understanding of how to use the tools. You are using a DVCS to facilitate collaborative development! This is centralised version control thinking.

Try this for a code to work by:

Trunk should always at least compile, run, and pass all the tests.

This hasn't stopped me wanting to work on the code, but it has raised my caution levels.

Tuesday 16 June 2009

kiwipycon

Today NZPUG held yet another organisation meeting for the first kiwipycon. Organising conferences takes a lot of effort by many dedicated people. The Christchurch python user group has volunteered to host the first PyCon in New Zealand. Personally I suggest things from time to time, but a big thanks goes out to those guys for the hard work that has gone on even before the call for papers.

The dates have been set as Saturday the 7th and Sunday the 8th of November 2009. A weekend was chosen to allow those working or studying who can't get leave to attend. As I understand it they are still working on pricing. The call for papers will probably go out next month some time.

Should be interesting...

Monday 15 June 2009

launchpadlib updates for branches

Just a quick note to yet you know of a few changes to launchpadlib for branches. Mainly because I've removed a method that I know someone is using.

You used to be able to get to the branches for a project by saying my_project.branches, but I've removed this. It would have been nicer to deprecate it but we don't have a nice deprecation method right now for launchpadlib, and since it is still in beta, I didn't feel too bad.

The branches of a project was an attribute, now we have a getBranches method. The old attribute would give you all the branches of the project, including the merged and abandoned ones. The method defaults to give you the active branches, and allows you to pass in the statuses that you'd like to get.

Also with this change you can now get the branches for a project group, or the branches owned by a person using the same getBranches method call.

Project groups also grew the method getMergeProposals in the same way that the method was already available for people and projects.

Please file any bugs on the launchpad-code project on Launchpad.

Wednesday 11 March 2009

lp:mad

One of the things that I have spent quite a lot of time on recently is the code review stuff in Launchpad. Recently, as of the 2.2.2 release, new merge proposals get a review diff created for them automagically. This review diff is based on the changes that have been done in the branch relative to the least common ancestor (LCA) of the target branch. Since the review diff only has changes that have been added, there is no way for this diff to ever have conflicts.

There is another diff that is useful to see however. This is the diff of what changes would happen if the source branch was merged into the target branch right now. Sometimes this might conflict. Sometimes this might be a smaller diff as some other dependent functionality has landed. This diff isn't generated automatically by Launchpad. However this is something that you can run to add it.

The Merge Analysis Daemon

Alright, it isn't exactly a daemon (yet), but the name was cool.

What this script does, using the launchpadlib API, is to get all of the current merge proposals for a project and works out the diff that would be — what we call the preview diff.

What do you need

Firstly you need branches of wadllib, launchpadlib, and mad.

$ bzr branch lp:wadllib
$ bzr branch lp:launchpadlib
$ bzr branch lp:mad

Inside the mad directory, there is the LICENSE file (GPL v3), and the script.

The script has many parameters.

$ ./mad.py --help
Usage: mad.py [options]

Options:
  -h, --help            show this help message and exit
  -v, --verbose         Display extra information
  -q, --quiet           Display less information
  -p PROJECT, --project=PROJECT
                        The name of the Launchpad project.
  -r DIR, --repo=DIR    The location of a local repository to use.
  --dry-run             Don't upload the diffs
  --force               Force an update of the diff.
  --staging             Update the proposals on staging.launchpad.net.
  -c FILE, --credentials=FILE
                        The credentials file. Defaults to ~/.launchpad/mad
  --cachedir=DIR        The location of the cache directory. Defaults to
                        ~/.launchpadlib/cache.
  --no-op               Don't get the proposals for the project.
  --new-credentials     Get a new OAuth token and save in the credentials
                        file.

I have the following in the server's crontab listing:

20 * * * * PYTHONPATH=/home/tim/launchpadlib:/home/tim/wadllib /home/tim/mad/mad.py -p storm -r /home/tim/sandbox/mad-playground -c /home/tim/.launchpad/mad -v >> /home/tim/mad-storm.log 2>&1

Basically this says:

At 20 minutes past every hour

Run the mad.py script using a PYTHONPATH that knows about wadllib and launchpadlib.

Use the credentials file ~/.launchpad/mad

Use the respository at ~/sandbox/mad-playground

Be verbose

Make all output go to a log file.

If the specified repository directory does not exist, a new shared repository with no working trees is created. If there is an existing repository, it will use that.

Each of the source and target branches are pulled into the repository. MAD won't create branches for them, it just grabs all the necessary revisions. MAD then calculates the diff that would be if the source was merged into the target, and sends that to Launchpad to have it annotate the proposals. As an example, see the storm ones.

You will also need to have permission to edit the proposals that you are wanting to update. If you are the person that is running the project, and are in the team that owns the target branches, then you should be able to update them.

There is a --staging option to test the script against what is in staging.

The script also walks you through the necessary OAuth token acquisition the first time you run the script.

Report bugs on Launchpad.

Wednesday 21 January 2009

Shallow branches or history horizons

There is an idea floating around and I'm curious to see if it is an idea that has merit and worth putting effort into. This idea is in the DVCS space and is called "shallow branches" or "history horizons".

The concept itself is pretty simple. When using a DVCS with a project with a long history, each and every user has a copy of this history. Now much of this history may be ancient (for some definition of ancient, 6 months, 6 years, whatever). Most developers will never have a need to go into the ancient history of a project, and so a truncated history is fine as long as their branches that they create are still mergable with the main repository.

Here's how it could play out:

Bob wants to work on the fooix project to fix a minor bug, this is Bob's first look at the fooix source. The fooix project has been around for eons and has a huge history. Bob doesn't care about the history, he just wants to do his simple fix (think a typo).

Bob grabs the fooix trunk branch but only gets enough history to create the working files.

Bob makes his fix, and publishes his branch for the fooix developers to grab.

The advantage here is that when Bob grabs his branch, he is only getting just enough history to work, and so his resulting repository is smaller and faster.

Commands that worked by inspecting the history would stop at the repository's horizon and say something like "and that's all I've got". Obviously there'd need to be a way to say "go and get me another 4 months of history" or even "ok, now I'm really interested, get me the complete history".

This is conceptually different from a lazy loading or stacked repository as there is an explicit horizon where normal history commands stop.

So lazyweb, the question I have is this: "Is this a worthwhile feature in a DVCS tool?"

How Bazaar