Tuesday 27 March 2012

Unity 5.8 issues and workarounds

Well... with the release of Unity 5.8 and associated dependencies, we got the extra testing we were after in precise, and with it a number of bugs. The positive side to this is that with the extra information from our wonderful beta-testers we have been able to work out how to reproduce a number of the issues. As any developer would tell you, being able to reproduce your user's problems is often the biggest hurdle.

Over the weekend I noticed a number of issues around the release of Unity 5.8, and this morning while going through the bug reports, I was happy to notice that we had some way to work around most of them.

Unity 5.8: Flickering and corruption on Unity UI elements - a fix for many is "unity --reset". The cause appears to be how compiz is dealing with plug-ins that are no longer around. For some there have been plug-ins that existed with Oneiric that are no longer around in Precise, and the reset caused them to be removed from the list to load.

Unity 5.8: Login to blank screen (all black or just wallpaper) - some have been fixed by "unity --reset", but the underlying cause of this one is still a bit of a mystery.

Unity 5.8: Can't login to Unity since upgrade to 5.8 - some have found that disabling "Unity MT Grab Handles" compiz plug-in fixes this issue. We still need to work out what the underlying problem is.

white box randomly shows up at top left corner blocking applications from using stuff under it - this one appears to be triggered by chromium desktop notifications. There have been reports that disabling the animations plug-in in compiz, and then re-enabling it fixes this. We are still investigating why.

If you are getting these issues, you can try the workarounds suggested here.

Monday 20 February 2012

Guilt reduction

So it is now Monday morning and I'm sitting next to Thomi.  We are going to pair program on this test stuff.  Partly because I think that pair programming is really cool, and partly due to Thomi knowing the autopilot test infrastructure really well, and that'll make this go much faster.

The bug in question related to the launcher getting into a very confused state where it thought there were multiple active applications.  And clicking on a launcher icon that was in this confused state caused a new application to be started rather than switching to the one that was running.

The first step in making all this work then, is to create a branch that is based off a revision that was before the fix.  This way we can write a test that fails first.  A key part of tests is to make sure they fail first.  Then when they start passing, you know it isn't by mistake, and that you have tested what you think, not just created something that passes.

Firstly, find that revision...

$ bzr log | less

The fix is revision 1977, so lets make a branch of trunk from revision 1976.

$ bzr cbranch trunk -r 1976 hud-ap-test
$ cd hud-ap-test/
$ bzr revno
1976


I use light weight checkouts for the unity repo, hence cbranch rather than branch.

At this revision, there is a HUD test that really just checks the reveal. Lets make sure it passes...

$ cd tests/autopilot/
$ python -m testtools.run autopilot.tests.test_hud
Tests running...
No handlers could be found for logger "autopilot.emulators.X11"

Ran 1 test in 4.238s
OK


I deleted a bunch of gtk warnings, they don't add any value for what I'm trying to show here.  Would be great if someone fixed them though :-)

Now I need to actually build and run my local unity (and test the autopilot test again).

Found out that my machine was failing to build for other reasons, so we switched to Thomi's.  The existing test still passed (of course it did), so the next step was to write a test that encapsulated the broken behaviour that we had found during the many hours of analysis.

That can be found at lp:~thomir/unity/autopilot-hud-triple-hit.

The test failed with the old revision, we then merged trunk, rebuilt, and ran the test again.  Test passed.  Job done.

Saturday 18 February 2012

That guilty feeling

Today had been a frustrating day.  I had been quick to anger and my family bore the brunt of that. It wasn't until I was confronted with this that I actually took a minute to think why I was feeling this way.  It came back to something I read on IRC this morning, where I read that some people I deeply respect were disappointed with the test coverage with Unity 5.4.

I took this disappointment the way people often take it from their parents.  Remember when as a child, one of the worst things you could feel was the disappointment of your parents.  Well I guess that is how I felt.

I took over the engineering manager position of the unity team at the end of last year, and I tend to take criticism of the project and team personally.

So... why the guilty feeling?

Well, back around the time I took over managing the team, the general acceptance criteria for getting Canonical projects into Ubuntu changed.  This includes Unity.  There were a number of automated tests for Unity, and a series of distro acceptance tests that were manually executed.  What we needed to do was to really change the team culture to one where tests were not only written, but expected.  New features needed test coverage, bug fixes needed test coverage.  The idea here, for all those that understand test driven development, and automated testing, was to make sure that bugs that were fixed, and new features, didn't get broken accidentally by new changes.

The guilt really came from knowing that I had allowed code reviews through the process without enforcing the need for tests.  And that as a senior person on the team, others took a lead from what I did.  If I was letting things through, so would others.  This is where the feeling really came from.

It is very easy to land fixes to crashes quickly when under pressure.  Especially when you've spent the last eight hours debugging in gdb, and auditing all the recently landed code looking for that change that would contribute to the broken behaviour that you have been trying to fix.  When you finally find that one line fix, it is so tempting to just commit the one line.  You know it works, you've just spent the last freaking eight hours looking at the weird behaviour.  What you haven't done however, is stopped it from happening again, by encapsulating the behaviour in an automated test.

I plan to spend some of Monday going back and adding an automated test to cover the particular behaviour that we fixed the other day.  I'll also write up what, and how this test gets written.  Hopefully by writing this, not only will Unity get better test coverage, but I'll personally feel better knowing that I've done the right thing.