Saturday 29 August 2009

Quotable Quotes no. 368

Progress is impossible without change, and those who cannot change their minds cannot change anything.

–George Bernard Shaw

Sunday 2 August 2009

easy_install (mis)behaviour

Version 0.7 of netaddr, a major milestone release went out recently. It marked the library's status moving from beta to production/stable as I'm fairly happy now with its interface and implementation.

There was a lot of change involved in this release due to some major refactoring work. I thought it best to put together a series of pre-release snapshot tarballs to give interested users an early look at the upcoming changes and a chance to hopefully squash any bugs before the main release.

The download page on the Google code hosting site seemed like a fairly safe bet for publishing these files. I purposefully didn't post them on PyPI in an attempt to limit the audience to which they would be made available.

BIG mistake!

A kind user raised the following bug report - http://code.google.com/p/netaddr/issues/detail?id=41

They used easy_install to install netaddr and, unbeknown to me, in the background it was going off to the code hosting site, picking up the release candidate packages, and trying to dynamically build eggs based on them to provide to users!

The question is "why would you choose to do this"?! Hey, I'm all for software being clever and helpful, but this choice of behaviour seems a bit clever for its own good. It wantonly broke a seemingly sane way of releasing code and really seems to try hard to break the Rule of Least Surprise. This kind of thing is actually the opposite of helpful.

OK, so the easy fix would probably be to remove or change the explicit link between PyPI and the code hosting download page by editing the Download URL parameter in the PKG_INFO file, but the fact remains that this functionality seems to be trying a little too hard to be useful.

It seems that my misgivings on supporting setuptools are not entirely without justification ;-)

Saturday 30 May 2009

The Wave is coming ...

Google Wave that is ;-)

http://wave.google.com/

Here's a direct link to the Tech Preview video :-

http://www.youtube.com/watch?v=v_UyVmITiYQ
(Skip forward to 7:30 if you prefer to leave out some of the marketing and introduction at the start)

The video is an hour and 20 minute demo on a product that Google has been working on for the last 2 years or so. It is a view on how the way we communicate using computers is likely to change in the very near future.

Seeing this, I am really beginning to believe some of the hype about Google becoming a true Microsoft killer. This doesn't even take into account all the hard work Google is doing to supplant Microsoft's products through software vendors with their own paid for products (an issue for another blog post). With technology like Wave, Google totally outclasses pretty much anything anyone else is doing right now.

Interestingly, Google is putting the protocol behind Wave forward as an open standard so other software vendors can build their own interoperable (and possibly competing) systems and products with it :-

http://www.waveprotocol.org/

This is all very altruistic but it does beg the question, "how do they make any money out of this"? I guess we'll be finding out in the months ahead.

Maybe it's just paranoid old me but aspects of Wave are also a little scary. It opens up a whole Pandora's box of new possible security and privacy issues (much as any new technology does). Some of those bots look real friendly in a demo setting (Spelly, Buggy, Tweety etc) but ones could also be built for less savoury purposes in the wrong hands.

On the whole though this is really cool technology and opens up some very interesting possibilities for new and better ways of working (in real time) collaboratively in the future.

Nice work Lars and team!

Sunday 24 May 2009

Wingsuit Base Jumping - the new extreme sport

This has to be one of the most extreme sports of all time. It takes base jumping to a whole new level by combining this already dangerous pastime with the latest advances in wingsuit technology developed over the last 10 years.

The most dangerous part has to be the practice known as "proximity flying" where the wingsuit base jumper glides over the mountain side on the way down only a few metres above the ground!

You really have to see it to know what I'm talking about :-

http://www.youtube.com/watch?v=QiXNxlM4BeE
http://www.youtube.com/watch?v=3ycBGkLkEkg

It literally makes your hairs stand up watching it!

See Also

Loic Jean Albert's Homepage - great details on the history of this sport, starting with Icarus :-) and the evolution of wingsuit technology.

Saturday 16 May 2009

Yahoo Pipes - a web service for the customisation of RSS and Atom feeds

In the last couple of years I've started subscribing to more and more RSS and Atom feeds preferring interesting articles and information to come to me rather than having to go out trawling a list of well known websites on a fairly ad hoc basis for new articles of interest.

Google Reader is my feed reader of choice mainly because it's online and I can access it from anywhere without having to maintain multiple installs of feed reader software and settings.

I'm fairly ruthless with my feed subscriptions preferring feeds that are reasonably low volume (1 of 2 posts a week) with high quality content. There is usually a single human being on the other end with a limited amount of bandwidth.

Quite a few feeds are aggregates of multiple contributions (e.g. news websites) or are automated to provide information when changes to some underlying system occur (e.g. check ins for a code hosting site). They contain useful information but you sometimes quickly get bogged down by the sheer volume and frequency of updates coming through. I don't stay subscribed to these feeds for long when all I'm doing is clicking on "Mark All As Read" as hundreds of unread entries pile up.

There are certain feeds in this latter category that I would still like to follow, if only I could reduce the noise by either adding an include or exclude filter and possibly generating sub feeds that I could publish and make available to myself and others.

This was a pipe dream until this morning when I came across Yahoo Pipes :-)

You rarely come across technology that does exactly what you want out of the box without requiring a decent amount of customisation first, either through configuration or some coding. Yahoo Pipes allowed me solve a problem that has been nagging me for ages and it was dead simple to do.

As its name suggests Yahoo Pipes makes use concepts familiar to those who know and love UNIX command line with I/O processing pipelines, redirection and a set of options similar to tools like grep, sed and awk. These are all provided through a great visual designer and editor that make settings up and publishing your own custom feeds really easy and most importantly very quick to crank out. You can literally build and publish a custom feed in 5 minutes or less without much prior experience of Yahoo Pipes. That is real power (hats off to the designers - this kind of thing is not easy to do).

Here is a sample feed that is already saving me loads of time filtering out stuff I'm not interested in :-

Source feed :-

PyPI Package Update RSS feed

http://pypi.python.org/pypi?%3Aaction=rss

Resulting feed :-

PyPI Package Update RSS feed (without Zope or Plone packages)

http://pipes.yahoo.com/drkjam/pypi_updates_no_zope_or_plone

You can organise your new feeds by giving them sensible and meaningful URLs too (another great feature).

So, if you subscribe to a lot of feeds and find that you want to tame them this is definitely the way forward™.

Check out these impressive tech demo videos for some real "wow" moments :-

Sunday 15 February 2009

TestDisk: *the* cross-platform data recovery tool

A few months ago, I lost both disks at the same time in the RAID 1 mirror running on my QNAP TS-209!

I get the feeling hardware manufacturers that extol the virtues of RAID would like you to believe that this sort of event is extremely uncommon. However, judging by how hot the disks were running (I couldn't hold them for more than a few seconds comfortably) the likelihood of both failing was probably quite high. The QNAP's cooling fan obviously wasn't able to provide sufficient airflow over the disks in question (a couple of Hiatchi
Deskstar 1Tb 7K1000's) which probably had a lot to do with the failures and data corruption. In hindsight, I'm amazed they lasted as long as they did.

I had been running an rsync job between the QNAP and my main PC fairly frequently. Just prior to the disk fatalities I'd been running short of disk space on my main machine and hadn't run the sync job for about six weeks. Annoyingly, when it failed there were still some files on the QNAP that I wanted.

At the time I had a brief search around for some free data recovery tools using all of the keyword variations you might expect to pull up something useful via a search engine. I tried a few I came across and was left distinctly unimpressed by what was on offer. So, I decided to shelve the disks until I could find the time and a suitable tool to try and get at least some of the data back.

Then, last week, I came across Victor Stinner's excellent library, hachoir. On that project's wiki I noticed the rather intriguing entry simply
titled 'forensics' which in turn led me to the TestDisk homepage.

What a find!


I haven't been more impressed with a bit of software since someone introduced me to VLC several years ago. TestDisk runs on an impressive array of platforms (Linux, BSD, Solaris, Mac OS X, Windows) and supports the drive formats you'd expect to come across - FAT12/16/32,
NTFS, EXT2/3, ReiserFS etc. I was running TestDisk on Windows Vista 64-bit retrieving data from an Ext3 partition.

Even better, it's free (GPL licensed). I think you'd be hardpressed to find a better free tool. Even paid for tools would likely struggle to match TestDisk's impressive list of features. I'm frankly astounded at how hard it is to find such a good tool via a search engine unless you know it by name (which is ridiculous).

This tale does have a happy ending. I managed to retrieve about 80% of the files for what I believe is only some partition corruption. I've yet to ascertain if there is any physical damage to the disks. TestDisk was able to dig up the partition table that the QNAP managed to trash somehow and even allowed me to get back copies of files that had previous been deleted when the disks were still functioning. Truly amazing. I'm also hoping there is some life left in my Hitachi Deskstar disks. Mind you, I intend to give them a thorough soak test before trusting them with any serious data.

I really recommend checking out the TestDisk web page and downloading it if you have spare 5 mins.

A big thank you to Christophe Grenier for all his efforts in creating a truly great piece of software!

Wednesday 21 January 2009

netaddr 0.6 released

For those who are interested netaddr 0.6 has just been released.

A couple of nights ago, I hit a very important personal milestone.

One of the killer features I've wanted since starting on netaddr back in January of 2008 is now in the bag! It ended up being one of the more tricky problems I've had to deal with but has been worth the effort. It will also prove extremely useful in my day-to-day work.

Essentially, you can now take an arbitrary start and end IP address (versions 4 and 6), and formulate a list of intervening CIDRs that exactly bridge the two with no overlaps.

A lot of infrastructure code had to be built before I was at a point where this could be implemented elegantly. The wait is (finally) over. It has also allowed me to kill some really ugly code based on math.frexp() which didn't work for IPv6 as it died somewhere around 2^54 due to rounding issues.

So, here is the result of the (worst case) scenario in the IPv4 address space (IPv6 would take up too much screen space) :-

In [1]: from netaddr import *

In [2]: cidrs = IPRange('0.0.0.1', '255.255.255.254').cidrs()

In [3]: print "\n".join(["%-18s: %-15s -> %-15s" % (str(c), c[0], c[-1]) for c in cidrs])
0.0.0.1/32 : 0.0.0.1 -> 0.0.0.1
0.0.0.2/31 : 0.0.0.2 -> 0.0.0.3
0.0.0.4/30 : 0.0.0.4 -> 0.0.0.7
0.0.0.8/29 : 0.0.0.8 -> 0.0.0.15
0.0.0.16/28 : 0.0.0.16 -> 0.0.0.31
0.0.0.32/27 : 0.0.0.32 -> 0.0.0.63
0.0.0.64/26 : 0.0.0.64 -> 0.0.0.127
0.0.0.128/25 : 0.0.0.128 -> 0.0.0.255
0.0.1.0/24 : 0.0.1.0 -> 0.0.1.255
0.0.2.0/23 : 0.0.2.0 -> 0.0.3.255
0.0.4.0/22 : 0.0.4.0 -> 0.0.7.255
0.0.8.0/21 : 0.0.8.0 -> 0.0.15.255
0.0.16.0/20 : 0.0.16.0 -> 0.0.31.255
0.0.32.0/19 : 0.0.32.0 -> 0.0.63.255
0.0.64.0/18 : 0.0.64.0 -> 0.0.127.255
0.0.128.0/17 : 0.0.128.0 -> 0.0.255.255
0.1.0.0/16 : 0.1.0.0 -> 0.1.255.255
0.2.0.0/15 : 0.2.0.0 -> 0.3.255.255
0.4.0.0/14 : 0.4.0.0 -> 0.7.255.255
0.8.0.0/13 : 0.8.0.0 -> 0.15.255.255
0.16.0.0/12 : 0.16.0.0 -> 0.31.255.255
0.32.0.0/11 : 0.32.0.0 -> 0.63.255.255
0.64.0.0/10 : 0.64.0.0 -> 0.127.255.255
0.128.0.0/9 : 0.128.0.0 -> 0.255.255.255
1.0.0.0/8 : 1.0.0.0 -> 1.255.255.255
2.0.0.0/7 : 2.0.0.0 -> 3.255.255.255
4.0.0.0/6 : 4.0.0.0 -> 7.255.255.255
8.0.0.0/5 : 8.0.0.0 -> 15.255.255.255
16.0.0.0/4 : 16.0.0.0 -> 31.255.255.255
32.0.0.0/3 : 32.0.0.0 -> 63.255.255.255
64.0.0.0/2 : 64.0.0.0 -> 127.255.255.255
128.0.0.0/2 : 128.0.0.0 -> 191.255.255.255
192.0.0.0/3 : 192.0.0.0 -> 223.255.255.255
224.0.0.0/4 : 224.0.0.0 -> 239.255.255.255
240.0.0.0/5 : 240.0.0.0 -> 247.255.255.255
248.0.0.0/6 : 248.0.0.0 -> 251.255.255.255
252.0.0.0/7 : 252.0.0.0 -> 253.255.255.255
254.0.0.0/8 : 254.0.0.0 -> 254.255.255.255
255.0.0.0/9 : 255.0.0.0 -> 255.127.255.255
255.128.0.0/10 : 255.128.0.0 -> 255.191.255.255
255.192.0.0/11 : 255.192.0.0 -> 255.223.255.255
255.224.0.0/12 : 255.224.0.0 -> 255.239.255.255
255.240.0.0/13 : 255.240.0.0 -> 255.247.255.255
255.248.0.0/14 : 255.248.0.0 -> 255.251.255.255
255.252.0.0/15 : 255.252.0.0 -> 255.253.255.255
255.254.0.0/16 : 255.254.0.0 -> 255.254.255.255
255.255.0.0/17 : 255.255.0.0 -> 255.255.127.255
255.255.128.0/18 : 255.255.128.0 -> 255.255.191.255
255.255.192.0/19 : 255.255.192.0 -> 255.255.223.255
255.255.224.0/20 : 255.255.224.0 -> 255.255.239.255
255.255.240.0/21 : 255.255.240.0 -> 255.255.247.255
255.255.248.0/22 : 255.255.248.0 -> 255.255.251.255
255.255.252.0/23 : 255.255.252.0 -> 255.255.253.255
255.255.254.0/24 : 255.255.254.0 -> 255.255.254.255
255.255.255.0/25 : 255.255.255.0 -> 255.255.255.127
255.255.255.128/26: 255.255.255.128 -> 255.255.255.191
255.255.255.192/27: 255.255.255.192 -> 255.255.255.223
255.255.255.224/28: 255.255.255.224 -> 255.255.255.239
255.255.255.240/29: 255.255.255.240 -> 255.255.255.247
255.255.255.248/30: 255.255.255.248 -> 255.255.255.251
255.255.255.252/31: 255.255.255.252 -> 255.255.255.253
255.255.255.254/32: 255.255.255.254 -> 255.255.255.254

I am now officially proud of this project ;-)

Friday 16 January 2009

"Not Invented Here" is a powerful force

A link to this 2006 blog post came up on my feeds recently :-

http://blog.pythonisito.com/2006/01/three-reasons-why-you-shouldnt-write.html

It's a great article (with excellent commentary) on reasons for (and against) starting your own code project versus joining an existing one. Things are certainly not as clear cut as they might appear at first. It uses the umpteenth Python web frameworks spawned (prior to the arrival of Django) as a basis for discussion but is good general advice for those thinking about contributing to the pool of open source software.

Certainly food for thought.