Example mashups – part I

by vingt deux

by vingt deux, licensed under CC BY-NC-ND 2.0

I’ve had a number of questions about how to get some different types of data mashed up, and for some worked examples. Good! We didn’t have as much time as I was hoping to cover that, so I’ll whack it in this blog post so that those of you going down a mashup path have  some better examples. This isn’t a comprehensive overview, since we’ve already had one of those, but simple and detailed worked examples of what to do. I’ll show some of the false starts that I might make in a mashup project, as well as the successes.

Before we get started, be sure you’ve read ProgrammableWeb’s mashup guide and our own previous class notes on mashups. They give more “why” that I will give here. This here is all about “how”.

First decision – am I doing an Australian mashup, or an international one? Australia’s a bit of a backwater in data access, and doesn’t have nearly as many easy datasets, but it’s probably a lot more relevant for a lot of us. International mashups, for example, have lots more handy mapping tools (e.g. google’s charts or mapsgeek) Let’s give Australia a go anyway. I’ll start with some Australian census data.

Say that I want to mashup, er, the average income of individuals in a suburb versus… geez, whatever, i can get away with that looks interesting. Remember, this is an experiment… There’s a few data types listed at at the Australian Bureau of Statistics site. The ABS data descriptions are unspeakably confusing and geeky, so it takes me a bit of poking around to realise that the “CDATA” seems to do what i want. I have to register to use it, but it’s free. Sweet. I can choose a ridiculous number of data types, and aggregate them, however i want, including by suburb, as long as I’m prepared to tolerate their opaque jargon:

CDATA by suburb

And best of all, it lets me export to a CSV file!

CensusDATA csv downloads

CDATA also has some other features: graphs and maps.


But graphs don’t work for spreadsheets this big, and the maps are amazingly ugly and don’t seem to export too well. Feel free to use them in your own mashups, mind… but this is not what I’m going for in my mashup.

nc-2-ugly map

So let’s get out data out of there and into something more useful. You will recall that I can upload CSV  to a google spreadsheet, and google spreadsheets are wicked easy to mashup.So, I’ll do that. (here it is)

spreadsheet uploading

Now, we can start mapping average income. First obstacle: There is no spreadsheet telling me average income, just how many people are in each income bracket, what you call a histogram. Now we have two options: We could change plans and choose a different data set, or if we are feeling a bit more confident we can process this data set to give us what we need.  That won’t be too hard in this case (although it will mean remembering high school maths) so I’m going to run with it.

Fortunately, the internet can help here, even if we’ve forgotten our high school mathematics – we can approximate the mean income from a histogram using instructions online. And we are already using a spreadsheet, so this part is kinda easy. It won’t be completely accurate, but hey, it will be a good start, and if the sites gets enough web traffic, maybe we can get some funding to get more accurate data. So we make up a “central” income for each income category:

So, now I modify the spreadsheet to include a mean-income estimation according to those instructions.

first, make up a "centre" for each income bin

first, make up a "centre" for each income bin

then this is the formula to calculate the mean from the histogram

then this is the formula to calculate the mean from the histogram

Next step, I’ve noticed that there is way to0 much crap in the spreadsheet. I can probably get rid of most of it. let’s delete the columns that are about income by sex – we just want to know about average income over all (unless you are doing a map of sex-based income gaps by suburb, which would be pretty interesting – I’ll leave that one as an exercise.)

spreadsheet tidying

Now, at this point, I want to create some nice interactive coloured map of income levels, a choropleth. But, as I intimated above, that’s a bit tricky in Australia. I know from helping with the AboutNSW ones that it was annoying to get them working, and we had to roll our own tools. Rolling your own tools isn’t impossible – you can get the suburb outlines from the ABS and follow google’s handy instructions for banging them on a map, but that’s going to involve a fair bit of software wrangling… so let’s see what else we can do that might be a bit simpler before we resort to that.

Well, how about importing the spreadsheet into wordpress?
wordpress import menu
(What? Well it turns out that google spreadsheets can export RSS and wordpress can import RSS, and vice versa. Remember me banging in week 3 about the importance of standards like RSS to make things talk to each other?)

Great, so now I import my list of suburbs into my blog… Bam! It works!

import rss in progress

Well, it kind of works. There are a few problems. Firstly, the import process takes so long that eventually dreamhost gives up the ghost, and my import script stops after 1294 suburbs have been imported. I could work around this by only exporting part of the spreadsheet at a time… but then let’s look at the next problem THEN there’s the problem that a lot of these imported articles are frankly, ugly. I should possibly have put more care into which columns of the spreadsheet I exported.

ugly posts

There’s another problem too, which I’ve glossed over – it turns out that when I copy and past the RSS URL into my browser, both firefox and safari refuse to download the RSS properly for me to upload into wordpress. I end up having to install an extension into firefox called DownThemAll just to get the RSS file to download without getting all mangled. watch out for that!

On the plus side, I now have a blog with a reasonable chunk of all the suburbs in New South Wales in it, and each of those even lists the approximate average personal income in the suburb. I could go places with that. Maybe I could have each suburb post show a little map of the suburb, using jQuery. Or maybe I could put extra suburb info up there in the blog.

Did you know the suburb of “Airds” has an average weekly income of approximately $370 dollars, and that its name means… etc

But for now, it’s a mess. I’m going to nuke my wordpress blog and start again, rather than deleting 1295 entries by hand.

And I’m going to do it a bit differently… I think I’m being overambitious. You know what? This is a one-semester project.

Why am i trying to deal with all two-and-a-half-thousand suburbs in New South Wales? Forget it. I’m going to run with a smaller data set. I’m going to go back to the ABS, and I’m going to grab the personal income statistics again, but this time I’m going to grab them, not per suburb, but per Local Government Area. There’s only a few hundred of those, which might fit into wordpress a bit better, and might be easier to manually edit if there are a few mistakes. Also, it looks like LGA is a more common thing to classify data by than suburb (e.g. the NSW crimes map, and stap isi’s fantastic council contact info). And I’m going to use Yahoo pipes to make my job easier, and present the result using jQuery to keep it a bit simpler.

At least, that’s the plan. However, banging out this detailed tutorial with screen shots and all that it taking an amazingly long time (16 hours and counting!) and I know some of you are wanting to get your hands on this info ASAP. So I’ll continue in a blog post part 2 over the weekend, and put this blog post out there, eh?

For now, here are some interesting follow up readings:

  1. the mashup australia contest is open! If your mashup is good enough, you might want to enter it and win $$
  2. mashupaustralia has also graced us with a handy list of data sources that you might want to use
  3. There is a nice blog post here about how to make your life mashing stuff up a lot easier which also walks through the same technologies I’m covering here.

0 Responses to “Example mashups – part I”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Except where otherwise attributed, Creative Commons License
Netcultures blog is by Dan Mackinlay and Chris Caines and licensed under a Creative Commons Attribution 2.5 Australia License. Content from external sites remains property of its original creator.

%d bloggers like this: