Now, last time when we left off, the site was in what you’d call a colossal mess. Trashed wordpress install, and still no interesting visualisation. So, how to improve things?
We’ll start by doing what I suggested last time – mapping off with a less ambitious dataset, based on LGAs (Local Government Areas) instead of suburbs. Here it is:
I even made a little chart. And look, since I am no longer quite as bored of CDATA as I was last week, I made another other spreadsheet based on percentage of people doing volunteer labour, by LGA.
Now, I’m keen to graph these data sets against each other. And I can, using google spreadsheets “Chart” function.
But it works.
Now… about that chart… Well, we could do one in google spreadsheets, but that’s basically like doing it in excel and about as much fun. So let’s not.
Instead, let’s upload it to ManyEyes. Have you seen this thing? Fun, collaborative mashup, and you do it by copying and pasting. There are a couple of tricks. Like I can’t have dollar signs in the spreadsheet values, but I have to put the dollar sign in the column header. Then I have to select just the rows I want. Here a screenshot fo the spreadsheet after I’ve done that:
Note that I don’t know that I have to do that just because I’m some kind of web genius, in case you were wondering. In fact, I just read those instructions off the manyeyes upload page: (click through to see it yourself – you’ll have to sign on, mind)
Wham. There it is. I did the same for the volunteers doohickey. And then I made a merged version of those two and uploaded it, because it’s more interesting that way. (“Interest” is also why I spelled it “voolunteerism, ok?)
Now this ain’t a statistics class, but, er, there you go, a possibly informative data visualisation. Well, it might be if you click through to a less microscopic version. So it turns out that rich local government areas tend to volunteer an amount close to average, and poor ones volunteer over a wider range. Hm. Damn those rich people, and their… lack of variability? That could have been more interesting. Messing around with statistics until a more fascinating one pops out is left as an exercise, however. Let’s power on through and work out what we can do with the tools at hand.
We’re gonna map this thing.
Now, first I try that yahoo pipes idea the obvious way, and repeat last time – i just send the output of my spreadsheet into the yahoo pipe, and hope the location extractor will do something useful:
Look at that. Dang, no location information in the output. Lame! I might have to read the manual.
A wide range of location mark-up is recognized, including GML (Geography Markup Language), Abbreviated GML, W3C Basic Geo, Abbreviated W3C Basic Geo, Simple GeoRSS, Yahoo! Local format, and KML LookAt and Point tags.
Right. Well, clearly, our feed doesn’t look like yahoo local or whatever the crap that is. But I reckon getting our data format in google maps link format should be fine, because it’s easy. I know from playing around, that you can get google maps to link to, say, Armidale, by using a URL like this:
Anyway, we could use Google Spreadsheets to turn all those “Armidale”-type cells into “http://maps.google.com/?q=Armidale, New South Wales” type cells, but it’s late in the day, so I’ll show you a little hack to do it without using Yahoo Pipes to save more fiddling around:
Woo! It worked! What does it do? Well, I’m not going into too much detail with this one, since, as I said, you can do it all in google spreadsheets… but the regex tool is a handy thingy for doing translation of strings of letters into other strings of letters (in this case, into strings of letters with “New South Wales” stuck on the end, and the URL builder just shoves stuff into a URL – in this case, the google maps one. Feel free to play around with it. The only trick with this one was I noticed that Yahoo Pipes was occasionally breaking when I tried to use “Local Government Area” as a field name in the CSV import thingy, so I renamed it to “LGA” in the CSV import widget, and then it worked OK. I guess it doesn’t like spaces in the name. Anyway, watch out for that.
Long story short: It works.
But lets get that stuff off the pipes page, by getting the KML export URL:
Now we’ve copied that url, we paste it in to google spreadsheets, right there in the search box.
And the result?
Win. we have a google map of the yahoo pipe of the google spreadsheet of the ABS data.
Basically, it was too much work to get it to geocode properly (it might have been a bit more fun if we controlled the source of data, and say, had a wordpress blog outputting GeoRSS or something).
Fortunately, Pamela Fox has blogged a nice way to geocode your spreadsheets. (She also has some awesome tips for designing maps.) Let’s follow her instructions. First, we add “New South Wales” to the end of all our LGA names, so that google can find ‘em better.
Nice. The rest is all following Pamela’s instructions.
You whack her gadget in there:
set it up:
Then you press the go button (small correction – up above I showed myself geocoding 150 records at once, but it only for 99. If you find it stops half-way through, that’s why. But it’s easy enough to restart.)
And then it goes. Not completely smoothly – it Turn out google has no idea where “Greater Hume Shire” is. And nor do I. I could have fixed that manually (possibly using the plain old paper street directory), but decided to be lazy. I copy and paste and then:
From the same blog post, Pamela also gives links to a handy spreadsheet mapping tool or two. There are, as always, too many alternatives. You might want to reverse geocode, or try this other hand bulk geocoder, or the GeoCommons open source geographic database, and there are semi-commerical services to automate part of it for you, such as umapper. Bah!
The real idea that I’m trying to get across here is that none of this is magical secret powers (well, except regexs), but there is a lot of muddling through. The list of tools out there is really large, and they change. No-one is a ninja at most of them, but be being happy to just try things and see if they work you can get a long way with a lot of them. There are bugs and annoyances. But then you usually get something at the end of the day.
Here are some links for lists of other directions to explore with mashups. I’ve mentioned some before, and some are new.
- Simon Willison’s roundup for the Guardian
- Freebase appears to have a general purpose mashup engine too.
- YQL is hot right now, if you want to get just a shade geekier. It supports a massive number of the services we are looking at here and more I haven’t heard of, and if those aren’t enough, you can turn anything, almost, into a YQL data source with a little bit of code. Very handy.
Now, that’s enough writing from me for now. I’m may have time to do another walk-through of things for this semester… If anyone wants to suggest the topic in the comments, fire away.