Geographic Queries on Google App Engine

May 28th, 2008 by crschmidt

Google App Engine is an interesting platform. It allows for you to run Python applications on Google’s platform, using BigTable based datasources as a backing store. Publishing applications to App Engine is trivial, and if you’ve written your application to support WSGI, porting to getting things running on App Engine is pretty easy.

However, there are some limitations placed on BigTable datastources that make doing geographic queries more difficult than it might otherwise be. Understanding of querying/indexing two-dimensional/geographical data is something that is typically a database extension — in Postgres, this is the PostGIS extension to the database, for example. BigTable has no inherent understanding of this type of data, so we have to drop to a more naive implementation. A first pass approximation at bounding box queries might be to create a four-tuple of columns in your datasource: left, bottom, right, top, and store the extent in that datasource. In a typical relational database, this would work reasonably well, though your database wouldn’t truly be indexing both dimensions at once: instead, you’d be using a single dimensional index, and then filtering the result as a second pass.

App Engine doesn’t allow this type of query, however: an application can’t perform inequality queries on more than one field at once. The naive implementation, of x < right and x > left, then, will fail: left and right are different fields, and you can’t do queries of that type.

So, the key is clearly to encode the bounding box of features in a way that embeds the data into a single string, which represents the bounding box accurately, but is also ‘ordered’ — such that a point represented by ‘123′ is near a point represented by ‘1234′.

The default response to this — for points — is to use Geohash (Wikipedia). Geohash embeds location information for a point into a single, ordered string, for which you can drop precision by simply dropping bits off the end of the string. For points, this actually works relatively well: a given point will be between two geohash values for the corners of a bounding box. So, if you have -71,42, and a Geohash of ‘drmwbmg7sp6yf’, and you do a query for -72,41 -> -70, 43, you have limits of ‘drkc1xg4q3uy8′ and ‘drwhx5gt1ubzk’. Clearly, the former is between the latter two: You can simply do a database query of (hash < 'drwhx5gt1ubzk' AND hash > ‘drkc1xg4q3uy8′), which App Engine has no problem with.

However, this doesn’t scale equally to non-point features. You can find the ‘union’ of two Geohashes by taking the bits that are the same from each and putting them together: in the case above, the bounding box of -72,41,-70,43 is the bitstring ‘01100101111′: however, because bboxes are not infinitely precise (/infinitely small) as points are, there is no way to assign precision to the string. This means that the ‘cutoff’ results in queries that don’t match up:

>>> a = str(geohash.Geostring((-72.01,40.99)))
>>> b = str(geohash.Geostring((-69.99,43.01)))
>>> c = str(geohash.Geostring((-72,41)) + geohash.Geostring((-70,43)))
>>> c >= a
False
>>> c <= b
True

Clearly, these should both return true: the bounding box of C is contained within the points, but because of the ‘infinite accuracy’ of points, this doesn’t actually work.

Instead, we need to add an additional mode to each of our bits: instead of just storing 0 or 1, we need to store an ‘inbetween’ bit to indicate that there is ‘no known data’. (This fixes the problem that the null string is less than 0.) In order to do sorting properly, we need this to be between ‘0′ and ‘1′, so we instead move to ‘0′, ‘2′, and ‘1′ for inbetween. We call this class the Geoindex class. For our ‘c’ above, this turns 01100101111 into:

0220020222211111111111111111111111111111111111111111111111111111

Now, we do the same test as before, with the new Geoindex class:

>>> a = str(geohash.Geoindex((-72.01,40.99)))
>>> b = str(geohash.Geoindex((-69.99,43.01)))
>>> c = str(geohash.Geoindex((-72,41)) + geohash.Geoindex((-70,43)))
>>> c >= a
True
>>> c <= b
True

Hooray! It worked as we expect.

Geohash here is a public-domain Python module created by Schuyler Erle, available from Mapping hacks Code: Geohash. Schuyler was the brainchild behind all the Geohash work: I just told him what did and didn’t work :)

In this way, I was able to put together a geographic bounding box query, on top of Google App Engine, using a Geohash-like algorithm as a storage format, and use that query to power a FeatureServer Demo App Engine application, doing geographic queries of non-point features on top of App Engine/BigTable. Simply create a Geoindex object of the bounding box of your feature, and then use lower-left/upper-right points as bounds for your Geohash when querying.

(A later post will detail how to set up your own copy of this App Engine app: I’m still working out a number of kinks with it.)

App Engine provides a number of interesting capabilities; for geographic applications, bounding box queries are very important, and using this solution, you can do queries against this type of data with some success.

OpenLayers 2.5 RC1

September 17th, 2007 by crschmidt

OpenLayers 2.5 RC1 has been released, according to the OpenLayers Blog. Anyone who is able to should test their applications with it in hopes of finding the remaining bugs…

Browser Based AtomPub GIS Client

September 6th, 2007 by crschmidt

Q: What do you get when you combine:

  • Atom
  • GeoRSS
  • FeatureServer
  • OpenLayers

A: The world’s first Open Source Browser-Based Atom Pub GIS Client.

FlickrBrowse: Flickr + FeatureServer in Action

June 5th, 2007 by crschmidt

One of the cool things about FeatureServer is its ability to load data from other APIs, even when those APIs are not really ‘geo’ related. The main example of this in the existing code is the Flickr datasource, which allows you to load data from Flickr as all the formats that FS supports, like WFS.

Last night, I sat down with the intention of using this in some way. I was able to hack together half of what I wanted in about an hour, and I hacked together the other half this morning in another hour.

Announcing…

FlickrBrowse: Browsing the World’s Interesting Flickr Images via OpenLayers

Using Flickr’s ‘interestingness’ sort, I’ve put together a map of the recent flickr geo photos, ordered by interestingness. You can limit photos by username/user id, and clicking the photos gives you little infobubbles with a link to the original photo.

The code is quite simple (view source to see it), and it provides a relatively cool service.

I think this is the kind of thing that OpenLayers and FeatureServer are all about: making it easier to integrate disparate datasets into a single interface. In this case, doing the transformation on the client would be difficult, so we do it on the server, and the client sees it as ‘just another WFS’.

Where 2.0: Stamen Design

May 29th, 2007 by crschmidt

Stamen Design just presented a lot of really cool visualization demos. I’m always happy to see people doing something a bit more than points, and even more than lines and polygons: they’re making really fun looking maps with map data.

Hindsight

Where 2.0, Morning 1

May 29th, 2007 by crschmidt

Schuyler, from here at MetaCarta: Mapping the Maximum City. How architectures of participation make it possible to create the geodata that we *all* need. OpenStreetMap leading the way with tag-style geodata creation, but long way to go for all of us. We need to solve the problems of a structured data geowiki, and there’s lots of work to do, and it will help everyone from the slumdwellers of mumbai to the residents living under the OSGB reign.

Topix: Creating local news pages by tagging documents on the web. “We don’t do NLP” — describing the different ways to disambiguate, from city mayor names, to names that residents attibute to themselves (Bay Stater, Nutmegger). Creating 32000 zip-code based news pages.

Google:  Google Street Map View, Google Mappletts, now indexing and providing search access to GeoRSS feeds (as of this morning).

Cool Map Tools — MetaCarta + OpenLayers Demo

May 27th, 2007 by crschmidt

This week, John presented at the MetaCarta Users Group meeting, showing off the integration between OpenLayers, FeatureServer, and MetaCarta APIs.

MetaCarta + OpenLayers: Cool Map Tools

Of course, this video will also be available from the MetaCarta Labs on a Stick :)

Check it out!

Labs on a Stick: 0.5

May 27th, 2007 by crschmidt

By the time I finished working on the USB drive distribution last night, I realized that there was actually a lot more that I wanted to share (and I still have about 50MB free on the USB drive). The software I put on it had gradually changed from being “FeatureServer demo and associated parts” to being “MetaCarta Labs demos”. The version I published last night took that into account insofar as the actual software included in the distribution, but not in the frontend and supporting materials.

To that end, I spent today rethinking how I wanted to do it, and I think I’ve come up with something I like better. You can check out the new labs-on-a-stick is available from the Labs on a Stick homepage, as a .tar.gz and a .zip file. This distribution is much bigger — ~9MB — because it includes video demonstrations of OpenLayers and MetaCarta’s geographic search engine. In addition, the homepage (instead of being a FeatureServer demo) is a list of all the things available, including four demos, software downloads, and video presentations.

Although this is probably not ideal for downloads by users on dialup, that’s not the target of the distribution, so for now I’m happy with that.

Now I can go back to concentrating on picking out what URLs I want to demo for my OpenLayers presentation. Hopefully (assuming I get it done in time), I’ll have that made into a video too, adding more material onto this nifty USB distribution.

Power of Geographic Search

May 27th, 2007 by crschmidt

Oftentimes, it’s not entirely clear to users of software like OpenLayers, TileCache, and FeatureServer what MetaCarta does.

Recently, John Frank, MetaCarta’s CTO (and my boss) put together a short (1m30s) screencast, demonstrating the power of what MetaCarta really does: The Power of Geographic Search. Schuyler graciously provided narration to the video.

This video demonstrates MetaCarta’s powerful text + geographic search engine, indexing millions of documents and responding in the blink of an eye. Using this search tool, it’s possible to do a full text and geographic search over millions of documents, finding everything written about an area even if you don’t happen to know the names of all the several hundred thousand places nearby.

As Schuyler’s voiceover says: “MetaCarta provides a powerful filter for finding everything written about any place.”

MetaCarta Labs on a Stick includes this video, and also a geographic search demonstration, showing how to use the GeoSearchCore download available from the MetaCarta Developers Downloads to add geographic text search to any OpenLayers Map. Of course, if you find this interesting, the next step is to read more about it on the official GTS Product Information page, and of course, contact us –  but even if you don’t, now you know a bit more about what MetaCarta does.

Labs on a Stick

May 26th, 2007 by crschmidt

So, right after FeatureServer was released, I realized I finally had the full stack I had an interest in handing out to users:

However, I know that one of the things that makes the use of things like FeatureServer a bit more difficult is that it depends on the installation of a number of Python modules for full usage. The GeoJSON service, for example, requires simplejson to be installed, and if you’re on Windows, even getting Python up and running in a sane way can be a bit of a challenge.

The solution? MetaCarta Labs on a Stick.

A collection of distributions for the latest in MetaCarta Labs software, including OpenLayers, TileCache, FeatureServer and GeoSearchCore, MetaCarta’s geographic text search addon for OpenLayers. A built in HTTP server that will run on Windows, OS X and Linux with nothing more than double clicking an icon or starting up a shell script. A set of tile data that will allow you to browse your local vectors over vmap0 or blue marble satellite data — without even being connected to the net.

And when I say ‘will run on Windows’, I mean it. I took a freshly formatted Windows machine, plugged the USB stick in, double clicked the ‘run_windows.bat’ icon, and I had a web server running — with no connection to the internet, I was able to draw features, save them in FeatureServer, read them as GeoRSS, and edit them as KML.  (Note that this is with the USB drive version, which is different from the ‘preliminary distribution’ on the website.)

I’m going to have 14 of these things with me in California next week. (The 15th is staying on my keyring. ;) ) At least some of them are likely going to be used as prizes for a competition that I’ll announce during my OpenLayers presentation at Where 2.0, on Tuesday at 4:30.

You can try out a less complete version of the Labs on a Stick distribution via the preliminary download, but for a demo of the ‘real thing’, be sure to find me at Where 2.0 or WhereCamp.

I’d like to point out that this is not ‘try before you buy’ software, as was mistakenly reported last night: the software on this distribution is provided free of charge for all eternity. FeatureServer, TileCache and OpenLayers are all open source software released under BSD-like licenses. The GeoSearchCore distribution has a slightly more restrictive terms of use, but not one that limits the use of the software — it just means that to get your search results, you have to come to MetaCarta. (Who else does super-fast geographic text search anyway?)

Everyone can thank James Fee for egging me on to actually get this out the door — I was going to spend the weekend on it, but somebody had to be impatient… but it seems like he’s pretty happy with what he’s got.