RSS GeoTagger
MetaCarta's geotagging software parses content, extracts geographic references, and resolves the geographic meaning intended by the author. One of the ways of accessing regularly updated user generated content is via RSS -- and with the MetaCarta Labs RSS GeoTagger, you can now combine the MetaCarta GeoTag API with your RSS feeds.
Input
- HTTP Method: GET
- Base URL: http://labs.metacarta.com/rss-geotagger/tag/
- Parameters:
- url (Required): the URL to an RSS feed
- min_geoconf (Optional): The minimum geoconfidence cutoff for georss:point elements
- include_tagged_content (Optional): Set to 'true' to get an mc:TaggedContent element which includes the actual content tagged, for indexing with the AnchorStart/AnchorEnd properties.
Example Input
An Example URL:
http://labs.metacarta.com/rss-geotagger/tag/?url=http://labs.metacarta.com/blog/feed/&min_geoconf=0.6
Try It
To try it out, simply enter a URL in this form:
Output
The geo output is in two parts: the raw GeoMarkup returned by the MetaCarta GeoTagger, and a simplified form embedded in GeoRSS point objects, with attributes.
Example Output
The following is example output for the blog entry from MetaCarta Labs about Where 2.0:
<mc:GeoMarkup xmlns:mc="http://labs.metacarta.com/rss-geotagger/ns/1.0/"
mc:CreatedBy="MetaCarta GeoTagger v4.0.0"
mc:CreatedOn="Thu May 28 21:33:02 2009 GMT"
mc:OriginalByteLength="3689" mc:Language="eng"
mc:InputEncoding="ASCII">
<mc:GeoTag mc:TagType="GEO" mc:Confidence="0.133588">
<mc:TextExtent mc:AnchorStart="1702" mc:AnchorEnd="1706" mc:Anchor="ARRA">
<mc:TextComponent mc:AnchorStart="1702" mc:AnchorEnd="1706" mc:Anchor="ARRA"/>
</mc:TextExtent>
<mc:Disjunct mc:Weight="0.937013" mc:FeatureType="DOT"
mc:GazetteerID="MetaCarta Gazetteer v4.0.0">
<mc:Conjunct mc:Class="P" mc:Type="PPL" mc:Population="37432"
mc:Country="IV" mc:CountryConfidence="0.125174">
<mc:Dot mc:Latitude="6.67" mc:Longitude="-3.97" mc:DotWeight="1.0"/>
</mc:Conjunct>
</mc:Disjunct>
</mc:GeoTag>
<mc:GeoTag mc:TagType="GEO" mc:Confidence="0.997204">
<mc:TextExtent mc:AnchorStart="2158" mc:AnchorEnd="2171" mc:Anchor="San Francisco">
<mc:TextComponent mc:AnchorStart="2158" mc:AnchorEnd="2171"
mc:Anchor="San Francisco"/>
</mc:TextExtent>
<mc:Disjunct mc:Weight="0.987091" mc:FeatureType="DOT"
mc:GazetteerID="MetaCarta Gazetteer v4.0.0">
<mc:Conjunct mc:Class="P" mc:Type="PPL" mc:Population="732072"
mc:Country="US" mc:CountryConfidence="0.984331" mc:Province="US06"
mc:ProvinceConfidence="0.984331">
<mc:Dot mc:Latitude="37.77" mc:Longitude="-122.45" mc:DotWeight="1.0"/>
</mc:Conjunct>
</mc:Disjunct>
</mc:GeoTag>
<mc:GeoTag mc:TagType="GEO" mc:Confidence="0.990725">
<mc:TextExtent mc:AnchorStart="2234" mc:AnchorEnd="2241" mc:Anchor="Estonia">
<mc:TextComponent mc:AnchorStart="2234" mc:AnchorEnd="2241" mc:Anchor="Estonia"/>
</mc:TextExtent>
<mc:Disjunct mc:Weight="1.000000" mc:FeatureType="DOT"
mc:GazetteerID="MetaCarta Gazetteer v4.0.0">
<mc:Conjunct mc:Class="A" mc:Type="PCLI" mc:Population="1408556"
mc:Country="EN" mc:CountryConfidence="0.990725">
<mc:Dot mc:Latitude="59" mc:Longitude="26" mc:DotWeight="1.0"/>
</mc:Conjunct>
</mc:Disjunct>
</mc:GeoTag>
</mc:GeoMarkup>
<georss:point
xmlns:mc="http://labs.metacarta.com/rss-geotagger/ns/1.0/"
xmlns:georss="http://www.georss.org/georss"
mc:Anchor="San Francisco" mc:AnchorStart="2158" mc:AnchorEnd="2171"
mc:Country="US" mc:Province="US06"
mc:Class="P" mc:Type="PPL">37.77 -122.45</georss:point>
<georss:point
xmlns:mc="http://labs.metacarta.com/rss-geotagger/ns/1.0/"
xmlns:georss="http://www.georss.org/georss"
mc:Anchor="Estonia" mc:AnchorStart="2234" mc:AnchorEnd="2241"
mc:Country="EN" mc:Class="A"
mc:Type="PCLI">59 26</georss:point>
</item>
Here, we can see both the GeoMarkup -- placed inside the MetaCarta namespace -- and the high confidence extents, which are placed in individual georss:point objects.
The GeoMarkup format is described in the MetaCarta Appliance GeoTagger API Guide. The AnchorStart/AnchorEnd properties are byte indexes into the indexed content -- which is the content of the RSS feed. When parsing, we do our best to find the longest content to tag -- we search through content, description, and summary tags in order to find it.
