Mapnik - The Missing Manual

3.2.1. Core data sources

The core plugins are part of the mapnik source code itself, and usually avaliable in all builds of the mapnik library. (TODO: add link to mappnik github).

CSV

The CSV plugin reads simple column separated data from a file when specified using the file parameter, or directly from the XML style file when using the inline paramter. In the later case all lines following the inline parameter tag will be read as CSV input until the closing paramter tag is reached. In case that the inline data contains <, > or & characters, you should enclose it in a <![CDATA[…]]> section to prevent the content from being interpreted as XML.

When giving a file path, this is taken as relative to the directory the style file is in, unless a base parameter is given. In that case a relative file path will be interpreted as relative to the directory path given in the <FileSource> of that base name.

Processig performance can be improved by creating an additional .index index file using the [mapnik-index] tool.

Example 5. CSV data source examples

<!-- read from file path/to/file.csv -->
<DataSource>
  <Parameter name="type">csv</Parameter>
  <Parameter name="file">path/to/file.csv</Parameter>
</DataSource>

<!-- read inline data -->
<DataSource>
  <Parameter name="type">csv</Parameter>
  <Parameter name="inline"><![CDATA[
lat,lon,text
52.0,8.5,"Bielefeld"
  ]]></Parameter>
</DataSource>

By default the CSV plugin tries to identify the field delimiter by looking at the first line of the file, checking for , ; | and the TAB character. Whatever of these characters seen the most often is considered the separator character, unless you specifcy a different one with the separator Parameter explicitly, e.g: <Parameter name="separator">;</Parameter>.

In cases where the data does not contain a header line, one can be given as content of the headers parameter.

The default quoting and escape characters are " and \, but can be changed with the quote and escape parameters.

Line endings are auto detected, too, so files with DOS/Windows (\r\n), Linux/Unix (\n) or MacOS (\r) style line endings are read correctly out of the box.

The CSV plugin assumes that the data it reads is UTF-8 encoded, a different encoding can be specified using the encoding parameter.

Column data can be referred to by the columns header name, using [column_name] placeholders in expressions. The following column names have a special meaning and are used to retrieve actual geometry data for a line:

lat or latitude: Point latitude
lon, lng, long, or longitude: Point longitude
wkt: Geometry data in Well Known Text format
geojson: Geometry data in GeoJSON format

So each input file either needs a lat/lon column pair, or either a wkt or geojson column to be used as a Mapnik data source.

When parsing the header line fails, or no geometry column(s) can be detected in it, the plugin will print a warning by default, and not return any data. When the strict parameter is set to true, style processing will be terminated completely by throwing a Mapnik exception.

Table 2. CSV data source parameters
Parameter	Type	Default	Description
`encoding`	string	utf-8	Text encoding used in the CSV data
`row_limit`	int	none	Read only this many data rows, ignore the rest.
`headers`	string	none	Header names if the file contains none on the first line
`strict`	boolean	false	Terminate Mapnik on hitting errors?
`quote`	char	`"`	Quote character used for string columns in the data
`escape`	char	`\`	TODO: does this even really exist?
`separator`	char	auto detected	Field separator, typically `,`, `;`, `\|` or `TAB`
`extent`	4xfloat	none	ignore data that is completely outside this extent bounding box
`inline`	text	none	CSV data to be read directly from the style file
`file`	file path	none	path of CSV file to read
`base`	string	none	name of a `<FileSource>` to find the input file in

TODO

.index file support? See also mapnik-index utility
NULL handling?

Gdal

Table 3. Gdal data source parameters
Parameter	Type	Default	Description
`band`
`base`	string	none	name of a `<FileSource>` to find the input file in
`extent`
`file`
`max_image_area`
`nodata`
`nodata_tolerance`
`shared`

GeoJSON

While the GeoJSON format is also supported by the OGR input plugin, a direct native GeoJSON plugin was added for performance reasons for this more and more common format.

Processig performance can be improved by creating an additional .index index file using the [mapnik-index] tool.

Table 4. GeoJson data source parameters
Parameter	Type	Default	Description base
`base`	string	none	name of a `<FileSource>` to find the input file in
`cache_features`	boolean	true
`encoding`	string	utf-8	Encoding used for textual informatin
`file`	file path	none	Path of a GeoJSON file to read for input.
`inline`	string	none	Inline GeoJSON data as part of the stylefile itself
`num_features_to_query`	int	5	How many features of a feature set to read up front to determine what property names exist in the data

Example 6. GeoJSON data source example

<?xml version="1.0" encoding="utf-8"?>
<Map background-color='white'>

  <Style name="style">
    <Rule>
      <PointSymbolizer file="symbols/[file]"/>
    </Rule>
  </Style>

  <Layer name="layer">
    <StyleName>style</StyleName>
    <Datasource>
      <Parameter name="type">geojson</Parameter>
      <Parameter name="inline"><![CDATA[
{
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "properties": {
                "file": "dot.svg"
            },
            "geometry": {
                "type": "Point",
                "coordinates": [1, 1]
            }
        },
        {
            "type": "Feature",
            "properties": {
                "file": "bug.svg" 
            },
            "geometry": {
                "type": "Point",
                "coordinates": [2, 1]
            }
        }
    ]
}
      ]]></Parameter>
    </Datasource>
  </Layer>

</Map>

OGR

The OGR input plugin supports a large number of different vector formats via the GDAL/OGR library. For a complete list of supported formats see the Vector Drivers list in the GDAL documentation.

The OGR plugin is typically used for GPX — for which no special input plugin exists — and OSM data — for which it replaced the older OSM plugin that has now been moved to the non-core-plugins repository and is usally not included in Mapnik binary builds anymore. So we’re going into details for these two data formats below only.

Table 5. OGR data source parameters
Parameter	Type	Default	Description
`base`	string	none	name of a `<FileSource>` to find the input file in
`driver`	string	auto detect	actual vector format driver to use
`encoding`	string	utf-8
`extent`
`file`	file path	none	path of input file to read
`inline`	string	none	inline vector file data to read directly from style file
`layer`	string	none	name of the input layer to actually process
`layer_by_index`	int	none	number of the input layer to actually process
`layer_by_sql`
`string`	string	none	alias for `inline`

OGR GPX

The GPX backend reads GPX XML files and provides the contained data via the following five layers:

routes: Returns routes from the GPX files <rte> tags as lines. Each route is given an extra route_id attribute.
tracks: Returns tracks from the GPX files <trk>/<trkseg> tags as multilines. Each track is given an extra track_id attribute.
route_points: Returns <rtept> route points from all routes, with an extra route_fid filed referring to the route_id of the route that a point belongs to.
track_points: Returns <trkpt> track points from all tracks, with extra track_fid and track_seg_id attributes added.
waypoints: Returns a compbination of all route and track points.

Any extra tags that a route, track or point may have, like <name> or <ele> (for eleveation), can be accessed in filter expressions and symbolizers by name, e.g. as [name] or [ele].

Example 7. OGR GPX data source example

Show a marker for all GPX points with a non-empty <name> tag.

<Style name="named_point">
  <Rule>
    <Filter>not ([name] = null or [name] = '')</Filter>
    <PointSymbolizer file="marker.svg"/>
    <TextSymbolizer face-name="DejaVu Sans Book" size="10" placement="point">[name]</TextSymbolizer>
  </Rule>
</Style>

<Layer>
  <StyleName>named_point</StyleName>
  <Datasource>
    <Parameter name="type">ogr</Parameter>
    <Parameter name="driver">gpx</Parameter>
    <Parameter name="file">file.gpx</Parameter>
    <Parameter name="layer">waypoints</Parameter>
  </Datasource>
</Layer>

For more details see the original GDAL documentation for the GPX backend

OGR OSM

The OGR plugin can read uncompressed OSM XML data andt the more compact, but not human readable, PBF format. File formats are auto detected when using the .osm or .pbf file extensions. When using files with other extensions, like e.g. .xml for OSM XML, the driver parameter needs to be set to osm explicitly.

The OSM backend provides data in the following five layers:

points: Nodes that have significant tags attached.
lines: Ways that are recognized as non-area.
multilinestrings: Relations that define a multilinestring (type=multilinestring or type=route).
multipolygons: Ways that are recognized as areas and relations that form a polygon or multipolygon (e.g. type=multipolygon or type=boundary)
other_relations: Relations that are not in multilinestrings or multipolygons

Example 8. OGR OSM data source example

<Datasource>
  <Parameter name="type">ogr</Parameter>
  <Parameter name="driver">osm</Parameter>
  <Parameter name="file">ways.osm</Parameter>
  <Parameter name="layer">lines</Parameter>
</Datasource>

While rendering OSM data directly can work out OK for small amounts of data the usually preferred way to present OSM data is to import it into PostGIS using either the osm2pgsql or imposm import tool first, and then to use the PostGIS Datasource. This requires some extra effort up front, but performs better on larger data sets, and allows for more sophisticated preprocessing of the OSM input data than the few fixed rules statically built into the OGR OSM backend.

For more details see the original GDAL documentation for the OSM backend

PgRaster

Table 6. PgRaster data source parameters
Parameter	Type	Default	Description
`autodetect_key_field`
`band`
`clip_rasters`
`connect_timeout`
`cursor_size`
`dbname`
`estimate_extent`
`extent`
`extent_from_subquery`
`host`
`initial_size`
`intersect_max_scale`
`intersect_min_scale`
`key_field`
`password`
`persist_connection`
`port`
`prescale_rasters`
`raster_field`
`raster_table`
`row_limit`
`srid`
`table`
`use_overviews`
`user`

PostGIS

Table 7. PostGIS data source parameters
Parameter	Type	Default	Description
`Connection parameters`
`host`	string	none	PostgreSQL server host name or address
`port`	string	none	PostgreSQL server port
`user`	string	none	Database user
`password`	string	none	Database user password
`dbname`	string	none	Name of database to use
`connect_timeout`	int	4	Connect timeout in seconds
`persist_connection`	boolean	true	Reuse connection for subsequent queries
`Other parameters`
`autodetect_key_field`	boolean	false	Whether to auto detect primary key if none is given in `key_field`
`cursor_size`	int	0	Fetch this many features at a time, or all when zero.
`estimate_extent`	boolean	false	Try to estimate the extent from the data retrieved
`extent`	floatx4	none	Extent bounding box
`extent_from_subquery`	boolean	false
`geometry_field`	string	none	The result field that the geometry to process is in. Auto detected when not given.
`geometry_table`	string	none	Name of table geometry is retrieved from. Auto detected when not given, but this may fail for complex queries.
`initial_size`	int	1	initial connection pool size
`intersect_min_field`	int	0
`intersect_max_field`	int	0
`key_field`	string	none	Primary key field of table geometry is retrieved from. Auto detected when not given and `autodetect_key_field` is true.
`key_field_as_attribute`	boolean	true
`max_size`	int	10	Max. connection pool size
`max_async_connections`	int	1	Run that many queries in parallel, must be ⇐ `max_size`
`row_limit`	int	0	Only return this many features if > 0
`simplify_dp_preserve`	boolean	false
`simplify_dp_ratio`	float	1/20
`simplify_geometries`	boolean	false
`simplify_prefiter`	float	0.0
`simplify_snap_ratio`	float	1/40
`srid`	int	0	SRID of returned features, auto detected when zero
`table`	string	none	Name of a table, or SQL query text
`twkb_rounding_adjustment`	float	0.0
`twkb_encoding`	boolean	false

Aside from the basic PostgreSQL connection parameters (host, port, user, password, dbname), you can also add additional connection parameter keywords in the host or dnname parameter (probably the others, too, but this I didn’t test yet) for more fine grained connection control.

You can e.g. set a datasource specific application name with this:

<Parameter name='host'>localhost application_name=my_sytle</Parameter>

Or set a specific schema search path:

<Parameter name='host'>localhost options='-c search_path=foo,public'</Parameter>

Probably most important though, this allows for using SSL/TLS. In it’s most basic form you’d just enforce SSL/TLS encryption being used:

<Parameter name='host'>localhost sslmode=require</Parameter>

The PostGIS datasource supports two different methods to return data to Mapnik: in regular well known binary (WKB) or — with PostGIS v2.2 or later — tiny well known binary (TWKB) format. This is controlled by the twkb_encoding option.

When using TWKB the twkb_rounding_adjustment parameter then controls the resolution the TWKB aims for. A value of 0.5 would lead to a coarseness of about one pixel, the default of 0.0 would be in the range of 0.05 to 0.2 pixels usually. This is done by using the twkb_rounding_adjustment parameter to calculate the tolerance paramter for ST_Simplify() and ST_RemoveRepeatedPoints(), and the decimaldigits_xy parater for ST_AsTWKB()

When using WKB (the default), simplification can be controlled via simplify_geometries, simplify_snap_ratio, simplify_dp_preserve, simplify_dp_ratio, simplify_prefilter, simplify_clip_resolution parameters. (TODO: describe in more detail)

simplify_clip_resolution is use for both formats, and controls at what map scale geometries start getting clipped to the rendering window when non-zero.

The following special tokens can be used in SQL queries, and will be replaced b the actual mapnik values for the current render request:

!bbox!: the map bounding box
!scale_denominator!: the current scale denominator
!pixel_width!,!pixel_height!: width and height of pixels (TODO: depens on STR, is ° with latlon and meters with google mercator?)

Raster

Table 8. Raster data source parameters
Parameter	Type	Default	Description
`base`	string	none	name of a `<FileSource>` to find the input file in
`extent`
`file`
`format`
`hix`
`hiy`
`lox`
`loy`
`multi`
`tile_size`
`tile_stride`
`x_width`
`y_width`

Shape

The shape input plugin can read the ESRI shapefile format. The OGR plugin also supports shapefiles, but the shape plugin has more direct support for this. It is also better maintained and tested.

Shapefiles are often used instead of databases for data that doesn’t change that often, or where data available in a database requires some preprocessing. Common examples are boundaries, coastlines, and elevation countour lines.

OpenStreetMap or example provides land polygons, water polygons, coastlines, and antarctic ice sheet polygons and outlines as regularily updated shapefiles on the OsmData Download Server. Due to the way large bodies of land and water are constructed by grouping individual coast line segments into polygon relations in OSM, there’s always a risk of such lines not really being closed polygons. The OSM shapefiles are generated by extracting and aggregating the line segments data, and are only published when containing no unclosed polygons.

Another often used source of shapefiles is Natural Earth, which provides public domain geo data for lots of physical and cultural features.

Shapefile processing performance can be increased by creating an index file using the [shapeindex] tool that is included in the Mapnik source code, and usually also in binary distribution pacakges.

Table 9. Shape data source parameters
Parameter	Type	Default	Description
`file`	file path	none	shapefile path, `.shp` extension is optional
`base`	string	none	name of a `<FileSource>` to find the input file in
`encoding`	string	utf-8	encoding used for text fields in the shapefile
`row_limit`	int	none	maximum number of rows to process

Example 9. Shape data source example

<?xml version="1.0" encoding="utf-8"?>
<Map background-color='blue' srs='epsg:4326'>

  <Style name="countries">
    <Rule>
      <PolygonSymbolizer fill="green"/>
    </Rule>
  </Style>

  <Layer name="countries">
    <StyleName>countries</StyleName>
    <Datasource>
      <Parameter name="type">shape</Parameter>
      <Parameter name="file">data/ne_110m_admin_0_countries.shp</Parameter>
    </Datasource>
  </Layer>

  <Style name="gridlines">
    <Rule>
      <LineSymbolizer stroke="black" stroke-width="0.1"/>
    </Rule>
  </Style>

  <Layer name="grid">
    <StyleName>gridlines</StyleName>
    <Datasource>
      <Parameter name="type">shape</Parameter>
      <Parameter name="file">data/ne_50m_graticules_10.shp</Parameter>
    </Datasource>
  </Layer>

</Map>

SQLite

Table 10. SQLite data source parameters
Parameter	Type	Default	Description
`attachdb`
`auto_index`
`base`	string	none	name of a `<FileSource>` to find the input file in
`encoding`
`extent`
`fields`
`file`
`geometry_field`
`geometry_table`
`index_table`
`initdb`
`key_field`
`metadata`
`row_limit`
`row_offset`
`table`
`table_by_index`
`use_spatial_index`
`wkb_format`

TopoJson

Table 11. SQLite data source parameters
Parameter	Type	Default	Description
`base`	string	none	name of a `<FileSource>` to find the input file in
`encoding`
`file`
`inline`