Working with GIS Data in Python: Part II — Projections, Buffers, and Maps You Can Click On

Mic
Jun 16
6 min read

A basic problem you might run into with GIS is different types of measurements. For example if you look at geographical coordinates in terms of degrees, you run into the problem that 0.005 degrees is not 500 metres in most places.

Even worth, this will catch up to you at the worst possible time, after the workflow is done and the final map prepared, since both measurements make sense and so buffer zones will look plausible. Still, they will be wrong.

In the previous post, we covered the basics of vector data: GeoPandas, Shapely, and the geometry-as-a-column data structure. In this part we a bit deeper: coordinate reference systems, projections, meaningful spatial analysis, and interactive maps with Folium and OSMnx.

This does sum up everything in this post rather well, well done ChatGPT.

The Coordinate Problem

We come to the first main issue, coordinates. In geographic coordinates, latitude and longitude, we measure angles. But herein lies the problem. One degree of longitude near the equator is roughly 111 kilometres, but one degree of longitude near the poles is nearly zero. The Earth is a sphere, or at least close enough to one, and coordinates are angular measurements on that sphere.

If you buffer a point by 0.01 degrees and call it a 1km radius, you are roughly right in Sydney. You are wrong in Helsinki. You are very wrong in Svalbard.

The solution is to project your data or in other words, to convert from angular coordinates on a sphere to flat coordinates in metres on a plane. The problem with this is simple, every projection distorts something (area, shape, distance, direction), and the right choice depends on where you are and what you are measuring.

The classic illustration: Greenland and Africa.

This creates two images, the first the 'standard' projection.

Using the 'standard' longitude/latitude coordinate system.

As one can see, Greenland looks pretty big in this projection, between a third and half the size of Africa. But if we look at the second type of projection we get the following image.

This second one is using the 'ESRI:54009' CRS, here CRS stands for coordinate reference system and 'ESRI:54009' is the identifier. The 'standard' option is 'EPSG:4326' and also called 'WGS84 longitude/latitude'. We will not go into details here to specify what the numbers and letter combinations in the identifiers mean.

What is important for us is that the second projection shows the following simple fact much better: Africa is 14 times larger than Greenland. If we use a projection commonly used by web maps, Mercator or 'EPSG:3857' for its identifier, it gets even worse.

Mercator preserves the shape the best, but it distorts size extremely

Now Greenland looks bigger than Africa.

PyProj — Working with Coordinate Systems

GeoPandas does not make coordinate system changes itself, it uses PyProj under the hood for all CRS operations. Most of the time you interact with it through to_crs(). But similarly as with Shapely in the previous part, knowing what PyProj can do directly is useful.

The setup is quite straight forward, you initialize two coordinate systems, create a transformer and then transform one into the other. Of note here is the paramater always_xy=True. In case of a CRS that uses latitude and longitude, the value for latitude often comes before longitude, bascially Y before X, which is geographically the correct usage but counterintuitive. The flag always_xy=True forces longitude-first, matching how most people think about coordinates, even if this is not always correct in a geographical sense.

Buffers Done Right

Now we come back to the opening problem.

This will produce as its first image, an area around each of the sample cities that is a 10 degree radius.

Using degrees for the buffer area. It looks good.

This looks good at first, but then we remember that the projection is not faithful with respect to size, so these should not have the same size. The package will usually give you a warning for this use of a buffer as well, as it is most likely not intended. The correct ones with a buffer given in kilometers can be seen in the second generated image.

The further away from the equator the more distorted this looks.

Now the areas are not even circles anymore if we are far enough away from the equator to see it. This also shows how distorted the projection is, far away from the equator.

The workflow is always: project to a system using meters → operate in metres → re-project back if needed for display.

OSMnx — The City as a Graph

OpenStreetMap is the world’s largest open geographic database. OSMnx is the library that makes it useful from Python.

It downloads street networks, buildings, and points of interest from OpenStreetMap and gives you back NetworkX graphs and GeoDataFrames. The main use case is urban network analysis: routing, walkability, isochrones.

This example takes a point in Amsterdam and asks for the walkable routes in a 2km radius. The printed parts at this point will look something like this

Nodes (intersections): 6536
Edges (street segments): 18934
Total street length: 397.1 km
Average street segment: 42 m

Giving you an idea of how many intersections there are in the area, how many independent street segments and how long they are. For the image we get the following.

All the walkable areas within 2km of Amsterdam Dam Square.

The canal ring structure of Amsterdam shows up immediately. That is not drawn, it is the actual OSM road network rendered as a graph.

Shortest path between two points:

This will produce a path tracing the shortest walking route between two locations on our map from above

A simple red line tracing the shortest walking route.

Note that it is not obvious from the example, but you will need to import scikit-learn for this, as you are searching in a non-embedded graph.

Isochrone — how far can you walk in 10 minutes?

We load the graph as before, then we add the average travel time to the edges, i.e. to the street segments. We use the central station as our origin node. When it comes to the plot, we start by plotting all the edges, i.e. the street network. Then we use Networkx package to get a subgraph that tells us how far we can go in the corresponding time. Then create a polygon out that and plot it. The result looks something along this line, after some additional cleanup.

The shaded areas tells us how far we can walk from Central Station in 5, 10, and 15 minutes

The isochrone is not a circle, it follows the actual street network and is blocked by canals and constrained by bridge locations. That is the real answer to “how far can I walk in 10 minutes from here.”

When not to use OSMnx: Large cities with dense networks take a while to download, in those cases you probably want to cache them with ox.save_graphml(). To be fair, this might be a good idea in general. OSMnx is also specifically a street network tool. For general OSM feature queries (buildings, parks, water bodies), use ox.features_from_place(), the function rather than graph-based API.

Folium — Interactive Maps

Static matplotlib plots are for analysis, the package Folium is for when someone else needs to use the map.

Folium wraps Leaflet.js, the library behind most interactive web maps, with a Python API. The output is an HTML file that opens in any browser. Tooltips on hover, popups on click, zoom and pan built in.

The result is a zoomable, hoverable world map. Mouse over any country and you see its name, GDP per capita, and population. The result would look something like this.

Browser screenshot from a zoomed in view.

There is no need for any additional JavaScript, but of course this does come with some limitations as you are using a pre-defined wrapper and do not have full control over everything.

Adding markers:

This will add markers to the different cities in our list with a radius scaled by the population. A part of it would look as follows.

The markers are proportional to the population of the city.

When not to use Folium: The package is clearly for display, not analysis. Folium does not do spatial joins, projections, or geometry operations. It is really a visualization tool, think of it as an interactive version of matplotlib rather than a different geographic tool.

The Workflow

Together with the previous part we thus arrive at the following pipeline.

gpd.read_file()           # Load vector data
  → .to_crs(epsg=...)     # Project to metric CRS
    → .buffer(500)        # Buffer in metres
      → gpd.sjoin(...)    # Spatial join
        → .to_crs(4326)   # Re-project for display
          → folium.Map()  # Interactive map

The projection step is really important in here, as it makes sure that your usual metric measurements are correctly represented. It is also the one that is easily forgotten, in which case the results tend to look nicer, even though imprecise or flat out wrong.

In the next part we want to look at raster data. The opposite of shapes or polygonal data. Instead of describing how our shapes look like, it describes values on a grid.