## check if those are installed
pip install geoviews
8 GPS Traces and Animations
The Lecture slides can be found here.
This lab’s notebook can be downloaded from here.
import pandas as pd, geopandas as gpd
import movingpandas as mpd
import os
from shapely.geometry import Point, LineString, Polygon
from matplotlib import pyplot as plt
import movingpandas as mpd
import folium
import requests
from io import BytesIO
import warnings
'ignore') warnings.filterwarnings(
8.1 Getting GPS trajectories from OpenStreetMap Data
The OSM API provides access to GPS traces contributed by the OpenStreetMap community. Users can query and retrieve traces based on parameters like geographical bounding box, time range, and user ID. The API supports output formats such as GPX for easy integration into applications or analysis. You can have a look at the last GPS traces uploaded here. Each trace entry likely includes:
- Trace ID (unique identifier for the trace)
- Trace Name (optional name provided by the user)
- Uploader Username
- Upload Date
- Trace Summary (e.g., distance, duration)
It’s not possible to download traces from OSM using OSMnx
. One has to script their own code
Parts of this section of the notebook have been readapted from this notebook created by Anita Graser.
8.1.1 Downaloding GPS traces from the OpenStreetMap
We can try to get OSM traces around the Uni of Liverpool Campus. Feel free to change the area.
import osmnx as ox
# Define the place name
= "University of Liverpool, UK"
place_name
# Get the latitude and longitude
= ox.geocoder.geocode(place_name)
latitude, longitude
# Create the bbox
= ox.utils_geo.bbox_from_point((latitude, longitude), dist = 1500)
bbox bbox
(53.420732655032396,
53.39375304496761,
-2.9432044446907044,
-2.988462877347038)
We need the bbox_to_osm_format
function below to convert our bounding box coordinates into the format required by the OpenStreetMap (OSM) API. The input tuple contains coordinates in the order (north, south, east, west). The function rearranges these values into the string “west,south,east,north”, which is the format expected by OSM for API queries.
def bbox_to_osm_format(bbox):
"""
Convert bounding box coordinates to OSM API format.
Parameters
----------
bbox: tuple
A tuple containing the bounding box coordinates in the order (north, south, east, west).
Returns
-------
bbox_str: str
A string representing the bounding box in the format "west,south,east,north".
"""
= bbox
north, south, east, west = f"{west},{south},{east},{north}"
bbox return bbox
= bbox_to_osm_format(bbox) # needs to be {west},{south},{east},{north} for OSM Api
bbox bbox
'-2.988462877347038,53.39375304496761,-2.9432044446907044,53.420732655032396'
The get_osm_traces
function below retrieves GPS traces from OpenStreetMap within a specified bounding box, processing up to a user-defined maximum number of pages. It uses a while loop to fetch and parse GPS data into GeoDataFrames, querying the OSM API by incrementally updating the page number until no more data is available or the maximum page limit is reached.
Upon fetching the data, the function concatenates these individual GeoDataFrames into a single comprehensive GeoDataFrame. Before returning the final GeoDataFrame, it cleans the dataset by dropping a predefined list of potentially irrelevant or empty columns.
def get_osm_traces(max_pages = 2, bbox='16.18, 48.09, 16.61, 48.32'):
"""
Retrieve OpenStreetMap GPS traces within a specified bounding box.
Parameters
----------
max_pages: int, optional
The maximum number of pages to retrieve. Defaults to 2.
bbox: str, optional
The bounding box coordinates in the format 'west, south, east, north'. Defaults to '16.18, 48.09, 16.61, 48.32'.
Returns
-------
final_gdf: GeoDataFrame
A GeoDataFrame containing the retrieved GPS traces.
"""
= []
all_data = 0
page
while (True) and (page <= max_pages):
# Constructing the URL to query OpenStreetMap API for GPS trackpoints within a specified bounding box and page number
= f'https://api.openstreetmap.org/api/0.6/trackpoints?bbox={bbox}&page={page}'
url
# Sending a GET request to the constructed URL
= requests.get(url)
response
# Checking if the response status code is not 200 (indicating success) or if the response content is empty
# If either condition is met, the loop breaks, indicating no more data to retrieve
if response.status_code != 200 or not response.content:
break
# Reading the content of the response, which contains GPS trackpoints data, into a GeoDataFrame
# The 'layer' parameter specifies the layer within the GeoDataFrame where trackpoints will be stored
= gpd.read_file(BytesIO(response.content), layer='track_points')
gdf
if gdf.empty:
break
all_data.append(gdf)+= 1
page
# Concatenate all GeoDataFrames into one
= gpd.GeoDataFrame(pd.concat(all_data, ignore_index=True))
final_gdf
# dropping empty columns
= ['ele', 'course', 'speed', 'magvar', 'geoidheight', 'name', 'cmt', 'desc',
columns_to_drop 'src', 'url', 'urlname', 'sym', 'type', 'fix', 'sat', 'hdop', 'vdop',
'pdop', 'ageofdgpsdata', 'dgpsid']
= final_gdf.drop(columns=[col for col in columns_to_drop if col in final_gdf.columns])
final_gdf return final_gdf
We initially download 2 pages of GSP traces. We can try with higher numbers to get more data. More recent traces are downloaded first.
= 2 # pages of data,
max_pages = get_osm_traces(max_pages, bbox) gps_track_points
gps_track_points.head()
track_fid | track_seg_id | track_seg_point_id | time | geometry | |
---|---|---|---|---|---|
0 | 0 | 0 | 0 | 2023-09-24 10:41:39+00:00 | POINT (-2.98825 53.40690) |
1 | 0 | 0 | 1 | 2023-09-24 10:41:46+00:00 | POINT (-2.98793 53.40708) |
2 | 0 | 0 | 2 | 2023-09-24 10:41:47+00:00 | POINT (-2.98408 53.40931) |
3 | 0 | 0 | 3 | 2023-09-24 10:41:49+00:00 | POINT (-2.98391 53.40927) |
4 | 0 | 0 | 4 | 2023-09-24 10:41:50+00:00 | POINT (-2.98378 53.40933) |
= 0.5) # still are trackpoints still gps_track_points.plot(markersize
8.1.2 Transform the Point
GeoDataFrame in a LineString
GeoDataFrame with MovingPandas
MovingPandas
is a Python library designed for analyzing movement data using Pandas
. It extends Pandas
’ capabilities to handle temporal and spatial data jointly. It is the ideal library for trajectory data analysis, visualization, and exploration. MovingPandas provides functionalities for processing and analyzing movement data, including trajectory segmentation, trajectory aggregation, and trajectory generalization.
See examples and documentation at https://movingpandas.org
The following code creates a TrajectoryCollection
object by providing the GPS track points data, specifying the column that identifies trajectories, and indicating the column containing timestamps.
= mpd.TrajectoryCollection(gps_track_points, 'track_fid', t='time')
osm_traces print(f'The OSM traces download contains {len(osm_traces)} tracks')
The OSM traces download contains 14 tracks
for track in osm_traces:
print(f'Track {track.id}: length={track.get_length(units="km"):.2f} km')
Track 0: length=9.96 km
Track 1: length=3.02 km
Track 2: length=2.84 km
Track 3: length=2.72 km
Track 4: length=2.92 km
Track 5: length=3.29 km
Track 6: length=4.95 km
Track 7: length=4.12 km
Track 8: length=4.87 km
Track 9: length=1.21 km
Track 10: length=0.03 km
Track 11: length=0.00 km
Track 12: length=0.70 km
Track 13: length=0.36 km
Below we set up some visualization options for holoviews
. Holoviews is a library for creating interactive visualizations of complex datasets with ease. Holoviews provides a declarative approach to building visualizations, allowing you to express your visualization intent concisely. Holoviews combines the usage of Matplotlib, Bokeh, and Plotly and supports creating interactive visualizations. You can easily add interactive elements like hover tooltips, zooming, panning, and selection widgets to your plots. Compared to folium
, however, it does not allow for as much freedom and development outside the box. Folium is primarily designed for creating web-based maps with features like tile layers and GeoJSON overlays, offering a high degree of customization for map-based visualizations. Holoviews, on the other hand, focuses more on general-purpose interactive visualizations beyond just maps, offering a broader range of visualization options.
See examples and documentation at https://holoviews.org.
from holoviews import opts, dim
import hvplot.pandas
= {'linewidth':5, 'capstyle':'round', 'figsize':(9,3), 'legend':True}
plot_defaults =['wheel_zoom'], frame_width=500, frame_height=400))
opts.defaults(opts.Overlay(active_tools= {'tiles':None, 'cmap':'Viridis', 'colorbar':True} hvplot_defaults
Traces generalisation
The MinTimeDeltaGeneralizer
class from MovingPandas is applied to the osm_traces
object. This class is responsible for simplifying or generalizing the GPS traces based on a minimum time delta. In this case, traces are being generalized to a tolerance of one minute.
from datetime import datetime, timedelta
= mpd.MinTimeDeltaGeneralizer(osm_traces).generalize(tolerance=timedelta(minutes=1))
osm_traces ='OSM Traces', line_width=3, width=700, height=400) osm_traces.hvplot(title
Visualising by speed
0).add_speed(overwrite=True, units=("km","h"))
osm_traces.get_trajectory(0).hvplot(
osm_traces.get_trajectory(='Speed (km/h) along track', c='speed', cmap='RdYlBu',
title=3, width=700, height=400, tiles='CartoLight', colorbar=True) line_width
One can also convert the TrajectoryCollection
(MovingPandas’ object) to a LineString GeoDataFrame
. This would allow us to take advantage of what we learnt across the previous sessions. For example, we can plot through GeoPandas
and matplotlib
after having projected the layer.
= 'EPSG:27700'
crs = osm_traces.to_traj_gdf()
gps_tracks = gps_tracks.set_crs("EPSG:4326").to_crs(crs) gps_tracks
As you can see, from the map above, OSM traces may include also odd traces that jump from different locations of the city.
Exercise: Explore the gps_tracks
dataset and try to compute the average speed of each of the tracks. Try to understand, also using the gps_track_points
(do not forget to project it, in case), if you can find a way to filter out traces that follow very odd routes.
8.2 Animations in Folium
8.2.1 Animating GPS tracks
from folium.plugins import TimestampedGeoJson
Download this file in your data folder.
= gpd.read_file('../data/geolife_small.gpkg')
gdf 't'] = pd.to_datetime(gdf['t'])
gdf[= gdf.set_index('t').tz_localize(None)
gdf = None
gdf.index.name = 'trajectory_id', markersize = 1, cmap = 'Set1') gdf.plot(column
gdf['t'] = pd.to_datetime(gdf['t'])
: This converts the values in the ‘t’ column ofgdf
into datetime objects usingpd.to_datetime()
. This is useful for time-series data where ‘t’ likely represents time information.gdf = df.set_index('t').tz_localize(None)
: This sets the index to the columnt
and removes the timezone information usingtz_localize(None)
def geo_df_to_timestamped_geojson(geo_df, colour="blue", marker_type="LineString"):
"""
Convert a GeoDataFrame to a list of timestamped GeoJSON features.
Parameters
----------
geo_df: GeoDataFrame
The GeoDataFrame containing the geographic data.
colour:str
The color of the line or marker. Defaults to "blue".
marker_type: str
The type of marker to use, e.g., "LineString" or "Point". Defaults to "LineString".
Returns
-------
feature_list: list
A list of GeoJSON features, each representing a timestamped marker or line.
"""
= geo_df.copy()
df = df["geometry"].x.to_list()
x_coords = df["geometry"].y.to_list()
y_coords = [[x, y] for x, y in zip(x_coords, y_coords)]
coordinate_list = [i.isoformat() for i in df.index.to_list()]
times_list
= [
feature_list
{"type": "Feature",
"geometry": {
"type": marker_type,
"coordinates": coordinate_list,
},"properties": {
"times": times_list,
"style": {
"color": colour,
"weight": 3,
},
},
}
]
return feature_list
Obtaining features from the GeoDataFrame
so that we can plot them in folium
directly.
= geo_df_to_timestamped_geojson(gdf, colour="blue", marker_type="LineString") features
This TimestampedGeoJson
plugin allows us to draw data on an interactive folium map and animate it depending on a time variable (year, month, day, hour, etc) and supports any kind of geometry (points, line strings, etc). The only thing you have to do is to make sure to “feed” it with a geojson in the right format.
= gdf.unary_union.centroid
xy map = folium.Map(location=[xy.y, xy.x], zoom_start=10)
"type": "FeatureCollection", "features": features},
TimestampedGeoJson({="PT1S",
period=True,
add_last_point=10
transition_timemap)
).add_to(
map
The arguments of the TimestampedGeoJson
plugin are:
- The geojson with the data.
period
: It is the time step that the animation will take starting from the first value. For example: “P1M” 1/month, “P1D” 1/day, “PT1H” 1/hour, and ‘PT1M’ 1/minute.duration
: Period of time which the features will be shown on the map after their time has passed. If None, all previous times will be shown.
8.2.2 Other Animations with Folium
This part of the notebook has been readapted from this post.
We are going to use the data of New York’s Citi Bike-sharing service. This is publicly available and can be downloaded from here. In this notebook, we used the data for the month of March 2024. Place it in your data folder, you can download it from here.
# Import the file as a Pandas DataFrame
= pd.read_csv("../data/NY_march_bikeData.csv")
nyc nyc.head()
ride_id | rideable_type | started_at | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9F5EEFB7CF78C6EA | electric_bike | 2024-03-21 13:32:42 | 2024-03-21 13:35:42 | Stevens - River Ter & 6 St | HB602 | City Hall - Washington St & 1 St | HB105 | 40.743143 | -74.026907 | 40.737360 | -74.030970 | member |
1 | 01E2E1F218F9D303 | electric_bike | 2024-03-16 08:26:15 | 2024-03-16 08:47:53 | Pershing Field | JC024 | Pershing Field | JC024 | 40.742379 | -74.051996 | 40.742677 | -74.051789 | member |
2 | C2914D662B33AA55 | electric_bike | 2024-03-15 15:17:36 | 2024-03-15 15:32:12 | Pershing Field | JC024 | Pershing Field | JC024 | 40.742480 | -74.051942 | 40.742677 | -74.051789 | member |
3 | 6723F7CE51888493 | electric_bike | 2024-03-13 08:34:26 | 2024-03-13 08:50:08 | Pershing Field | JC024 | Pershing Field | JC024 | 40.742400 | -74.051963 | 40.742677 | -74.051789 | member |
4 | EAB7E29AAB546941 | electric_bike | 2024-03-15 15:37:58 | 2024-03-15 16:19:32 | Pershing Field | JC024 | Pershing Field | JC024 | 40.742459 | -74.051973 | 40.742677 | -74.051789 | member |
For every trip made we have:
started_at
: start date and time of the trip.ended_at
: end date and time the trip.start_station_id
,start_station_name
,start_lat
, andstart_lng
: all the useful info to identify and locate the start station.end_station_id
,end_station_name
,end_lat
, andend_lng
: all the useful info to identify and locate the end station of the trip.
type(nyc.loc[0, 'started_at'])
str
Transformation
The variables started_at
and ended_at
were imported as text (string). we are going to do the following:
- Change the type of these variables to datetime.
- Compute the duration of the trip in seconds.
- Define
started_at
as the data frame’s index. This will allow us to pivot (group) values easier by time of day. - Create a new column called
ty
to help us in the pivoting.
# Setting the right format for started_at and ended_at
'started_at'] = pd.to_datetime(nyc['started_at'])
nyc['ended_at'] = pd.to_datetime(nyc['ended_at'])
nyc['duration'] = (nyc['ended_at'] - nyc['started_at']).dt.total_seconds()
nyc[
# Define the startime as index and create a new column
= nyc.set_index('started_at')
nyc 'ty'] = 'station'
nyc[1) nyc.head(
ride_id | rideable_type | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual | duration | ty | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
started_at | ||||||||||||||
2024-03-21 13:32:42 | 9F5EEFB7CF78C6EA | electric_bike | 2024-03-21 13:35:42 | Stevens - River Ter & 6 St | HB602 | City Hall - Washington St & 1 St | HB105 | 40.743143 | -74.026907 | 40.73736 | -74.03097 | member | 180.0 | station |
Now, we need to feed TimestampedGeoJson plugin with the right data format. To do so, we will transform the data frame. First, we count the number of trips per hour that start at each station
# Aggregate number of trips for each start station by hour of the day
= nyc.pivot_table('duration',
start = ['start_station_id',
index 'start_lat',
'start_lng',
nyc.index.hour],= 'ty',
columns ='count').reset_index()
aggfunc
start.head()
ty | start_station_id | start_lat | start_lng | started_at | station |
---|---|---|---|---|---|
0 | 6786.02 | 40.714002 | -74.093255 | 12 | 1 |
1 | 6839.10 | 40.714002 | -74.093255 | 17 | 1 |
2 | 6948.10 | 40.714002 | -74.093255 | 14 | 1 |
3 | 6955.05 | 40.714002 | -74.093255 | 12 | 1 |
4 | 7141.07 | 40.714002 | -74.093255 | 16 | 1 |
In the code above we use the pandas pivot_table()
to group the data. Note that the last index we use is “nyc.index.hour”. This takes the index of the data frame and, since it is of the datetime format, we can get the hour of that value like shown above. Similarly, we could get the day or month. Thus, to get a daily average we will divide by the numbers of days.
= nyc.index.day.max()
days 'station'] = start['station']/days start[
Now, we will change the name of the columns and define the color we want the points to have on the map.
# Rename the columns
= ['station_id', 'lat', 'lon', 'hour', 'count']
start.columns
# Define the color
'fillColor'] = '#53c688'
start[
# The stops where less than one daily trip
# will have a different color
'count']<1, 'fillColor'] = '#586065'
start.loc[start[1) start.head(
station_id | lat | lon | hour | count | fillColor | |
---|---|---|---|---|---|---|
0 | 6786.02 | 40.714002 | -74.093255 | 12 | 0.032258 | #586065 |
After transforming the data frame to the format we need, we have to define a function that takes the values from it and creates the geojson with the right properties to use in the plugin.
def create_geojson_features(df):
= []
features
for _, row in df.iterrows():
= {
feature 'type': 'Feature',
'geometry': {
'type':'Point',
'coordinates':[row['lon'],row['lat']]
},'properties': {
'time': pd.to_datetime(row['hour'], unit='h').__str__(),
'style': {'color' : ''},
'icon': 'circle',
'iconstyle':{
'fillColor': row['fillColor'],
'fillOpacity': 0.8,
'stroke': 'true',
'radius': row['count'] + 5
}
}
}
features.append(feature)
return features
The function:
- Takes a data frame and iterates through its rows.
- Creates a feature and defines the geometry as a point, taking the lat and lon variables from our data frame.
- Defines the rest of the properties taking the other variables:
- Takes the hour variable to create a time property. ** This one is the most important step to animate our data. **
- Takes
fillColor
to create the fillColor property. - Defines the radius of the point as a function of the count variable.
Once the function is defined, we can use it on our data frame and get the geojson.
= create_geojson_features(start)
start_geojson 0] start_geojson[
{'type': 'Feature',
'geometry': {'type': 'Point',
'coordinates': [-74.0932553, 40.714001716666665]},
'properties': {'time': '1970-01-01 12:00:00',
'style': {'color': ''},
'icon': 'circle',
'iconstyle': {'fillColor': '#586065',
'fillOpacity': 0.8,
'stroke': 'true',
'radius': 5.032258064516129}}}
With this, we can now create the animated map.
= folium.Map(location = [40.71958611647166, -74.0431174635887],
nyc_map = "CartoDB Positron",
tiles = 14)
zoom_start
TimestampedGeoJson(start_geojson,= 'PT1H',
period = 'PT1M',
duration = 1000,
transition_time = True).add_to(nyc_map)
auto_play
nyc_map