11 Data Sets
11.1 Madrid AirBnb
This dataset contains a pre-processed set of properties advertised on the AirBnb website within the region of Madrid (Spain), together with house characteristics.
Availability
The dataset is stored on a Geopackage that can be found, within the structure of this project, under:
path <- "data/assignment_1_madrid/madrid_abb.gpkg"
db <- st_read(path)
Reading layer `madrid_abb' from data source
`/Users/franciscorowe/Dropbox/Francisco/uol/teaching/envs453/202324/san/data/assignment_1_madrid/madrid_abb.gpkg'
using driver `GPKG'
Simple feature collection with 18399 features and 16 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -3.86391 ymin: 40.33243 xmax: -3.556 ymax: 40.56274
Geodetic CRS: WGS 84
Variables
For each of the 17 properties, the following characteristics are available:
-
price
: [string] Price with currency -
price_usd
: [int] Price expressed in USD -
log1pm_price_usd
: [float] Log of the price -
accommodates
: [integer] Number of people the property accommodates -
bathrooms
: [integer] Number of bathrooms -
bedrooms
: [integer] Number of bedrooms -
beds
: [integer] Number of beds -
neighbourhood
: [string] Name of the neighbourhood the property is located in -
room_type
: [string] Type of room offered (shared, private, entire home, hotel room) -
property_type
: [string] Type of property advertised (apartment, house, hut, etc.) -
WiFi
: [binary] Takes1
if the property has WiFi,0
otherwise -
Coffee
: [binary] Takes1
if the property has a coffee maker,0
otherwise -
Gym
: [binary] Takes1
if the property has access to a gym,0
otherwise -
Parking
: [binary] Takes1
if the property offers parking,0
otherwise -
km_to_retiro
: [float] Euclidean distance from the property to the El Retiro park -
geom
: [geometry] Point geometry
Projection
The location of each property is stored as point geometries and expressed in longitude and latitude coordinates:
st_crs(db)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Source & Pre-processing
The data are sourced from Inside Airbnb. A Jupyter notebook in Python (available at data/assignment_1_madrid/clean_data.ipynb
) details the process from the original file available from source to the data in madrid_abb.gpkg
.
11.2 England COVID-19
This dataset contains:
daily COVID-19 confirmed cases from 1st January, 2020 to 2nd February, 2021 from the GOV.UK dashboard;
resident population characteristics from the 2011 census, available from the Office of National Statistics; and,
2019 Index of Multiple Deprivation (IMD) data from GOV.UK and published by the Ministry of Housing, Communities & Local Government.
The data are at the Upper Tier Local Authority District (UTLAD) level - also known as Counties and Unitary Authorities.
Availability
The dataset is stored on a Geopackage:
sdf <- st_read("data/assignment_2_covid/covid19_eng.gpkg")
Reading layer `covid19_eng' from data source
`/Users/franciscorowe/Dropbox/Francisco/uol/teaching/envs453/202324/san/data/assignment_2_covid/covid19_eng.gpkg'
using driver `GPKG'
Simple feature collection with 149 features and 507 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 134112.4 ymin: 11429.67 xmax: 655653.8 ymax: 657536
Projected CRS: OSGB36 / British National Grid
Variables
The data set contains 508 variables:
-
objectid
: [integer] unit identifier -
ctyua19cd
: [integer] Upper Tier Local Authority District (or Counties and Unitary Authorities) identifier -
ctyua19nm
: [character] Upper Tier Local Authority District (or Counties and Unitary Authorities) name -
Region
: [character] Region name -
long
: [numeric] longitude -
lat
: [numeric] latitude -
st_areasha
: [numeric] area in hectare -
X2020.01.31
toX2021.02.05
: [numeric] Daily COVID-19 cases from 31st January, 2020 to 5th February, 2021 -
IMD...Average.score
-IMD.2019...Local.concentration
: [numeric] IMD indicators - for details see File 11: upper-tier local authority summaries. -
Residents
: [numeric] Total resident population -
Households
: [numeric] Total households -
Dwellings
: [numeric] Total dwellings -
Household_Spaces
: [numeric] Total household spaces -
Aged_16plus
toOther_industry
: [numeric] comprise 114 variables relating to various population and household attributes of the resident population. A description of all these variables can be found here -
geom
: [geometry] Point geometry
Projection
Details of the coordinate reference system:
st_crs(sdf)
Coordinate Reference System:
User input: OSGB36 / British National Grid
wkt:
PROJCRS["OSGB36 / British National Grid",
BASEGEOGCRS["OSGB36",
DATUM["Ordnance Survey of Great Britain 1936",
ELLIPSOID["Airy 1830",6377563.396,299.3249646,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4277]],
CONVERSION["British National Grid",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",49,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-2,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996012717,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",400000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",-100000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
BBOX[49.75,-9,61.01,2.01]],
ID["EPSG",27700]]