1 Overview

Access to all materials, including lecture notes, computational notebooks and datasets, is centralised through the use of the course website available in the following url:

https://gdsl-ul.github.io/san/

The module handbook, including the assessment description, criteria and module programme, and videos for each teaching week can be accessed via the module Canvas site:

ENS453 Spatial Modelling for Data Scientists

1.1 Aims

This module aims to provides students with a range of techniques for analysing and modelling spatial data:

build upon the more general research training delivered via companion modules on Data Collection and Data Analysis, both of which have an aspatial focus;
highlight a number of key social issues that have a spatial dimension;
explain the specific challenges faced when attempting to analyse spatial data;
introduce a range of analytical techniques and approaches suitable for the analysis of spatial data; and,
enhance practical skills in using R software packages to implement a wide range of spatial analytical tools.

1.2 Learning Outcomes

By the end of the module, students should be able to:

identify some key sources of spatial data and resources of spatial analysis and modelling tools;
explain the advantages of taking spatial structure into account when analysing spatial data;
apply a range of computer-based techniques for the analysis of spatial data, including mapping, correlation, kernel density estimation, regression, multi-level models, geographically-weighted regression, spatial interaction models and spatial econometrics;
apply appropriate analytical strategies to tackle the key methodological challenges facing spatial analysis – spatial autocorrelation, heterogeneity, and ecological fallacy; and,
select appropriate analytical tools for analysing specific spatial data sets to address emerging social issues facing the society.

1.3 Feedback

Formal assessment of two computational essays. Written assignment-specific feedback will be provided within three working weeks of the submission deadline. Comments will offer an understanding of the mark awarded and identify areas which can be considered for improvement in future assignments.
Verbal face-to-face feedback. Immediate face-to-face feedback will be provided during lecture, discussion and clinic sessions in interaction with staff. This will take place in all live sessions during the semester.
Online forum. Asynchronous written feedback will be provided via an online forum maintained by the module lead. Students are encouraged to contribute by asking and answering questions relating to the module content. Staff will monitor the forum Monday to Friday 9am-5pm, but it will be open to students to make contributions at all times.

1.4 Computational Environment

To reproduce the code in the book, you need the following software packages:

R-4.3.1
RStudio 2023.09.0+463
Quarto 1.3.450
the list of libraries in the next section

To check your version of:

R and libraries run sessionInfo()
RStudio click help on the menu bar and then About
Quarto check the version file in the quarto folder on your computer.

To install and update:

R, download the appropriate version from The Comprehensive R Archive Network (CRAN)
RStudio, download the appropriate version from Posit
Quarto, download the appropriate version from the Quarto website

1.4.1 Dependency list

The list of libraries used in this book is provided below:

arm
car
corrplot
devtools
FRK
gghighlight
ggplot2
ggmap
GISTools
gridExtra
gstat
hexbin
jtools
kableExtra
knitr
lme4
lmtest
lubridate
MASS
merTools
plyr
RColorBrewer
rgdal
sf
sjPlot
sp
spgwr
spatialreg
spacetime
stargazer
tidyverse
tmap
tufte
viridis
basemapR

Copy, paste and run the code below in your console. Ensure all packages are installed on your computer.

# package names
packages <- c(
    "arm",
    "car",
    "corrplot",
    "devtools",
    "FRK",
    "gghighlight",
    "ggplot2",
    "ggmap",
    "gridExtra",
    "gstat",
    "hexbin",
    "jtools",
    "kableExtra",
    "knitr",
    "lme4",
    "lmtest",
    "lubridate",
    "MASS",
    "merTools",
    "plyr",
    "RColorBrewer",
    "sf",
    "sjPlot",
    "sp",
    "spgwr",
    "spatialreg",
    "spacetime",
    "stargazer",
    "tidyverse",
    "tmap",
    "tufte",
    "viridis"
)

# install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

# packages loading
invisible(lapply(packages, library, character.only = TRUE))

Note

To install the library basemapR, you need to install from source by running:

library(devtools) install_github('Chrisjb/basemapR')

1.5 Assessment

The final module mark is composed of the two computational essays. Together they are designed to cover the materials introduced in the entirety of content covered during the semester. A computational essay is an essay whose narrative is supported by code and computational results that are included in the essay itself. Each teaching week, you will be required to address a set of questions relating to the module content covered in that week, and to use the material that you will produce for this purpose to build your computational essay.

Assignment 1 (50%) refer to the set of questions at the end of Chapter 4, Chapter 5 and Chapter 6. You are required to use your responses to build your computational essay. Each chapter provides more specific guidance of the tasks and discussion that you are required to consider in your assignment.

Assignment 2 (50%) refer to the set of questions at the end of Chapter 7, Chapter 8, Chapter 9 and Chapter 10. You are required to use your responses to build your computational essay. Each chapter provides more specific guidance of the tasks and discussion that you are required to consider in your assignment.

1.5.1 Format Requirements

Both assignments will have the same requirements:

Maximum word count: 2,000 words, excluding figures and references.
Up to three maps, plot or figures (a figure may include more than one map and/or plot and will only count as one but needs to be integrated in the figure)
Up to two tables.

Assignments need to be prepared in “Quarto Document” format (i.e. qmd extension) and then converted into a self-contained HTML file that will then be submitted via Turnitin. The document should only display content that will be assessed. Intermediate steps do not need to be displayed. Messages resulting from loading packages, attaching data frames, or similar messages do not need to be included as output code. Useful resources to customise your R notebook can be found on Quarto’s website.

Two Quarto Document templates will be available via the module Canvas site.

Submission is electronic only via Turnitin on Canvas.

1.5.2 Marking criteria

The Standard Environmental Sciences School marking criteria apply, with a stronger emphasis on evidencing the use of regression models, critical analysis of results and presentation standards. In addition to these general criteria, the code and outputs (i.e. tables, maps and plots) contained within the notebook submitted for assessment will be assessed according to the extent of documentation and evidence of expertise in changing and extending the code options illustrated in each chapter. Specifically, the following criteria will be applied:

0-15: no documentation and use of default options.
16-39: little documentation and use of default options.
40-49: some documentation, and use of default options.
50-59: extensive documentation, and edit of some of the options provided in the notebook (e.g. change north arrow location).
60-69: extensive well organised and easy to read documentation, and evidence of understanding of options provided in the code (e.g. tweaking existing options).
70-79: all above, plus clear evidence of code design skills (e.g. customising graphics, combining plots (or tables) into a single output, adding clear axis labels and variable names on graphic outputs, etc.).
80-100: all as above, plus code containing novel contributions that extend/improve the functionality the code was provided with (e.g. comparative model assessments, novel methods to perform the task, etc.).

# Overview {#overview} Access to all materials, including lecture notes, computational notebooks and datasets, is centralised through the use of the course website available in the following url: > <https://gdsl-ul.github.io/san/> The module handbook, including the assessment description, criteria and module programme, and videos for each teaching week can be accessed via the module Canvas site: > [ENS453 Spatial Modelling for Data Scientists](https://liverpool.instructure.com) ## Aims This module aims to provides students with a range of techniques for analysing and modelling spatial data: - build upon the more general research training delivered via companion modules on *Data Collection and Data Analysis*, both of which have an aspatial focus; - highlight a number of key social issues that have a spatial dimension; - explain the specific challenges faced when attempting to analyse spatial data; - introduce a range of analytical techniques and approaches suitable for the analysis of spatial data; and, - enhance practical skills in using *R* software packages to implement a wide range of spatial analytical tools. ## Learning Outcomes By the end of the module, students should be able to: - identify some key sources of spatial data and resources of spatial analysis and modelling tools; - explain the advantages of taking spatial structure into account when analysing spatial data; - apply a range of computer-based techniques for the analysis of spatial data, including mapping, correlation, kernel density estimation, regression, multi-level models, geographically-weighted regression, spatial interaction models and spatial econometrics; - apply appropriate analytical strategies to tackle the key methodological challenges facing spatial analysis -- spatial autocorrelation, heterogeneity, and ecological fallacy; and, - select appropriate analytical tools for analysing specific spatial data sets to address emerging social issues facing the society. ## Feedback - *Formal assessment of two computational essays*. Written assignment-specific feedback will be provided within three working weeks of the submission deadline. Comments will offer an understanding of the mark awarded and identify areas which can be considered for improvement in future assignments. - *Verbal face-to-face feedback*. Immediate face-to-face feedback will be provided during lecture, discussion and clinic sessions in interaction with staff. This will take place in all live sessions during the semester. - *Online forum*. Asynchronous written feedback will be provided via an online forum maintained by the module lead. Students are encouraged to contribute by asking and answering questions relating to the module content. Staff will monitor the forum Monday to Friday 9am-5pm, but it will be open to students to make contributions at all times. ## Computational Environment To reproduce the code in the book, you need the following software packages: - R-4.3.1 - RStudio 2023.09.0+463 - Quarto 1.3.450 - the list of libraries in the next section To check your version of: - R and libraries run `sessionInfo()` - RStudio click `help` on the menu bar and then `About` - Quarto check the `version` file in the quarto folder on your computer. To install and update: - R, download the appropriate version from [The Comprehensive R Archive Network (CRAN)](https://cran.r-project.org) - RStudio, download the appropriate version from [Posit](https://posit.co/download/rstudio-desktop/) - Quarto, download the appropriate version from [the Quarto website](https://quarto.org/docs/get-started/) ### Dependency list {#sec-dependencies} The list of libraries used in this book is provided below: - `arm` - `car` - `corrplot` - `devtools` - `FRK` - `gghighlight` - `ggplot2` - `ggmap` - `GISTools` - `gridExtra` - `gstat` - `hexbin` - `jtools` - `kableExtra` - `knitr` - `lme4` - `lmtest` - `lubridate` - `MASS` - `merTools` - `plyr` - `RColorBrewer` - `rgdal` - `sf` - `sjPlot` - `sp` - `spgwr` - `spatialreg` - `spacetime` - `stargazer` - `tidyverse` - `tmap` - `tufte` - `viridis` - `basemapR` Copy, paste and run the code below in your console. Ensure all packages are installed on your computer. ```{r} #| eval: false # package names packages <- c( "arm", "car", "corrplot", "devtools", "FRK", "gghighlight", "ggplot2", "ggmap", "gridExtra", "gstat", "hexbin", "jtools", "kableExtra", "knitr", "lme4", "lmtest", "lubridate", "MASS", "merTools", "plyr", "RColorBrewer", "sf", "sjPlot", "sp", "spgwr", "spatialreg", "spacetime", "stargazer", "tidyverse", "tmap", "tufte", "viridis" ) # install packages not yet installed installed_packages <- packages %in% rownames(installed.packages()) if (any(installed_packages == FALSE)) { install.packages(packages[!installed_packages]) } # packages loading invisible(lapply(packages, library, character.only = TRUE)) ``` ::: column-margin ::: callout-note To install the library `basemapR`, you need to install from source by running: `library(devtools)` `install_github('Chrisjb/basemapR')` ::: ::: ## Assessment The final module mark is composed of the *two computational essays*. Together they are designed to cover the materials introduced in the entirety of content covered during the semester. A computational essay is an essay whose narrative is supported by code and computational results that are included in the essay itself. Each teaching week, you will be required to address a set of questions relating to the module content covered in that week, and to use the material that you will produce for this purpose to build your computational essay. **Assignment 1 (50%)** refer to the set of questions at the end of @sec-chp4, @sec-chp5 and @sec-chp6. You are required to use your responses to build your computational essay. Each chapter provides more specific guidance of the tasks and discussion that you are required to consider in your assignment. **Assignment 2 (50%)** refer to the set of questions at the end of @sec-chp7, @sec-chp8, @sec-chp9 and @sec-chp10. You are required to use your responses to build your computational essay. Each chapter provides more specific guidance of the tasks and discussion that you are required to consider in your assignment. ### Format Requirements Both assignments will have the same requirements: - Maximum word count: 2,000 words, excluding figures and references. - Up to three maps, plot or figures (a figure may include more than one map and/or plot and will only count as one but needs to be integrated in the figure) - Up to two tables. Assignments need to be prepared in "*Quarto Document*" format (i.e. qmd extension) and then converted into a self-contained HTML file that will then be submitted via Turnitin. The document should only display content that will be assessed. Intermediate steps do not need to be displayed. Messages resulting from loading packages, attaching data frames, or similar messages do not need to be included as output code. Useful resources to customise your R notebook can be found on [Quarto's website](https://quarto.org/docs/guide/). Two Quarto Document templates will be available via [the module Canvas site](https://canvas.liverpool.ac.uk/courses/60454). Submission is electronic only via Turnitin on Canvas. ### Marking criteria The Standard Environmental Sciences School marking criteria apply, with a stronger emphasis on evidencing the use of regression models, critical analysis of results and presentation standards. In addition to these general criteria, the code and outputs (i.e. tables, maps and plots) contained within the notebook submitted for assessment will be assessed according to the extent of documentation and evidence of expertise in changing and extending the code options illustrated in each chapter. Specifically, the following criteria will be applied: - **0-15**: no documentation and use of default options. - **16-39**: little documentation and use of default options. - **40-49**: some documentation, and use of default options. - **50-59**: extensive documentation, and edit of some of the options provided in the notebook (e.g. change north arrow location). - **60-69**: extensive well organised and easy to read documentation, and evidence of understanding of options provided in the code (e.g. tweaking existing options). - **70-79**: all above, plus clear evidence of code design skills (e.g. customising graphics, combining plots (or tables) into a single output, adding clear axis labels and variable names on graphic outputs, etc.). - **80-100**: all as above, plus code containing novel contributions that extend/improve the functionality the code was provided with (e.g. comparative model assessments, novel methods to perform the task, etc.).