Chapter 5 Files

You need to make sure that:

  1. You have access to all of your files;
  2. Your files are backed up so your setup is not entirely reliant on a single device;
  3. Each device on which your files are copied or from which they are accessed is encrypted.

5.1 Backup/sync

The simplest and recommended way to do this at Liverpool is to keep all your files and data on your university account at OneDrive. This is part of the Office 365 Suite available from the university, you can find more info at:

https://www.liverpool.ac.uk/csd/working-from-home/

There are Windows and Mac clients that work relatively well (equivalent to Dropbox client).

Once you are set up, copy all your files onto your OneDrive account, which will create a copy of them in Microsoft’s secure cloud. The exception is where you have data that has requirements to be managed in particular ways - e.g. only from a single machine etc; not in the cloud.

Please, be sure to speak with your PhD supervisor if you access data that may pose some challenges when moving from local machines or within the university network (remember OneDrive is in the Cloud, not the university servers!).

5.2 Encryption

Disk encryption helps protect data on your devices through converting them into an unreadable format. Deciphering the data without access to the required keys is challenging. Therefore, should your devices be lost or stolen, encrypting your devices therefore introduces an additional barrier for someone attempting to access potentially sensitive data. Please note that, as per University of Liverpool guidelines, “the security of confidential information is the responsibility of the individual member of staff or student NOT the University, nor the line manager or Head of Department”. Encryption methods are platform dependent. A list of relevant guides is provided below:

  1. Windows
  2. Mac
  3. Ubuntu
  4. iOS
  5. Android

5.3 File Transfer

Below we shall discuss two approaches that can be used to transfer files between two servers:

  1. File Transfer Protocol (FTP);
  2. Secure Copy Protocol (scp).

5.3.1 File Transfer Protocol (FTP)

If you need to move large and/or many files from a local machine to a remote server (e.g. from your laptop to a Linux machine at the lab), you can do so using a drag and drop interface with an FTP client (e.g., filezilla for Windows/Mac/Ubuntu or WinSCP for Windows). To access a remote server you will need to enter the following into respective fields within your FTP client:

  • Host (Remote Server IP Address);
  • Username (Your username on the remote host);
  • Password (Remote host password).

5.3.2 Secure Copy Protocol (scp)

Alternatively Mac and Linux users can copy files between servers using the scp command. To copy a local file to a remote server:

scp <filepath> <username>@<server.ip.address>:<target_directory>

This command can also be use to copy a file from the remote server to your local machine:

scp <username>@<server.ip.address>:<filepath> <local_target_directory>/

As with copying file locally -r can be added to the above command to recursively copy all files within a directory. However, if a directory contains a large number of files then zip your directory before executing scp. The zip file can be unzipped using:

unzip <filename>.zip -d <target_directory>

Tar files meanwhile can be extracted using:

tar -xvzf <filename>.tar.gz -C <target_directory>

5.4 File Download

Often large datasets, etc can be downloaded from the web directly. The wget command can be used to download files from both http(s)

wget '<file_url>'

and ftp servers:

wget -r 'ftp://<username>:<password>@<server.ip.address>/<directory>'

Once the data has been downloaded we must verify that the integrity of the file. Typically the websites from which data can be downloaded provide a md5 checksum. This allows us to verify that a file has not been changed:

md5sum <filename>

5.5 Practical

5.5.1 File Download

For the second task we shall download the CIFAR-10 dataset from https://www.cs.toronto.edu/~kriz/cifar.html using the wget command.

  1. First we will need to determine the url from which we can download the dataset. Visit the CIFAR-10 website, right click on the “CIFAR-10 python version”, and choose the “copy link” option, which will copy the link to your clipboard.
  2. Next, ssh into your remote server.
  3. Type “wget” and paste the url into to the terminal by pressing (ctrl + shift + V).
  4. Press enter to start the file download.
  5. Upon downloading the CIFAR-10 dataset verify that the md5 checksum matches the one specified on the website.

5.5.2 File Transfer (Optional)

  1. Create a file named “Test.txt” locally and enter some random text. From your local machine copy “Test.txt” to your remote server using either scp or a ftp client (e.g., using WinSCP or filezilla).
  2. Upon transferring the file, ssh into the remote server and verify that the file is within the specified target directory using the ls command.
  3. (Optional) Open the file using a command line editing interface from the terminal, e.g, nano or vim:
nano ./Test.txt