On the verge of the Scottish vote for independence here is a look at all of the scheduled passenger connections between Scotland and outside of the UK. As it is, there are only flights. There are a few ferries between Northern Ireland and Scotland, however NI is still within the UK. No conclusions just an interesting visualisation.
Loading .csv data into PostGIS was surprisingly difficult, but can be done in a number of ways. This post will cover doing it using a .vrt (virtual raster) file.
In this example we are using Ordnance Survey and ultimately Royal Mail: Code-Point Open data. This can be found from OpenData and is a CSV point file for UK postcodes.
First we need to modify the header file provided with Code-Point:
I have deleted the short codes and replaced them with sensible headers. You should only have one line in your header file.
Firstly the files are provided split up into a number of different files. We can combine all of the .csv files using the cat command in Linux.
cat *.csv > codepoint.csv
So this will combine all of the individual .csv files into the codepoint.csv file. Your header file should be the top line.
So now we can look at the data using ogrinfo, part of the gdal suite.
We create a virtual raster definition file with our X and Y columns, and projection defined:
codepoint.csvwkbPointEPSG:27700
So now we can run “ogrinfo” on codepoint.vrt:
ogrinfo -ro -al -so codepoint.vrt
Result:
So we can see that there was an error opening the codepoint.csv file itself but the .vrt worked fine. This is probably down to a memory issue on my part. So your mileage may vary. I tried again with just one postcode area and it worked fine (ab postcode area .csv was named codepoint.csv):
So I need to find a more memory friendly way to do this. So I will load the postcode files one at a time, appending them to any data that is already loaded.
So first I need to add the header row to the top of each file. The cat command worked really well last time so lets try that.
Rename the header file to .txt. So it will be called Code-Point_Open_Column_Headers.txt and consists of only one row with our desired column headers.
Write a bash script that adds the header file to the beginning of each .csv file:
#!/bin/bash
# vesanto
##################################
echo " ******************************* "
echo " Add a header to codepoint"
echo " ******************************* "
for codepointFiles in *.csv
do
Echo $codepointFiles
cat Code-Point_Open_Column_Headers.txt >> $codepointFiles.csv
cat $codepointFiles >> $codepointFiles.csv
echo "Done"
done
Run the file:
Bash cat_command.sh
Delete all of the .csv file (not the .csv.csv files!)
And we rename .csv.csv to .csv:
for i in *.csv.csv; do mv "$i" "${i/.csv.csv}".csv; done
Then we can create the .vrt files for each csv file:
So we want to create the equivalent of for each csv file:
-nlt | PROMOTE_TO_MULTI, creates it as a multipart geometry, not important for points
-progress | Shows a progress bar
-nln codepoint | The name of the layer in the database, so because we are appending data to the first csv file loaded this is important.
-skipfailures -lco PRECISION=no | Personal preference
-f PostgreSQL PG:”dbname=’os_open’ host=’localhost’ port=’5432′ user=’os_open’ password=’os_open'” | Destination, so the details of your PostGIS database (see setting up PostGIS for help).
The latter commands also have the -append tag, which means they will be appended to the first one loaded.
Excellent. Though this was a bit intensive. Doing this in QGIS would have been a lot easier. However this can now be scripted and automatically run just by replacing the original .csv input files allowing for an easy update of the data when a new Cope-Point dataset is released.
The Ordnance Survey have begun to release their data as free to use. I think this is great, and despite what some say, I think the selection is good as well. I can see why MasterMap and AddressBase, their flagship products still cost money, there are some significant costs associated with producing them.
There is however still a bit of a barrier of entry, into using OS OpenData. This guide will hopefully make downloading the data a little bit easier.
There is no cost, but a valid e-mail address is required.
For my purposes I started with OS Street View® and OS VectorMap™ District for the whole country.
You will receive an e-mail with the download links for the data. Unfortunately these are individual links since the data is split up into tiles. 55 tiles in total for the whole of the UK, so we want some way to automate the download of these links.
In comes curl, made by some nice Swedes. curl is a command line tool which will download the link provided.
They are all in the British National Grid Projection (EPSG:27700). Watch out for this in QGIS. Version 1.8 used the 3 parameter conversion while 2.0 uses the 9 parameter. This means that if you created data in QGIS 1.8 and are now using 2.0, your data may not line up if you have “on the fly” CRS transformation turned on. To fix this just go into the layer properties and re-define the projection as British National Grid (EPSG:27700) for all the layers.
If you are working with Raster data you need to copy the .tfw files into the same directory as the .tif files. The .tfw (tiff world files) tell the .tif image where they should be in the world.
Natural Earth provides some of the best data for large scale mapping. It is clean, accurate, extensive, at a number of different scales, and best of all free.
To load the data it into PostGIS (PostgreSQL) we will use the vector tools provided in GDAL. Mainly ogr2ogr.
After downloading the data. I went for all of the vector data in ShapeFile format. First I need to generate a list of datasets and their respective file paths. This will be put into a spreadsheet and the command to load the data will be applied to each line, and finally it will be run using a shell script. Setting up a PostGIS database is covered in my previous post.
My Natural Earth data consisted of:
28 directories, 1472 files
So a little automation is needed. Interestingly there were also a few .gdb files “ne_10m_admin_1_label_points.gdb”. Those we can look into at a later date.
To begin:
ls > my_contents.txt
Produced a decent result, but not quite what I was looking for.
sudo apt-get install tree
tree > natural_earth.txt
Was much better, although with a bit more tuning I’m sure ls would have achieved a better result.
After a bit of work in the spreadsheet, I had what I wanted. Perhaps not the most elegant solution, but certainly enough for my purposes.
Ogr2ogr is a file converter, which does so much more. In this case we are converting the ShapeFiles into tables in a PostGIS database. Essentially you want to copy the beginning part of the command in front of the files you want to load, changing only: “10m_cultural/ne_10m_admin_0_antarctic_claim_limit_lines.shp” .
Our settings:
-nlt PROMOTE_TO_MULTI | Loads all out files as if they were multi-part polygons. This means that a multi-part polygon wont fail the loading. This is a PostGIS requirement.
-progress | Shows a progress bar.
-skipfailures | Will not stop for each failure.
-overwrite | Overwrites a table if there is one with the same name. Our tables will be called whatever the ShaeFile is called since we are not specifying a name.
-lco PRECISION=no | Helps keep numbers manageable, especially with this data where precision is not important.
-f PostgreSQL PG:”dbname=’DatabaseName’ host=’IpAddressOfHost’ port=’5432′ user=’Username’ password=’Password'” | Details of the database where we are connecting to.
Now we are ready to run the commands. While ogr2ogr commands can be pasted straight into the terminal, for this task that is not really feasible. So we can create a simple shell script.
Copy the commands into a file and then:
sh your_commands
Finally there was one final error, with ne_10m_populated_places.shp. This was due to encoding. The encoding for the ogr2ogr tool can be changed from UFT8 to LATIN1 using:
I have just ordered a new home server to store all of my GIS data, so I have put together a quick guide on setting up PostGIS on Linux Mint, setting it up on Ubuntu will be very similar.
PostgreSQL come pre-installed on most Linux distros, so we don’t have to worry about installing it. However we do need to change the password for the default ‘postgres’ user. This can be done with the following:
sudo -u postgres psql
postgres=> alter user postgres password 'apassword';
postgres=> create user yourusername createdb createuser password 'somepass';
postgres=> \q
The command sets the password for the root user ‘postgres’ and sets up your own user account with privileges to create databases and users.
If you get a response of:
sudo: unknown user: postgres
sudo: unable to initialise policy plug-in
Then PostgreSQL is not installed, but can be done using:
sudo apt-get install postgresql
To create a database we then simply run:
createdb nameofdatabase
In my case it was natural_earth. PostGIS is an add-on to a PostgreSQL database that that makes it spatially enabled. To install PostGIS we can launch pgAdmin, which is used to administer PostgreSQL database. As we can see we have the default ‘postgres’ databse and the one just created ‘natural_earth’.
Adding PostGIS to the database we created should be as easy as running the following SQL command in the ‘natural_earth’ database:
CREATE EXTENSION postgis;
Unfortunately while PostgreSQL is automatically installed, PostGIS is not. So it needs to be installed through the Software Manager:
If you cannot find PostGIS it may be due to a missing personal repository, UbuntuGIS. It is a PPA (Personal Package Archive). In Software Manager it can be added from Edit> Software Sources> PPAs> Add a new PPA> ppa:ubuntugis/ppa.
With PostGIS installed we will try the command again:
CREATE EXTENSION postgis;
With a better result. Now we can see ‘postgis’ in the Extensions for our new Database:
PostGIS topology can be added with the following command.
CREATE EXTENSION postgis_topology;
You are now ready to get started loading in your spatial data.