Loading CSV XY Data into PostGIS, with Code-Point Open

Now that we have downloaded some OS OpenData, we can look into working with it.

Loading .csv data into PostGIS was surprisingly difficult, but can be done in a number of ways. This post will cover doing it using a .vrt (virtual raster) file.

In this example we are using Ordnance Survey and ultimately Royal Mail: Code-Point Open data. This can be found from OpenData and is a CSV point file for UK postcodes.

First we need to modify the header file provided with Code-Point:

Screenshot from 2013-10-09 20:21:32

I have deleted the short codes and replaced them with sensible headers. You should only have one line in your header file.

Firstly the files are provided split up into a number of different files. We can combine all of the .csv files using the cat command in Linux.

So this will combine all of the individual .csv files into the codepoint.csv file. Your header file should be the top line.

So now we can look at the data using ogrinfo, part of the gdal suite.

We create a virtual raster definition file with our X and Y columns, and projection defined:

So now we can run “ogrinfo” on codepoint.vrt:

Result:

Screenshot from 2013-10-09 21:25:42

So we can see that there was an error opening the codepoint.csv file itself but the .vrt worked fine. This is probably down to a memory issue on my part. So your mileage may vary. I tried again with just one postcode area and it worked fine (ab postcode area .csv was named codepoint.csv):

Screenshot from 2013-10-09 21:24:35

So I need to find a more memory friendly way to do this. So I will load the postcode files one at a time, appending them to any data that is already loaded.

So first I need to add the header row to the top of each file. The cat command worked really well last time so lets try that.

Rename the header file to .txt. So it will be called Code-Point_Open_Column_Headers.txt and consists of only one row with our desired column headers.

Write a bash script that adds the header file to the beginning of each .csv file:

Run the file:

Delete all of the .csv file (not the .csv.csv files!)

And we rename .csv.csv to .csv:

Then we can create the .vrt files for each csv file:

So we want to create the equivalent of for each csv file:

So we can do this using another bash script:

And finally we can load the files using a final bash script (this could be done by looping through the files as well):

So the ogr2ogr command is:

ogr2ogr -nlt PROMOTE_TO_MULTI -progress -nln codepoint -skipfailures -lco PRECISION=no -f PostgreSQL PG:”dbname=’os_open’ host=’localhost’ port=’5432′ user=’os_open’ password=’os_open'” source_file

Explained:

-nlt | PROMOTE_TO_MULTI, creates it as a multipart geometry, not important for points

-progress | Shows a progress bar

-nln codepoint | The name of the layer in the database, so because we are appending data to the first csv file loaded this is important.

-skipfailures -lco PRECISION=no | Personal preference

-f PostgreSQL PG:”dbname=’os_open’ host=’localhost’ port=’5432′ user=’os_open’ password=’os_open'” | Destination, so the details of your PostGIS database (see setting up PostGIS for help).

The latter commands also have the -append tag, which means they will be appended to the first one loaded.

Excellent. Though this was a bit intensive. Doing this in QGIS would have been a lot easier. However this can now be scripted and automatically run just by replacing the original .csv input files allowing for an easy update of the data when a new Cope-Point dataset is released.

Sources:

http://www.debian-administration.org/articles/150

Leave a Reply

Your email address will not be published. Required fields are marked *