I found some campsite data in XML format and I wanted to formalise it into a relational database.
So I downloaded the data and started analysing it.
We have the name of the camping with some sort of id and the city/town it belonged. We also have the country. From here on, we have uneven data. Some countries have counties or departments or provinces. Others don’t. So I will make relational connections between countries, places and campings and I’ll dump the zone information in some null-able columns.
First, lets create the tables to import the data and massage it later.
I have created a schema for importing the raw data and the table that will hold the raw data.
We have the data imported now. But all the information appears as a url. Lets break it up. First we will create another temporary table to keep the data in order to facilitate further processing.
Cool, now the data it’s divided and we were able to identify the camping name. Lets try to identify the countries. Note that depending on the country it may appear as zone4, zone3 or zone2.
Lets tackle places now.
Finally, lets import and relate the campsites.
Done! if you have internet connection to the web and Postgresql, you can do it too.