June 19, 2014

How to Move Data Between Databases

Decided to migrate some data in postgres from home to the cloud. 11,431 rows of data to be exact. Enter python... First to get the data out of the database:


#!/Users/hdiwan/.virtualenvs/gis/bin/python import logging import argparse import cStringIO as StringIO import csv import datetime import psycopg2 GOOD_TYPES = set(['taxi', 'shop', 'social_facility', 'coffee_shop', 'bar', 'pub_brewery', 'pub', 'restaurant', 'nightclub', 'cafe']) SQL_STATEMENT = u'''select name, st_x(way) as longitude, st_y(way) as latitude, amenity as type from planet_osm_point WHERE amenity = %s '''+'OR amenity = %s ' * (len(GOOD_TYPES)-1) if __name__ == '__main__': argp = argparse.ArgumentParser(description='Filter PostGIS data files to CSV') argp.add_argument('-v', '--verbose', action='store_false', help='Make script chatty') parsed = argp.parse_args() if not parsed.verbose: logging.basicConfig(level=logging.DEBUG) else: logging.basicConfig(level=logging.FATAL) _out = StringIO.StringIO() out = csv.DictWriter(_out, fieldnames=['name', 'longitude', 'latitude', 'type']) conn = psycopg2.connect(u'dbname=gis user=hdiwan') cursor = conn.cursor() cursor.execute(SQL_STATEMENT, list(GOOD_TYPES)) for row in cursor.fetchall(): row = {'name': row[0], 'longitude': unicode(row[1]), 'latitude': unicode(row[2]), 'type': row[3]} logging.debug(row) out.writerow(row) cursor.close() conn.close() print(_out.getvalue()),
... and the script to get it into the other database is even simpler:
#!/usr/bin/env python
import logging
import argparse
import csv
import datetime

import psycopg2

SQL_STATEMENT = 'INSERT INTO planet_osm_points (name, way, amenity) values (%s, st_point(%s, %s), %s)'

if __name__ == '__main__':
    argp = argparse.ArgumentParser(description='Import CSV with location data, as exported from places.py')
    argp.add_argument('-v', '--verbose', action='store_false', help='Make script chatty')
    argp.add_argument('-f', '--filename', action='store', type=unicode, help='Filename exported from places.py')
    parsed = argp.parse_args()

    if not parsed.verbose:
        logging.basicConfig(level=logging.DEBUG)
    else:
        logging.basicConfig(level=logging.FATAL)

    rowCount = 0
    with open(parsed.filename, 'r') as fin:
        csv = csv.reader(fin)
        conn = psycopg2.connect(u'dbname=gis user=hdiwan')
        cursor = conn.cursor()
        for dictionary in csv:
            rowCount = rowCount + 1
            logging.debug(dictionary)
            cursor.execute(SQL_STATEMENT, (dictionary[0], dictionary[1], dictionary[2], dictionary[3]))
        cursor.close()
        conn.commit()
        conn.close()

    print('{} rows imported'.format(rowCount))

1 comment:

  1. I would recommend you use pg_dump/pg_restore if you have Postgres on both ends, or COPY or its friend \copy if you have CSV on one side or the other. Barring that, Pentaho Data Integration is powerful, flexible and easy, can do much more complex things and worries a lot about error handling. It's actually quite fun, and a table->table transform is pretty trivial compared to the kinds of crazy pipelines it can do.

    ReplyDelete