Python Scripts to ingest a shapefile into a Postgr

2019-03-21 17:07发布

问题:

I have a PostgreSQL database hosted on a Windows 2008 Server RT Virtual Machine (Yes I know it should be hosted on a Linux VM but this is what my organization has dictated it be on. Sigh...)

Our GIS guys dump a lot of shapefiles into a repository. We would like to have an autoprocess that walks through folder as a scheduled task. We would like to add these into our Postgres geodatabase for some other processes we are currently developing

I am looking to walk through large amounts of shapefiles and have their geometries and file names loaded into out database.

Here's the gist of the core portions of the ingest function I have working so far

import os, subprocess
base_dir = r"c:\shape_file_repository"
full_dir = os.walk(base_dir)
shapefile_list = []
for source, dirs, files in full_dir:
    for file_ in files:
        if file_[-3:] == 'shp':
            #print "Found Shapefile"
            shapefile_path = base_dir + '/' + file_
            shapefile_list.append(shapefile_path)
for paths in shapefile_list:  
    #This is the part where I keep running into trouble. os.system also didnt work
    temp_bat = open(r"c:\temp\temp_shp.bat", "w")
    temp_bat.write(r'start /D c:\Program Files (x86)\PostgreSQL\8.4\bin\shp2pgsql.exe' + \
                   paths + "new_shp_table | psql -d geometry_database")
    temp_bat.close()
    subprocess.Popen(r"c:\temp\temp_shp.bat")

Once the geometries are loaded into the new database table I already have code setup to pull the geometry out of the temporary table and load that plus the shapefile name into our main database table. My problem is I can do this through command prompt, however running the windows commands through python or outputting them to batch file and then running them does not seem to be working at all.

回答1:

Here are some modifications that should make thing work. Note that it would need further modification if you need to be notified if any of the commands fail. Note that it will fail for more than one shapefile, since a new_shp_table table will already exist until you have further logic to move or rename that table elsewhere, or to load it with a unique name.

Also, note that PostgreSQL 8.4 will reach it's end-of-life later this year, so you might want to plan to upgrade to a more recent release before it's too late.

import os, subprocess

# Choose your PostgreSQL version here
os.environ['PATH'] += r';C:\Program Files (x86)\PostgreSQL\8.4\bin'
# http://www.postgresql.org/docs/current/static/libpq-envars.html
os.environ['PGHOST'] = 'localhost'
os.environ['PGPORT'] = '5432'
os.environ['PGUSER'] = 'someuser'
os.environ['PGPASSWORD'] = 'clever password'
os.environ['PGDATABASE'] = 'geometry_database'

base_dir = r"c:\shape_file_repository"
full_dir = os.walk(base_dir)
shapefile_list = []
for source, dirs, files in full_dir:
    for file_ in files:
        if file_[-3:] == 'shp':
            shapefile_path = os.path.join(base_dir, file_)
            shapefile_list.append(shapefile_path)
for shape_path in shapefile_list:
    cmds = 'shp2pgsql "' + shape_path + '" new_shp_table | psql '
    subprocess.call(cmds, shell=True)