Pandas error with basemap/proj for map plotting

2019-02-26 03:43发布

问题:

I ran the Python code below that is an example of "Plotting Maps: Visualizing Haiti Earthquake Crisis Data" on a book, Python for Data Analysis. Page 242-246

The code is supposed to create a plot map of Haiti but I got an error as below:

Traceback (most recent call last):
  File "Haiti.py", line 74, in <module>
    x, y = m(cat_data.LONGITUDE, cat_data.LATITUDE)
  File "/usr/local/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1148, in __call__
    xout,yout = self.projtran(x,y,inverse=inverse)
  File "/usr/local/lib/python2.7/site-packages/mpl_toolkits/basemap/proj.py", line 286, in __call__
    outx,outy = self._proj4(x, y, inverse=inverse)
  File "/usr/local/lib/python2.7/site-packages/mpl_toolkits/basemap/pyproj.py", line 388, in __call__
    _proj.Proj._fwd(self, inx, iny, radians=radians, errcheck=errcheck)
  File "_proj.pyx", line 122, in _proj.Proj._fwd (src/_proj.c:1571)
RuntimeError

I checked if mpl_toolkits.basemap and proj module were installed okay on my machine. Basemap was installed from source as instructed and proj was installed by Homebrew and they looks fine to me.

If you have basemap and proj installed, does this code run successfully? If not, do you think if it's a module installation issue, the code itself, or any other?

Haiti.csv file can be downloaded from https://github.com/pydata/pydata-book/raw/master/ch08/Haiti.csv

import pandas as pd
import numpy as np
from pandas import DataFrame

data = pd.read_csv('Haiti.csv')

data = data[(data.LATITUDE > 18) & (data.LATITUDE < 20) &
        (data.LONGITUDE > -75) & (data.LONGITUDE < -70)
        & data.CATEGORY.notnull()]

def to_cat_list(catstr):
    stripped = (x.strip() for x in catstr.split(','))
    return [x for x in stripped if x]

def get_all_categories(cat_series):
    cat_sets = (set(to_cat_list(x)) for x in cat_series) 
    return sorted(set.union(*cat_sets))

def get_english(cat):
    code, names = cat.split('.') 
    if '|' in names:
        names = names.split(' | ')[1] 
    return code, names.strip()

all_cats = get_all_categories(data.CATEGORY)
english_mapping = dict(get_english(x) for x in all_cats)

def get_code(seq):
    return [x.split('.')[0] for x in seq if x]

all_codes = get_code(all_cats)
code_index = pd.Index(np.unique(all_codes))
dummy_frame = DataFrame(np.zeros((len(data), len(code_index))),
                        index=data.index, columns=code_index)

for row, cat in zip(data.index, data.CATEGORY): 
    codes = get_code(to_cat_list(cat)) 
    dummy_frame.ix[row, codes] = 1

data = data.join(dummy_frame.add_prefix('category_'))

from mpl_toolkits.basemap import Basemap 
import matplotlib.pyplot as plt

def basic_haiti_map(ax=None, lllat=17.25, urlat=20.25, lllon=-75, urlon=-71):
    # create polar stereographic Basemap instance. 
    m = Basemap(ax=ax, projection='stere', 
                lon_0=(urlon + lllon) / 2, 
                lat_0=(urlat + lllat) / 2,
                llcrnrlat=lllat, urcrnrlat=urlat, 
                llcrnrlon=lllon, urcrnrlon=urlon, 
                resolution='f')
    # draw coastlines, state and country boundaries, edge of map. m.drawcoastlines()
    m.drawstates()
    m.drawcountries()
    return m

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 10)) 
fig.subplots_adjust(hspace=0.05, wspace=0.05)

to_plot = ['2a', '1', '3c', '7a']

lllat=17.25; urlat=20.25; lllon=-75; urlon=-71

for code, ax in zip(to_plot, axes.flat):
    m = basic_haiti_map(ax, lllat=lllat, urlat=urlat,
                        lllon=lllon, urlon=urlon) 

    cat_data = data[data['category_%s' % code] == 1]

    # compute map proj coordinates.
    print cat_data.LONGITUDE, cat_data.LATITUDE
    x, y = m(cat_data.LONGITUDE, cat_data.LATITUDE)

    m.plot(x, y, 'k.', alpha=0.5)
    ax.set_title('%s: %s' % (code, english_mapping[code]))

回答1:

This is resolved by changing m(cat_data.LONGITUDE, cat_data.LATITUDE) to m(cat_data.LONGITUDE.values, cat_data.LATITUDE.values), thanks to Alex Messina's finding.

With a little further study of mine, pandas changed that Series data of DataFrame (derived from NDFrame) should be passed with .values to a Cython function like basemap/proj since v0.13.0 released on 31 Dec 2013 as below.

Quote from github commit log of pandas:

+.. warning::
 +
 +   In 0.13.0 since ``Series`` has internaly been refactored to no longer sub-class ``ndarray``
 +   but instead subclass ``NDFrame``, you can **not pass** a ``Series`` directly as a ``ndarray`` typed parameter
 +   to a cython function. Instead pass the actual ``ndarray`` using the ``.values`` attribute of the Series.
 +
 +   Prior to 0.13.0
 +
 +   .. code-block:: python
 +
 +        apply_integrate_f(df['a'], df['b'], df['N'])
 +
 +   Use ``.values`` to get the underlying ``ndarray``
 +
 +   .. code-block:: python
 +
 +        apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)

You can find the corrected version of the example code here.