RuntimeError: OperationalError: (2003, Can't c

2019-08-23 04:16发布

问题:

I'm trying to run a Python(version 2.7.1') script where I am using pymysql package to create a table into a database from a CSV file.

It runs correctly in my local system ,however, the problem appears when running the same script as a part of a pipeline in Google Cloud Dataflow.

My Python function is the following one:

class charge_to_db(beam.DoFn):
    def process(self, element):
        import pymysql

        with open(element, 'r') as f:
            data = f.read().decode("UTF-8")

        datalist = []
        for line in data.split('\n'):
            datalist.append(line.split(','))

        db = pymysql.connect(host='IPadress', user='root', password='mypassword', database='stack_model')

        cursor = db.cursor()
        cursor.execute("DROP TABLE IF EXISTS stack_convergence")

        # create columnn names from the first line in fList
        up = "upper_bnd"
        primal = "primal"
        d = "dualit"
        gap = "gap_rel"
        teta = "teta"
        alpha = "alpha"
        imba = "imba_avg"
        price = "price_avg"
       # create STUDENT table // place a comma after each new column except the last
       queryCreateConvergenceTable = """CREATE TABLE stack_convergence(
                                {} float not null,
                                {} float not null,
                                {} float not null,
                                {} float not null,
                                {} float not null,
                                {} float not null,
                                {} float not null,
                                {} float not null )""".format(up, primal, d, gap, teta, alpha, imba, price)

        cursor.execute(queryCreateConvergenceTable)

When running this function in the cloud I'm obtaining the following error:

  RuntimeError: OperationalError: (2003, 'Can\'t connect to MySQL server on \'35.195.1.40\' (110 "Connection timed out")')

I don't know why this error is occurring because it runs correctly in local system, so from the local system I have acces to my cloud SQL instance, but not from the dataflow in the cloud.

Why is this error occurring?

回答1:

On Dataflow you cannot whitelist an IP to enable Dataflow to access a SQL instance. If you would be using Java, the easiest way would be to use JdbcIO / JDBC socket factory.

But since you're using Python, then mimicking the implementation of JdbcIO.read() using Python-specific database connectivity facilities would help. There's this related question with a workaround after changing some Cloud SQL settings and adding related python codes.

If this seems complex, alternatively you can export data from Cloud SQL to Cloud Storage and then load from Cloud Storage.