How to provide user defined function for python bl

2019-06-23 00:55发布

问题:

I connect to sqlite database in Blaze using df = bz.Data("sqlite:///<mydatabase>) everything works fine but I do not know how to provide user-defined functions in my interaction with df. I have a column called IP in df which is text containing IP addresses. I also have a function toSubnet (x, y) which takes in an IP address (x) in text format and return its /y subnet. For example:

out = toSubnet('1.1.1.1',24)
out
1.1.1.0/24

Now if I want to map all IPs to their /14 subnets, I use:

df.IP.map(lambda x:toSubnet(x,14),'string')

Which works when the backend is CSV. But with sqlite backend I get NotImplementedError. What's wrong here?

回答1:

NB: This doesn't tell you how to do exactly what you want, but it provides an explanation of why it doesn't work and a possible next step to get this to work with SQLite.

The problem you're running into is that it is very difficult to efficiently execute arbitrary Python code against an arbitrary SQL database.

Blaze takes user code and translates it to SQL as best it can, using SQLAlchemy, which I don't think has a way to do this.

Since nearly every database has a different way of dealing with user-defined functions (UDFs), it's quite a lot of work to build an API that allows the following:

  1. A user to define a function in Python
  2. Turn that pure Python function into a UDF native to the database.

That said, the Python interface to SQLite has a way to register Python functions that can be executed in a SQL statement:

https://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.create_function

There currently isn't a way to express a UDF with Blaze using the SQL backend, though this could be implemented as new expression type that allows a user to register a function via the underlying database's db API.