I connect to sqlite database in Blaze using
df = bz.Data("sqlite:///<mydatabase>)
everything works fine but I do not know how to provide user-defined functions in my interaction with df.
I have a column called IP in df which is text containing IP addresses. I also have a function toSubnet (x, y) which takes in an IP address (x) in text format and return its /y subnet. For example:
out = toSubnet('1.1.1.1',24)
out
1.1.1.0/24
Now if I want to map all IPs to their /14 subnets, I use:
df.IP.map(lambda x:toSubnet(x,14),'string')
Which works when the backend is CSV. But with sqlite backend I get NotImplementedError
.
What's wrong here?
NB: This doesn't tell you how to do exactly what you want, but it provides an explanation of why it doesn't work and a possible next step to get this to work with SQLite.
The problem you're running into is that it is very difficult to efficiently execute arbitrary Python code against an arbitrary SQL database.
Blaze takes user code and translates it to SQL as best it can, using SQLAlchemy, which I don't think has a way to do this.
Since nearly every database has a different way of dealing with user-defined functions (UDFs), it's quite a lot of work to build an API that allows the following:
- A user to define a function in Python
- Turn that pure Python function into a UDF native to the database.
That said, the Python interface to SQLite has a way to register Python functions that can be executed in a SQL statement:
https://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.create_function
There currently isn't a way to express a UDF with Blaze using the SQL backend, though this could be implemented as new expression type that allows a user to register a function via the underlying database's db API.