Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 5 months ago.
I'm looking for a library that lets me run SQL-like queries on python "object databases". With object database I mean a fairly complex structure of python objects and lists in memory. Basically this would be a "reverse ORM" - instead of providing an object oriented interface to a relational database, it would provide a SQL-ish interface to an object database.
C#'s LINQ is very close. Python's list comprehensions are very nice, but the syntax gets hairy when doing complex things (sorting, joining, etc.). Also, I can't (easily) create queries dynamically with list comprehensions.
The actual syntax could either be string based, or use a object-oriented DSL (a la from(mylist).select(...)). Bonus points if the library would provide some kind of indices to speed up search.
Does this exist or do I have to invent it?
Dee is aiming to be SQL (ish; the author prefers relational calculus) for in-memory structures. There's a GROUP BY equivalent and everything.
If you like list comprehensions, don't forget about their "lazy" counterpart: generator expressions. These should at least to some extent solve the question of dynamically constructing queries. Complement this with itertools, and of course some of the builtin functions that work on iterables, and you could say you have your Python "LINQ to Object" (or at least something very close).
You won't get "pseudo-sql" syntax built in in python as with C# and LINQ, but that is a matter of taste I guess. (personally, I like the fact that it remains Python, uses the same keywords etc. instead of trying to be sql: familiar if
and for
clauses instead of introducing where
and from
and so on).
The object-oriented DSL approach seems feasible (and easier to implement than string-based I think). You will find this sort of thing in ORMs like SqlAlchemy, but I don't know if anyone has done that already for "normal" data structures (I guess people don't really feel it's necessary?)
For indices and so on: I guess you'll have to search for actual "object databases"...
One major difference between what SQL does and what you can do in idiomatic python, in SQL, you tell the evaluator what information you are looking for, and it works out the most efficient way of retrieving that based on the structure of the data it holds. In python, you can only tell the interpreter how you want the data, there's no equivalent to a query planner.
That said, there are a few extra tools above and beyond list comprehensions that help alot.
First, use a structure that closely resembles the declarative nature of SQL. Many of them are builtins. map
, filter
, reduce
, zip
, all
, any
, sorted
, as well as the contents of the operator
, functools
and itertools
packages, all offer a fairly concise way of expressing data queries.
We're launching PythonQL, which does exactly what you mention (its quite similar to C# LINQ). PythonQL demo site
Not quite exactly what you describe, but littletable works with lists of objects in memory, supports joins, pivots, queries. And each result is in turn another littletable, so you can easily perform a succession of filtering steps.
I came across this as I was looking for a good library to support a similar use case. Turns out that Pandas works nicely for cases where join/group by type of queries are needed or for time series processing.
See if sql4csv might be of help.