I am creating a query builder class that will help in constructing a query for mongodb from URL params. I have never done much object oriented programming, or designed classes for consumption by people other than myself, besides using basic language constructs and using django's built in Models.
So I have this QueryBuilder
class
class QueryHelper():
"""
Help abstract out the problem of querying over vastly
different dataschemas.
"""
def __init__(self, collection_name, field_name, params_dict):
self.query_dict = {}
self.params_dict = params_dict
db = connection.get_db()
self.collection = db[collection_name]
def _build_query(self):
# check params dict and build a mongo query
pass
Now in _build_query
I will be checking the params_dict
and populating query_dict
so as to pass it to mongo's find()
function.
In doing this I was just wondering if there was an absolute correct approach to as whether _build_query
should return a dictionary or whether it should just modify self.query_dict
. Since it is an internal method I would assume it is OK to just modify self.query_dict
. Is there a right way (pythonic) way of approaching this? Is this just silly and not an important design decision? Any help is appreciated.
Returning a value is preferable as it allows you to keep all the attribute modifying in one place (__init__
). Also, this makes it easier to extend the code later; suppose you want to override _build_query
in a subclass, then the overriding method can just return a value, without needing to know which attribute to set. Here's an example:
class QueryHelper(object):
def __init__(self, param, text):
self._param = param
self._query = self._build_query(text)
def _build_query(self, text):
return text + " and ham!"
class RefinedQueryHelper(QueryHelper):
def _build_query(self, text):
# no need to know how the query object is going to be used
q = super(RefinedQueryHelper, self)._build_query()
return q.replace("ham", "spam")
vs. the "setter version":
class QueryHelper(object):
def __init__(self, param, text):
self._param = param
self._build_query(text)
def _build_query(self, text):
self._query = text + " and ham!"
class RefinedQueryHelper(QueryHelper):
def _build_query(self, text):
# what if we want to store the query in __query instead?
# then we need to modify two classes...
super(RefinedQueryHelper, self)._build_query()
self._query = self._query.replace("ham", "spam")
If you do choose to set an attribute, you might want to call the method _set_query
for clarity.
It's perfectly fine to modify self.query_dict
as the whole idea of object-oriented programming is that methods can modify an object's state. As long as an object is in a consistent state after a method has finished, you're fine. The fact that _build_query
is an internal method does not matter. You can choose to call _build_query
after in __init__
to construct the query already when the object is created.
The decision mostly matters for testing purposes. Fur testing purposes, it's convenient to test each method individually without necessarily having to test the whole object's state. But that does not apply in this case because we're talking about an internal method so you alone decide when to call that method, not other objects or other code.
If you return anything at all, I'd suggest self
. Returning self
from instance methods is convenient for method chaining, since each return value allows another method call on the same object:
foo.add_thing(x).add_thing(y).set_goal(42).execute()
This is sometimes referred to as a "fluent" API.
However, while Python allows method chaining for immutable types such as int
and str
, it does not provide it for methods of mutable containers such as list
and set
—by design—so it is arguably not "Pythonic" to do it for your own mutable type. Still, lots of Python libraries do have "fluent" APIs.
A downside is that such an API can make debugging harder. Since you execute the whole statement or none of it, you can't easily see the object at intermediate points within the statement. Of course, I usually find print
perfectly adequate for debugging Python code, so I'd just throw a print
in any method whose return value I was interested in!
While it's common for methods of an object to directly modify its state, it can sometimes be advantageous for an object to be its own "client", and access themselves indirectly through (typically) private access methods.
Doing this provides better encapsulation and the benefits that follow from it (insulation from the implementation details being the major one). However doing so may be impractical because it would require too much additional code, and is often slower which might affect performance an unacceptable amount adversely — so trade-offs have to be made with the ideal.