My program needs to do 2 things.
Extract stuff from a webpage.
Do stuff with a webpage.
However, there are many webpages, such as Twitter and Facebook.
should I do this?
def facebookExtract():
code here
def twitterExtract():
code here
def myspaceExtract():
code here
def facebookProcess():
code here
def twitterProcess():
code here
def myspaceProcess():
code here
Or, should I have some sort of class? When is it recommended to use classes, and when is it recommend to just use functions?
"My program needs to do 2 things."
When you start out like that, the objects cannot be seen. You're perspective isn't right.
Change your thinking.
"My program works with stuff"
That's OO thinking. What "stuff" does your program work with? Define the stuff. Those are your basic classes. There's a class for each kind of stuff.
"My program gets the stuff from various sources"
There's a class for each source.
"My program displays the stuff"
This is usually a combination of accessor methods of the stuff plus some "reporting" classes that gather parts of the stuff to display it.
When you start out defining the "stuff" not the "do", you're doing OO programming. OO applies to everything, since every single program involves "doing" and "stuff". You can chose the "doing" POV (which is can be procedural or functional), or you can chose the "stuff" POV (which is object-oriented.)
I regularly define classes for solving problems for a few reasons, I'll model an example of my thinking below. I have no compunctions about mixing OO models and procedural styles, that's often more a reflection of your work society than personal religion. It often works to have a procedural facade for a class hierarchy if that's what other maintainers expect. (Please excuse the PHP syntax.)
Put as much of the common stuff together in a single function. Once you've factored as much out as possible, build a mechanism for branching to the appropriate function for each website.
One possible way to do this is with python's
if/else
clauses, but if you have many such functions, you may want something more elegant such asF = __import__('yourproject.facebookmodule')
This lets you put the code that's specific for facebook in it's own area. Since you pass
__import__()
a string, you can modify that at runtime based on which site you're accessing, and then just call function F in your generic worker code.More on that here: http://effbot.org/zone/import-confusion.htm
You use OOP when it makes sense, when it makes developing the solution quicker and when it makes the end result easier to read, understand and maintain.
In this case it might make sense to create a generic Extractor interface/class and then have subclasses for Twitter, MySpace, Facebook, etc but this really depends on how site-specific the extraction is. The idea of this kind of abstraction is to hide such details. If you can do it, it makes sense. If you can't you probably need a different approach.
It may also be that similar benefits can be obtained from good decomposition of a procedural solution.
Remember at the end of the day that all these things are just tools. Pick the best one for that particular job rather than picking the hammer and then trying to turn everything into a nail.
It's up to you. I personally try to stay away from Java-style classes when programming in python. Instead, I use dicts and/or simple objects.
For instance, after defining these functions (the ones you defined in the question), I'd create a simple dict, maybe like this:
or, better yet, use introspection to get the process/extract function automatically:
Where
module
is the current module (or the module that has these functions).To get this module as an object:
Assuming of course, that the generic main function does something like this:
My favorite rule of thumb: if you're in doubt (unspoken assumption: "and you're a reasonable person rather than a fanatic";-), make and use some classes. I've often found myself refactoring code originally written as simple functions into classes -- for example, any time the simple functions' best way to communicating with each others is with globals, that's a code smell, a strong hint that the system's factoring is not really good -- and often refactoring the OOP way is a reasonable fix for that.
Python is multi-paradigm, but its central paradigm is OOP (much like, say, C++). When a procedural or functional approach (maybe through generators &c) is optimal for some part of the system, that generally stands out -- for example, static functions are also a code smell, and if your classes have any substantial amount of those THAT is a hint to refactor things to avoid that requirement.
So, assuming you have a rich grasp of all the paradigms Python affords -- if you're STILL in doubt, that suggests you probably want to go OOP for that part of your system! Just because Python supports OOP even more wholly than it supports functional programming and the like.
From your very skeletal code, it seems to me that each extract/process pair belongs together and probably needs to communicate state, so a small set of classes with extraction and processing methods seems a natural fit.