How much work is it reasonable for an object constructor to do? Should it simply initialize fields and not actually perform any operations on data, or is it okay to have it perform some analysis?
Background: I was writing a class which is responsible for parsing an HTML page and returning various information based on the parsed information. The design of the class is such that the class' constructor does the parsing, throwing an exception if an error occurs. Once the instance is initialized, the parsed values are available without further processing via accessors. Something like:
public class Parser {
public Parser(final String html) throws ParsingException {
/* Parsing logic that sets private fields */
/* that throws an error if something is erroneous.*/
}
public int getNumOfWhatevers() { return private field; }
public String getOtherValue() { return other private field; }
}
After designing the class I started to wonder if this was correct OO practice. Should the parsing code be placed within a void parseHtml()
method and the accessors only return valid values once this method is called? I feel as though my implementation is correct, but I can't help but feel that some OO purists might find it incorrect for some reason and that an implementation such as the following would be better:
public class Parser {
public Parser(final String html) {
/* Remember html for later parsing. */
}
public void parseHtml() throws ParsingException {
/* Parsing logic that sets private fields */
/* that throws an error if something is erroneous.*/
}
public int getNumOfWhatevers() { return private field; }
public String getOtherValue() { return other private field; }
}
Are there instances where initialization code, such as parsing information, should not occur within the constructor, or am I just being silly and second-guessing myself?
What are the benefits/drawbacks of splitting the parsing from the constructor?
Thoughts? Insights?
It sounds very much as though your object isn't really a parser. Does it just wrap a call to a parser and presents the results in a (presumably) more usable fashion? Because of this, you need to call the parser in the constructor as your object would be in a non-useful state otherwise.
I'm not sure how the "object-oriented" part helps here. If there's only one object and it can only process one specific page then it's not clear why it needs to be an object. You could do this just as easily in procedural (i.e. non-OO) code.
For languages that only have objects (e.g. Java) you could just create a
static
method in a class that had no accessible constructor and then invoke the parser and return all of the parsed values in aMap
or similar collectionI think when you create a class ($obj = new class), the class should not affect the page at all, and should be relatively low processing.
For instance:
If you have a user class, it should be checking for incoming login/logout parameters, along with cookies, and assigning them to class variables.
If you have a database class, it should make the connection to the database so it is ready when you are going to start a query.
If you have a class that deals with a particular form, it should go get the form values.
In a lot of my classes, I check for certain parameters to define an 'action', like add, edit or delete.
All of these things don't really affect the page, so it wouldn't matter too much if you created them or not. They are simply ready for when you are going to call that first method.
I normally follow one easy principle:
Everything that is mandatory for the correct existence and behavior of the class instance should be passed and done into the constructor.
Every other activity is done by other methods.
The constructor should never:
Because I learned the hard way that while you are in the constructor, the object is in a incoherent, intermediate state which is too dangerous to handle. Some of this unexpected behavior could be expected from your code, some could be from the language architecture and compiler decisions. Never guess, stay safe, be minimal.
In your case, I would use a Parser::parseHtml(file) method. The instantiation of the parser and the parsing are two different operations. When you instance a parser, the constructor puts it in the condition to perform its job (parsing). Then you use its method to perform the parsing. You then have two choices:
The second strategy grants you better granularity, as the Parser is now stateless, and the client needs to interact with the methods of the ParsingResult interface. The Parser interface remains sleek and simple. The internals of the Parser class will tend to follow the Builder pattern.
You comment: "I feel as though returning an instance of a parser that hasn't parsed anything (as you suggest), a constructor that's lost its purpose. There's no use in initializing a parser without the intent of actually parsing the information. So if parsing is going to happen for sure, should we parse as early as possible and report and errors early, such as during the construction of the parser? I feel as though initializing a parser with invalid data should result in an error being thrown."
Not really. If you return an instance of a Parser, of course it's going to parse. In Qt, when you instantiate a button, of course it's going to be shown. However, you have the method QWidget::show() to manually call before something is visible to the user.
Any object in OOP has two concerns: initialization, and operation (ignore finalization, it's not on discussion right now). If you keep these two operations together, you both risk trouble (having an incomplete object operating) and you lose flexibility. There are plenty of reasons why you would perform intermediate setup of your object before calling parseHtml(). Example: suppose you want to configure your Parser to be strict (so to fail if a given column in a table contains a string instead of an integer) or permissive. Or to register a listener object which is warned every time a new parsing is performed or ended (think GUI progress bar). These are optional information, and if your architecture puts the constructor as the übermethod that does everything, you end up having a huge list of optional method parameters and conditions to handle into a method which is inherently a minefield.
"Caching should not be the responsibility of a parser. If data is to be cached, a separate cache class should be created to provide that functionality."
On the opposite. If you know that you are going to use the parsing functionality on a lot of files, and there's a significant chance that the files are going to be accessed and parsed again later on, it is internal responsability of the Parser to perform smart caching of what it already saw. From the client perspective, it is totally oblivious if this caching is performed or not. He is still callling the parsing, and still obtaining a result object. but it is getting the answer much faster. I think there's no better demonstration of separation of concerns than this. You boost performance with absolutely no change in the contract interface or the whole software architecture.
However, note that I am not advocating that you should never use a constructor call to perform parsing. I am just claiming that it's potentially dangerous and you lose flexibility. There are plenty of examples out there where the constructor is at the center of the actual activity of the object, but there is also plenty of examples of the opposite. Example (although biased, it arises from C style): in python, I would consider very weird something like this
instead of the actual
But I am sure there are IO access libraries using the first approach (with the second as a sugar-syntax approach).
Edit: finally, remember that while it's easy and compatible to add in the future a constructor "shortcut", it is not possible to remove this functionality if you find it dangerous or problematic. Additions to the interface are much easier than removals, for obvious reasons. Sugary behavior must be weighted against future support you have to provide to that behavior.
I agree with the posters here arguing minimal work in the constructor, really just putting the object into a non-zombie state, then have verb functions like parseHTML();
One point I'd like to make, although I don't want to cause a flame war, is consider the case of a non-exception environment. I know you're talking about C#, but I try to keep my programming models as similar as possible between c++ and c#. For various reasons, I don't use exceptions in C++ (think embedded video game programming), I use return code errors.
In this case, I can't throw exceptions in a constructor, so I tend to not have a constructor do anything which could fail. I leave that to the accessor functions.
"Should the parsing code be placed within a void parseHtml() method and the accessors only return valid values once this method is called?"
Yes.
"The design of the class is such that the class' constructor does the parsing"
This prevents customization, extension, and -- most importantly -- dependency injection.
There will be times when you want to do the following
Construct a parser.
Add Features to the parser: Business Rules, Filters, Better Algorithms, Strategies, Commands, whatever.
Parse.
Generally, it's best to do as little as possible in a constructor so that you are free to extend or modify.
Edit
"Couldn't extensions simply parse the extra information in their constructors?"
Only if they don't have any kind of features that need to be injected. If you want to add features -- say a different strategy for constructing the parse tree -- your subclasses have to also manage this feature addition before they parse. It may not amount to a simple
super()
because the superclass does too much."Also, parsing in the constructor allows me to fail early"
Kind of. Failing during construction is a weird use case. Failing during construction makes it difficult to construct a parser like this...
Usually a construction failure means you're out of memory. There's rarely a good reason to catch construction exceptions because you're doomed anyway.
You're forced to build the parser in a method body because it has too complex arguments.
In short, you've removed options from the clients of your parser.
"It's inadvisable to inherit from this class to replace an algorithm."
That's funny. Seriously. It's an outrageous claim. No algorithm is optimal for all possible use cases. Often a high-performance algorithm uses a lot of memory. A client may want to replace the algorithm with a slower one that uses less memory.
You can claim perfection, but it's rare. Subclasses are the norm, not an exception. Someone will always improve on your "perfection". If you limit their ability to subclass your parser, they'll simply discard it for something more flexible.
"I don't see needing step 2 as described in the answer."
A bold statement. Dependencies, Strategies and related injection design patterns are common requirements. Indeed, they're so essential for unit testing that a design which makes it difficult or complex often turns out to be a bad design.
Limiting the ability to subclass or extend your parser is a bad policy.
Bottom Line.
Assume nothing. Write a class with as few assumptions about it's use cases as possible. Parsing at construction time makes too many assumptions about client use cases.
Why not just pass the parser to the constructor? This would allow you to change the implementation without changing the model:
This should give you some flexibility in changing/improving how your application performs without needing to rewrite this class.