I'm dabbling with the idea of setting up PHP CodeSniffer on our continuous integration server in an effort to improve the quality of our code-base. After reading the documentation I'm very excited about the idea of normalizing and enforcing our coding standards. However, I'm left wondering about the actual improvement to our product. I'm well aware that the sniffer only detects violations to a defined coding-standard but what type of benefits does a clean, consistent, code-base provide? Is it worth the extra work to refactor a project with 100k+ lines of code to conform to the PEAR standard?
For those who are not familiar with PHP CodeSniffer or code-smell in general, here is an example output:
FILE: /path/to/code/myfile.php
FOUND 5 ERROR(S) AFFECTING 2 LINE(S)
--
2 | ERROR | Missing file doc comment
20 | ERROR | PHP keywords must be lowercase; expected "false" but found "FALSE"
47 | ERROR | Line not indented correctly; expected 4 spaces but found 1
51 | ERROR | Missing function doc comment
88 | ERROR | Line not indented correctly; expected 9 spaces but found 6
Strictly speaking, the user/client would not notice any difference in a product that was refactored to be standards-compliant but I'm wondering if there are other hidden benefits
Right now our code is by no means sloppy and we try to follow our own personal standards which, for the most part, are derived from Pear's Coding Standards but a trained eye can spot the differences.
So my question is how much do they improve the quality of a product. What kind of latent benefits resulted from it?
Am I just being obsessive-compulsive with my desire to move our product closer to a set of standards? Would it be worth it? If so, What kind of strategy did you use to implement the code-sniffer and correct the subsequent violations that were detected?
There are lots of cases that require human judgement, and CodeSniffer doesn't have one.
Consistent brackets, indentation improve the code. Lack of space after commas in function call? Probably can be forgiven, but that's ERROR according to CodeSniffer.
IMHO there are way too many errors reported by CS. Even projects that appear to have neat code can easily run into thousands of CS issues. It quickly becomes tiring and nearly impossible to resolve all those issues, especially when it's a mix of real problems and obsesive-compulsive nonsense — both equally often marked as ERRORS.
You may be better off ignoring CS and spending time on actual improvements to the code (in terms of design, algorithms) rather than just completely superficial whitespace and comments changes (does 1-line
isAlpha
function really need 8 lines of comments? Yes, if you ask CS).CS can too easily become turd-polishing tool.
This is definitely a good thing. We run ours from an SVN hook so that all code has to pass the house standard (a modification from PEAR) before it can be committed (this was one of the best decisions I ever made).
Of course, this works best for a new project, where there isn't loads of legacy code to convert to the new standard. One way around this is modify your SVN pre-commit hook to only run new additions to the codesniffer and to ignore modifications. You can do this by adding the line:
This will exit the hook script if there isn't a new PHP code to parse. Hence all new files will need to obey the standard and you can bring the legacy code up to the standard in your own time.
Firstly, I'm the maintainer of PHP_CodeSniffer, so I'm clearly biased in this area. But I've also worked on some big code bases in my 10 years as a PHP dev, so I hope I can bring some concrete reasons to why coding standards are a good thing. I could write a blog series on this topic, but I'll just give you a little story about how PHP_CodeSniffer came about so you can understand the problem that the tool solved for me.
I've worked on a few big CMS projects. The first had a heap of code behind it and a relatively small development team. We had no standards. But we had no real problems. The team was small and stayed together for quite a while. We got used to each other.
Then we built a new CMS. We started fresh with just a couple of devs. I was then part of a team of just two developers. Again, coding standards didn't cause us any issues. Me and the other dev came from the same background and had already established some guidelines that we followed. We didn't need PHPCS back then.
But that team grew a developer at a time and eventually reached 12 full-time devs and quite a few came and went. Some came from the old CMS and some came from outside the company. All had different backgrounds and a different approach to development. It was obvious who wrote what code because the styles were so different. Whenever you worked on something complex, you'd first have to adjust to their style because it was just not the way you were used to seeing code. It's like reading Shakespeare for the first time. You need to get used to it before you can read at your natural pace.
For developers, that extra time to have to stop and figure out another coding style is just pure wasted time. It's a chance for an idea to slip away while you are bogged down with spacing, indentation and bracket position. At the end of the day, these things just don't matter. But let me tell you, they matter a lot if they are causing developers to break their flow. So we needed a way to make them get right out of the way and let developers do what they do best.
At the same time, we were digging into JavaScript a lot more. A new language where style was generally thrown out the window. Code was copy/pasted from example sites and mashed together. When learning to develop complex code in a new language, it made sense to find a way to make our JS look similar to our PHP. We can minimise it later, but we needed to be able to switch between languages quickly, again to keep our flow.
So PHP_CodeSniffer was born to do that. It helps developers work to the same coding style to make formatting and other flame-bait issues move completely out of the way. It allows you to treat your JS like your PHP to an extent. I use it to detect product-specific smells like non-translated strings or developers not using our proper class inclusion code. I also use it for language-specific smells like making sure that JS comma that kills IE isn't left around. You can use it for whatever you want. It comes with heaps of sniffs that are easy to merge together using the XML ruleset file. You can also write your own. You can integrate 3rd-party tools to make it a one-stop-shop for static code analysis. You can be as serious about standards and code smells as you like.
PHP_CodeSniffer, like any dev tool, should work for you. You don't work for it. If it produces too many errors that you don't care about, customise the standard to remove the ones you don't want, or turn the errors into warnings. But if my story sounds like something you're going through or might go through in the future, it's worth taking a close look at PHP_CodeSniffer to see if it can help you.
I hope that helps you, and others, understand why coding standards are really important to some projects and developers. It's not about the detail. It's about removing coding style from the list of things that cause developers to lose focus.
Having coding style conventions is a good idea, because it helps developers not get distracted by code written in a different style when working on code they did not write. It will make your code base superficially cleaner. It's great if you can automate it, but there is usually no need to go through great lengths to comply (unless the current style is terrible). If you already have a good-enough standard, stick to it.
Code smell is something different though: it is (a set of) symptoms that may indicate a deeper problem with the code. Examples are cyclomatic complexity, long method names, large classes, undescriptive names, duplicate code, etc. This is usually much more problematic, as it may thoroughly hurt the maintainability of your code. You should definitely solve these problems.
PHP CodeSniffer appears to be mainly developed for checking style conventions, not code smell. If you can use it to help enforce style conventions, great. But beware that it will not make your code base substantially better. You will want to do manual reviews to accomplish that.
If you want to use it to check if you comply to your current standard, that appears to be possible, see the answer to the question "I don't agree with your coding standards! Can I make PHP_CodeSniffer enforce my own?" in their FAQ.
CodeSniffer is a great thing to implement, but you have to know how to use it. Unless you have to comply with a given coding standard because you are submitting your work to some external project, you are free to completely define your own coding standards.
PHP CodeSniffer should make this very easy for you, because there are already many single code sniffs that you can include in your own coding standard definition and need not write them from scratch. While exploring the possibilities of the existing Codesniffers, you might end up writing an extension to an existing sniff or one sniff on your own, if you feel the need.
If you want to start with CodeSniffer, first step would be to grab a set of sniffs that completely resemble your current coding standards, and check the resulting errors and warnings. Don't apply one of the predefined standards, as this will most likely result in too many errors with too few benefit if fixed. For example, if you do not use PHPDoc to generate a documentation, it would be no use to fulfill all the codesniffer errors about missing PHPDoc tags and comments.
Are you providing PEAR packages for public distribution thru PEAR/PECL? If so then you probably want to stick to PEAR conventions.
Otherwise, I can't see it being worth the big refactor. The biggest thing is agreeing upon a coding standard for your team... doesn't have to be PEAR's standard... just make sure there is some standard convention enforced.
For instance, I'm a fan of the
format vs the PEAR standard..
Bottom line, don't worry too much about conforming to their standard if its going to be a ton of work, especially if you aren't putting the packages on PECL.