I feel the term rather pejorative. Hence, I am flabbergasted by the two sentences in Wikipedia:
Imperative programming is known for employing side effects to make programs function. Functional programming in turn is known for its minimization of side effects. [1]
Since I am somewhat Math-biased, the latter sounds excellent. What are the arguments for side-effects? Do they mean the loss of control or the acceptance of uncertainty? Are they a good thing?
The jury is still out. And the trial has been going on since the dawn of computing, so don't be expecting a verdict any time soon.
It is true, as some people here mention, that without side effects one cannot make a useful application. But from that it does not follow that using side effects in an uncontrolled way is a good thing.
Consider the following analogy: a processor with an instruction set that had no branch instructions would be ansolutely worthless. However, it does not follow that programmers must use gotos all the time. On the contrary, it turned out that structured programming and later OOP languages like Java could do without even having a goto statement, and nobody missed it.
(To be sure, there is still goto in Java - it's now called break, continue and throw.)
In von-Neumann machines, side effects are things that make the machine work. Essentially, no matter how you write your program, it'll need to do side-effects to work (at a low level view).
Programming without side effects means abstracting side effects away so that you could think about the problem in general -without worrying about the current state of the machine- and reduce dependencies across different modules of a program (be it procedures, classes or whatever else). By doing so, you'll make your program more reusable (as modules do not depend on a particular state to work).
So yes, side-effect free programs are a good thing but side-effects are just inevitable at some level (so they cannot be considered as "bad").
Side effects are just like any other weapon. They are unquestionably useful, and potentially very dangerous when improperly handled.
Like weapons, you have side effects of all different kinds of all different degrees of lethality.
In C++, side effects are totally unrestricted, thanks to pointers. If a variable is declared as "private", you can still access or change it using pointer tricks. You can even alter variables which aren't in scope, such as the parameters and locals of the calling function. With a little help from the OS (mmap), you can even modify your program's machine code at runtime! When you write in a language like C++, you are elevated to the rank of Bit God, master of all memory in your process. All optimizations the compiler makes to your code are made with the assumption you don't abuse your powers.
In Java, your abilities are more restricted. All variables in scope are at your control, including variables shared by different threads, but you must always adhere to the type system. Still, thanks to a subset of the OS being at your disposal and the existence of static fields, your code may have non-local effects. If a separate thread somehow closes System.out, it will seem like magic. And it will be magic: side effectful magic.
Haskell (despite the propaganda about being pure) has the IO monad, which requires you register all your side effects with the type system. Wrapping your code in the IO monad is like the 3 day waiting period for handguns: you can still blow your own foot off, but not until you OK it with the government. There's also unsafePerformIO and its ilk, which are Haskell IO's black market, giving you side effects with "no questions asked".
Miranda, the predecessor to Haskell, is a pure functional language created before monads became popular. Miranda (as far as I've learned... if I'm wrong, substitute Lambda Calculus) has no IO primitives at all. The only IO done is compiling the program (the input) and running the program and printing the result (the output). Here, you have full purity. The order of execution is completely irrelevant. All "effects" are local to the functions which declare them, meaning never can two disjoint parts of code effect each other. It's a utopia (for mathematicians). Or equivalently a distpia. It's boring. Nothing ever happens. You can't write a server for it. You can't write an OS in it. You can't write SNAKE or Tetris in it. Everyone just kind of sits around looking mathematical.
Well, it's a lot easier and more intuitive to program with side effects, for one thing. Functional programming is difficult for a lot of people to wrap their head around -- find someone who's taught/TAed a class in Ocaml and you'll probably get all kinds of stories about people's abject failure to comprehend it. And what good is having beautifully designed, wonderfully side effect free functional code if nobody can actually follow it? Makes hiring people to get your software done rather difficult.
That's one side of the argument, at least. There's any number of reasons lots of people are going to have to learn all about functional style, side-effect-less code. Multithreading comes to mind.
Since your program has to have side effects to have any output or interesting effect (apart from heating your CPU), the question is rather where these side effects should be triggered in your program. They become only harmful if they are hidden in methods you don't expect them.
As a rule of thumb: Separate pure methods and methods with side effects. A method that prints something to the console should do only that and not compute some interesting value that you might want to use somewhere else.