In the past, I used Microsoft Web Application Stress Tool and Pylot to stress test web applications. I'd written a simple home page, login script, and site walkthrough (in an ecommerce site adding a few items to a cart and checkout).
Just hitting the homepage hard with a handful of developers would almost always locate a major problem. More scalability problems would surface at the second stage, and even more - after the launch.
The URL of the tools I used were Microsoft Homer (aka Microsoft Web Application Stress Tool) and Pylot.
The reports generated by these tools never made much sense to me, and I would spend many hours trying to figure out what kind of concurrent load the site would be able to support. It was always worth it because the stupidest bugs and bottlenecks would always come up (for instance, web server misconfigurations).
What have you done, what tools have you used, and what success have you had with your approach? The part that is most interesting to me is coming up with some kind of a meaningful formula for calculating the number of concurrent users an app can support from the numbers reported by the stress test application.
I've used JMeter. Besides testing the web server you can also test your database backend, messaging services and email servers.
Also, there is an awesome open-source pure-python distributed and scaleable locust framework that uses greenlets. It's great at simulating enormous amount of simultaneous users.
Visual Studio Test Edition 2010 (2008 good too). This is a really easy and powerful tool to create web/load tests with.
The bonus with this tool when using against Windows servers is that you get integrated access to all the perfmon server stats in your report. Really useful.
The other bonus is that with Visual Studio project you can integrate a "Performance Session" that will profile the code execution of your website.
If you are serving webpages from a windows server, this is the best tool out there.
There is a separate and expensive licence required to use several machines to load test the application however.
Here's another vote for JMeter.
JMeter is an open-source load testing tool, written in Java. It's capable of testing a number of different server types (for example, web, web services, database, just about anything that uses requests basically).
It does however have a steep learning curve once you start getting to complicated tests, but it's well worth it. You can get up and running very quickly, and depending on what sort of stress-testing you want to do, that might be fine.
Pros:
Cons:
Blaze meter has a chrome extension for recording sessions and exporting them to JMeter (currently requires login). You also have the option of paying them money to run it on their cluster of JMeter servers (their pricing seems much better than LoadImpact which I've just stopped using):
I don't have any association with them, I just like the look of their service, although I haven't used the paid version yet.
We have developed a process that treats load and performance measurenment as a first-class concern - as you say, leaving it to the end of the project tends to lead to disappointment...
So, during development, we include very basic multi-user testing (using selenium), which checks for basic craziness like broken session management, obvious concurrency issues, and obvious resource contention problems. Non-trivial projects include this in the continuous integration process, so we get very regular feedback.
For projects that don't have extreme performance requirements, we include basic performance testing in our testing; usually, we script out the tests using BadBoy, and import them into JMeter, replacing the login details and other thread-specific things. We then ramp these up to the level that the server is dealing with 100 requests per second; if the response time is less than 1 second, that's usually sufficient. We launch and move on with our lives.
For projects with extreme performance requirements, we still use BadBoy and JMeter, but put a lot of energy into understanding the bottlenecks on the servers on our test rig(web and database servers, usually). There's a good tool for analyzing Microsoft event logs which helps a lot with this. We typically find unexpected bottlenecks, which we optimize if possible; that gives us an application that is as fast as it can be on "1 web server, 1 database server". We then usually deploy to our target infrastructure, and use one of the "Jmeter in the cloud" services to re-run the tests at scale.
Again, PAL reports help to analyze what happened during the tests - you often see very different bottlenecks on production environments.
The key is to make sure you don't just run your stress tests, but also that you collect the information you need to understand the performance of your application.