I am developing an SMS application in Java. My clients send queries via SMS which will be forwarded to my server in the form of http requests through SMS Gateway. Now my app processes the requests and sends back responses to clients again through SMS Gateway. Maximum only 300 characters are sent as response. I'm expecting very high traffic (2000 requests/sec). I wanted to host my application with some webhosting company (considering mochahost). What factors should I consider before hosting (interms of RAM, CPU, etc) and also what shall be the major bottlenecks? Can dedicated tomcat server handle such high traffic if tuned properly? What are your suggestions?
There is no database interaction (I'm only using Java heap memory). I ran a test with JMeter(100 requests/sec). My heap memory usage was 35MB and average response time was 532ms.And also i'm not using any session variables.
It's difficult to answer your question without knowing what you're doing in your servlet. But the short answer is that it really doesn't have anything to do with tomcat.
We current use Dell R410s (dual quad core, 32G ram) for our Tomcat servers. For a REST service that talks to a membase cluster on the back end we can easily process ~ 15k req/second on a single server (this is using the Jersey JAX-RS implementation). We currently have 4 of these behind an F5 load balancer. Each of these requests is serviced in about 10ms on average.
What it really comes down to is the concurrency; How long does it take your servlet to do what it needs to do with a request. You've got a thread going for every concurrent request, so if you're trying to 2000 req/sec and a single request takes 500ms to process ... you're going to need a bit of hardware. The issue isn't tomcat, but one of available resources for your servlet.
A single Tomcat server with default settings on modest hardware should easily handle 2k requests/second, assuming it doesn't have too much work to do per request. If processing one request takes 500+ ms, you'll probably need to bump up the number of threads in the thread pool, and you might start pushing the limits. Alternately, if you can offload some of that work to some other thread(s), it will speed up the response times, and you could keep the default 200 threads. Then it's just a question of whether your worker thread(s) can keep up with incoming requests. That would depend on whether your load is constant or bursty and how much delay you can accept in processing. This doesn't even address HA, DR, and what your acceptable downtime is. It's all a big balancing act, and there are far too many variables to just give a cut-and-dried answer.
It looks like you may have to implement a cluster / load balancing approach. Take a look at this for an example.