Can this technology stack scale? [closed]

2019-09-13 07:25发布

问题:

My client ask me to build a realtime application that could chat, send images and videos all in realtime. He asked me to come up with my own technology stack, so I did a lot of research and found out that the easiest one to build would be using below tech stack

1) Node.js and cluster to max out the CPU core for one instance of server - Language

2) Socket.io - realtime framework

3) Redis - pub/sub for multiple instances of server

4) Nginx - to reverse proxy and load balance multiple servers

5) Amazon EC2 - to run the server

6) Amazon S3 and CloudFront - to save the images/videos and to deliver

Correct me if I'm wrong for the above stack. My real question is, can the above tech stack scale 1,000,000 messages per seconds (text, images, videos)?

Anyone who have experienced with node.js and socket.io, could give me an insights or an alternatives of the above stack.

Regards,

SinusGob

回答1:

My real question is, can the above tech stack scale 1,000,000 messages per seconds (text, images, videos)?

Sure it can. With the right design and enough hardware. The question your client should be asking is really not whether it can be made to go that big, but at what cost and practicality can it be done and are those the best choices.

Let's look at each piece you've mentioned:

node.js - For an I/O centric app, it's an excellent choice for high scale and it can scale by deploying many CPUs in a cluster (both multi-process per server and multi-server). How practical this type of scale is depends a lot on what kind of shared data all these server processes need access to. Usually, the data store ultimately ends up being the harder bottleneck in scaling because it's easy to throw more servers at the request processing. It's not so easy to throw more hardware at a centralized data store. There are ways to do that, but it depends a lot on the demands of the app for how you do it and how hard it is.

socket.io - If you need efficient server push of smallish messages, then socket.io is probably the best way to go because it's the most efficient at push to the client. It is not great at all types of transport though. For example, I wouldn't be moving large images or video around through socket.io as there are more purpose built ways to do that. So, the use of socket.io depends a lot on what exactly the app wants to use it for. If you wanted to push a video to a client, you could also push just an URL and have the client turn around and request the video via a regular http URL using well known high scale technology.

Redis - Again, great for some things, not great at everything. So, it really depends upon what you're trying to do. What I explained earlier is that the design of your data store and the number of transactions through it is probably where your real scale problems lie. If I were starting this job, I'd start with an understanding of the data storage needs for a server, transactions per second of various types, caching strategy, redundancy, fail-over, data persistence, etc... and design the high scale access to data first. I wouldn't be entirely sure redis was the preferred choice. I'd probably suggest you need a high scale database guy as a consultant early in the project.

Nginx - Lots of high scale sites using nginx so it's certainly a good tool. Whether it's exactly the right tool for you depends upon your design. I'd probably work on this part last because it seems less central to the design and once the rest of the system is laid out, you can then consider what you need here.

Amazon EC2 - One of several possible choices. These choices are hard to compare directly in an apples to apples comparison. Large scale systems have been built out of EC2 so there is proof of concept there and the general architecture seems an appropriate match. If you wanted to know where the real gremlins are there, you'd need a consultant that had done high scale stuff on EC2.

Amazon S3 - I personally know some very high storage and bandwidth sites using S3 for both video and images. It works for that.

So ... these are generally likely good tools to use if they are used in the right way. Redis would be a question-mark depending upon the storage needs of the actual application (you've provided zero requirements and a database can't be selected with zero requirements). A more reasoned answer would be based on putting together a high level set of requirements that analyze what the system needs to be able to do to serve 1,000,000 whatever. Those requirements could be compared with known capabilities for some of these pieces to start a ballpark on scaling a system. Then, you'd have to put together some benchmarking tests to run some tests on certain pieces of the system. As much of the success of failure would depend upon how the app was built and how the tools were used as it would which tools were selected. You can likely make a successful scale with many different types of tools. Heck, Facebook runs on PHP (well, a highly modified, customized PHP that is not really typical PHP at all at runtime).