Tactics for using PHP in a high-load site

2020-01-27 08:58发布

Before you answer this I have never developed anything popular enough to attain high server loads. Treat me as (sigh) an alien that has just landed on the planet, albeit one that knows PHP and a few optimisation techniques.


I'm developing a tool in PHP that could attain quite a lot of users, if it works out right. However while I'm fully capable of developing the program I'm pretty much clueless when it comes to making something that can deal with huge traffic. So here's a few questions on it (feel free to turn this question into a resource thread as well).

Databases

At the moment I plan to use the MySQLi features in PHP5. However how should I setup the databases in relation to users and content? Do I actually need multiple databases? At the moment everything's jumbled into one database - although I've been considering spreading user data to one, actual content to another and finally core site content (template masters etc.) to another. My reasoning behind this is that sending queries to different databases will ease up the load on them as one database = 3 load sources. Also would this still be effective if they were all on the same server?

Caching

I have a template system that is used to build the pages and swap out variables. Master templates are stored in the database and each time a template is called it's cached copy (a html document) is called. At the moment I have two types of variable in these templates - a static var and a dynamic var. Static vars are usually things like page names, the name of the site - things that don't change often; dynamic vars are things that change on each page load.

My question on this:

Say I have comments on different articles. Which is a better solution: store the simple comment template and render comments (from a DB call) each time the page is loaded or store a cached copy of the comments page as a html page - each time a comment is added/edited/deleted the page is recached.

Finally

Does anyone have any tips/pointers for running a high load site on PHP. I'm pretty sure it's a workable language to use - Facebook and Yahoo! give it great precedence - but are there any experiences I should watch out for?

23条回答
趁早两清
2楼-- · 2020-01-27 09:21

@Gary

Don't use MySQLi -- PDO is the 'modern' OO database access layer. The most important feature to use is placeholders in your queries. It's smart enough to use server side prepares and other optimizations for you as well.

I'm loking over PDO at the moment and it looks like you're right - however I know that MySQL are developing the MySQLd extension for PHP - I think to succeed either MySQL or MySQLi - what do you think about that?


@Ryan, Eric, tj9991

Thanks for the advice on PHP's caching extensions - could you explain reasons for using one over another? I've heard great things about memcached through IRC but have never heard of APC - what are your opinions on them? I assume using multiple caching systems is pretty counter-effective.

I will definitely be sorting out some profiling testers - thank you very much for your recommendations on those.

查看更多
地球回转人心会变
3楼-- · 2020-01-27 09:23

Sure pdo is nice, but there has been some controversy about it's performance versus mysql and mysqli, although it seems fixed now.

You should use pdo if you envision portability, but if not, mysqli should be the way. It has an OO interface, prepared statements, and most of what pdo offers (except, well, portability).

Plus, if performance is really needed, prepare for the (native mysql) MysqLnd driver in PHP 5.3, who will be much more tightly integrated with php, with better performance and improved memory usage (and statistics for performance tuning).

Memcache is nice if you have clustered servers (and YouTube-like load), but i'd try out APC first too.

查看更多
smile是对你的礼貌
4楼-- · 2020-01-27 09:23

Look into mod_cache, an output cache for the Apache web server, simillar to the output caching in ASP.NET.

Yes, I can see that it's still experimental but it will be final someday.

查看更多
叛逆
5楼-- · 2020-01-27 09:24

General

  • Do not try to optimize before you start to see real world load. You might guess right, but if you don't, you've wasted your time.
  • Use jmeter, xdebug or another tool to benchmark the site.
  • If load starts to be an issue, either object or data caching will likely be involved, so generally read up on caching options (memcached, MySQL caching options)

Code

  • Profile your code so that you know where the bottleneck is, and whether it's in code or the database

Databases

  • Use MYSQLi if portability to other databases is not vital, PDO otherwise
  • If benchmarks reveal the database is the issue, check the queries before you start caching. Use EXPLAIN to see where your queries are slowing down.
  • After the queries are optimized and the database is cached in some way, you may want to use multiple databases. Either replicating to multiple servers or sharding (splitting the data over multiple databases/servers) may be appropriate, depending on the data, the queries, and the kind of read/write behavior.

Caching

  • Plenty of writing has been done on caching code, objects, and data. Look up articles on APC, Zend Optimizer, memcached, QuickCache, JPCache. Do some of this before you really need to, and you'll be less concerned about starting off unoptimized.
  • APC and Zend Optimizer are opcode caches, they speed up PHP code by avoiding reparsing and recompilation of code. Generally simple to install, worth doing early.
  • Memcached is a generic cache, that you can use to cache queries, PHP functions or objects, or entire pages. Code must be specifically written to use it, which can be an involved process if there are no central points to handle creation, update and deletion of cached objects.
  • QuickCache and JPCache are file caches, otherwise similar to Memcached. The basic concept is simple, but also requires code and is easier with central points of creation, update and deletion.

Miscellaneous

  • Consider alternative web servers for high load. Servers like lighthttp and nginx can handle large amounts of traffic in much less memory than Apache, if you can sacrifice Apache's power and flexibility (or if you just don't need those things, which often, you don't).
  • Remember that hardware is surprisingly cheap these days, so be sure to cost out the effort to optimize a large block of code versus "let's buy a monster server."
  • Consider adding the "MySQL" and "scaling" tags to this question
查看更多
家丑人穷心不美
6楼-- · 2020-01-27 09:26

The points made about cache are spot-on; it is the least complicated and most important part of building an efficient application. I'd like to add that while memcached is great, APC is about five times faster if your application lives on a single server.

The "Cache Performance Comparison" post at the MySQL performance blog has some interesting benchmarks on the subject - http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/.

查看更多
狗以群分
7楼-- · 2020-01-27 09:27

No two sites are alike. You really need to get a tool like jmeter and benchmark to see where your problem points will be. You can spend a lot of time guessing and improving, but you won't see real results until you measure and compare your changes.

For example, for many years, the MySQL query cache was the solution to all of our performance problems. If your site was slow, MySQL experts suggested turning the query cache on. It turns out that if you have a high write load, the cache is actually crippling. If you turned it on without testing, you'd never know.

And don't forget that you are never done scaling. A site that handles 10req/s will need changes to support 1000req/s. And if you're lucking enough to need to support 10,000req/s, your architecture will probably look completely different as well.

Databases

  • Don't use MySQLi -- PDO is the 'modern' OO database access layer. The most important feature to use is placeholders in your queries. It's smart enough to use server side prepares and other optimizations for you as well.
  • You probably don't want to break your database up at this point. If you do find that one database isn't cutting, there are several techniques to scale up, depending on your app. Replicating to additional servers typically works well if you have more reads than writes. Sharding is a technique to split your data over many machines.

Caching

  • You probably don't want to cache in your database. The database is typically your bottleneck, so adding more IO's to it is typically a bad thing. There are several PHP caches out there that accomplish similar things like APC and Zend.
  • Measure your system with caching on and off. I bet your cache is heavier than serving the pages straight.
  • If it takes a long time to build your comments and article data from the db, integrate memcache into your system. You can cache the query results and store them in a memcached instance. It's important to remember that retrieving the data from memcache must be faster than assembling it from the database to see any benefit.
  • If your articles aren't dynamic, or you have simple dynamic changes after it's generated, consider writing out html or php to the disk. You could have an index.php page that looks on disk for the article, if it's there, it streams it to the client. If it isn't, it generates the article, writes it to the disk and sends it to the client. Deleting files from the disk would cause pages to be re-written. If a comment is added to an article, delete the cached copy -- it would be regenerated.
查看更多
登录 后发表回答