I am writing a banner ad engine in php/mysql. I don't want to use OpenX or a turn-key solution because there is going to be a bunch of custom functionality that I would rather not rely upon an existing system to have to flex to accommodate.
Here is the thinking and my current approaches to the impressions architecture:
- Requests to banner server come in via javascript snippet on target site
- Server has cached list of banners to serve and returns appropriate image as needed.
- Impressions are recorded to log files similar to apache's access log style: using a rotating text file, adding one line per impression.
- Log files record user IP, url, banner id, time, etc.
- Log files are rotated hourly and are then summarized (also hourly) to a mysql DB so advertisers can get (close to) real-time stats on activity
My concerns are:
- Is writing to a "log" file an efficient and scalable way to record the impressions? We expect to serve 13-15 million impressions a month.
- Any pitfalls with the log writing approach?
don't forget to record user agent as well.
i would recommend using a text file for log and having scripts parsing it (making caches etc.) for further display of stats
I would suggest using lighttpd with mod_accesslog . lighttpd is recommended where static file pushing is the main objective.
Since you are using javascript on the individual websites, include the needed information in the query string requesting the image. This is similar to how Google Analytics aggregates their information, by the way.
Rotate the web servers access log and parse during each rotation.
Sculpt you access log format to be directly importable into a temporary mysql table for further processing.
If you are expecting massive amounts of impressions early and expect to scale at some point, you may consider using a CDN.