For a personal project, I need to build a forum using PHP and MySQL. It is not possible for me to use an already-built forum package (such as phpBB).
I'm currently working through the logic needed to build such an application, but it's been a long day and I'm struggling with the concept of handling unread posts for users. One solution I had was to have a separate table which essentially holds all post IDs and user IDs, to determine if they've been read:
tbl_userReadPosts: user_id, post_id, read_timestamp
Obviously, if a user's ID appears in this table, we know they've read the post. This is great, except if we have thousdands of posts per day (which is more than possible in the system which is being proposed), and thousdands of users. This table would become huge within a matter of days, if not hours.
Another option would be to track the user's last activity as a timestamp, and then retrieve all posts made after their last activity was updated. This works in theory, but let's say a user is writing an extremely long post, and in the meantime several members also start new threads or reply to posts in other threads. When the user submits his new post, his last activity would be updated, and thus not match those made in the meantime.
Does anyone have experience with this, and how did you tackle it?
I've checked in phpBB and it seems that the system assigns a custom session to each user, and works on that basis, but the documentation is pretty sparse as to how this deals with unread posts.
Thoughts and opinions gratefully received, as always.
Sorry for the quick answer but I only have a second. You definitely do not want to store the read information in the database, as you've already deduced, this table would become gigantic.
Something in between what you've already suggested: Store the users last activity, and in conjunction with storing information of what they've seen in the cookie, to determine which threads/posts they've read already.
This offloads the storage to the client side cookie, which is far more efficient.
This method stores the most recently-accessed
postID
separately for eachforumID
.It's not as fine-grained as a solution that keeps track of each post individually, but it shrinks the amount of data that you need to store per user and still provides a decent way to keep track of a user's view history.
Note: Obviously this example is for demonstration purposes only. Some of the structure and logic may need to be changed when dealing with an actual database.
Phpbb2 has implemented this fairly simple. It just shows you all post since your last login. This way you don’t need to store any information about what the user actually has seen or read.
Something that wasn't suggested was using Big Data to store that kind of information, namely NoSQL. It is effectively made specifically for handling this kind of data.
I use MongoDB but you can find a NoSQL application to suit your needs. http://nosql.findthebest.com/
This will allow you to scale to other applicable uses rather than just what you are working on now. EG, forums, posts, tickets, notes, messages, etc.
Another suggestion is that you can alternatively store the data in as "Metadata", similar to the csv suggestion, but giving it a more flexible and storable structure, using serialize to compress the data for your object to load and unserialize at run time. Thus working like a session that doesn't expire that is associated with the user_id rather than session_id that can be loaded on demand and segregated out however you like. For example when a forum page is loaded for a particular user.
eg:
(dry coded examples - adjust to fit your own schema)
You can swap out file_x_contents with your RDBMS - eg:
You can also do other things like searching via regexp, segregate it out further (topic, category, etc), or change the method to be based on users that read posts in the forum (forum->post->viewedby) instead of forum posts the user read (user->metadata->forums). Especially if you already have a working "Total Views", but this would be more difficult of retrieving posts that have/have not been read by a particular user, while the reverse is true with the other method, or even use both methods in conjunction.
A table holding all user_ids and post_ids is a bad idea, as it grows exponentially. Imagine if your forum solution grew to a million posts and 50,000 users. Now you have 50 billion records. That'll be a problem.
The trick is to use a table as you said, but it only holds posts which have been read since the this login, of posts which were posted between the last login and this login.
All posts made prior to the last login are considered read.
IE, I last logged in on 4/3/2011, and then I log in today. All posts made before 4/3/2011 are considered read (they're not new to me). All posts between 4/3/2011 and now, are unread unless they are seen in the read table. The read table is flushed each time I log in.
This way your read posts table should never have more than a couple hundred records for each member.
Instead of having a new row for every post*user, you can have a field in the user-table that holds a comma-separated string with post-IDs that the user has read.
Obviously the user doesn't need to know that there are unread posts from 2 years ago, so you only display "New post" for posts made in the last 24 hours and is not in the comma-separated string.
You could also solve this with a session variable or a cookie.