I am building a blog aggregator like Techmeme that finds most popular posts from several blogs. Unlike Techmeme, first, I aggregate blog posts from a variety of RSS feeds, then save the headlines and relevant URLs in database. After that, I have to find what the most popular blog posts are.
For defining top blog post headlines, I track Facebook and Twitter share counts for every post of every blog and I rank the blog posts for their share counts. But that isn't the best solution because some bloggers can cheat via increasing their sharing counts with fraudulent shares.
So my question is what criterias could I use to define what the most popular posts are?
What would be a better algorithm for ranking blog posts?
Since the term 'popular' in this context is vague I would define the popularity of posts according to my criterias. Combine all suggested answers and make a reasonable reputation system for the blog posts. For instance, basically I would do something like this.
- facebook share x 2
- twitter share x 3
- pagerank of the domain x 2
- 50 000 / global alexa rating
- and so on
Finally, you may sum up all these and compare. Moreover, you can develop some criterias take into account of size of size of posts, number of images within the post, etc.
It may be possible to estimate the joint distribution of shares across different sources. It's hard to detect fraudulence for marginalized (i.e. single) metrics, but it's harder to fake a holistic "organic" profile.
How about using variation of PageRank?
here is the more details.
http://pr.efactory.de/e-pagerank-algorithm.shtml
http://en.wikipedia.org/wiki/PageRank?PHPSESSID=e371f8cacb91eff0c852a0e001893a9a