Please be patient with my writing, as my English is not proficient.
As a programmer, I wanna learn about the algorithm, or the machine learning intelligence, that are implemented underneath recommendation systems or related-based systems. For instance, the most obvious example would be from Amazon. They have a really good recommendation system. They get to know: if you like this, you might also like that, or something else like: What percentage of people like this and that together.
Of course I know Amazon is a big website and they invested a lot of brain and money into these systems. But, on the very basic core, how can we implement something like that within our database? How can we identify how one object relates to other? How can we build a statistic unit that handles this kind of thing?
I'd appreciate if someone can point out some algorithms. Or, basically, point out some good direct references/ books that we can all learn from. Thank you all!
I think, you talk about knowledge base systems. I don't remember the programming language (maybe LISP), but there is implementations. Also, look at OWL.
The are 2 different types of recommendation engines.
The simplest is item-based ie "customers that bought product A also bought product B". This is easy to implement. Store a sparse symmetrical matrix nxn (where n is the number of items). Each element (m[a][b]) is the number of times anyone has bought item 'a' along with item 'b'.
The other is user-based. That is "people like you often like things like this". A possible solution to this problem is k-means clustering. ie construct a set of clusters where users of similar taste are placed in the same cluster and make suggestions based on users in the same cluster.
A better solution, but an even more complicated one is a technique called Restricted Boltzmann Machines. There's an introduction to them here
There's also prediction.io if you're looking for an open source solution or SaaS solutions like mag3llan.com.
A first attempt could look like this:
First I calculate how often each pair of products was bought together, and then I group them by the product and select the top 20 other products bought with it. The result should be put into some kind of dictionary keyed by product ID.
This might get too slow or cost too much memory for large databases.