I prototyped a tiny search engine with pagerank that worked on my computer. I am interested in building a knowledge graph on top of it, and it should return only queried webpages that are within the right context, similarly to how Google found relevant answers to search questions. I saw a lot of publicity around knowledge graph but not a lot of literature and almost no pseudocode like guideline of building one. Does anyone know good references on how such knowledge graph works internally, so there will be no need to create models about a knowledge graph?
相关问题
- What is the best way to do a search in a large fil
- Finding k smallest elements in a min heap - worst-
- binary search tree path list
- High cost encryption but less cost decryption
- How to get a fixed number of evenly spaced points
相关文章
- how to flatten input in `nn.Sequential` in Pytorch
- Mercurial Commit Charts / Graphs [closed]
- What is the complexity of bisect algorithm?
- What are the problems associated to Best First Sea
- Coin change DP solution to keep track of coins
- Algorithm for partially filling a polygonal mesh
- Robust polygon normal calculation
- Algorithm for maximizing coverage of rectangular a
Knowledge graph is a buzzword. It is a sum of models and technologies put together to achieve a result. The first stop on your journey starts with Natural language processing, Ontologies and Text mining. It is a wide field of artificial intelligence, go here for a research survey on the field.
Before building your own models, I suggest you try different standard algorithms using dedicated toolboxes such as gensim. You will learn about tf-idf, LDA, document feature vectors, etc.
I am assuming you want to work with text data, if you want to do image search using other images it is different. Same for the audio part.
Building models is only the first step, the most difficult part of Google's knowledge graph is to actually scale to billions of requests each day ...
A good processing pipeline can be built "easily" on top of Apache Spark, "the current-gen Hadoop". It provides a resilient distributed datastore which is mandatory if you want to scale.
If you want to keep your data as a graph, as in graph theory (like pagerank), for live querying, I suggest you use Bulbs which is a framework which is "Like an ORM for graphs, but instead of SQL, you use the graph-traversal language Gremlin to query the database". You can switch the backend from Neo4j to OpenRDF (useful if you do ontologies) for instance.
For graph analytics you can use Spark, GraphX module or GraphLab.
Hope it helps.