suppose I have a "journal article" class which has variables such as year, author(s), title, journal name, keyword(s), etc.
variables such as authors and keywords might be declared as String[] authors and String[] keywords
What's the best data structure to search among a group of objects of "journal paper" by one or several "keywords", or one of several author names, or part of the title?
Thanks!
========================================================================== Following everybody's help, the test code realized via the Processing environment is shown below. Advices are greatly appreciated! Thanks!
ArrayList<Paper> papers = new ArrayList<Paper>();
HashMap<String, ArrayList<Paper>> hm = new HashMap<String, ArrayList<Paper>>();
void setup(){
Paper paperA = new Paper();
paperA.title = "paperA";
paperA.keywords.append("cat");
paperA.keywords.append("dog");
paperA.keywords.append("egg");
//println(paperA.keywords);
papers.add(paperA);
Paper paperC = new Paper();
paperC.title = "paperC";
paperC.keywords.append("egg");
paperC.keywords.append("cat");
//println(paperC.keywords);
papers.add(paperC);
Paper paperB = new Paper();
paperB.title = "paperB";
paperB.keywords.append("dog");
paperB.keywords.append("egg");
//println(paperB.keywords);
papers.add(paperB);
for (Paper p : papers) {
// get a list of keywords for the current paper
StringList keywords = p.keywords;
// go through each keyword of the current paper
for (int i=0; i<keywords.size(); i++) {
String keyword = keywords.get(i);
if ( hm.containsKey(keyword) ) {
// if the hashmap has this keyword
// get the current paper list associated with this keyword
// which is the "value" of this keyword
ArrayList<Paper> papers = hm.get(keyword);
papers.add(p); // add the current paper to the paper list
hm.put(keyword, papers); // put the keyword and its paper list back to hashmap
} else {
// if the hashmap doesn't have this keyword
// create a new Arraylist to store the papers with this keyword
ArrayList<Paper> papers = new ArrayList<Paper>();
papers.add(p); // add the current paper to this ArrayList
hm.put(keyword, papers); // put this new keyword and its paper list to hashmap
}
}
}
ArrayList<Paper> paperList = new ArrayList<Paper>();
paperList = hm.get("egg");
for (Paper p : paperList) {
println(p.title);
}
}
void draw(){}
class Paper
{
//===== variables =====
int ID;
int year;
String title;
StringList authors = new StringList();
StringList keywords = new StringList();
String DOI;
String typeOfRef;
String nameOfSource;
String abs; // abstract
//===== constructor =====
//===== update =====
//===== display =====
}
Use a
HashMap<String, JournalArticle>
data structure.for example
you can put your keywords as the key of String type in this map, however, it only supports "exact-match" kind of search, meaning that you have to use the keyword (stored as key in the Hashmap) in your search.
If you are looking for " like " kind of search, I suggest you save your objects in a database that supports queries for "like".
Edit: on a second thought, I think you can do some-kind-of "like" queries (just like the like clause in SQL), but the efficiency is not going to be too good, because you are iterating through all the keys in the HashMap whenever you do a query. If you know regex, you can do all kinds of queries with modification of the following example code (e.g. key.matches(pattern)):
For simple cases you can use a
Multimap<String, Article>
. There's one in Guava library.For larger amounts of data Apache Lucene will be a better fit.
I would create a map from a keyword (likewise for author, or title, etc.), to a set of JournalArticles.
When you create a new JournalArticle, for each of its key words, you'd add that article to the appropriate set.
To do a look up, you'd do something like: