How can I use Mathematica and Google scholar to find the number of papers a person published in 2011?
问题:
回答1:
Google Scholar is not very suited for this goal as it doesn't have a formal API AFAIK. It also doesn't provide results in a structured (e.g. XML) format. So, we have to resort to a quick (and very, very fragile!) text pattern matching hack like:
searchGoogleScholarAuthor[author_String] :=
First[StringCases[
Import["http://scholar.google.com/scholar?start=0&num=1&q=" <>
StringDrop[
StringJoin @@ ("author:" <> # <> "+" & /@
StringSplit[author]), -1] <> "&hl=en&as_sdt=1,5"], ___ ~~
"Results" ~~ ___ ~~ "of about" ~~ Shortest[___] ~~
p : Longest[(DigitCharacter | ",") ..] ~~ ___ ~~ "." ~~ ___ ~~
"(" ~~ ___ :> p]]
In[191]:= searchGoogleScholarAuthor["A Einstein"]
Out[191]= "6,400"
In[190]:= searchGoogleScholarAuthor["Einstein"]
Out[190]= "9,400"
In[192]:= searchGoogleScholarAuthor["Wizard"]
Out[192]= "197"
In[193]:= searchGoogleScholarAuthor["Vries"]
Out[193]= "70,700"
Add ToExpression
if you don't like the string result. If you want to restrict the publication years you can add &as_ylo=2011&as_yhi=2011&
to the search string and change the start and end years
appropriately.
Please note that authors with popular names will generate lots of spurious hits as there is no way to uniquely identify a single author. Additionally, Scholar returns a diversity of hits, including citations, books, reprints and more. So, really, this ain't very useful for counting.
A bit of explanation:
Scholar splits the initials and names of authors and co-authors over several author:
fields combined with a +. The StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1]
part of the code takes care of that. The StringDrop
removes the last +
.
The Stringcases
part contains a large text pattern which basically searches for the text that Scholar places at the top of each results page and which contains the number of hits. This number is then isolated and returned.