Searching Wikipedia using API

2020-05-16 02:11发布

问题:

I want to search Wikipedia using the query action. I am using this url:

http://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=apple

That works but I want to get into the first result of the search. How can I do that?

Note: That url works fine when there is only one result.. I just need the title and some short description.

回答1:

I don't think you can do both in one query.

1. To get the first result, use the Opensearch API.

https://en.wikipedia.org/w/api.php?action=opensearch&search=zyz&limit=1&namespace=0&format=jsonfm

https://en.wikipedia.org/w/api.php
?action=opensearch
&search=zyz          # search query
&limit=1             # return only the first result
&namespace=0         # search only articles, ignoring Talk, Mediawiki, etc.
&format=json         # jsonfm prints the JSON in HTML for debugging.

This will return:

[
    "Zyz",
    [
        "Zyzomys"
    ],
    [
        ""
    ],
    [
        "https://en.wikipedia.org/wiki/Zyzomys"
    ]
]

2. You now have the article name of the first search result. To get the article's first paragram (or description, as you call it), see my answer here: https://stackoverflow.com/a/19781754/908703



回答2:

actually the wikipedia json api works with a right query only, so I recommend to use the wikipedia search and crawl the actual article and parse it with BeautifulSoup

https://en.wikipedia.org/w/index.php?search=QUERY&title=Special:Search&fulltext=Search

and also there is module call wikipedia does this for you



回答3:

Here is the my solution:

// Created By Pawan Mall | www.pawanmall.net
$(document).ready(function() {
  $('#sTerm').focus();
  $('#resultArea').hide();
  $('#searchArticle').on('click', function() {
    $('#resultArea').show();
    searchTerm = $('#sTerm').val();
    let surl = 'https://en.wikipedia.org/w/api.php?action=query&prop=extracts&origin=*&format=json&generator=search&gsrnamespace=0&gsrlimit=1&gsrsearch=' + searchTerm;
    $.ajax({
      url: surl,
      header: {
        'Access-Control-Allow-Origin' : '*',
        'Content-Type': 'application/json'
      },
      method: 'GET',
      dataType: 'jsonp',
      data: '',
      beforeSend: function(){
        // $("#loader").show();
        $('#resultArea').html('<div class="text-center"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="margin: auto; background: #fff; display: block;" width="25%" height="25%" viewBox="0 0 100 100" preserveAspectRatio="xMidYMid"><g transform="translate(50 50)"><g transform="scale(0.7)"><g transform="translate(-50 -50)"><g transform="translate(-3.20642 -20)"><animateTransform attributeName="transform" type="translate" repeatCount="indefinite" dur="1s" values="-20 -20;20 -20;0 20;-20 -20" keyTimes="0;0.33;0.66;1"></animateTransform><path fill="#5699d2" d="M44.19 26.158c-4.817 0-9.345 1.876-12.751 5.282c-3.406 3.406-5.282 7.934-5.282 12.751 c0 4.817 1.876 9.345 5.282 12.751c3.406 3.406 7.934 5.282 12.751 5.282s9.345-1.876 12.751-5.282 c3.406-3.406 5.282-7.934 5.282-12.751c0-4.817-1.876-9.345-5.282-12.751C53.536 28.033 49.007 26.158 44.19 26.158z"></path><path fill="#1d3f72" d="M78.712 72.492L67.593 61.373l-3.475-3.475c1.621-2.352 2.779-4.926 3.475-7.596c1.044-4.008 1.044-8.23 0-12.238 c-1.048-4.022-3.146-7.827-6.297-10.979C56.572 22.362 50.381 20 44.19 20C38 20 31.809 22.362 27.085 27.085 c-9.447 9.447-9.447 24.763 0 34.21C31.809 66.019 38 68.381 44.19 68.381c4.798 0 9.593-1.425 13.708-4.262l9.695 9.695 l4.899 4.899C73.351 79.571 74.476 80 75.602 80s2.251-0.429 3.11-1.288C80.429 76.994 80.429 74.209 78.712 72.492z M56.942 56.942 c-3.406 3.406-7.934 5.282-12.751 5.282s-9.345-1.876-12.751-5.282c-3.406-3.406-5.282-7.934-5.282-12.751 c0-4.817 1.876-9.345 5.282-12.751c3.406-3.406 7.934-5.282 12.751-5.282c4.817 0 9.345 1.876 12.751 5.282 c3.406 3.406 5.282 7.934 5.282 12.751C62.223 49.007 60.347 53.536 56.942 56.942z"></path></g></g></g></g></svg></div>')
       },
      success: function(data){
        // console.log(data.query.pages);
        dataNum = Object.keys(data.query.pages)[0];
        $('#resultArea').empty();
        let newTitle = '<h1 class="alert alert-info text-center"><strong>'+data.query.pages[dataNum].title+'</strong></h1>';
        $('#resultArea').html(`${newTitle}<div>${data.query.pages[dataNum].extract}</div>`);
        console.log(data);
      },
      complete: function(){
        $('#sTerm').val('');
        $('#sTerm').focus();
      }
    });
  });
});
<link href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.4.1/css/bootstrap.min.css" rel="stylesheet"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1 class="p-5 alert alert-dark text-center">
  <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Wikipedia-logo-v2.svg/225px-Wikipedia-logo-v2.svg.png" class="w-25" />
  <p>Search Article on Wikipedia via  Wikipedia  Search API</p>
</h1>  
<div class="mt-2 p-5">
            <div class="row pb-4">
              <div class="col-md-10">
                <input type="text" id="sTerm" class="form-control" placeholder="Type here to search an article on wikipedia... By Pawan Mall | www.pawanmall.net">
              </div>
              <div class="col-md-2">
                <button type="button" id="searchArticle" class="btn btn-primary btn-block">Search</button>
              </div>
            </div>
            <div id="resultArea" class="form-control border-0" contenteditable="false"></div>
          </div>