How to get information from movies Wikipedia categ

2020-04-21 00:19发布

问题:

Is it possible to fetch information from Wikipedia API by movies category? e.g I've a url which search avatar but I don't know how to search avatar movie.

https://en.wikipedia.org/w/api.php?&titles=avatar&format=xml&action=query&prop=extracts|categories|categoryinfo|pageterms|pageprops|pageimages&exintro=&explaintext=&cllimit=max&piprop=original

回答1:

It will not be easy by "movies category" because there are a lot of nested categories, but you can use something else - all articles about movie include within themselves Template:Infobox film, and we can get all them by MediaWiki API:

https://en.wikipedia.org/w/api.php?format=xml&action=query&list=embeddedin&einamespace=0&eilimit=500&eititle=Template:Infobox_film

Then, you decide how will search in them - by regex, Contains() or StartsWith(), CaseInsensitive or not, will return first found or all matches, etc...

Here is an example in C# for all movie articles which title starts with "Avatar":

var articles = GetMovies("Avatar");
...

private static List<string> GetMovies(string word)
{
    var api = "https://en.wikipedia.org/w/api.php?format=xml&action=query&list=embeddedin&" +
        "einamespace=0&eilimit=500&eititle=Template:Infobox film";
    var articles = new List<string>();
    var next = string.Empty;
    while (true)
    {
        using (var response = (HttpWebResponse)WebRequest.Create(api + next).GetResponse())
        {
            using (var reader = new StreamReader(response.GetResponseStream()))
            {
                var xElement = XElement.Parse(reader.ReadToEnd());
                articles.AddRange(xElement.Descendants("ei")
                    .Select(x => x.Attribute("title").Value)
                    .Where(x => Regex.IsMatch(x, "^" + word + "\\b")));

                var cont = xElement.Element("continue");
                if (cont == null) break;

                next = "&eicontinue=" + cont.Attribute("eicontinue").Value;
            }
        }
    }

    return articles;
}

This will returns:

Avatar (2009 film)
Avatar (2004 film)
Avatar (1916 film)