Given an IMDB movie id, how do I programmatically

2020-02-02 09:38发布

站内文章 / 移动开发

33 0

疯言疯语

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

movie id tt0438097 can be found at http://www.imdb.com/title/tt0438097/

What's the url for its poster image?

回答1:

As I'm sure you know, the actual url for that image is

http://ia.media-imdb.com/images/M/MV5BMTI0MDcxMzE3OF5BMl5BanBnXkFtZTcwODc3OTYzMQ@@._V1._SX100_SY133_.jpg

You're going to be hard pressed to figure out how it's generated though and they don't seem to have a publicly available API.

Screenscraping is probably your best bet.

The picture seems to generally be inside a div with class=photo and the name of the a tag is poster.

The image itself is just inside the a tag.

回答2:

Check out ~~http://www.imdbapi.com/,~~ It returns Poster url in string.

For example, check ~~http://www.imdbapi.com/?i=&t=inception~~ and you'll get the poster address: Poster":"http://ia.media-imdb.com/images/M/MV5BMjAxMzY3NjcxNF5BMl5BanBnXkFtZTcwNTI5OTM0Mw@@._V1._SX320.jpg"

Update: Seems like the site owner had some arguments with IMDB legal staff. As mentioned in the original site, new site's address is http://www.omdbapi.com/

回答3:

The URL is a random string as far as I can tell.

It can still be easily retrieved. It is the only img inside the anchor named poster.

So, if you are reading the source, simply search for <a name="poster" and it will be the text following the first src=" from there.

However, you will need to keep the screen scraping code updated because that will probably change.

You should also be aware that the images are copyrighted, so be careful to only use the image under a good "fair use" rationale.

回答4:

If a thumb is enough, you can use the Facebook Graph API: http://graph.facebook.com/?ids=http://www.imdb.com/title/tt0438097/

Gets you a thumbnail: http://profile.ak.fbcdn.net/hprofile-ak-ash2/50289_117058658320339_650214_s.jpg

回答5:

I know that it is way too late, but in my project I used this:-

Use omdbapi, Lets take example of Inception, use www.omdbapi.com/?t=inception it will return a json object.
In that json object get the "Poster" object, it contains the poster for the image.

回答6:

omdbapi works, but I found out you cannot really use these images (because of screen scraping and they are blocked anyway if you use them in an img tag)

The best solution is to use tmdb.org :

1 use your imdbid in this api url:

https://api.themoviedb.org/3/find/tt0111161?api_key=___YOURAPIKEY___&external_source=imdb_id

2 Retrieve the json response and select the poster_path attribute:

"poster_path":"/9O7gLzmreU0nGkIB6K3BsJbzvNv.jpg"

3 Prepend this path with "http://image.tmdb.org/t/p/original", and you will have the poster URL that you can use in an img tag :-)

4 You can even change sizes like this:

http://image.tmdb.org/t/p/original/9O7gLzmreU0nGkIB6K3BsJbzvNv.jpg
http://image.tmdb.org/t/p/w150/9O7gLzmreU0nGkIB6K3BsJbzvNv.jpg

回答7:

You can use imdb-cli tool to download movie's poster, e.g.

omdbtool -t "Ice Age: The Meltdown" | wget `sed -n '/^poster/{n;p;}'`

回答8:

Be aware tough, that the terms of service explicitly forbid screenscraping. You can download the IMDB database as a set of text files, but as I understand it, the IMDB movie ID is nowhere to be found in these text files.

回答9:

You can use Trakt API, you have to make a search request with the imdb ID, and the Json result given by Trakt API contains links for two images of that movie (poster and fan art) http://trakt.tv/api-docs/search-movies

回答10:

I've done something similar using phantomjs and wget. This bit of phantomjs accepts a search query and returns the first result's movie poster url. You could easily change it to your needs.

var system = require('system');

if (system.args.length === 1) {
  console.log('Usage: moviePoster.js <movie name>');
  phantom.exit();
}

var formattedTitle = encodeURIComponent(system.args[1]).replace(/%20/g, "+");
var page = require('webpage').create();
page.open('http://m.imdb.com/find?q=' + formattedTitle, function() {
  var url = page.evaluate(function() {
    return 'http://www.imdb.com' + $(".title").first().find('a').attr('href');
  });
  page.close();
  page = require('webpage').create();
  page.open(url, function() {
    var url = page.evaluate(function() {
      return 'http://www.imdb.com' + $("#img_primary").find('a').attr('href');
    });
    page.close();
    page = require('webpage').create();
    page.open(url, function() {
      var url = page.evaluate(function() {
        return $(".photo").first().find('img').attr('src');
      });
      console.log(url);
      page.close();
      phantom.exit();
    });
  });
});

I download the image using wget for many movies in a directory using this bash script. The mp4 files have names that the IMDB likes, and that's why the first search result is nearly guaranteed to be correct. Names like "Love Exposure (2008).mp4".

for file in *.mp4; do
  title="${file%.mp4}"
  if [ ! -f "${title}.jpg" ] 
    then
      wget `phantomjs moviePoster.js "$title"` -O "${title}.jpg"
  fi
done

Then minidlna uses the movie poster when it builds the thumbnail database, because it has the same name as the video file.

回答11:

$Movies = Get-ChildItem -path "Z:\MOVIES\COMEDY" | Where-Object {$_.Extension -eq ".avi" -or $_.Extension -eq ".mp4" -or $_.Extension -eq ".mkv" -or $_.Extension -eq<br>  <br>".flv" -or $_.Extension -eq ".xvid" -or $_.Extension -eq ".divx"} | Select-Object Name, FullName | Sort Name <br>
#Grab all the extension types and filter the ones I ONLY want <br>
<br>
$COMEDY = ForEach($Movie in $Movies) <br>
{<br>
        $Title = $($Movie.Name)<br>
        #Remove the file extension<br>
        $Title = $Title.split('.')[0] <br>       
<br>
        #Changing the case to all lower <br>       
        $Title = $Title.ToLower()<br>
<br>
        #Replace a space w/ %20 for the search structure<br>
        $searchTitle = $Title.Replace(' ','%20')       <br>
<br>
        #Fetching search results<br>
        $moviesearch = Invoke-WebRequest "http://www.imdb.com/search/title?title=$searchTitle&title_type=feature"<br>
         <br>
        #Moving html elements into variable<br>
        $titleclassarray = $moviesearch.AllElements | where Class -eq 'title' | select -First 1<br>
<br>
        #Checking if result contains movies<br>
        try<br><br>
        {
            $titleclass = $titleclassarray[0]<br>
        }<br>
        catch<br>
        {<br>
            Write-Warning "No movie found matching that title http://www.imdb.com/search/title?title=$searchTitle&title_type=feature"<br>
        }      <br>
                   <br>
        #Parcing HTML for movie link<br>
        $regex = "<\s*a\s*[^>]*?href\s*=\s*[`"']*([^`"'>]+)[^>]*?>"<br>
        $linksFound = [Regex]::Matches($titleclass.innerHTML, $regex, "IgnoreCase")<br>
         <br><br>

        #Fetching the first result from <br>
        $titlelink = New-Object System.Collections.ArrayList<br>
        foreach($link in $linksFound)<br>
        {<br>
            $trimmedlink = $link.Groups[1].Value.Trim()<br>
            if ($trimmedlink.Contains('/title/'))<br>
            {<br>
                [void] $titlelink.Add($trimmedlink)<br>
            }<br>
        }<br>
        #Fetching movie page<br>
        $movieURL = "http://www.imdb.com$($titlelink[0])"<br>
        <br>
        #Grabbing the URL for the Movie Poster<br>
        $MoviePoster = ((Invoke-WebRequest –Uri $movieURL).Images | Where-Object {$_.title -like "$Title Poster"} | Where src -like "http:*").src  <br> 
<br>
        $MyVariable = "<a href=" + '"' + $($Movie.FullName) + '"' + " " + "title='$Title'" + ">"<br>
        $ImgLocation = "<img src=" + '"' + "$MoviePoster" + '"' + "width=" + '"' + "225" + '"' + "height=" + '"' + "275" + '"' + "border=" + '"' + "0" + '"' + "alt=" +<br> '"' + $Title + '"' + "></a>" + "&nbsp;" + "&nbsp;" + "&nbsp;"+ "&nbsp;" + "&nbsp;" + "&nbsp;"+ "&nbsp;" + "&nbsp;" + "&nbsp;"<br>
        <br>
        Write-Output $MyVariable, $ImgLocation<br>
       <br>
    }$COMEDY | Out-File z:\db\COMEDY.htm  <br>
<br>
    $after = Get-Content z:\db\COMEDY.htm <br>
<br>
    #adding a back button to the Index <br>
    $before = Get-Content z:\db\before.txt<br>
<br>
    #adding the back button prior to the poster images content<br>
    Set-Content z:\db\COMEDY.htm –value $before, $after<br>

回答12:

Those poster images don't appear to have any correlation to the title page, so you'll have to retrieve the title page first, and then retrieve the img element for the page. The good news is that the img tag is wrapped in an a tag with name="poster". You didn't say what kind of tools you are using, but this basically a screen scraping operation.

回答13:

Here is my program to generate human readable html summary page for movie companies found on imdb page. Change the initial url to your liking and it generates a html file where you can see title, summary, score and thumbnail.

npm install -g phantomjs

Here is the script, save it to imdb.js

var system = require('system');

var page = require('webpage').create();
page.open('http://www.imdb.com/company/co0026841/?ref_=fn_al_co_1', function() {
  console.log('Fetching movies list');
  var movies = page.evaluate(function() {
    var list = $('ol li');
    var json = []
    $.each(list, function(index, listItem) {
      var link = $(listItem).find('a');
      json.push({link: 'http://www.imdb.com' + link.attr('href')});
    });
    return json;
  });
  page.close();

  console.log('Found ' + movies.length + ' movies');

  fetchMovies(movies, 0);
});

function fetchMovies(movies, index) {
  if (index == movies.length) {
    console.log('Done');

    console.log('Generating HTML');
    genHtml(movies);

    phantom.exit();
    return;
  }
  var movie = movies[index];

  console.log('Requesting data for '+ movie.link);

  var page = require('webpage').create();
  page.open(movie.link, function() {
    console.log('Fetching data');
    var data = page.evaluate(function() {
      var title = $('.title_wrapper h1').text().trim();
      var summary = $('.summary_text').text().trim();
      var rating = $('.ratingValue strong').attr('title');
      var thumb = $('.poster img').attr('src');

      if (title == undefined || thumb == undefined) {
        return null;
      }
      return { title: title, summary: summary, rating: rating, thumb: thumb };
    });

    if (data != null) {
      movie.title = data.title;
      movie.summary = data.summary;
      movie.rating = data.rating;
      movie.thumb = data.thumb;
      console.log(movie.title)
      console.log('Request complete');
    } else {
      movies.slice(index, 1);
      index -= 1;
      console.log('No data found');
    }
    page.close();
    fetchMovies(movies, index + 1);
  });
}

function genHtml(movies) {
  var fs = require('fs');

  var path = 'movies.html';
  var content = Array();

  movies.forEach(function(movie) {
    var section = '';

    section += '<div>';
    section += '<h3>'+movie.title+'</h3>';
    section += '<p>'+movie.summary+'</p>';
    section += '<p>'+movie.rating+'</p>';
    section += '<img src="'+movie.thumb+'">';
    section += '</div>';

    content.push(section);
  });

  var html = '<html>'+content.join('\n')+'</html>';

  fs.write(path, html, 'w');
}

And run it like so

phantomjs imdb.js

回答14:

$Title = $($Movie.Name)

$searchTitle = $Title.Replace(' ','%20')  

$moviesearch = Invoke-WebRequest "http://www.imdb.com/search/title?title=$searchTitle&title_type=feature"

$titleclassarray = $moviesearch.AllElements | where Class -eq 'loadlate' | select -First 1

$MoviePoster = $titleclassarray.loadlate

回答15:

After playing around with @Hawk's BASE64 discovery above, I found that everything after the BASE64 code is display info. If you remove everything between the last @ and .jpg it will load the image in the highest res it has.

https://m.media-amazon.com/images/M/MV5BMjAwODg3OTAxMl5BMl5BanBnXkFtZTcwMjg2NjYyMw@@._V1_UX182_CR0,0,182,268_AL_.jpg

becomes

https://m.media-amazon.com/images/M/MV5BMjAwODg3OTAxMl5BMl5BanBnXkFtZTcwMjg2NjYyMw@@.jpg

回答16:

Now a days, all modern browser have "Inspect" section:

100% Correct for Google Chrome only:

Take your cursor on image.
Right click on it, select "Inspect Element".
In the window appear, under Elements tab you will find the highlighted text as
Just click on it.
In the Resource tab, right click on image.
Select "Copy image URL" option.

Try to paste it any where as URL in any browser, you will only get the image.

标签： imdb