Python requests arguments/dealing with api paginat

I'm playing around with the Angel List (AL) API and want to pull all jobs in San San Francisco. Since I couldn't find an active Python wrapper for the api (if I make any headway, I think I'd like to make my own), I'm using the requests library.

The AL API's results are paginated, and I can't figure out how to move beyond the first page of the results.

Here is my code:

import requests
r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
r_sanfran.keys()
# returns [u'per_page', u'last_page', u'total', u'jobs', u'page']
r_sanfran['last_page']
#returns 16
r_sanfran['page']
# returns 1

I tried adding arguments to requests.get, but that didn't work. I also tried something really dumb - changing the value of the 'page' key like that was magically going to paginate for me.

eg. r_sanfran['page'] = 2

I'm guessing it's something relatively simple, but I can't seem to figure it out so any help would be awesome.

Thanks as always.

Angel List API documentation if it's helpful.

标签： python api http pagination python-requests

3条回答

干净又极端

2楼-- · 2020-02-08 13:58

I came across a scenario where the API didn't return pages but rather a min/max value. I created this, and I think it will work for both situations. This will automatically increase the increment until it reaches the end, and then it will stop the while loop.

max_version = [1]
while len(max_version) > 0:
    r = requests.get(url, headers=headers, params={"page": max_version[0]}).json()
    next_page = r['page']
    if next_page is not None:
        max_version[0] = next_page
        Process data...
    else:
        max_version.clear() # Stop the while loop

0人赞添加讨论(0) 举报

劳资没心，怎么记你

3楼-- · 2020-02-08 14:08

Read last_page and make a get request for each page in the range:

import requests

r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
num_pages = r_sanfran['last_page']

for page in range(2, num_pages + 1):
    r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs", params={'page': page}).json()
    print r_sanfran['page']
    # TODO: extract the data

0人赞添加讨论(0) 举报

一纸荒年 Trace。

4楼-- · 2020-02-08 14:11

Improving on @alecxe's answer: if you use a Python Generator and a requests HTTP session you can improve the performance and resource usage if you are querying lots of pages or very large pages.

import requests

session = requests.Session()

def get_jobs():
    url = "https://api.angel.co/1/tags/1664/jobs" 
    first_page = session.get(url).json()
    yield first_page
    num_pages = first_page['last_page']

    for page in range(2, num_pages + 1):
        next_page = session.get(url, params={'page': page}).json()
        yield next_page

for page in get_jobs():
    # TODO: process the page

0人赞添加讨论(0) 举报

Python requests arguments/dealing with api paginat

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间