使用Web API

通过URL获取数据,称为API调用。通常返回的是JSON和CSV格式数据。
这里我们使用的是Github
Github是网站,其名字来自于Git,一个分布式版本控制系统。
Github中包含项目,所有项目相关的信息,如代码,协同,bug等都存放于repository中。
以下是调用的Github API,查询Github中评星最高的Python项目:

https://api.github.com/search/repositories?q=language:python&sort=stars

以下为返回的数据开始部分:

{
  "total_count": 4735027,
  "incomplete_results": false,
  "items": [
    {
      "id": 83222441,
      "node_id": "MDEwOlJlcG9zaXRvcnk4MzIyMjQ0MQ==",
      "name": "system-design-primer",
      "full_name": "donnemartin/system-design-primer",
      "private": false,
      "owner": {
        "login": "donnemartin",
        "id": 5458997,
        "node_id": "MDQ6VXNlcjU0NTg5OTc=",
        "avatar_url": "https://avatars2.githubusercontent.com/u/5458997?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/donnemartin",
        "html_url": "https://github.com/donnemartin",
        "followers_url": "https://api.github.com/users/donnemartin/followers",
        "following_url": "https://api.github.com/users/donnemartin/following{/other_user}",
        "gists_url": "https://api.github.com/users/donnemartin/gists{/gist_id}",
        "starred_url": "https://api.github.com/users/donnemartin/starred{/owner}{/repo}",
        "subscriptions_url": "https://api.github.com/users/donnemartin/subscriptions",
        "organizations_url": "https://api.github.com/users/donnemartin/orgs",
        "repos_url": "https://api.github.com/users/donnemartin/repos",
        "events_url": "https://api.github.com/users/donnemartin/events{/privacy}",
        "received_events_url": "https://api.github.com/users/donnemartin/received_events",
        "type": "User",
        "site_admin": false
      },
      "html_url": "https://github.com/donnemartin/system-design-primer",
      "description": "Learn how to design large-scale systems. Prep for the system design interview.  Includes Anki flashcards.",
      "fork": false,
...

其中,total_count表示返回的条目数。incomplete_results为false表示成功读取。items为条目的详细信息。
requests包可让Python很方便的从网站读取信息并处理返回数据,我们先安装它:

$ python3 -m pip install --user requests
Collecting requests
  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)
    100% |████████████████████████████████| 61kB 67kB/s 
Collecting certifi>=2017.4.17 (from requests)
  Downloading https://files.pythonhosted.org/packages/b9/63/df50cac98ea0d5b006c55a399c3bf1db9da7b5a24de7890bc9cfd5dd9e99/certifi-2019.11.28-py2.py3-none-any.whl (156kB)
    100% |████████████████████████████████| 163kB 9.7kB/s 
Collecting chardet<3.1.0,>=3.0.2 (from requests)
  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
    100% |████████████████████████████████| 143kB 14kB/s 
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests)
  Downloading https://files.pythonhosted.org/packages/e8/74/6e4f91745020f967d09332bb2b8b9b10090957334692eb88ea4afe91b77f/urllib3-1.25.8-py2.py3-none-any.whl (125kB)
    100% |████████████████████████████████| 133kB 9.1kB/s 
Collecting idna<2.9,>=2.5 (from requests)
  Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)
    100% |████████████████████████████████| 61kB 14kB/s 
Installing collected packages: certifi, chardet, urllib3, idna, requests
Successfully installed certifi-2019.11.28 chardet-3.0.4 idna-2.8 requests-2.22.0 urllib3-1.25.8

此API返回的是JSON格式,然后正常处理即可,代码python_repos.py如下:

import requests

# Make an API call and store the response.
url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# Store API response in a variable.
response_dict = r.json()
print(f"Total repositories: {response_dict['total_count']}")

# Explore information about the repositories.
repo_dicts = response_dict['items']
print(f"Repositories returned: {len(repo_dicts)}")

print("\nSelected information about each repository:")
for repo_dict in repo_dicts:
    print(f"Name: {repo_dict['name']}")
    print(f"Owner: {repo_dict['owner']['login']}")
    print(f"Stars: {repo_dict['stargazers_count']}")
    print(f"Repository: {repo_dict['html_url']}")
    print(f"Created: {repo_dict['created_at']}")
    print(f"Updated: {repo_dict['updated_at']}")
    print(f"Description: {repo_dict['description']}")

几点需要说明:

  • r.status_code为200表示API调用正常
  • 所有相关Repository为4735216,但只返回30个Repository的详细信息,这应该是API的约定

以下是排名前30的Python项目,可以参考,特别是前几位的可重点关注:

$ p3 python_repos.py |grep Repository
Repository: https://github.com/donnemartin/system-design-primer
Repository: https://github.com/vinta/awesome-python
Repository: https://github.com/public-apis/public-apis
Repository: https://github.com/TheAlgorithms/Python
Repository: https://github.com/tensorflow/models
Repository: https://github.com/ytdl-org/youtube-dl
Repository: https://github.com/nvbn/thefuck
Repository: https://github.com/pallets/flask
Repository: https://github.com/django/django
Repository: https://github.com/keras-team/keras
Repository: https://github.com/jakubroztocil/httpie
Repository: https://github.com/josephmisiti/awesome-machine-learning
Repository: https://github.com/ansible/ansible
Repository: https://github.com/psf/requests
Repository: https://github.com/scikit-learn/scikit-learn
Repository: https://github.com/scrapy/scrapy
Repository: https://github.com/minimaxir/big-list-of-naughty-strings
Repository: https://github.com/shadowsocks/shadowsocks
Repository: https://github.com/ageitgey/face_recognition
Repository: https://github.com/home-assistant/home-assistant
Repository: https://github.com/soimort/you-get
Repository: https://github.com/XX-net/XX-Net
Repository: https://github.com/python/cpython
Repository: https://github.com/deepfakes/faceswap
Repository: https://github.com/testerSunshine/12306
Repository: https://github.com/Avik-Jain/100-Days-Of-ML-Code
Repository: https://github.com/certbot/certbot
Repository: https://github.com/isocpp/CppCoreGuidelines
Repository: https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap
Repository: https://github.com/521xueweihan/HelloGitHub

大部分的API都是有查询速率限制的,例如以下为https://api.github.com/rate_limit的输出,是每分钟的限制:

{
  "resources": {
    "core": {
      "limit": 60,
      "remaining": 58,
      "reset": 1579964117
    },
    "search": {
      "limit": 10,
      "remaining": 8,
      "reset": 1579962440
    },
    "graphql": {
      "limit": 0,
      "remaining": 0,
      "reset": 1579966020
    },
    "integration_manifest": {
      "limit": 5000,
      "remaining": 5000,
      "reset": 1579966020
    },
    "source_import": {
      "limit": 5,
      "remaining": 5,
      "reset": 1579962480
    }
  },
  "rate": {
    "limit": 60,
    "remaining": 58,
    "reset": 1579964117
  }
}

其中reset是速率限制清零的时间(epoch time)。以下可将epoch时间转换为可读格式:

$ date -d @1579964117
Sat Jan 25 22:55:17 CST 2020

Rate Limit不仅关乎性能,也是一种安全机制,还可以使你的API具可扩展性。如此所说,通常是按客户端IP来限制的。

使用 Plotly可视化Repositories

主要讲的是做tooltip,可以显示repository的URL等,如下图:
在这里插入图片描述
关于Plotly和GitHub API的几个有用链接:

Hacker News API

Hacker News分享了如何用API访问其它网站的知识。
例如https://hacker-news.firebaseio.com/v0/item/19155826.json返回当前排名第一的文章:

{"by":"jimktrains2","descendants":221,"id":19155826,"kids":[19156572,19158857,19156773,19157251,19156415,19159820,19157154,19156385,19156489,19158522,19156755,19156974,19158319,19157034,19156935,19158935,19157531,19158638,19156466,19156758,19156565,19156498,19156335,19156041,19156704,19159047,19159127,19156217,19156375,19157945],"score":728,"time":1550085414,"title":"Nasa’s Mars Rover Opportunity Concludes a 15-Year Mission","type":"story","url":"https://www.nytimes.com/2019/02/13/science/mars-opportunity-rover-dead.html"}

https://hacker-news.firebaseio.com/v0/topstories.json返回排名靠前文章的ID:

[22146086,22144210,22137250,22144369,22145826,22145303,22141903,22136430,22144330,22141299,22144411,22135638,22137402,22136905,...,22111385]

关于Hacker News API的更多信息参见这里

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐