会刮什么

图片

先决条件

使用 CSS 选择器抓取基本知识

CSS 选择器声明样式适用于标记的哪一部分,从而允许从匹配的标签和属性中提取数据。

如果你还没有使用 CSS 选择器,我有一篇专门的博客文章,关于如何在网络抓取时使用 CSS 选择器涵盖了它是什么、优点和缺点,以及为什么从网络抓取的角度来看它们很重要.

独立的虚拟环境

简而言之,它创建了一组独立的已安装库,包括可以在同一系统中相互共存的不同 Python 版本,从而防止库或 Python 版本冲突。

如果您之前没有使用过虚拟环境,请查看使用 Virtualenv 和 Poetry](https://serpapi.com/blog/python-virtual-environments-using-virtualenv-and-poetry/)我的博客文章的专用[Python 虚拟环境教程,以便更加熟悉。

📌注意:这不是这篇博文的严格要求。

安装库:

pip install requests parsel

减少被屏蔽的机会

请求可能会被阻止。看看如何减少网页抓取时被阻止的机会,有十一种方法可以绕过大多数网站的阻止。


抓取 Google Finance Ticker 报价数据

def scrape_google_finance(ticker: str):
    params = {
        "hl": "en" # language
        }

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
        }

    html = requests.get(f"https://www.google.com/finance/quote/{ticker}", params=params, headers=headers, timeout=30)
    selector = Selector(text=html.text)

    # where all extracted data will be temporary located
    ticker_data = {
        "ticker_data": {},
        "about_panel": {},
        "news": {"items": []},
        "finance_perfomance": {"table": []}, 
        "people_also_search_for": {"items": []},
        "interested_in": {"items": []}
    }

    # current price, quote, title extraction
    ticker_data["ticker_data"]["current_price"] = selector.css(".AHmHk .fxKbKc::text").get()
    ticker_data["ticker_data"]["quote"] = selector.css(".PdOqHc::text").get().replace(" • ",":")
    ticker_data["ticker_data"]["title"] = selector.css(".zzDege::text").get()

    # about panel extraction
    about_panel_keys = selector.css(".gyFHrc .mfs7Fc::text").getall()
    about_panel_values = selector.css(".gyFHrc .P6K39c").xpath("normalize-space()").getall()

    for key, value in zip_longest(about_panel_keys, about_panel_values):
        key_value = key.lower().replace(" ", "_")
        ticker_data["about_panel"][key_value] = value

    # description "about" extraction
    ticker_data["about_panel"]["description"] = selector.css(".bLLb2d::text").get()
    ticker_data["about_panel"]["extensions"] = selector.css(".w2tnNd::text").getall()

    # news extarction
    if selector.css(".yY3Lee").get():
        for index, news in enumerate(selector.css(".yY3Lee"), start=1):
            ticker_data["news"]["items"].append({
                "position": index,
                "title": news.css(".Yfwt5::text").get(),
                "link": news.css(".z4rs2b a::attr(href)").get(),
                "source": news.css(".sfyJob::text").get(),
                "published": news.css(".Adak::text").get(),
                "thumbnail": news.css("img.Z4idke::attr(src)").get()
            })
    else: 
        ticker_data["news"]["error"] = f"No news result from a {ticker}."

    # finance perfomance table
    if selector.css(".slpEwd .roXhBd").get():
        fin_perf_col_2 = selector.css(".PFjsMe+ .yNnsfe::text").get()           # e.g. Dec 2021
        fin_perf_col_3 = selector.css(".PFjsMe~ .yNnsfe+ .yNnsfe::text").get()  # e.g. Year/year change

        for fin_perf in selector.css(".slpEwd .roXhBd"):
            if fin_perf.css(".J9Jhg::text , .jU4VAc::text").get():
                perf_key = fin_perf.css(".J9Jhg::text , .jU4VAc::text").get()   # e.g. Revenue, Net Income, Operating Income..
                perf_value_col_1 = fin_perf.css(".QXDnM::text").get()           # 60.3B, 26.40%..   
                perf_value_col_2 = fin_perf.css(".gEUVJe .JwB6zf::text").get()  # 2.39%, -21.22%..

                ticker_data["finance_perfomance"]["table"].append({
                    perf_key: {
                        fin_perf_col_2: perf_value_col_1,
                        fin_perf_col_3: perf_value_col_2
                    }
                })
    else:
        ticker_data["finance_perfomance"]["error"] = f"No 'finence perfomance table' for {ticker}."

    # "you may be interested in" results
    if selector.css(".HDXgAf .tOzDHb").get():
        for index, other_interests in enumerate(selector.css(".HDXgAf .tOzDHb"), start=1):
            ticker_data["interested_in"]["items"].append(discover_more_tickers(index, other_interests))
    else:
        ticker_data["interested_in"]["error"] = f"No 'you may be interested in` results for {ticker}"


    # "people also search for" results
    if selector.css(".HDXgAf+ div .tOzDHb").get():
        for index, other_tickers in enumerate(selector.css(".HDXgAf+ div .tOzDHb"), start=1):
            ticker_data["people_also_search_for"]["items"].append(discover_more_tickers(index, other_tickers))
    else:
        ticker_data["people_also_search_for"]["error"] = f"No 'people_also_search_for` in results for {ticker}"


    return ticker_data


def discover_more_tickers(index: int, other_data: str):
    """
    if price_change_formatted will start complaining,
    check beforehand for None values with try/except and set it to 0, in this function.

    however, re.search(r"\d{1}%|\d{1,10}\.\d{1,2}%" should make the job done.
    """
    return {
            "position": index,
            "ticker": other_data.css(".COaKTb::text").get(),
            "ticker_link": f'https://www.google.com/finance{other_data.attrib["href"].replace("./", "/")}',
            "title": other_data.css(".RwFyvf::text").get(),
            "price": other_data.css(".YMlKec::text").get(),
            "price_change": other_data.css("[jsname=Fe7oBc]::attr(aria-label)").get(),
            # https://regex101.com/r/BOFBlt/1
            # Up by 100.99% -> 100.99%
            "price_change_formatted": re.search(r"\d{1}%|\d{1,10}\.\d{1,2}%", other_data.css("[jsname=Fe7oBc]::attr(aria-label)").get()).group()
        }


scrape_google_finance(ticker="GOOGL:NASDAQ")

提取Ticker数据的说明

导入库:

import requests, json, re
from parsel import Selector
from itertools import zip_longest # https://docs.python.org/3/library/itertools.html#itertools.zip_longest

图书馆

目的

requests

向网站提出请求。

json

将提取的数据转换为 JSON 对象。

re

通过正则表达式提取部分数据。

parsel

从 HTML/XML 文档中解析数据。类似于BeautifulSoup

zip_longest

并行迭代多个可迭代对象。更多关于下面的内容。

定义一个函数:

def scrape_google_finance(ticker: str): # ticker should be a string
    # further code...

scrape_google_finance(ticker="GOOGL:NASDAQ")

创建请求标头和 URL 参数:

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
    "hl": "en" # language
}

# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
# https://www.whatismybrowser.com/detect/what-is-my-user-agent
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
}

图书馆

目的

params

将 URL 参数传递给请求的一种更漂亮的方式。

user-agent

通过将其传递给请求标头来充当来自浏览器的“真实”用户请求。检查你的user-agent是什么。

传递请求参数和请求头,发出请求并将响应传递给parsel:

html = requests.get(f"https://www.google.com/finance/quote/{ticker}", params=params, headers=headers, timeout=30)
selector = Selector(text=html.text)

代码

解释

f"https://www.google.com/finance/quote/{ticker}"

是一个f-string其中{ticker}将被实际的股票代码字符串替换,例如"GOOGL:NASDAQ".

timeout=30

在 30 秒后停止等待响应。

Selector(text=html.text)

从响应中传递的 HTML 将由parsel处理。

创建一个空的字典结构,其中所有数据都将被填充:

# where all extracted data will be temporarily located
ticker_data = {
    "ticker_data": {},
    "about_panel": {},
    "news": {"items": []},
    "finance_perfomance": {"table": []}, 
    "people_also_search_for": {"items": []},
    "interested_in": {"items": []}
}

提取当前价格、报价和标题数据:

# current price, quote, title extraction
ticker_data["ticker_data"]["current_price"] = selector.css(".AHmHk .fxKbKc::text").get()
ticker_data["ticker_data"]["quote"] = selector.css(".PdOqHc::text").get().replace(" • ",":")
ticker_data["ticker_data"]["title"] = selector.css(".zzDege::text").get()

代码

解释

ticker_data["ticker_data"]["current_price"]

访问["ticker_data"]键并创建一个新键["current_price"]并将其分配给parsel将提取的任何值。新的["quote"]["title"]密钥也是如此。

::text

是一个parsel自己的伪元素支持它将每个 CSS 查询转换为 XPath。在这种情况下,::text将变为/text()

get()

获取实际数据。

replace(" • ",":")

用新的东西替换旧的东西。

提取右面板数据:

about_panel_keys = selector.css(".gyFHrc .mfs7Fc::text").getall()
about_panel_values = selector.css(".gyFHrc .P6K39c").xpath("normalize-space()").getall()

for key, value in zip_longest(about_panel_keys, about_panel_values):
    key_value = key.lower().replace(" ", "_")
    ticker_data["about_panel"][key_value] = value

代码

解释

getall()

获取所有匹配list的。

祖兹100097

也可以获取空白文本节点。默认情况下,空白文本节点将被跳过,导致不完整的输出。

lower()

将所有字符串字符小写。

zip_longest()

组合多个迭代器。zip()zip_longest()之间的区别在于zip()以最短迭代器结束,而zip_longest()迭代到最长迭代器的长度。

[key_value]

将使用它自己的提取值动态地将键添加到字典中。

从右侧面板中提取描述和扩展数据:

# description "about" and  extensions extraction
ticker_data["about_panel"]["description"] = selector.css(".bLLb2d::text").get()
ticker_data["about_panel"]["extensions"] = selector.css(".w2tnNd::text").getall()

提取新闻结果:

# news extarction
if selector.css(".yY3Lee").get():
    for index, news in enumerate(selector.css(".yY3Lee"), start=1):
        ticker_data["news"]["items"].append({
            "position": index,
            "title": news.css(".Yfwt5::text").get(),
            "link": news.css(".z4rs2b a::attr(href)").get(),
            "source": news.css(".sfyJob::text").get(),
            "published": news.css(".Adak::text").get(),
            "thumbnail": news.css("img.Z4idke::attr(src)").get()
        })
else: 
    ticker_data["news"]["error"] = f"No news result from a {ticker}."

代码

解释

if selector.css(".yY3Lee").get()

检查新闻结果是否存在。无需检查if <element> is not None

swz 100106 swz 100116 swz 100107 swz 100105

向可迭代对象添加计数器并返回。start=1将从 1 开始计数,而不是从默认值 0 开始计数。

ticker_data["news"].append({})

append提取数据到list作为字典。

::attr(src)

也是一个parsel伪元素支持从节点获取src属性。等效于 XPath/@src

ticker_data["news"]["error"]

创建一个新的"error"密钥和错误发生时的消息。

提取财务绩效表数据:

# finance perfomance table
# checks if finance table exists
if selector.css(".slpEwd .roXhBd").get():
    fin_perf_col_2 = selector.css(".PFjsMe+ .yNnsfe::text").get()           # e.g. Dec 2021
    fin_perf_col_3 = selector.css(".PFjsMe~ .yNnsfe+ .yNnsfe::text").get()  # e.g. Year/year change

    for fin_perf in selector.css(".slpEwd .roXhBd"):
        if fin_perf.css(".J9Jhg::text , .jU4VAc::text").get():

            """
            if fin_perf.css().get() statement is needed, otherwise first dict key and sub dict values would be None:

            "finance_perfomance": {
            "table": [
                {
                "null": {
                    "Dec 2021": null,
                    "Year/year change": null
                }
            }
            """             

            perf_key = fin_perf.css(".J9Jhg::text , .jU4VAc::text").get()   # e.g. Revenue, Net Income, Operating Income..
            perf_value_col_1 = fin_perf.css(".QXDnM::text").get()           # 60.3B, 26.40%..   
            perf_value_col_2 = fin_perf.css(".gEUVJe .JwB6zf::text").get()  # 2.39%, -21.22%..

            ticker_data["finance_perfomance"]["table"].append({
                perf_key: {
                    fin_perf_col_2: perf_value_col_1, # dynamically add key and value from the second (2) column
                    fin_perf_col_3: perf_value_col_2  # dynamically add key and value from the third (3) column
                }
            })
else:
    ticker_data["finance_perfomance"]["error"] = f"No 'finence perfomance table' for {ticker}."

提取你可能是"interested in"/"people also search for"的结果:

    # "you may be interested in" results
    if selector.css(".HDXgAf .tOzDHb").get():
        for index, other_interests in enumerate(selector.css(".HDXgAf .tOzDHb"), start=1):
            ticker_data["interested_in"]["items"].append(discover_more_tickers(index, other_interests))
    else:
        ticker_data["interested_in"]["error"] = f"No 'you may be interested in` results for {ticker}"


    # "people also search for" results
    if selector.css(".HDXgAf+ div .tOzDHb").get():
        for index, other_tickers in enumerate(selector.css(".HDXgAf+ div .tOzDHb"), start=1):
            ticker_data["people_also_search_for"]["items"].append(discover_more_tickers(index, other_tickers))
    else:
        ticker_data["people_also_search_for"]["error"] = f"No 'people_also_search_for` in results for {ticker}"

# ....

def discover_more_tickers(index: int, other_data: str):
    """
    if price_change_formatted will start complaining,
    check beforehand for None values with try/except or if statement and set it to 0.

    however, re.search(r"\d{1}%|\d{1,10}\.\d{1,2}%" should get the job done.
    """
    return {
            "position": index,
            "ticker": other_data.css(".COaKTb::text").get(),
            "ticker_link": f'https://www.google.com/finance{other_data.attrib["href"].replace("./", "/")}',
            "title": other_data.css(".RwFyvf::text").get(),
            "price": other_data.css(".YMlKec::text").get(),
            "price_change": other_data.css("[jsname=Fe7oBc]::attr(aria-label)").get(),
            # https://regex101.com/r/BOFBlt/1
            # Up by 100.99% -> 100.99%
            "price_change_formatted": re.search(r"\d{1}%|\d{1,10}\.\d{1,2}%", other_data.css("[jsname=Fe7oBc]::attr(aria-label)").get()).group()
        }

代码

解释

discover_more_tickers()

创建的函数用于将两个相同的代码组合成一个函数。这样,只需要在一个地方更改代码。

zwz 100130 zwz 100136 zwz 100131 zwz 100129

获取节点属性。

[jsname=Fe7oBc]

是一个CSS 选择器,用于选择具有指定属性和值的元素例如[attribute=value].

re.search()

匹配部分字符串并仅获取数字和%值。而group()通过正则表达式返回匹配的字符串。

返回并打印数据:

# def scrape_google_finance(ticker: str):
    # ticker_data = {
    #     "ticker_data": {},
    #     "about_panel": {},
    #     "news": {"items": []},
    #     "finance_perfomance": {"table": []}, 
    #     "people_also_search_for": {"items": []},
    #     "interested_in": {"items": []}
    # }

    # extraction code...

    return ticker_data

print(json.dumps(data_1, indent=2, ensure_ascii=False))

完整输出:

{
  "ticker_data": {
    "current_price": "$2,665.75",
    "quote": "GOOGL:NASDAQ",
    "title": "Alphabet Inc Class A"
  },
  "about_panel": {
    "previous_close": "$2,717.77",
    "day_range": "$2,659.31 - $2,713.40",
    "year_range": "$2,193.62 - $3,030.93",
    "market_cap": "1.80T USD",
    "volume": "1.56M",
    "p/e_ratio": "23.76",
    "dividend_yield": "-",
    "primary_exchange": "NASDAQ",
    "ceo": "Sundar Pichai",
    "founded": "Oct 2, 2015",
    "headquarters": "Mountain View, CaliforniaUnited States",
    "website": "abc.xyz",
    "employees": "156,500",
    "description": "Alphabet Inc. is an American multinational technology conglomerate holding company headquartered in Mountain View, California. It was created through a restructuring of Google on October 2, 2015, and became the parent company of Google and several former Google subsidiaries. The two co-founders of Google remained as controlling shareholders, board members, and employees at Alphabet. Alphabet is the world's third-largest technology company by revenue and one of the world's most valuable companies. It is one of the Big Five American information technology companies, alongside Amazon, Apple, Meta and Microsoft.\nThe establishment of Alphabet Inc. was prompted by a desire to make the core Google business \"cleaner and more accountable\" while allowing greater autonomy to group companies that operate in businesses other than Internet services. Founders Larry Page and Sergey Brin announced their resignation from their executive posts in December 2019, with the CEO role to be filled by Sundar Pichai, also the CEO of Google. Page and Brin remain co-founders, employees, board members, and controlling shareholders of Alphabet Inc. ",
    "extensions": [
      "Stock",
      "US listed security",
      "US headquartered"
    ]
  },
  "news": [
    {
      "position": 1,
      "title": "Amazon Splitting Stock, Alphabet Too. Which Joins the Dow First?",
      "link": "https://www.barrons.com/articles/amazon-stock-split-dow-jones-51646912881?tesla=y",
      "source": "Barron's",
      "published": "1 month ago",
      "thumbnail": "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcRlf6wb63KP9lMPsOheYDvvANIfevHp17lzZ-Y0d0aQO1-pRCIDX8POXGtZBQk"
    },
    {
      "position": 2,
      "title": "Alphabet's quantum tech group Sandbox spins off into an independent company",
      "link": "https://www.cnbc.com/2022/03/22/alphabets-quantum-tech-group-sandbox-spins-off-into-an-independent-company.html",
      "source": "CNBC",
      "published": "2 weeks ago",
      "thumbnail": "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSIyv1WZJgDvwtMW8e3RAs9ImXtTZSmo2rfmCKIASk4B_XofZfZ8AbDLAMolhk"
    },
    {
      "position": 3,
      "title": "Cash-Rich Berkshire Hathaway, Apple, and Alphabet Should Gain From Higher \nRates",
      "link": "https://www.barrons.com/articles/cash-rich-berkshire-hathaway-apple-and-alphabet-should-gain-from-higher-rates-51647614268",
      "source": "Barron's",
      "published": "3 weeks ago",
      "thumbnail": "https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSZ6dJ9h9vXlKrWlTmHiHxlfYVbViP5DAr9a_xV4LhNUOaNS01RuPmt-5sjh4c"
    },
    {
      "position": 4,
      "title": "Amazon's Stock Split Follows Alphabet's. Here's Who's Next.",
      "link": "https://www.barrons.com/articles/amazon-stock-split-who-next-51646944161",
      "source": "Barron's",
      "published": "1 month ago",
      "thumbnail": "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSJGKk2i1kLT_YToKJlJnhWaaj_ujLvhhZ5Obw_suZcu_YyaDD6O_Llsm1aqt8"
    },
    {
      "position": 5,
      "title": "Amazon, Alphabet, and 8 Other Beaten-Up Growth Stocks Set to Soar",
      "link": "https://www.barrons.com/articles/amazon-stock-growth-buy-51647372422",
      "source": "Barron's",
      "published": "3 weeks ago",
      "thumbnail": "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcTxotkd3p81U7xhmCTJ6IO0tMf_yVKv3Z40bafvtp9XCyosyB4WAuX7Qt-t7Ds"
    },
    {
      "position": 6,
      "title": "Is It Too Late to Buy Alphabet Stock?",
      "link": "https://www.fool.com/investing/2022/03/14/is-it-too-late-to-buy-alphabet-stock/",
      "source": "The Motley Fool",
      "published": "3 weeks ago",
      "thumbnail": "https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcQv5D9GFKMNUPvMd91aRvi83p12y91Oau1mh_4FBPj6LCNK3cH1vEZ3_gFU4kI"
    }
  ],
  "finance_perfomance": [
    {
      "Revenue": {
        "Dec 2021": "75.32B",
        "Year/year change": "32.39%"
      }
    },
    {
      "Net income": {
        "Dec 2021": "20.64B",
        "Year/year change": "35.56%"
      }
    },
    {
      "Diluted EPS": {
        "Dec 2021": "30.69",
        "Year/year change": "37.62%"
      }
    },
    {
      "Net profit margin": {
        "Dec 2021": "27.40%",
        "Year/year change": "2.39%"
      }
    },
    {
      "Operating income": {
        "Dec 2021": "21.88B",
        "Year/year change": "39.83%"
      }
    },
    {
      "Net change in cash": {
        "Dec 2021": "-2.77B",
        "Year/year change": "-143.78%"
      }
    },
    {
      "Cash and equivalents": {
        "Dec 2021": "20.94B",
        "Year/year change": "-20.86%"
      }
    },
    {
      "Cost of revenue": {
        "Dec 2021": "32.99B",
        "Year/year change": "26.49%"
      }
    }
  ],
  "people_also_search_for": [
    {
      "position": 1,
      "ticker": "GOOG",
      "ticker_link": "https://www.google.com/finance/quote/GOOG:NASDAQ",
      "title": "Alphabet Inc Class C",
      "price": "$2,680.21",
      "price_change": "Down by 1.80%",
      "price_change_formatted": "1.80%"
    }, ... other results
    {
      "position": 18,
      "ticker": "SQ",
      "ticker_link": "https://www.google.com/finance/quote/SQ:NYSE",
      "title": "Block Inc",
      "price": "$123.22",
      "price_change": "Down by 2.15%",
      "price_change_formatted": "2.15%"
    }
  ],
  "interested_in": [
    {
      "position": 1,
      "ticker": "Index",
      "ticker_link": "https://www.google.com/finance/quote/.INX:INDEXSP",
      "title": "S&P 500",
      "price": "4,488.28",
      "price_change": "Down by 0.27%",
      "price_change_formatted": "0.27%"
    }, ... other results
    {
      "position": 18,
      "ticker": "NFLX",
      "ticker_link": "https://www.google.com/finance/quote/NFLX:NASDAQ",
      "title": "Netflix Inc",
      "price": "$355.88",
      "price_change": "Down by 1.73%",
      "price_change_formatted": "1.73%"
    }
  ]
}

抓取多个谷歌金融代码行情

for ticker in ["DAX:INDEXDB", "GOOGL:NASDAQ", "MSFT:NASDAQ"]:
    data = scrape_google_finance(ticker=ticker)
    print(json.dumps(data["ticker_data"], indent=2, ensure_ascii=False))

输出:

{
  "current_price": "14,178.23",
  "quote": "DAX:Index",
  "title": "DAX PERFORMANCE-INDEX"
}
{
  "current_price": "$2,665.75",
  "quote": "GOOGL:NASDAQ",
  "title": "Alphabet Inc Class A"
}
{
  "current_price": "$296.97",
  "quote": "MSFT:NASDAQ",
  "title": "Microsoft Corporation"
}

提取谷歌财经图表时间序列数据

抓取时间序列数据并不是一个特别好的主意,因此最好使用专用 API 来完成工作。

如何找到 Google 用于构建时间序列图表的 API?

图片

我们可以通过简单地检查带有报价GOOGL](https://www.nasdaq.com/market-activity/stocks/googl)的[纳斯达克图表来确认 Google 正在使用 NASDAQ API 来获取时间序列数据:

图片

在这种情况下,我使用了Nasdaq Data Link API,它支持Python、R和[Excel(https://docs.data.nasdaq.com/docs/excel-installation)1。我相信其他平台也提供 Python 集成。

我假设你已经安装了一个nasdaq-data-link包,但如果没有,你可以这样做。如果您设置 Python 的默认版本:

# WSL
$ pip install nasdaq-data-link

如果您没有设置 Python 的默认版本:

# WSL
$ python3.9 -m pip install nasdaq-data-link # change python to your version: python3.X

data.nasdaq.com/account/profile处获取您的 API 密钥:

图片

创建一个.env文件来存储您的 API 密钥:

touch .nasdaq_api_key # change the file name to yours 

# paste API key inside the created file

抓取谷歌财经时间序列数据

import nasdaqdatalink

def nasdaq_get_timeseries_data():
    nasdaqdatalink.read_key(filename=".nasdaq_api_key")
    # print(nasdaqdatalink.ApiConfig.api_key) # prints api key from the .nasdaq_api_key file

    timeseries_data = nasdaqdatalink.get("WIKI/GOOGL", collapse="monthly") # not sure what "WIKI" stands for
    print(timeseries_data)

nasdaq_get_timeseries_data()

时间序列提取码说明

代码

解释

祖兹 100193

读取您的 API 密钥。

".nasdaq_api_key"

是您的.env变量与秘密 API 密钥。所有秘密变量(如果我错了,请纠正我)以.符号开头来展示它。

nasdaqdatalink.ApiConfig.api_key

测试您的 API 是否被nasdaq-data-link包识别。示例输出:2adA_avd12CXauv_1zxs

nasdaqdatalink.get()

得到时间序列数据,即数据集结构。

输出一个pandasDataFrame对象:

                Open     High      Low    Close      Volume  Ex-Dividend  Split Ratio    Adj. Open    Adj. High     Adj. Low   Adj. Close  Adj. Volume
Date                                                                                                                                                  
2004-08-31   102.320   103.71   102.16   102.37   4917800.0          0.0          1.0    51.318415    52.015567    51.238167    51.343492    4917800.0
2004-09-30   129.899   132.30   129.00   129.60  13758000.0          0.0          1.0    65.150614    66.354831    64.699722    65.000651   13758000.0
2004-10-31   198.870   199.95   190.60   190.64  42282600.0          0.0          1.0    99.742897   100.284569    95.595093    95.615155   42282600.0
2004-11-30   180.700   183.00   180.25   181.98  15384600.0          0.0          1.0    90.629765    91.783326    90.404069    91.271747   15384600.0
2004-12-31   199.230   199.88   192.56   192.79  15321600.0          0.0          1.0    99.923454   100.249460    96.578127    96.693484   15321600.0
...              ...      ...      ...      ...         ...          ...          ...          ...          ...          ...          ...          ...
2017-11-30  1039.940  1044.14  1030.07  1036.17   2190379.0          0.0          1.0  1039.940000  1044.140000  1030.070000  1036.170000    2190379.0
2017-12-31  1055.490  1058.05  1052.70  1053.40   1156357.0          0.0          1.0  1055.490000  1058.050000  1052.700000  1053.400000    1156357.0
2018-01-31  1183.810  1186.32  1172.10  1182.22   1643877.0          0.0          1.0  1183.810000  1186.320000  1172.100000  1182.220000    1643877.0
2018-02-28  1122.000  1127.65  1103.00  1103.92   2431023.0          0.0          1.0  1122.000000  1127.650000  1103.000000  1103.920000    2431023.0
2018-03-31  1063.900  1064.54   997.62  1006.94   2940957.0          0.0          1.0  1063.900000  1064.540000   997.620000  1006.940000    2940957.0

[164 rows x 12 columns]

如您所见,没有关于 2019-2022 年的数据。这是因为我使用的是免费版,适合实验和探索,正如纳斯达克所说的。

纳斯达克限价

经过身份验证的用户

高级用户

每 10 秒调用 300 次。

-

每 10 分钟 2,000 个电话。

每 10 分钟 5,000 个电话。

每天限制 50,000 个电话。

每天限制 720,000 个电话。

其他纳斯达克 API 资源

资源

解释

时间序列参数

通过向请求添加其他参数来自定义(操作)您的时间序列数据集。时间序列数据的变换

curl组成一个请求

使用curl轻松发出请求。

可用的保存格式

CSVXMLJSON中保存数据

数据格式

将数据转换为可用的格式。

批量下载

在一次调用中下载数据库中的所有数据。

可用方法的详细指南

了解如何更详细地使用data-link-python包。

data-link-pythonon GitHub

阅读完整的文档。

quandl-pythonon GitHub

data-link-python在幕后使用了什么。您可以在此处找到更多文档。


友情链接

  • 在线IDE中的代码

  • GitHub存储库


其他

如果您有任何要分享的内容、任何问题、建议或无法正常工作的内容,请通过 Twitter 与@dimitryzub或@serp_api联系。

您的,Dmitriy 和 SerpApi 团队的其他成员。

加入我们Twitter|YouTube

添加功能请求💫 或错误🐞

Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐