How to scrape a public tableau dashboard? [closed]

Mangs

315人浏览 · 2022-09-08 03:04:24

Mangs · 2022-09-08 03:04:24 发布

Answer a question

Every day I need to downlaod the data available on a public Tableau dashboard. After defining the parameters of interest (time series frequency, time series interval, etc) the dashboard allows you to download the series.

My life would be reasonably easier if I could automate the download of these series to a database using Python or R. I've already tried to analyze the requests made on the page but I couldn't get much further. Is there any way to automate this process?

The dashboard: https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima?:embed=y&:showAppBanner=false&:showShareOptions=true&:display_count=no&:showVizHome=no

Answers

Edit

I've made a tableau scraper library to extract the data from Tableau worksheets

You can get the data from worksheets in a pandas dataframe directly. Also, the parametered values are supported.

The following example get the data from worksheet Simples Demanda Máxima Ano, then switch to daily mode, shows the worksheet Simples Demanda Máxima Semana Dia data and then set start date to 01/01/2017 :

from tableauscraper import TableauScraper as TS

url = "https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

# show dataframe with yearly data
ws = wb.getWorksheet("Simples Demanda Máxima Ano")
print(ws.data)

# switch to daily
wb = wb.setParameter("Escala de Tempo DM Simp 4", "Dia")

# show dataframe with daily data
ws = wb.getWorksheet("Simples Demanda Máxima Semana Dia")
print(ws.data)

# switch to daily
wb = wb.setParameter(
    "Início Primeiro Período DM Simp 4", "01/01/2017")

# show dataframe with daily data from 01/01/2017
ws = wb.getWorksheet("Simples Demanda Máxima Semana Dia")
print(ws.data)

Try this on repl.it

Original post

This answer is similar to this one but the initial URL page and tableau base URL differ. The process/algo remains the same essentially but I will details the steps :

The graphic is generated in JS from the result of an API :

POST https://tableau.ons.org.br/ROOT_PATH/bootstrapSession/sessions/SESSION_ID

The SESSION_ID parameter is located (among other things) in tsConfigContainer textarea in the URL used to build the iframe.

Starting from https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima?:embed=y&:showAppBanner=false&:showShareOptions=true&:display_count=no&:showVizHome=no :

there is a textarea with id tsConfigContainer with a bunch of json values
extract the session_id and root path (vizql_root)
make a POST on https://tableau.ons.org.br/ROOT_PATH/bootstrapSession/sessions/SESSION_ID with the sheetId as form data
extract the json from the result (result is not json)

Code :

import requests
from bs4 import BeautifulSoup
import json
import re

url = "https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima"

r = requests.get(
    url,
    params= {
        ":embed":"y",
        ":showAppBanner":"false",
        ":showShareOptions":"true",
        ":display_count":"no",
        "showVizHome": "no"
    }
)
soup = BeautifulSoup(r.text, "html.parser")

tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'https://tableau.ons.org.br{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'

r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})

dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))

print(data["secondaryInfo"]["presModelMap"]["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"]["dataSegments"]["0"]["dataColumns"])

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia