Using the zipfile module to unzip a large data file in Python works correctly on Python 2 but produces the following error on Python 3.6.0:
BadZipFile: Bad CRC-32 for file 'myfile.csv'
I traced this to error handling code checking the CRC values.
Using ZipFile.testzip() on Python 2 returns nothing (all files are fine). Running it on Python 3 returns 'myfile.csv' indicating a problem with that file.
Code to reproduce on both Python 2 and Python 3 (involves a 300 MB download, sorry):
import zipfile
import urllib
import sys
url = "https://de.iplantcollaborative.org/anon-files//iplant/home/shared/commons_repo/curated/Vertnet_Amphibia_Sep2016/VertNet_Amphibia_Sept2016.zip"
if sys.version_info >= (3, 0, 0):
urllib.request.urlretrieve(url, "vertnet_latest_amphibians.zip")
else:
urllib.urlretrieve(url, "vertnet_latest_amphibians.zip")
archive = zipfile.ZipFile("vertnet_latest_amphibians.zip")
archive.testzip()
Does anyone understand why this difference exists and if there's a way to get Python 3 to properly extract the file using:
archive.extract("vertnet_latest_amphibians.csv")

所有评论(0)