I have a Python program which is going to take text files as input. However, some of these files may be gzip compressed.
Is there a cross-platform, usable from Python way to determine if a file is gzip compressed or not?
Is the following reliable or could an ordinary text file 'accidentally' look gzip-like enough for me to get false positives?
try:
gzip.GzipFile(filename, 'r')
# compressed
# ...
except:
# not compressed
# ...
The magic number for gzip compressed files is 1f 8b. Although testing for this is not 100% reliable, it is highly unlikely that "ordinary text files" start with those two bytes—in UTF-8 it's not even legal.
Usually gzip compressed files sport the suffix .gz though. Even gzip(1) itself won't unpack files without it unless you --force it to. You could conceivably use that, but you'd still have to deal with a possible IOError (which you have to in any case).
One problem with your approach is, that gzip.GzipFile() will not throw an exception if you feed it an uncompressed file. Only a later read() will. This means, that you would probably have to implement some of your program logic twice. Ugly.
所有评论(0)