Sometimes while dealing with zip files, we may face Zipfile.badzipfile: file is not a zip file error. There can be many possible reasons for this error. This error can occur when the input file format is incorrect, the file is corrupted or the file that you are trying to open is not a zip file anymore. In this short article, we will learn how we can solve the Zipfile.badzipfile: file is not a zip file error using various methods. Moreover, we will also discuss how to understand errors in Python taking Zipfile.badzipfile: file is not a zip file as an example.
Zipfile.badzipfile: file is not a zip file – Possible solutions
Before going into the possible solutions let us first figure out the reasons for getting this error. Some of the main causes are listed below:
- The file name is incorrect: If the file is in zip format but the contents are not in the correct format then we will get the same error:
- Corrupted file: Sometime, we may face the same error if the file has been damaged or the contents inside are not readable anymore.
- Incorrect Path: Before opening the file, make sure that the path is correct.
- Opening non-zip file: Another thing to be sure that the file should be with .zip extension.
Let us now solve the Zipfile.badzipfile: file is not a zip file error using various possible methods:
Solution-1: Using a user-defined function – fixed the corrupted zip file
One way to solve the Zipfile.badzipfile: file is not a zip file error is to use a user-defined function and then use the conditional statements to get rid of the error:
def fixBadZipfile(zipFile): f = open(zipFile, 'r+b') data = f.read() pos = data.find('\x50\x4b\x05\x06') if (pos > 0): self._log("Trancating file at location " + str(pos + 22)+ ".") f.seek(pos + 22) f.truncate() f.close() else: # raising the error print("Error occurs")
The code is a Python function named
fixBadZipfile that attempts to fix a corrupted zip file by finding the end of the central directory signature (‘\x50\x4b\x05\x06’) and truncating the file from that position. The function opens the specified zip file in binary mode (‘r+b’) for reading and writing, reads its content into memory, and uses the find method to search for the end of the central directory signature. If the signature is found, the function seeks to the position immediately after the signature, truncates the file to remove any corrupted data that might be present after the signature, and closes the file. If the signature is not found, an error is raised indicating that the file is truncated.
Solution-2: Repair the corrupted zip file
Now let us see how we can repair the corrupted zip file and get rid of the error:
def main(zipFileContainer): content = zipFileContainer.read() pos = content.rfind('\x50\x4b\x05\x06') if pos>0: zipFileContainer.seek(pos+20) link above. zipFileContainer.truncate() zipFileContainer.write('\x00\x00') zipFileContainer.seek(0) return zipFileContainer
This solution takes a file-like object, zipFileContainer, assumed to contain a zip archive, and attempts to repair a corrupt zip file by finding and truncating the end of the zip’s central directory. The code reads the contents of zipFileContainer into memory, performs a reverse find of the end of central directory signature (‘\x50\x4b\x05\x06’), and if found, seeks to a position 20 bytes after the signature, truncates the file to remove any corrupted data that might be present after the signature, writes two bytes (‘\x00\x00’) to the file to indicate a zero-byte comment length, and returns the zipFileContainer object. If the end of central directory signature is not found, the code will return the zipFileContainer object without modifying it. Hopefully, now you will no more get the Zipfile.badzipfile: file is not a zip file error.
Solution-3: Downloading 7zip
Another method to get rid of the error is to download the 7zip and add it to the same directory. Then the next step is to use the following script:
import subprocess ziploc = "C:/Program Files/7-Zip/7z.exe" cmd = [ziploc, 'e',your_Zip_file.zip ,'-o'+ OutputDirectory ,'-r' ] sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)
Hopefully, now you will not get the error anymore.
Solution-4: Closing the file before unzipping the file
If none of the above-mentioned methods helped you then most probably you are not closing the file before unzipping it. Try to close the file before unzipping as shown below:
with open(path, 'wb') as outFile: outFile.write(data) outFile.close() # was missing this with zipfile.ZipFile(path, 'r') as zip: zip.extractall(destination)
As you can see that, The code opens a file at the specified path in binary write mode (‘wb’) as outFile, writes the contents of data to the file, and closes it. Then it opens the file as a zip archive using the zipfile module’s ZipFile class in read mode (‘r’) as zip and extracts all its contents to the specified destination directory. The code ensures that the file is closed properly after writing the contents and extracting the contents from the zip archive.
Solution-5: Confirm the gzip format
Sometimes you have to confirm if the zip file is actually in a gzip format or not. You can use the following simple script to confirm the gzip file.
import requests import tarfile url = ".tar.gz link" response = requests.get(url, stream=True) file = tarfile.open(fileobj=response.raw, mode="r|gz") file.extractall(path=".")
As you can see, in this solution we imported the requests and tarfile modules and makes a GET request to a URL for a .tar.gz archive. The stream argument is set to True to stream the response so that the archive can be processed as it is being received. The tarfile module’s open method is then used to open the archive from the response object’s raw attribute in read mode (“r|gz”) as file. Finally, the contents of the archive are extracted to the current working directory (.) using the extractall method. This code downloads and extracts a .tar.gz archive using the requests and tarfile modules in a single Python script.
In this short article, we discussed how we can solve the Zipfile.badzipfile: file is not a zip file error in Python. we covered 5 different solutions and explained each of them.
- TypeError: not all arguments converted during string formatting
- Typeerror: ‘float’ object is not iterable
- Importerror: DLL load failed: the specified procedure could not be found
- TypeError: ‘builtin_function_or_method’ object is not subscriptable
- ModuleNotFoundError: No module named ‘bs4’
- Typeerror: string indices must be integers
- TypeError: can’t multiply sequence by non-int of type ‘numpy.float64’
- TypeError: only integer scalar arrays can be converted to a scalar index
- ‘numpy.ndarray’ object has no attribute ‘append’
- numpy.core.multiarray failed to import
- Attributeerror: module matplotlib has no attribute subplots