extracting .7z file using google colab

2 min read 06-10-2024
extracting .7z file using google colab


Extracting .7z Files in Google Colab: A Step-by-Step Guide

Google Colab offers a powerful cloud-based environment for data science and machine learning tasks. However, you might encounter situations where you need to extract .7z files, a popular compression format, within your Colab notebook. While Colab itself doesn't natively support .7z extraction, we can easily achieve this using the py7zr library.

Scenario: Imagine you're working on a project in Colab and need to access data stored within a .7z archive. You've uploaded the archive to your Google Drive and want to extract its contents directly within your notebook.

Original Code (Without py7zr):

# This code will NOT work as Colab doesn't support .7z extraction by default
import zipfile

with zipfile.ZipFile('/content/my_data.7z', 'r') as zip_ref:
  zip_ref.extractall('/content/extracted_data')

The Solution: py7zr Library

The py7zr library is a Python package designed for working with various archive formats, including .7z. Here's how to use it:

  1. Install py7zr:

    !pip install py7zr
    
  2. Extract the .7z file:

    import py7zr
    
    with py7zr.SevenZipFile('/content/my_data.7z', 'r') as archive:
        archive.extractall('/content/extracted_data')
    

Explanation:

  • The first line installs the py7zr library.
  • The SevenZipFile object opens the .7z archive for reading.
  • extractall() extracts all files and folders within the archive to the specified directory.

Additional Notes:

  • File Paths: Ensure the file paths for your .7z archive and the extraction destination are correct. You can use Google Drive mounting to access files from your drive.

  • Password Protection: If your .7z file is password protected, you can provide the password when opening the SevenZipFile object:

    with py7zr.SevenZipFile('/content/my_data.7z', 'r', password='your_password') as archive:
        archive.extractall('/content/extracted_data')
    

Benefits of Using py7zr:

  • Supports Various Formats: Besides .7z, py7zr can handle other formats like 7z, ZIP, TAR, GZIP, BZIP2, and more.
  • Flexibility: Offers options for extracting specific files or folders.
  • Easy Integration: Seamlessly integrates into your Colab workflow.

Conclusion:

Extracting .7z files in Google Colab becomes effortless with the py7zr library. By using the provided steps and understanding its capabilities, you can efficiently access data stored within .7z archives for your machine learning and data science projects.

References: