Table of Contents

Linux : Compression and Archiving

What is Compression?

Compression is a process or technique to reduce the size of a data file. This is achieved using specific algorithms that identify patterns in the data to reduce its size.

Compression is divided into two types:

What is Archiving?

Before exploring compression, it is important to understand archiving.

Archiving is the process of collecting multiple files or directories into a single file. This archive can then be compressed using compression tools.

tar

tar is a common archiving tool on Linux. Example usage:

tar -cvf archive.tar file1 file2 directory

In the example above:

You can add compression options:

-z

for gzip,

-j

for bzip2, or

-J

for xz.

Compression Tools

Common compression tools on Linux:

Compression Tool Compression Algorithm
gzip DEFLATE
bzip2 Burrows-Wheeler
xz LZMA
zip DEFLATE

gzip

gzip is a widely used compression utility using the DEFLATE algorithm.

    gzip filename
 
  → Produces <code>filename.gz</code>
    gunzip filename.gz
    # or
    gzip -d filename.gz
 
    gzip -l filename.gz
 

bzip2

bzip2 uses the Burrows-Wheeler algorithm for better compression than gzip.

    bzip2 filename
 
  → Produces <code>filename.bz2</code>
    bunzip2 filename.bz2
    # or
    bzip2 -d filename.bz2
 
    bzcat filename.bz2 | wc -c
 

xz

xz uses the LZMA (Lempel-Ziv-Markov chain algorithm). It offers higher compression ratios but is slower and more resource-intensive.

    xz filename
 
  → Produces <code>filename.xz</code>
    unxz filename.xz
    # or
    xz -d filename.xz
 
    xz -l filename.xz
 

zip

zip is commonly used to compress and archive multiple files.

    zip archive.zip file1 file2 folder1
 
    zip -r archive.zip folder1
 
    zip -u archive.zip file3
 
    zip -r -e archive.zip folder1
 
→ Produces <code>archive.zip</code>
    unzip archive.zip
 
    unzip -l archive.zip
 

Archiving and Compression with tar

tar supports built-in compression in one command:

tar + gzip

# Compress
tar -czvf archive.tar.gz directory/
# Decompress
tar -xzvf archive.tar.gz

tar + bzip2

# Compress
tar -cjvf archive.tar.bz2 directory/
# Decompress
tar -xjvf archive.tar.bz2

tar + xz

# Compress
tar -cJvf archive.tar.xz directory/
# Decompress
tar -xJvf archive.tar.xz

Explanation of flags:

  • -c

    : Create archive

  • -x

    : Extract archive

  • -z

    : Use gzip

  • -j

    : Use bzip2

  • -J

    : Use xz

  • -v

    : Verbose (detailed output)

  • -f

    : Specify archive file name