This is an old revision of the document!
Linux : Compression and Archiving
What is Compression?
Compression is a process or technique to reduce the size of a data file. This is achieved using specific algorithms that identify patterns in the data to reduce its size.
Compression is divided into two types:
- Lossless Compression: Data is compressed without any loss of information. The decompressed data is identical to the original.
- Lossy Compression: Some information is intentionally removed to reduce file size. The decompressed result may not be identical to the original.
What is Archiving?
Before exploring compression, it is important to understand archiving.
Archiving is the process of collecting multiple files or directories into a single file. This archive can then be compressed using compression tools.
tar
tar is a common archiving tool on Linux. Example usage:
tar -cvf archive.tar file1 file2 directory
In the example above:
-c
: Create a new archive
-v
: Verbose mode (shows detailed process)
-f
: Specifies the archive file name
You can add compression options:
-z
for gzip,
-j
for bzip2, or
-J
for xz.
Compression Tools
Common compression tools on Linux:
| Compression Tool | Compression Algorithm |
|---|---|
| gzip | DEFLATE |
| bzip2 | Burrows-Wheeler |
| xz | LZMA |
| zip | DEFLATE |
gzip
gzip is a widely used compression utility using the DEFLATE algorithm.
- Compress a file:
gzip filename
→ Produces <code>filename.gz</code>
- Decompress a file:
gunzip filename.gz # or gzip -d filename.gz
- View compression info:
gzip -l filename.gz
bzip2
bzip2 uses the Burrows-Wheeler algorithm for better compression than gzip.
- Compress a file:
bzip2 filename
→ Produces <code>filename.bz2</code>
- Decompress a file:
bunzip2 filename.bz2 # or bzip2 -d filename.bz2
- View compression info:
bzcat filename.bz2 | wc -c
xz
xz uses the LZMA (Lempel-Ziv-Markov chain algorithm). It offers higher compression ratios but is slower and more resource-intensive.
- Compress a file:
xz filename
→ Produces <code>filename.xz</code>
- Decompress a file:
unxz filename.xz
# or
xz -d filename.xz
- View compression info:
xz -l filename.xz
zip
zip is commonly used to compress and archive multiple files.
- Compress files:
zip archive.zip file1 file2 folder1
- Recursively compress a folder:
zip -r archive.zip folder1
- Add files to existing zip:
zip -u archive.zip file3
- Password-protect a zip:
zip -r -e archive.zip folder1
→ Produces <code>archive.zip</code>
- Decompress:
unzip archive.zip
- View compression info:
unzip -l archive.zip
Archiving and Compression with tar
tar supports built-in compression in one command:
tar + gzip
# Compress tar -czvf archive.tar.gz directory/ # Decompress tar -xzvf archive.tar.gz
tar + bzip2
# Compress tar -cjvf archive.tar.bz2 directory/ # Decompress tar -xjvf archive.tar.bz2
tar + xz
# Compress tar -cJvf archive.tar.xz directory/ # Decompress tar -xJvf archive.tar.xz
Explanation of flags:
-c
: Create archive
-x
: Extract archive
-z
: Use gzip
-j
: Use bzip2
-J
: Use xz
-v
: Verbose (detailed output)
-f
: Specify archive file name