====== Linux : Compression and Archiving ======
===== What is Compression? =====
**Compression** is a process or technique to reduce the size of a data file. This is achieved using specific algorithms that identify patterns in the data to reduce its size.
Compression is divided into two types:
* **Lossless Compression**: Data is compressed without any loss of information. The decompressed data is identical to the original.
* **Lossy Compression**: Some information is intentionally removed to reduce file size. The decompressed result may not be identical to the original.
===== What is Archiving? =====
Before exploring compression, it is important to understand **archiving**.
**Archiving** is the process of collecting multiple files or directories into a single file. This archive can then be compressed using compression tools.
==== tar ====
**tar** is a common archiving tool on Linux. Example usage:
tar -cvf archive.tar file1 file2 directory
In the example above:
* -c: Create a new archive
* -v: Verbose mode (shows detailed process)
* -f: Specifies the archive file name
You can add compression options: -z for gzip, -j for bzip2, or -J for xz.
===== Compression Tools =====
Common compression tools on Linux:
^ Compression Tool ^ Compression Algorithm ^
| **gzip** | DEFLATE |
| **bzip2** | Burrows-Wheeler |
| **xz** | LZMA |
| **zip** | DEFLATE |
==== gzip ====
**gzip** is a widely used compression utility using the **DEFLATE** algorithm.
* Compress a file:
gzip filename
→ Produces filename.gz
* Decompress a file:
gunzip filename.gz
# or
gzip -d filename.gz
* View compression info:
gzip -l filename.gz
==== bzip2 ====
**bzip2** uses the **Burrows-Wheeler** algorithm for better compression than gzip.
* Compress a file:
bzip2 filename
→ Produces filename.bz2
* Decompress a file:
bunzip2 filename.bz2
# or
bzip2 -d filename.bz2
* View compression info:
bzcat filename.bz2 | wc -c
==== xz ====
**xz** uses the **LZMA (Lempel-Ziv-Markov chain algorithm)**. It offers higher compression ratios but is slower and more resource-intensive.
* Compress a file:
xz filename
→ Produces filename.xz
* Decompress a file:
unxz filename.xz
# or
xz -d filename.xz
* View compression info:
xz -l filename.xz
==== zip ====
**zip** is commonly used to compress and archive multiple files.
* Compress files:
zip archive.zip file1 file2 folder1
* Recursively compress a folder:
zip -r archive.zip folder1
* Add files to existing zip:
zip -u archive.zip file3
* Password-protect a zip:
zip -r -e archive.zip folder1
→ Produces archive.zip
* Decompress:
unzip archive.zip
* View compression info:
unzip -l archive.zip
===== Archiving and Compression with tar =====
**tar** supports built-in compression in one command:
==== tar + gzip ====
# Compress
tar -czvf archive.tar.gz directory/
# Decompress
tar -xzvf archive.tar.gz
==== tar + bzip2 ====
# Compress
tar -cjvf archive.tar.bz2 directory/
# Decompress
tar -xjvf archive.tar.bz2
==== tar + xz ====
# Compress
tar -cJvf archive.tar.xz directory/
# Decompress
tar -xJvf archive.tar.xz
Explanation of flags:
* -c: Create archive
* -x: Extract archive
* -z: Use gzip
* -j: Use bzip2
* -J: Use xz
* -v: Verbose (detailed output)
* -f: Specify archive file name