====== Linux : Compression and Archiving ====== ===== What is Compression? ===== **Compression** is a process or technique to reduce the size of a data file. This is achieved using specific algorithms that identify patterns in the data to reduce its size. Compression is divided into two types: * **Lossless Compression**: Data is compressed without any loss of information. The decompressed data is identical to the original. * **Lossy Compression**: Some information is intentionally removed to reduce file size. The decompressed result may not be identical to the original. ===== What is Archiving? ===== Before exploring compression, it is important to understand **archiving**. **Archiving** is the process of collecting multiple files or directories into a single file. This archive can then be compressed using compression tools. ==== tar ==== **tar** is a common archiving tool on Linux. Example usage: tar -cvf archive.tar file1 file2 directory In the example above: * -c: Create a new archive * -v: Verbose mode (shows detailed process) * -f: Specifies the archive file name You can add compression options: -z for gzip, -j for bzip2, or -J for xz. ===== Compression Tools ===== Common compression tools on Linux: ^ Compression Tool ^ Compression Algorithm ^ | **gzip** | DEFLATE | | **bzip2** | Burrows-Wheeler | | **xz** | LZMA | | **zip** | DEFLATE | ==== gzip ==== **gzip** is a widely used compression utility using the **DEFLATE** algorithm. * Compress a file: gzip filename → Produces filename.gz * Decompress a file: gunzip filename.gz # or gzip -d filename.gz * View compression info: gzip -l filename.gz ==== bzip2 ==== **bzip2** uses the **Burrows-Wheeler** algorithm for better compression than gzip. * Compress a file: bzip2 filename → Produces filename.bz2 * Decompress a file: bunzip2 filename.bz2 # or bzip2 -d filename.bz2 * View compression info: bzcat filename.bz2 | wc -c ==== xz ==== **xz** uses the **LZMA (Lempel-Ziv-Markov chain algorithm)**. It offers higher compression ratios but is slower and more resource-intensive. * Compress a file: xz filename → Produces filename.xz * Decompress a file: unxz filename.xz # or xz -d filename.xz * View compression info: xz -l filename.xz ==== zip ==== **zip** is commonly used to compress and archive multiple files. * Compress files: zip archive.zip file1 file2 folder1 * Recursively compress a folder: zip -r archive.zip folder1 * Add files to existing zip: zip -u archive.zip file3 * Password-protect a zip: zip -r -e archive.zip folder1 → Produces archive.zip * Decompress: unzip archive.zip * View compression info: unzip -l archive.zip ===== Archiving and Compression with tar ===== **tar** supports built-in compression in one command: ==== tar + gzip ==== # Compress tar -czvf archive.tar.gz directory/ # Decompress tar -xzvf archive.tar.gz ==== tar + bzip2 ==== # Compress tar -cjvf archive.tar.bz2 directory/ # Decompress tar -xjvf archive.tar.bz2 ==== tar + xz ==== # Compress tar -cJvf archive.tar.xz directory/ # Decompress tar -xJvf archive.tar.xz Explanation of flags: * -c: Create archive * -x: Extract archive * -z: Use gzip * -j: Use bzip2 * -J: Use xz * -v: Verbose (detailed output) * -f: Specify archive file name