"split" and "cat" - Split and Join Files

Provides a tutorial example on how to use 'split' command to split it into chunks and use 'cat' command to join chunks back into a single file.

If you have a large file, generated from an archive tool, a video generator, a memory dump, or a database backup, you may have trouble to open or copy it.

One workaround is to split it into chunks with the "split" command and keep those chunks in sub-directory. You can join those chunks back into the original file with the "cat" command.

Here is what I did to create a large compressed archive file and split it into chunks.

1. Create the compressed archive file with the "tar -c -z" command:

herong$ cd /var/lib/mysql

herong$ tar -c -z -f /tmp/database-backup.tar.gz data
... wait for the "tar" command to finish

herong$ cd /tmp

herong$ ls -l *.zip
-rwx------. 1 herong herong 9482463744 Oct 31 01:52 database-backup.tar.gz

2. Split the large file into chunks in a sub-directory. Option "-d" says to use numeric suffixes instead of alphabetic, like "chunk-00" and "chunk-01". Option "-b 1000000000" says to split it with 1000000000 bytes per chunk.

herong$ mkdir database-backup
herong$ cd database-backup

herong$ split -d -b 1000000000 ../database-backup.tar.gz chunk-
... wait for the 'split' command to finish

herong$ ls -l chunk*
-rwx------. 1 herong herong 1000000000 Oct 31 02:04 chunk-00
-rwx------. 1 herong herong 1000000000 Oct 31 02:05 chunk-01
-rwx------. 1 herong herong 1000000000 Oct 31 02:05 chunk-02
-rwx------. 1 herong herong 1000000000 Oct 31 02:06 chunk-03
-rwx------. 1 herong herong 1000000000 Oct 31 02:06 chunk-04
-rwx------. 1 herong herong 1000000000 Oct 31 02:06 chunk-05
-rwx------. 1 herong herong 1000000000 Oct 31 02:07 chunk-06
-rwx------. 1 herong herong 1000000000 Oct 31 02:07 chunk-07
-rwx------. 1 herong herong 1000000000 Oct 31 02:07 chunk-08
-rwx------. 1 herong herong  482463744 Oct 31 02:08 chunk-09

3. Save the original file name in the sub-directory. and delete the original file.

herong$ touch database-backup.tar.gz

herong$ rm ../database-backup.tar.gz

4. Copy them to other devices is much easier now.

5. Use "cat" command to join chunks back whenever needed:

herong$ cd database-backup

herong$ cat chunk* > database-backup.tar.gz
... wait for the "cat" command to finish

herong$ ls -l *.tar.gz
-rwx------. 1 herong herong 9482463744 Oct 31 02:52 database-backup.tar.gz

6. If joining chunks and creating the original large file is a problem, you can pipe the "cat" command output to the "tar -x -z" command directly:

herong$ cat chunk* > tar -x -v -z
... wait until all files are extracted

Note that only the the "tar" and "gzip" combination gives you this nice feature of managing a large archive as a stream of sequential chunks.

If you split a large ZIP file into chunks, you will not be able to use "unzip" as an output stream pipe on the "cat" command. This is because the table of content is stored at the end of the ZIP file, which is in the last chunk.

Someone on the Internet said that the "jar -xv" command is able unzip ZIP files as a stream of sequential chunks. Note that "jar" is a ZIP tool provided from the JDK (Java Development Kit) package. If you want to try it, here is the command:

herong$ cd database-backup

herong$ cat chunk* | jar -xv

Table of Contents

 About This Book

 Introduction to Linux Systems

 Cockpit - Web Portal for Administrator

 Process Management

Files and Directories

 "find" - Search for Files

 "more", "head" and "cat" - Read Files

"split" and "cat" - Split and Join Files

 "compress/uncompress" - Compressed *.Z Files

 "gzip/gunzip" - Compressed *.gz Files

 "xz/unxz" - Compressed *.xz or *.lzma Files

 "tar -c" and "tar -x" - Create and Extract Archive Files

 "zip" and "unzip" - Create and Extract ZIP Files

 Users and Groups

 File Systems

 Block Devices and Partitions

 LVM (Logical Volume Manager)

 Installing CentOS

 SELinux - Security-Enhanced Linux

 Network Connection on CentOS

 Software Package Manager on CentOS - DNF and YUM

 Running Apache Web Server (httpd) on Linux Systems

 Running PHP Scripts on Linux Systems

 Running MySQL Database Server on Linux Systems

 Running Python Scripts on Linux Systems

 vsftpd - Very Secure FTP Daemon

 Postfix - Mail Transport Agent (MTA)

 Dovecot - IMAP and POP3 Server

 Email Client Tools - Mail User Agents (MUA)

 LDAP (Lightweight Directory Access Protocol)

 GCC - C/C++ Compiler

 Graphics Environments on Linux

 Conda - Environment and Package Manager

 Tools and Utilities

 Administrative Tasks

 References

 Full Version in PDF/EPUB