Data Encodings - Herong's Tutorial Examples - Version 5.10, by Dr. Herong Yang
 Data Encodings - Herong's Tutorial Examples ∟UUEnccode Algorithm This section describes the UUEncode algorithm with some simple encoding examples. UUEncode (Unix-to-Unix Encoding) was designed to address the problem of sending binary data file by email. It converts any data file to a text file with only printable characters. UUEncode was very useful for email users in the early days, when email attachment (MIME protocol) was not available yet. For example, if I want to send a text message in Chinese GB coding to a friend, I can not include the GB codes directly in the email body. I need to uuencode (UUEncode encoding command) the GB codes into printable characters. Then copy those characters into the email body. When my friend receives this email, he/she need to uudecode (UUEncode decoding command) the printable characters back to the original GB codes to read the text message in Chinese. The encoding process is to: Divide the input bytes stream into blocks of 3 bytes. Divide the 24 bits of a 3-byte block into 4 groups of 6 bits. Expand each group of 6 bits to 8 bits and add 32, \x20, so the resulting bit map is representing an ASCII printable character. If the last 3-byte block has only 1 byte of input data, pad 2 bytes of 1 (\x0101). If the last 3-byte block has only 2 bytes of input data, pad 1 byte of 1 (\x01). The printable characters used by UUEncode encoding are listed in the following table: ```32 33 ! 34 " 35 # 36 \$ 37 % 38 & 39 ' 40 ( 41 ) 42 * 43 + 44 , 45 - 46 . 47 / 48 0 49 1 50 2 51 3 52 4 53 5 54 6 55 7 56 8 57 9 58 : 59 ; 60 < 61 = 62 > 63 ? 64 @ 65 A 66 B 67 C 68 D 69 E 70 F 71 G 72 H 73 I 74 J 75 K 76 L 77 M 78 N 79 O 80 P 81 Q 82 R 83 S 84 T 85 U 86 V 87 W 88 X 89 Y 90 Z 91 [ 92 \ 93 ] 94 ^ 95 _ ``` Example 1: Input data, 1 byte, "A". Encoded output, 4 characters, "00\$!" ```Input Data A Input Bits 01000001 Padding 01000001 00000001 00000001 \ \ \ Bit Groups 010000 010000 000100 000001 Adding 32 100000 100000 100000 100001 110000 110000 100100 100001 Output 0 0 \$ ! ``` Example 2: Input data, 2 bytes, "AB". Encoded output, 4 characters, "04(!" ```Input Data A B Input Bits 01000001 01000010 Padding 01000001 01000010 00000001 \ \ \ Bit Groups 010000 010100 001000 000000 Adding 32 100000 100000 100000 100000 110000 110100 101000 100001 Output 0 4 ( ! ``` Example 3: Input data, 3 bytes, "ABC". Encoded output, 4 characters, "04)#" ```Input Data A B C Input Bits 01000001 01000010 01000011 \ \ \ Bit Groups 010000 010100 001001 000011 Adding 32 100000 100000 100000 100000 110000 110100 101001 100011 Output 0 4 ) # ``` Encoding output file formatting rules: First line must be: "begin ooo filename", where "ooo" is the Unix file access mode code, and "filename" is the file name of the input data file. Encoded output characters will be grouped lines with 60 characters per line. A counter byte is inserted at the beginning of each line. It records the number of input data bytes encoded in this line. A value of 32, \x20, is added to this byte, so it becomes a printable character. For a line of full 60 output characters, the leading counter byte will be "M", because there are 45 input bytes, plus 32, resulting 77, which is the ASCII value of "M". So you will see "M" in all the output lines except for the last line, which will have a smaller value, if the number of input bytes is less than 45. Two extra lines are used to end the output file. The first line has a single byte of \x20. The second line has "end". Table of Contents
UUEnccode Algorithm - Updated in 2010, by Dr. Herong Yang