This section describes the UUEncode algorithm with some simple encoding examples.
UUEncode (Unix-to-Unix Encoding) was designed to address
the problem of sending binary data file by email. It converts any data file
to a text file with only printable characters.
UUEncode was very useful for email users in the early days, when email attachment
(MIME protocol) was not available yet. For example, if I want to send a text message
in Chinese GB coding to a friend, I can not include the GB codes directly in the
email body. I need to uuencode (UUEncode encoding command) the GB codes into
printable characters. Then copy those characters into the email body. When my friend
receives this email, he/she need to uudecode (UUEncode decoding command)
the printable characters back to the original GB codes to read the text message
in Chinese.
The encoding process is to:
Divide the input bytes stream into blocks of 3 bytes.
Divide the 24 bits of a 3-byte block into 4 groups of 6 bits.
Expand each group of 6 bits to 8 bits and add 32, \x20, so the resulting
bit map is representing an ASCII printable character.
If the last 3-byte block has only 1 byte of input data, pad 2 bytes of 1 (\x0101).
If the last 3-byte block has only 2 bytes of input data, pad 1 byte of 1 (\x01).
The printable characters used by UUEncode encoding are listed in the following
table:
32 33 ! 34 " 35 # 36 $ 37 % 38 & 39 '
40 ( 41 ) 42 * 43 + 44 , 45 - 46 . 47 /
48 0 49 1 50 2 51 3 52 4 53 5 54 6 55 7
56 8 57 9 58 : 59 ; 60 < 61 = 62 > 63 ?
64 @ 65 A 66 B 67 C 68 D 69 E 70 F 71 G
72 H 73 I 74 J 75 K 76 L 77 M 78 N 79 O
80 P 81 Q 82 R 83 S 84 T 85 U 86 V 87 W
88 X 89 Y 90 Z 91 [ 92 \ 93 ] 94 ^ 95 _
Example 1: Input data, 1 byte, "A". Encoded output, 4 characters, "00$!"
Input Data A
Input Bits 01000001
Padding 01000001 00000001 00000001
\ \ \
Bit Groups 010000 010000 000100 000001
Adding 32 100000 100000 100000 100001
110000 110000 100100 100001
Output 0 0 $ !
Input Data A B C
Input Bits 01000001 01000010 01000011
\ \ \
Bit Groups 010000 010100 001001 000011
Adding 32 100000 100000 100000 100000
110000 110100 101001 100011
Output 0 4 ) #
Encoding output file formatting rules:
First line must be: "begin ooo filename", where "ooo" is the Unix file access mode code,
and "filename" is the file name of the input data file.
Encoded output characters will be grouped lines with 60 characters per line.
A counter byte is inserted at the beginning of each line. It records the number of
input data bytes encoded in this line. A value of 32, \x20, is added to this byte, so it
becomes a printable character.
For a line of full 60 output characters, the leading counter
byte will be "M", because there are 45 input bytes, plus 32, resulting 77, which is
the ASCII value of "M". So you will see "M" in all the output lines except for the last line,
which will have a smaller value, if the number of input bytes is less than 45.
Two extra lines are used to end the output file. The first line has a single byte of \x20.
The second line has "end".