Correction to Sun Implementation of UUEnccode

This section provides a correction to the Java implementation of UUEncode by Sun to fix the problem with the end of line delimiter.

To find out what's wrong with Sun's implementation of the decoding algorithm, I went to Internet, and found the source code of UUDecode.java at http://www.cs.duke.edu/csed/java/src1.3/sun/misc (Not available anymore):

/*
 * @(#)UUDecoder.java  1.13 00/02/02
 *
 * Copyright 1995-2000 Sun Microsystems, Inc. All Rights Reserved.
 *
 * This software is the proprietary information of Sun Microsystems,
 * Inc. Use is subject to license terms.
 *
 */
package sun.misc;

import java.io.InputStream;
import java.io.OutputStream;
import java.io.PrintStream;
import java.io.IOException;

/**
 * This class implements a Berkeley uu character decoder. This
 * decoder was made famous by the uudecode program.
 *
 * The basic character coding is algorithmic, taking 6 bits of binary
 * data and adding it to an ASCII ' ' (space) character. This
 * converts these six bits into a printable representation. Note that
 * it depends on the ASCII character encoding standard for english.
 * Groups of three bytes are converted into 4 characters by treating
 * the three bytes a four 6 bit groups, group 1 is byte 1's most
 * significant six bits, group 2 is byte 1's least significant two
 * bits plus byte 2's four most significant bits. etc.
 *
 * In this encoding, the buffer prefix is:
 *     begin [mode] [filename]
 *
 * This is followed by one or more lines of the form:
 *  (len)(data)(data)(data) ...
 * where (len) is the number of bytes on this line. Note that
 * groupings are always four characters, even if length is not a
 * multiple of three bytes. When less than three characters are
 * encoded, the values of the last remaining bytes is undefined and
 * should be ignored.
 *
 * The last line of data in a uuencoded buffer is represented by a
 * single space character. This is translated by the decoding engine
 * to a line length of zero. This is immediately followed by a line
 * which contains the word 'end[newline]'
 *
 * If an error is encountered during decoding this class throws a
 * CEFormatException. The specific detail messages are:
 *
 *  "UUDecoder: No begin line."
 *  "UUDecoder: Malformed begin line."
 *  "UUDecoder: Short Buffer."
 *  "UUDecoder: Bad Line Length."
 *  "UUDecoder: Missing 'end' line."
 *
 * @version     1.13, 02/02/00
 * @author      Chuck McManis
 * @see    CharacterDecoder
 * @see    UUEncoder
 */
public class UUDecoder extends CharacterDecoder {

    /**
     * This string contains the name that was in the buffer being
     * decoded.
     */
    public String bufferName;

    /**
     * Represents UNIX(tm) mode bits. Generally three octal digits
     * representing read, write, and execute permission of the owner,
     * group owner, and  others. They should be interpreted as the
     * bit groups:
     * (owner) (group) (others)
     *  rwx      rwx     rwx   (r = read, w = write, x = execute)
     *
     */
    public int mode;

    /**
     * UU encoding specifies 3 bytes per atom.
     */
    protected int bytesPerAtom() {
       return (3);
    }

    /**
     * All UU lines have 45 bytes on them, for line length of 15*4+1
     * or 61 characters per line.
     */
    protected int bytesPerLine() {
       return (45);
    }

    /** This is used to decode the atoms */
    private byte decoderBuffer[] = new byte[4];

    /**
     * Decode a UU atom. Note that if l is less than 3 we don't write
     * the extra bits, however the encoder always encodes 4 character
     * groups even when they are not needed.
     */
    protected void decodeAtom(InputStream inStream,
        OutputStream outStream, int l)
  throws IOException {
  int i, c1, c2, c3, c4;
  int a, b, c;
  StringBuffer x = new StringBuffer();

  for (i = 0; i < 4; i++) {
      c1 = inStream.read();
      if (c1 == -1) {
        throw new CEStreamExhausted();
      }
      x.append((char)c1);
      decoderBuffer[i] = (byte) ((c1 - ' ') & 0x3f);
  }
  a = ((decoderBuffer[0] << 2) & 0xfc)
    | ((decoderBuffer[1] >>> 4) & 3);
  b = ((decoderBuffer[1] << 4) & 0xf0)
    | ((decoderBuffer[2] >>> 2) & 0xf);
  c = ((decoderBuffer[2] << 6) & 0xc0)
    | (decoderBuffer[3] & 0x3f);
  outStream.write((byte)(a & 0xff));
  if (l > 1) {
      outStream.write((byte)( b & 0xff));
  }
  if (l > 2) {
      outStream.write((byte)(c&0xff));
  }
}

/**
 * For uuencoded buffers, the data begins with a line of the
 * form:
 *     begin MODE FILENAME
 * This line always starts in column 1.
 */
protected void decodeBufferPrefix(InputStream inStream,
    OutputStream outStream) throws IOException {
  int  c;
  StringBuffer q = new StringBuffer(32);
  String r;
  boolean sawNewLine;

  /*
   * This works by ripping through the buffer until it finds
   * a 'begin' line or the end of the buffer.
   */
  sawNewLine = true;
  while (true) {
      c = inStream.read();
      if (c == -1) {
         throw new CEFormatException(
            "UUDecoder: No begin line.");
      }
      if ((c == 'b')  && sawNewLine){
    c = inStream.read();
    if (c == 'e') {
        break;
    }
      }
      sawNewLine = (c == '\n') || (c == '\r');
  }

  /*
   * Now we think its begin, (we've seen ^be) so verify it
   * here.
         */
  while ((c != '\n') && (c != '\r')) {
      c = inStream.read();
      if (c == -1) {
         throw new CEFormatException(
            "UUDecoder: No begin line.");
      }
      if ((c != '\n') && (c != '\r')) {
    q.append((char)c);
      }
  }
  r = q.toString();
  if (r.indexOf(' ') != 3) {
      throw new CEFormatException("UUDecoder: Malformed"+
          " begin line.");
  }
  mode = Integer.parseInt(r.substring(4,7));
  bufferName = r.substring(r.indexOf(' ',6)+1);
    }

    /**
     * In uuencoded buffers, encoded lines start with a character
     * that represents the number of bytes encoded in this line. The
     * last line of input is always a line that starts with a single
     * space character, which would be a zero length line.
     */
    protected int decodeLinePrefix(InputStream inStream,
        OutputStream outStream) throws IOException {
  int  c;

  c = inStream.read();
  if (c == '\n' || c == '\r') {//Herong - skip the extra byte
      c = inStream.read();
  }
  if (c == ' ') {
      c = inStream.read(); /* discard the trailing <newline> */
      throw new CEStreamExhausted();
  } else if (c == -1) {
      throw new CEFormatException("UUDecoder: Short Buffer.");
  }

  c = (c - ' ') & 0x3f;
  if (c > bytesPerLine()) {
      throw new CEFormatException(
         "UUDecoder: Bad Line Length.");
  }
  return (c);
    }


    /**
     * Find the end of the line for the next operation.
     */
    protected void decodeLineSuffix(InputStream inStream,
        OutputStream outStream) throws IOException {
  int c;
  while (true) {
      c = inStream.read();
      if (c == -1) {
    throw new CEStreamExhausted();
      }
      if (c == '\n') {
    break;
      }
  }
    }

    /**
     * UUencoded files have a buffer suffix which consists of the
     * word end. This line should immediately follow the line with
     * a single space in it.
     */
    protected void decodeBufferSuffix(InputStream inStream,
        OutputStream outStream) throws IOException  {
  int  c;

  c = inStream.read(decoderBuffer);
  if ((decoderBuffer[0] != 'e') || (decoderBuffer[1] != 'n')
   || (decoderBuffer[2] != 'd')) {
    throw new CEFormatException(
       "UUDecoder: Missing 'end' line.");
  }
    }

}

If you read the source code carefully, you will see that the cause of the problem is the \r\n characters in the encoded file. The source code was designed for Unix file systems, where only \n is used to break lines.

To fix the problem, I added a correction to the decodeLinePrefix() method:

    protected int decodeLinePrefix(InputStream inStream,
        OutputStream outStream) throws IOException {
  int  c;

  c = inStream.read();
  if (c == '\n' || c == '\r') { // Herong - skip the extra byte
      c = inStream.read();
  }
  if (c == ' ') {
      c = inStream.read(); /* discard the trailing <newline> */
      throw new CEStreamExhausted();
  } else if (c == -1) {
      throw new CEFormatException("UUDecoder: Short Buffer.");
  }

  c = (c - ' ') & 0x3f;
  if (c > bytesPerLine()) {
      throw new CEFormatException(
         "UUDecoder: Bad Line Length.");
  }
  return (c);
    }

With this correction, the code works fine.

It is interesting to know that my sample program, SunUUEncode.java, has been adopted by the KeyWorx project, and packaged as an utility class, org.keyworx.common.util.SunUU.java. See http://keyworx.oss.waag.org/docs/java2html (Not available anymore) for details.

Table of Contents

 About This Book

 Base64 Encoding

 Base64 Encoding and Decoding Tools

 Base64URL - URL Safe Base64 Encoding

 Base32 Encoding

 URL Encoding, URI Encoding, or Percent Encoding

UUEncode Encoding

 UUEnccode Algorithm

 Sun Implementation of UUEnccode in Java

Correction to Sun Implementation of UUEnccode

 PHP - convert_uuencode() and convert_uudecode()

 References

 Full Version in PDF/EPUB