Writing a GIF Decoder

Writing a GIF Decoder

·

9 min read

In this article, let's write a gif decoder in typescript.

Parser

Say we have a gif file, the first we need to do after loading the file is to parse it. We can find the format of the gif in GIF89a Specification.

It has defined a clear grammar for the file.

The Grammar.

<GIF Data Stream> ::=     Header <Logical Screen> <Data>* Trailer

<Logical Screen> ::=      Logical Screen Descriptor [Global Color Table]

<Data> ::=                <Graphic Block>  |
                          <Special-Purpose Block>

<Graphic Block> ::=       [Graphic Control Extension] <Graphic-Rendering Block>

<Graphic-Rendering Block> ::=  <Table-Based Image>  |
                               Plain Text Extension

<Table-Based Image> ::=   Image Descriptor [Local Color Table] Image Data

<Special-Purpose Block> ::=    Application Extension  |
                               Comment Extension

We can use the recursive descent approach to parse the file with this grammar. The flow the code looks like below. What we do is just follow the grammar, parse each parts and then get what we want in one public method parse.

class Parser {
  private dataArray: Uint8Array;
  private dataView: DataView;
  private byteOffset = 0;

  constructor(arrayBuffer: ArrayBuffer) {
    this.dataArray = new Uint8Array(arrayBuffer);
    this.dataView = new DataView(arrayBuffer);
  }

  private readHeader() {
  }

  private readTrailer() {
  }

  private readLogicalScreenDescriptor() {
  }

  private readColorTable(size: number) {
  }

  private readPlainTextExtension() {
  }

  private readApplicationExtension() {
  }

  private readDataSubblocks() {
  }

  private readCommentExtension() {
  }

  private readGraphicControlExtension() {
  }

  private readImageDescriptor() {
  }

  private readImageData() {
  }

  parse() {
  }

  // <GIF Data Stream> ::= Header <Logical Screen> <Data>* Trailer
  private readGifDataStream() {
  }

  // <Logical Screen> ::= Logical Screen Descriptor [Global Color Table]
  private readLogicalScreen() {
  }

  // <Data> ::= <Graphic Block> | <Special-Purpose Block>
  private readData() {
  }

  // <Graphic Block> ::= [Graphic Control Extension] <Graphic-Rendering Block>
  private readGraphicBlock() {
  }

  // <Graphic-Rendering Block> ::= <Table-Based Image> | Plain Text Extension
  private readGraphicRenderingBlock() {
  }

  // <Table-Based Image> ::= Image Descriptor [Local Color Table] Image Data
  private readTableBasedImage() {
  }

  // <Special-Purpose Block> ::= Application Extension | Comment Extension
  private readSpecialPurposeBlock() {
  }
}

Decoding

The real image data is encoding with The LZW Data Compression Algorithm. I will not talk about the algorithm in here, but a few details need to be noted.

First is about code size. What we get from parser is a Uint8Array. This does not mean that we get 1 byte as 1 valid code value. In gif, we can get a lzwMinimumCodeSize from parser, and the real code size starts as lzwMinimumCodeSize + 1. And as the dictionary size increases, the code size will be increased accordingly. This means that the code size could less than 1 byte, and also could bigger than that. So we need a bit reader first we get each code value one by one.

class BitReader {
  private data: Uint8Array;
  private minCodeSize: number;

  private codeSize!: number;
  private codeBitMask!: number;

  private currentByteIndex = 0;
  private cache = 0;
  private cacheCodeSize = 0;

  constructor(data: Uint8Array, minCodeSize: number) {
    this.data = data;
    this.minCodeSize = minCodeSize;
    this.resetCodeSize();
  }

  resetCodeSize() {
    this.codeSize = this.minCodeSize + 1;
    this.calculateCodeBitMask();
  }

  increaseCodeSize() {
    this.codeSize++;
    this.calculateCodeBitMask();
  }

  getMaxNumberByCurrentCodeSize() {
    return Math.pow(2, this.codeSize) - 1;
  }

  private calculateCodeBitMask() {
    this.codeBitMask = 2 ** this.codeSize - 1;
  }

  // bit layout(5-bit)
  //    +---------------+
  // 0  |               |    bbbaaaaa
  //    +---------------+
  // 1  |               |    dcccccbb
  //    +---------------+
  // 2  |               |    eeeedddd
  //    +---------------+
  // 3  |               |    ggfffffe
  //    +---------------+
  // 4  |               |    hhhhhggg
  //    +---------------+
  //          . . .
  //    +---------------+
  // N  |               |
  //    +---------------+
  // first value: aaaaa
  // second value: bbbbb
  // third value: ccccc
  // ...
  read(): number {
    while (this.cacheCodeSize < this.codeSize) {
      if (this.currentByteIndex >= this.data.byteLength)
        throw new Error("no enough data");
      this.cache |= this.data[this.currentByteIndex++] << this.cacheCodeSize;
      this.cacheCodeSize += 8;
    }
    const v = this.cache & this.codeBitMask;
    this.cache >>= this.codeSize;
    this.cacheCodeSize -= this.codeSize;
    return v;
  }
}

export default BitReader;

The next thing needs to be noted is that we are using 2 arrays for building the dictionary efficiently. In lzw algorithm, each dictionary key is built based on previous key and then one extra data(could be byte or character or number, etc.). So we will use 2 arrays suffix and prefix. With the same dictionary key, the suffix will store the new data, the prefix will store the previous key.

In this way, the current prevCode will be the sum of previous prevCode plus previous lastCode. We can do this recursively until we hit the basic code which is the initial dictionary values.

[prevCode                          ] + lastCode
[prevCode             ] + lastCode
[prevCode] + lastCode
...

Last thing needs to know is that gif defines some special code. For example, if we run into END_CODE, we need to stop, if we run into CLEAR_CODE, we need to reset the dictionary.

With all this in mind, we can implement our own decoding algorithm.

import BitReader from "./BitReader";

export const decode = (
  minCodeSize: number,
  data: Uint8Array,
  pixelCount: number
): number[] => {
  const MAX_CODE = Math.pow(2, 12),
    INIT_DICTIONARY_SIZE = Math.pow(2, minCodeSize),
    CLEAR_CODE = INIT_DICTIONARY_SIZE,
    END_CODE = CLEAR_CODE + 1,
    NULL_CODE = -1;

  const output: number[] = [],
    bitReader = new BitReader(data, minCodeSize),
    prefix: number[] = Array(INIT_DICTIONARY_SIZE).fill(0),
    suffix: number[] = Array(INIT_DICTIONARY_SIZE)
      .fill(0)
      .map((_, index) => index);

  let prevCode = NULL_CODE,
    nextKey = CLEAR_CODE + 2,
    lastCode = 0,
    i = 0;

  while (i < pixelCount) {
    const curCode = bitReader.read();

    if (curCode > nextKey) throw new Error("code bigger than nextKey");

    if (curCode === END_CODE) break;

    if (curCode === CLEAR_CODE) {
      bitReader.resetCodeSize();
      nextKey = CLEAR_CODE + 2;
      prevCode = NULL_CODE;
      continue;
    }

    if (prevCode === NULL_CODE) {
      output.push(curCode);
      prevCode = curCode;
      lastCode = curCode;
      i++;
      continue;
    }

    let code = curCode;
    const stack: number[] = [];

    if (code === nextKey) {
      stack.push(lastCode);
      code = prevCode;
    }

    while (code > CLEAR_CODE) {
      stack.push(suffix[code]);
      code = prefix[code];
    }
    lastCode = suffix[code];
    stack.push(lastCode);

    if (nextKey < MAX_CODE) {
      prefix[nextKey] = prevCode;
      suffix[nextKey] = lastCode;
      nextKey++;

      if (
        nextKey > bitReader.getMaxNumberByCurrentCodeSize() &&
        nextKey < MAX_CODE
      ) {
        bitReader.increaseCodeSize();
      }
    }

    prevCode = curCode;
    output.push(...stack.reverse());
    i++;
  }

  output.push(...Array(pixelCount - output.length).fill(0));

  return output;
};

Interlacing

There may also exist an interlacing process similiar to the one in PNG. The purpose is the same, to enable progressive display when the network is slow.

The data we get from decoding is an array with color indexes. Interlacing is just rearrange the indexes in certain way. In deinterfacing process, we just do it in reverse, restore the original order for the color indexes.

// The rows of an Interlaced images are arranged in the following order:

//       Group 1 : Every 8th. row, starting with row 0.              (Pass 1)
//       Group 2 : Every 8th. row, starting with row 4.              (Pass 2)
//       Group 3 : Every 4th. row, starting with row 2.              (Pass 3)
//       Group 4 : Every 2nd. row, starting with row 1.              (Pass 4)

// The Following example illustrates how the rows of an interlaced image are
// ordered.

//       Row Number                                        Interlace Pass

//  0    -----------------------------------------       1
//  1    -----------------------------------------                         4
//  2    -----------------------------------------                   3
//  3    -----------------------------------------                         4
//  4    -----------------------------------------             2
//  5    -----------------------------------------                         4
//  6    -----------------------------------------                   3
//  7    -----------------------------------------                         4
//  8    -----------------------------------------       1
//  9    -----------------------------------------                         4
//  10   -----------------------------------------                   3
//  11   -----------------------------------------                         4
//  12   -----------------------------------------             2
//  13   -----------------------------------------                         4
//  14   -----------------------------------------                   3
//  15   -----------------------------------------                         4
//  16   -----------------------------------------       1
//  17   -----------------------------------------                         4
//  18   -----------------------------------------                   3
//  19   -----------------------------------------                         4
export function deinterlace(
  data: number[],
  width: number,
  height: number
): number[] {
  const output: number[] = new Array(data.length);
  let dataIndex = 0;

  // Group 1: Every 8th row, starting with row 0
  for (let i = 0; i < height; i += 8) {
    for (let j = 0; j < width; j++) {
      output[i * width + j] = data[dataIndex++];
    }
  }

  // Group 2: Every 8th row, starting with row 4
  for (let i = 4; i < height; i += 8) {
    for (let j = 0; j < width; j++) {
      output[i * width + j] = data[dataIndex++];
    }
  }

  // Group 3: Every 4th row, starting with row 2
  for (let i = 2; i < height; i += 4) {
    for (let j = 0; j < width; j++) {
      output[i * width + j] = data[dataIndex++];
    }
  }

  // Group 4: Every 2nd row, starting with row 1
  for (let i = 1; i < height; i += 2) {
    for (let j = 0; j < width; j++) {
      output[i * width + j] = data[dataIndex++];
    }
  }

  return output;
}

Rendering

After all above steps, we can finally render the gif. Again, a few things need to be noted in the rendering process.

First is about the color table. What we get until now is just color indexes. We need to get the real rgb values by indexing the color table. There are 2 kinds of color table in gif. There is a global one, which can be used for all frames. And there is a local one, which we can only use for the frame.

Second is about transparent color index. If we have transparent color index enabled, then we should treat the pixel as transparent, which means the pixel from previous frame shall display.

The next thing is looping. The gif spec does not define if the image should loop. This is defined in an extension called Netscape Looping Application Extension. So in above parser process, we get this loopCount value from this extension, and will use it in this rending process.

That's probably all. Let's see the render code.

import { DecodedGif } from "./types";

class Player {
  private loopCount: number;
  private canvas: HTMLCanvasElement;
  private isStoped = false;
  private ctx: CanvasRenderingContext2D;
  private imageData: ImageData;

  constructor(private gif: DecodedGif, private container: HTMLElement) {
    this.loopCount =
      gif.loopCount === 0
        ? Infinity
        : gif.loopCount === undefined
        ? 1
        : gif.loopCount;

    this.canvas = document.createElement("canvas");
    this.canvas.width = gif.width;
    this.canvas.height = gif.height;
    this.ctx = this.canvas.getContext("2d")!;
    this.imageData = this.ctx.getImageData(
      0,
      0,
      this.gif.width,
      this.gif.height
    );
    this.container.appendChild(this.canvas);
    console.log(gif);
  }

  play() {
    this.renderFrame(0);
  }

  renderFrame(frameIndex: number) {
    const frame = this.gif.frames[frameIndex];
    const colorTable = frame.localColorTableFlag
      ? frame.localColorTable
      : this.gif.globalColorTable;
    if (colorTable === undefined) throw new Error("no color table");

    let i = 0;
    for (let x = frame.top; x < frame.top + frame.height; x++) {
      for (let y = frame.left; y < frame.left + frame.width; y++) {
        const j = x * this.gif.width * 4 + y * 4;
        const colorIndex = frame.colorIndexes[i++];
        if (
          frame.transparentColorFlag &&
          frame.transparentColorIndex === colorIndex
        ) {
          continue;
        }

        const { r, g, b } = colorTable[colorIndex];
        this.imageData.data[j] = r;
        this.imageData.data[j + 1] = g;
        this.imageData.data[j + 2] = b;
        this.imageData.data[j + 3] = 255;
      }
    }
    this.ctx.putImageData(this.imageData, 0, 0);

    if (this.isStoped || this.loopCount <= 0) return;

    setTimeout(() => {
      const nextFrameIndex =
        frameIndex + 1 < this.gif.frames.length ? frameIndex + 1 : 0;
      if (nextFrameIndex === this.gif.frames.length - 1) {
        this.loopCount--;
      }
      this.renderFrame(nextFrameIndex);
    }, frame.delayTime * 10);
  }
}

export default Player;

The End

This is the end of this article. Even if we have covered a lot about the gif, but there are still some topic not touched, such as background color, disposal method, etc. Feel free to dig deep if you are interested. The full code of this article can be found in my repository gif-decoder. Note that this is not production ready, use it with caution.