Decoding APNG

APNG is an extension of the PNG format, adding support for animated images. In modern browser, the support is pretty good and we can use apng directly with an img element. But if you ever need to have more control, having a deeper understanding is required. In this article, let's explore how APNG works by decoding it manually and playing it on canvas.

File format

Let's see the file format of APNG first.

APNG is a extension of the PNG format, so they have the same magic header, and also the same structure. I assume you already have a basic understanding of the file format of PNG files. If not, please refer to my previous article An Intro to PNG Decoder.

For the APNG spec, we normally refer to below 2 docs:

APNG file works by introducing a few new chunks to provide information about animation. Let's look at them one by one.

1. acTL

This chunk is used to provide information about how many frames we have and how many play we should do for the animation. It has structure as below:

byte
  0   num_frames     (unsigned int)    Number of frames
  4   num_plays      (unsigned int)    Number of times to loop this APNG.  0 indicates infinite looping.

2. fcTL

This chunk is used to provide information about each frame. For example, each frame has its own width and height. Let's see its structure.

byte
0    sequence_number       (unsigned int)   Sequence number of the animation chunk, starting from 0
4    width                 (unsigned int)   Width of the following frame
8    height                (unsigned int)   Height of the following frame
12   x_offset              (unsigned int)   X position at which to render the following frame
16   y_offset              (unsigned int)   Y position at which to render the following frame
20   delay_num             (unsigned short) Frame delay fraction numerator
22   delay_den             (unsigned short) Frame delay fraction denominator
24   dispose_op            (byte)           Type of frame area disposal to be done after rendering this frame
25   blend_op              (byte)           Type of frame area rendering for this frame

As you can see, the above information is mainly used to control how to play each frame.

3. fdAT

In PNG files, we have IDAT chunks, which contains the pixel data for PNG images. This chunk serves the same function. It has the same structure as IDAT chunks, except preceded by a sequence number.

That's all the new chunks need to know. Besides that, one thing we need to pay attention is that to be compatible with normal PNG, we have both IDAT chunks and the new fdAT chunks. Now the thing is that we need to know if the IDAT chunk is the first frame of the animation.

This is decided by the position of fcTL chunk. If we have fcTL chunk before the IDAT chunk, then this fcTL chunk is used to describe the animation behavior of the IDAT data, which means that it is the first frame. If not, we can just start with the fdAT chunks.

Strategy to play

Now we need to talk about the strategy to play APNG files. PNG files have the pretty complicated structure and we do not want to implement the decoder from scratch. What we want is to have control over the animation and also rely on the decoder from the browsers.

The strategy is that, we parse the APNG files, identify frames and animation data, then we extract each frames out as independent PNG images, and finally we play each images on the canvas according to the animation control info we have.

Let's see code

Enough with theory, let's see some code.

Let's see the main process first, it includes 4 steps.

async function main() {
  // 1. load APNG image
  const bytes = await loadImg(targetPath);

  // 2. parse the image
  const apng = parseApng(bytes);

  // 3. load it into html img element
  await Promise.all(apng.frames.map(loadImgElement));

  // 4. play it on canvas
  play(apng);
}

For loading image as typed array, its pretty simple.

async function loadImg(targetPath) {
  const img = await fetch(targetPath);
  const arrayBuffer = await img.arrayBuffer();
  return new Uint8Array(arrayBuffer);
}

Parsing the key part, let's break it down.

First, we check the magic number.

const magic = new Uint8Array([137, 80, 78, 71, 13, 10, 26, 10]);
if (!bytes.slice(offset, offset + 8).every((v, i) => v === magic[i])) {
  throw new Error("magic number check fail");
}

Then we iterate all the chunks to get all the chunks we needed.

const dv = new DataView(bytes.buffer);
const otherChunks = [];
const frames = [];
let frame;
let ihdrChunk;
let acTLChunk;
let IENDChunk;

while (offset < bytes.length) {
  const chunkLength = dv.getUint32(offset) + 12;
  const chunkType = new TextDecoder().decode(
    bytes.slice(offset + 4, offset + 8)
  );
  const chunk = bytes.slice(offset, offset + chunkLength);

  switch (chunkType) {
    case "IHDR": {
      ihdrChunk = chunk;
      break;
    }
    case "acTL": {
      acTLChunk = chunk;
      break;
    }
    case "fcTL": {
      if (frame) frames.push(frame);

      frame = {
        fcTLChunk: chunk,
        dataBytes: [],
      };
      break;
    }
    case "IDAT": {
      if (frame) {
        // only data-bytes
        // 4(data-length) + 4(type) + data-bytes + 4(crc)
        frame.dataBytes.push(chunk.slice(8, chunkLength - 4));
      }
      break;
    }
    case "fdAT": {
      if (!frame) throw new Error("unexpected fdAT chunk");
      // same structure as IDAT chunk
      // but with an extra sequence field at first
      frame.dataBytes.push(chunk.slice(12, chunkLength - 4));
      break;
    }
    case "IEND": {
      IENDChunk = chunk;
      break;
    }
    default:
      otherChunks.push(chunk);
  }

  offset += chunkLength;
}

Then we get width and height information from ihdr chunk.

const apng = {};

if (!ihdrChunk) {
  throw new Error("no ihdr chunk");
}
const ihdrChunkDv = new DataView(ihdrChunk.slice(8).buffer);
apng.width = ihdrChunkDv.getUint32(0);
apng.height = ihdrChunkDv.getUint32(4);

Then we get num of plays info from acTL chunk.

if (!acTLChunk) {
  throw new Error("no acTL chunk");
}
const acTLChunkDv = new DataView(acTLChunk.slice(8).buffer);
apng.numPlays = acTLChunkDv.getUint32(4);
if (apng.numPlays === 0) {
  apng.numPlays = Infinity;
}

Next step, we need to iterate each frame, and recreat image from each frame.

const chunks = [magic];
// set frame width and height
ihdrChunk.set(fcTLChunk.slice(0, 4), 0);
ihdrChunk.set(fcTLChunk.slice(4, 8), 4);
chunks.push(encodeChunk("IHDR", ihdrChunk.slice(8, ihdrChunk.length - 4)));

chunks.push(...otherChunks);

chunks.push(...dataBytes.map((dataPart) => encodeChunk("IDAT", dataPart)));

chunks.push(IENDChunk);

const image = new Blob(chunks, { type: "image/png" });

And then we get all the info about animation for each frames.

// control info
const dv = new DataView(fcTLChunk.slice(8).buffer);
const sequenceNum = dv.getUint32(0);
const width = dv.getUint32(4);
const height = dv.getUint32(8);
const xOffset = dv.getUint32(12);
const yOffset = dv.getUint32(16);
const delayNum = dv.getUint16(20);
const delayDen = dv.getUint16(22);
const disposeOp = dv.getUint8(24);
const blendOp = dv.getUint8(25);

And this is all the parsing part.

Next step is to load images into HTML img elements. This is trivial.

function loadImgElement(frame) {
  return new Promise((resolve) => {
    const url = URL.createObjectURL(frame.image);
    const imageElement = document.createElement("img");
    imageElement.onload = () => {
      frame.imageElement = imageElement;
      resolve();
    };
    imageElement.src = url;
  });
}

And lastly, we can play all images on canvas.

function play(apng) {
  const canvas = document.createElement("canvas");
  canvas.width = apng.width;
  canvas.height = apng.height;
  document.getElementById("container").appendChild(canvas);

  const ctx = canvas.getContext("2d");

  let frameIndex = 0;
  let numFramePlays = apng.numPlays * apng.frames.length;

  const draw = () => {
    const frame = apng.frames[frameIndex];
    frameIndex = frameIndex >= apng.frames.length - 1 ? 0 : frameIndex + 1;

    if (frame.disposeOp === 1) {
      ctx.clearRect(0, 0, apng.width, apng.height);
    }
    ctx.drawImage(
      frame.imageElement,
      frame.xOffset,
      frame.yOffset,
      frame.width,
      frame.height
    );

    if (--numFramePlays <= 0) return;

    setTimeout(draw, frame.delay);
  };

  draw();
}

We probably need to refactor this setTimeout by requestAnimatinoFrame for accuracy. We can see the result in here: https://yaox023.github.io/apng/.

At last

At last, I got the idea by reading the source code of this repo: apng-js. And if you want to check the functioning code and play it by yourself, see my repo: apng.

yaox023's blog