ThumbHash

A very compact representation of an image placeholder. Store it inline with your data and show it while the real image is loading for a smoother loading experience. It's similar to BlurHash but with the following advantages:

Despite doing all of these additional things, the code for ThumbHash is still similar in complexity to the code for BlurHash. One potential drawback compared to BlurHash is that the parameters of the algorithm are not configurable (everything is automatically configured).

The code for this is available at https://github.com/evanw/thumbhash and contains implementations for JavaScript, Rust, Swift, and Java. You can use npm install thumbhash to install the JavaScript package and cargo add thumbhash to install the Rust package.

#Demo


or drag/drop
to try your own image
ThumbHash generates an image
representation in a few bytes
Render the ThumbHash
as an image placeholder

#Comparisons

The table below compares ThumbHash to several other similar approaches:

In addition to these sample images, you can also drag and drop your own images to compare them here.

Original image ThumbHash BlurHash Potato WebP

#Details

The image is approximated using the Discrete Cosine Transform. Luminance is encoded using up to 7 terms in each dimension while chrominance (i.e. color) is encoded using 3 terms in each dimension. The optional alpha channel is encoded using 5 terms in each dimension if present. If alpha is present, luminance is only encoded using up to 5 terms in each dimension.

Each channel of DCT coefficients comes in three parts: the DC term, the AC terms, and the scale. The DC term is the coefficient for the 0th order cosine and the AC terms are the coefficients of all other cosines (DC and AC are terms from signal processing). All values are quantized to only a few bits each. To maximize the useful numeric range, AC values are scaled up by the maximum magnitude and the scale is saved separately. In addition, ThumbHash omits the high-frequency half of the coefficients and only keep the low-frequency half. If you are familiar with JPEG's zig-zag coefficient order, this roughly corresponds to stopping halfway through that sequence. The rationale is that the low-frequency coefficients carry most of the information, and we also want a smooth image.

Luminance and chrominance is represented in a simple color space that's easy to encode and decode. It uses the values L for luminance, P for yellow vs. blue, and Q for red vs. green (inspired by human eyesight). The advantage of LPQ over RGB is that variation in luminance is typically more important than variation in chrominance, so we can make better use of space by using more space for luminance and less space for chrominance. Note that the range of L is 0 to 1 but the range of P and Q is -1 to 1 because they each represent a subtraction.

To convert from RGB to LPQ:

l = (r + g + b) / 3;
p = (r + g) / 2 - b;
q = r - g;

And to convert from LPQ back to RGB:

b = l - 2 / 3 * p;
r = (3 * l - b + q) / 2;
g = r - q;

The file format is tightly packed and each number uses fewer than 8 bits. If the ThumbHash file format were to be represented as a C++ struct, it might look something like this:

struct ThumbHash {
  // 3 bytes
  uint8_t l_dc : 6;
  uint8_t p_dc : 6;
  uint8_t q_dc : 6;
  uint8_t l_scale : 5;
  uint8_t has_alpha : 1;

  // 2 bytes
  uint8_t l_count : 3;
  uint8_t p_scale : 6;
  uint8_t q_scale : 6;
  uint8_t is_landscape : 1;

  // Only present if "has_alpha" is 1
  #if has_alpha
    // 1 byte
    uint8_t a_dc : 4;
    uint8_t a_scale : 4;
  #endif

  // Each element is 4 bits
  uint8_t l_ac[] : 4;
  uint8_t p_ac[] : 4;
  uint8_t q_ac[] : 4;

  // Only present if "has_alpha" is 1
  #if has_alpha
    uint8_t a_ac[] : 4;
  #endif
};

The colon syntax after each field is the number of bits used by that field. The length of each AC array is the number of coefficients left after removing the 0th component (i.e. the DC component) and also removing the high-frequency half of the components. Representing that in C code might look something like this for a single channel, where nx and ny are the numbers of coefficients in each dimension:

for (int y = 0; y < ny; y++)
  for (int x = 0; x < nx; x++)
    if ((x != 0 || y != 0) && (x * ny + y * nx < nx * ny))
      readAC();

The number of luminance components is derived as follows:

if (is_landscape) {
  lx = max(3, has_alpha ? 5 : 7);
  ly = max(3, l_count);
} else {
  lx = max(3, l_count);
  ly = max(3, has_alpha ? 5 : 7);
}

Using the is_landscape and has_alpha flags like this to make the number of coefficients in one dimension implicit is a way to save space. Since the number of components is automatically derived from the aspect ratio of the original image, you can also use this information to derive an approximation of the original aspect ratio.

If you just want the average color of the image (e.g. in a situation where showing a placeholder image is impractical), you can get that by transforming the l_dc, p_dc, and q_dc values from LPQ to RGB. These values are conveniently at the front of the file for this purpose.

Reference implementations for this algorithm can be found at https://github.com/evanw/thumbhash.