A very compact representation of an image placeholder. Store it inline with your data and show it while the real image is loading for a smoother loading experience. It's similar to BlurHash but with the following advantages:
Despite doing all of these additional things, the code for ThumbHash is still similar in complexity to the code for BlurHash. One potential drawback compared to BlurHash is that the parameters of the algorithm are not configurable (everything is automatically configured).
The code for this is available at
https://github.com/evanw/thumbhash and contains implementations for
JavaScript, Rust, Swift, and Java. You can use npm install thumbhash
to install the
JavaScript package and cargo add thumbhash
to
install the Rust package.
→ |
|
→ | ||
or drag/drop to try your own image |
ThumbHash generates an image representation in a few bytes |
Render the ThumbHash as an image placeholder |
The table below compares ThumbHash to several other similar approaches:
ThumbHash: ThumbHash encodes a higher-resolution luminance channel, a lower-resolution color channel, and an optional alpha channel. The format is described in detail in the details section. There are no parameters to configure.
BlurHash: Uses BlurHash with 3x3 components for square images, 4x3 components for landscape images, and 3x4 components for portrait images. This is the configuration recommended in the documentation, and is roughly the same size as a ThumbHash encoded using base64.
Potato WebP: This is an experiment of mine to see how Google's WebP image format does at this. The "hash" is just the contents of the "VP8" chunk in a minimal WebP file: 0% quality (i.e. potato quality) and a size of 16x16, since WebP encodes everything in 16x16 blocks. The image is reconstructed by blurring a scaled-up copy of a minimal WebP file with the VP8 chunk reinserted.
In addition to these sample images, you can also drag and drop your own images to compare them here.
Original image | ThumbHash | BlurHash | Potato WebP |
---|
The image is approximated using the Discrete Cosine Transform. Luminance is encoded using up to 7 terms in each dimension while chrominance (i.e. color) is encoded using 3 terms in each dimension. The optional alpha channel is encoded using 5 terms in each dimension if present. If alpha is present, luminance is only encoded using up to 5 terms in each dimension.
Each channel of DCT coefficients comes in three parts: the DC term, the AC terms, and the scale. The DC term is the coefficient for the 0th order cosine and the AC terms are the coefficients of all other cosines (DC and AC are terms from signal processing). All values are quantized to only a few bits each. To maximize the useful numeric range, AC values are scaled up by the maximum magnitude and the scale is saved separately. In addition, ThumbHash omits the high-frequency half of the coefficients and only keep the low-frequency half. If you are familiar with JPEG's zig-zag coefficient order, this roughly corresponds to stopping halfway through that sequence. The rationale is that the low-frequency coefficients carry most of the information, and we also want a smooth image.
Luminance and chrominance is represented in a simple color space that's easy to encode and decode. It uses the values L for luminance, P for yellow vs. blue, and Q for red vs. green (inspired by human eyesight). The advantage of LPQ over RGB is that variation in luminance is typically more important than variation in chrominance, so we can make better use of space by using more space for luminance and less space for chrominance. Note that the range of L is 0 to 1 but the range of P and Q is -1 to 1 because they each represent a subtraction.
To convert from RGB to LPQ:
l = (r + g + b) / 3; p = (r + g) / 2 - b; q = r - g;
And to convert from LPQ back to RGB:
b = l - 2 / 3 * p; r = (3 * l - b + q) / 2; g = r - q;
The file format is tightly packed and each number uses fewer than 8 bits. If the ThumbHash file format were to be represented as a C++ struct, it might look something like this:
struct ThumbHash { // 3 bytes uint8_t l_dc : 6; uint8_t p_dc : 6; uint8_t q_dc : 6; uint8_t l_scale : 5; uint8_t has_alpha : 1; // 2 bytes uint8_t l_count : 3; uint8_t p_scale : 6; uint8_t q_scale : 6; uint8_t is_landscape : 1; // Only present if "has_alpha" is 1 #if has_alpha // 1 byte uint8_t a_dc : 4; uint8_t a_scale : 4; #endif // Each element is 4 bits uint8_t l_ac[] : 4; uint8_t p_ac[] : 4; uint8_t q_ac[] : 4; // Only present if "has_alpha" is 1 #if has_alpha uint8_t a_ac[] : 4; #endif };
The colon syntax after each field is the number of bits used by that field. The length of each AC array is the
number of coefficients left after removing the 0th component (i.e. the DC component) and also removing the
high-frequency half of the components. Representing that in C code might look something like this for a single
channel, where nx
and ny
are the numbers of coefficients in each dimension:
for (int y = 0; y < ny; y++) for (int x = 0; x < nx; x++) if ((x != 0 || y != 0) && (x * ny + y * nx < nx * ny)) readAC();
The number of luminance components is derived as follows:
if (is_landscape) { lx = max(3, has_alpha ? 5 : 7); ly = max(3, l_count); } else { lx = max(3, l_count); ly = max(3, has_alpha ? 5 : 7); }
Using the is_landscape
and has_alpha
flags like this to make the number of coefficients in
one dimension implicit is a way to save space. Since the number of components is automatically derived from the
aspect ratio of the original image, you can also use this information to derive an approximation of the original
aspect ratio.
If you just want the average color of the image (e.g. in a situation where showing a placeholder image is
impractical), you can get that by transforming the l_dc
, p_dc
, and q_dc
values from LPQ to RGB. These values are conveniently at the front of the file for this purpose.
Reference implementations for this algorithm can be found at https://github.com/evanw/thumbhash.