We're building a large, modern collection of images for compression research. We need:
- Images in their “original” form — without JPEG compression artifacts that bias benchmarks.
- Diverse set representing all common use-cases — from transparent icons to multi-megapixel photos.
- Images using alpha channel — there's very little research in compression of image transparency.
- Large number of images — for statistically significant results.
Image compression research requires a large number of uncompressed (lossless) images for testing and benchmarking. Unfortunately, for most researches the options are to either use a few "traditional" images (such as Lena or the Kodak set) or scrape the Web. Both options are problematic, because either the set is too small and unrepresentative, or the images already have compression artefacts and due to copyright restrictions can't be shared with others to let them verify the results.
We're collecting a large number of freely licensed, raw quality images. We look for:
photos from high-end digital cameras(done),
- post-processed/edited photos (especially with transparency),
- icons, logos, buttons,
- screenshots from various OSes,
- CGI images and screenshots from games,
- assets from games and interactive infographics,
- photos from cellphone cameras,
- photos taken in low light (noisy),
images taking advantage of better-than-sRGB color spaces.(done)
We'll collect statistics about “typical” images on the Web — sizes, types, amount of noise, contrast, edges, etc.
We'll select (or edit) images to be statistically similar to the set of the images found on the Web.
We'll publish the whole set, with image tags/categories.
You can browse images we've collected so far and see the tools we're developing.
- Evaluation of proposed alpha channel support in JPEG XT.
- Tuning of lossy PNG compressors.
- Solid benchmark comparing WebP, JPEG XR, JPEG 2000 and various lossy PNG solutions.
- Test ideas for improvements of the MozJPEG encoder (e.g. algorithms for noise shaping and activity masking)
- Design/evaluation/tuning of algorithms that can quickly guess whether PNG or JPEG will be better for any given image, and what type of chroma subsampling is appropriate.
In the long term we hope the test suite could:
- become a de-facto standard for fair benchmarks of image compression algorithms/products,
- aid research and development of image compression algorithms,
- help browser performance tuning,
- help R&D of other image-related algorithms (enhancing filters, tracing, hashing/similarity, visual quality assessment metrics, etc.),
- get reused in other fields (e.g. computer vision).
If you only want to check out what images are available without downloading the full set, then use the sample archives below. These archives contain temporary, random subsets of images.
- Sample 1 (200MB)
- Sample 2 (400MB)
- Sample 3 (800MB)
- Sample 4 (800MB)
- Sample 5 (800MB)
- Sample 6 (800MB)
- Sample 7 (800MB)
- Sample 8 (800MB)
- Sample 9 (800MB)
You can avoid storing large archive files by streaming the decompression like this:
curl -s https://yardstick.pictures/yardstick-images-2.tar.gz | tar xvz
The complete set
The complete collection contains over 34000 images totalling 120GB, so we don't provide a single archive. Instead, we provide a list of all images and their metadata and make them available for download from their individual URLs. The image URLs are in format:
xs are lowercase hex digits of SHA-1 of the image file, and
ext is image filename extension — both pieces of information are present in images'
.json files in the metadata repository.
Contribute your images.
Help development of the site and tools.
Contact: Kornel Lesiński, Yoav Weiss, Ann Robson, Guy Podjarny.