Not wanting to repeat myself I have written a small bash script to handle the parallel processing of the post images for this site. This involves resizing, cropping and then compressing the images ready for the web. Currently the script supports both JPEG and PNG images for all these operations.
On top of this I wanted to ensure that only recently added or modified images would
be processed rather than processing the entire folder again. There is a handy
option for touch
that we’ll see later that makes this process much easier.
So let’s work through the bash script to slowly build it up into a working example. The first item on the agenda is to declare the hashbang for the script.
#! /usr/bin/env bash
Here we are using env
to locate the bash executable - this should help to make the
script more portable between systems rather than hard referencing /usr/bin/bash
directly. Some systems might have bash in /bin/bash
for example and using env
will prevent this from breaking our script.
Now the script can begin in earnest by declaring a few variables to store the width and heights we want the final images to be. A temporary file path is also required to store the last run timestamp to prevent re-processing the same image twice.
TH_WIDTH=720
TH_HEIGHT=70
LG_WIDTH=720
LG_HEIGHT=480
TOUCH_FILE="last.run.time"
Across the article I will refer to thumbnail, TH and list image interchangeably - same goes for large, LG and post image.
If the touch file doesn’t exist then we need to create it and specify the timestamp to use as it’s default. As I am tracking the entire project in git the last git commit date will do for the default date. This will prevent any already committed in images from being run again.
if [ ! -f "$TOUCHFILE" ]; then
# http://stackoverflow.com/a/19812608/461813
LAST_COMMIT_TIMESTAMP=$(git show -s --format=%ct)
# http://unix.stackexchange.com/a/36765/10219
touch -d "@$LAST_COMMIT_TIMESTAMP" "$TOUCHFILE"
fi
There is one slight caveat here - if you clone the git repository then all the files will have a modification time of the clone date and not their original resize date. Therefore the resizing will be run against all images on initial clone. This is not an issue for me as I will rarely clone the repo - if it is for you then you could get the latest modification time across all the files and use that instead.
All of the images we wish to resize are stored in a directory called src
so we
need to find all the files in there that have a more recent modification time
than the touch file. find
has a handy switch -newer
that will allow us to
easily locate them.
FILES=$(find src -newer "$TOUCHFILE" -iname '*.jpg' -or -newer "$TOUCHFILE" -iname '*.png')
This will find all files that are newer than the touch file and that have either
.jpg
or .png
extensions. If there are any then we want to resize and crop
them to the correct dimensions using ImageMagick’s convert
utility. To complicate
this we’re also going to using GNU parallel to process the images across processors.
If you haven’t used parallel before it is probably worth checking out my other post to get an idea of the syntax and opportunities it provides.
To test that there are some files to process we can simply test it with the -n
switch.
if [ -n "$FILES" ]; then
# process the large images
parallel -j8 convert "{}" -strip -resize "${LG_WIDTH}x${LG_HEIGHT}^" -gravity center -crop "${LG_WIDTH}x${LG_HEIGHT}+0+0" -filter catrom "t_post/{/}" ::: $FILES
# process the image slices
parallel -j8 convert "t_post/{/}" -gravity center -crop "${TH_WIDTH}x${TH_HEIGHT}+0+0" -filter catrom -extent "${TH_WIDTH}x${TH_HEIGHT}" +repage "t_list/{/}" ::: $FILES
fi
The cropping and resizing particulars can be researched in the ImageMagick manual
so I won’t spend too much time covering it here. Note that the parallel utility
uses the same syntax (pretty much) as xargs
where the file names are passed into
convert
- as detailed in my previous post. Also note how $FILES
is passed into the parallel
command as an argument after the special :::
blockade.
So in the first call to parallel
you can see {}
being used - that is the file
name/path as it is passed back from find
without modification. You’ll see it used
else where with {/}
, which will be the same as {}
except that it strips the
preceeding path from the argument before printing it (eg. /var/www/index.html
becomes index.html
). You can also strip the extension from the argument with
{.}
giving /var/www/index
when fed /var/www/index.html
. Finally you can
also combine the two; {./}
produces index
when given the same.
As the thumbnail quality is less important than the actual large image I have cheated
a little performed the second crop and resize on the large image rather re-cutting
from the src
. This has two purposes; it is quicker to process a smaller image and
it means the image is already at the correct width.
So now we have resized and cropped both our large and thumbnail image - it is time to compress them. Before we get into that however now is a good time to go over the required dependencies and how to install them. I have wrapped them all up into installation bash script you can use at the end of this of article too.
Handily some of the requirements can be obtained from Ubuntu/Debians’s repositories.
sudo apt-get install imagemagick optipng advancecomp parallel
This gives you the ImageMagick package to do the resizing and cropping, two PNG optimisation tools and GNU parallel to handle the multi-processor usage.
Compressing JPEGs nicely takes a little more work as we must manually compile the dependencies here - not at all hard I promise! To facilitate compilation we need to install some build tools from the repositories.
sudo apt-get install build-essential autoconf pkg-config nasm libtool git
With these in place we can turn our attention to mozjpeg
which sits under our final library jpeg-archive
.
git clone https://github.com/mozilla/mozjpeg.git
cd mozjpeg
autoreconf -fiv
./configure --with-jpeg8
make
sudo make install
cd -
Now that has been built and installed it is possible to jpeg-archive
up and running
with another simple build script.
git clone https://github.com/danielgtaylor/jpeg-archive.git
cd jpeg-archive
git checkout 2.1.1
make
sudo make install
cd -
After the dependencies are available we can get on with process of compressing the resized and cropped image files. It is essential that different file types are handled differently here. You cannot compress a PNG with the same tools as a JPEG and vice versa. Additionally I want to compress the thumbnail/list images more than the large/post images.
Let’s begin with handling the JPEG results first.
JPOST_FILES=$(find t_post -newer "$TOUCHFILE" -iname '*.jpg')
JLIST_FILES=$(find t_list -newer "$TOUCHFILE" -iname '*.jpg')
The next step is to loop over these results in parallel and apply the compression tools we installed earlier.
if [ -n "$JPOST_FILES" ]; then
parallel -j8 jpeg-recompress --method smallfry --quality medium --min 60 "{}" "{}" ::: $JPOST_FILES
fi
if [ -n "$JLIST_FILES" ]; then
parallel -j8 jpeg-recompress --method smallfry --quality low --min 50 "{}" "{}" ::: $JLIST_FILES
fi
From the jpeg-archive
suite the above code is jpeg-recompress
to perform the
compression using the so called smallfry algorithm/technique. As you can see the
thumnail/list and large/post images are handled separately and the options passed
to the list jpeg-recompress
are far more severe.
PNGs are simpler, because they’ve not got the same level of compression options. We’re going to use a PNG optimiser followed by a compressor/reducer (GZIP underneath essentially).
PNG_FILES=$(find t_post t_list -newer "$TOUCHFILE" -iname '*.png')
if [ -n "$PNG_FILES" ]; then
parallel -j8 optipng -o 3 -fix "{}" -out "{}" ::: $PNG_FILES
parallel -j8 advdef --shrink-extra -z "{}" ::: $PNG_FILES
fi
Together these two utilities will shave something like 10% or so off of a PNG image in my limited experience with 10 or so images.
With all the actual operations now complete it just remains to update the last.run.time
file to prevent the same images being run over twice.
touch "$TOUCHFILE"
Simple! So, yes, it took some work to get here, but you’ve now got repeatable and efficient image manipulation with a small and easily modified bash script.
To make it easier to copy and paste and verify your final result the full installation and resize scripts are included below.
resize.sh
#! /usr/bin/env bash
LG_WIDTH=720
LG_HEIGHT=480
TH_WIDTH=720
TH_HEIGHT=70
TOUCHFILE="last.run.time"
if [ ! -f "$TOUCHFILE" ]; then
# http://stackoverflow.com/a/19812608/461813
LAST_COMMIT_TIMESTAMP=$(git show -s --format=%ct)
# http://unix.stackexchange.com/a/36765/10219
touch -d "@$LAST_COMMIT_TIMESTAMP" "$TOUCHFILE"
fi
echo "Resizing in post images"
FILES=$(find src -newer "$TOUCHFILE" -iname '*.jpg' -or -newer "$TOUCHFILE" -iname '*.png')
if [ -n "$FILES" ]; then
# process the large images
parallel -j8 convert "{}" -strip -resize "${LG_WIDTH}x${LG_HEIGHT}^" -gravity center -crop "${LG_WIDTH}x${LG_HEIGHT}+0+0" -filter catrom "t_post/{/}" ::: $FILES
# process the image slices
parallel -j8 convert "t_post/{/}" -gravity center -crop "${TH_WIDTH}x${TH_HEIGHT}+0+0" -filter catrom -extent "${TH_WIDTH}x${TH_HEIGHT}" +repage "t_list/{/}" ::: $FILES
fi
# compress jpg images
JPOST_FILES=$(find t_post -newer "$TOUCHFILE" -iname '*.jpg')
JLIST_FILES=$(find t_list -newer "$TOUCHFILE" -iname '*.jpg')
if [ -n "$JPOST_FILES" ]; then
parallel -j8 jpeg-recompress --method smallfry --quality medium --min 60 "{}" "{}" ::: $JPOST_FILES
fi
if [ -n "$JLIST_FILES" ]; then
parallel -j8 jpeg-recompress --method smallfry --quality low --min 50 "{}" "{}" ::: $JLIST_FILES
fi
# compress png images
PNG_FILES=$(find t_post t_list -newer "$TOUCHFILE" -iname '*.png')
if [ -n "$PNG_FILES" ]; then
parallel -j8 optipng -o 3 -fix "{}" -out "{}" ::: $PNG_FILES
parallel -j8 advdef --shrink-extra -z "{}" ::: $PNG_FILES
fi
echo " "
echo "Completed resize operation"
touch "$TOUCHFILE"
install.sh
echo "Installing imagemagick"
sudo apt-get install imagemagick
echo " "
echo "Installing optipng and advdef"
sudo apt-get install optipng advancecomp
echo " "
echo "Installing gnu parallel"
sudo apt-get install parallel
echo " "
echo "Installing mozjpeg"
sudo apt-get install build-essential autoconf pkg-config nasm libtool
git clone https://github.com/mozilla/mozjpeg.git
cd mozjpeg
autoreconf -fiv
./configure --with-jpeg8
make
sudo make install
cd -
echo " "
echo "Installing jpeg-archive"
git clone https://github.com/danielgtaylor/jpeg-archive.git
cd jpeg-archive
git checkout 2.1.1
make
sudo make install