Unadulterated Geekery

Ok, so there’s a bit of geekery below the cut for those of you who like looking through shell script source code. All others should move on…

Still there? Ok, you have been warned.

Here’s the background: during the summer, my new job tends to have a few slack periods when people aren’t bringing in new work. Rather than sit around twiddling our thumbs, we go back through our old work and see what needs improvement. There’s a rather large database of images which includes a “quality control” feedback form. Images that score low on quality get flagged for review and repair.

The trouble is, we’ve got literally tens of thousands of images in hundreds of directories. Gathering up the bad ones takes hours. So I wrote a script to do that for us. It was able to go through about 10,000 images and find 120 or so of the ones that we needed in about seven minutes.

Code is below the break…

#!/bin/bash

##

# This script will page through a list of file names and then

# perform a recursive search of all files and directories in a

# user specified "source" path, copying any matching files to a

# second, user specified "destination" directory in their local

# machine's home directory.

##

## Has the script been invoked correctly

if [ -z "$3" ]; then

echo command $0 needs source, destination and matchlist

exit

fi

## Assign command-line arguments to variables

source=$1

destination=~/$2

matchlist=$3

##

# NOTE! The ~/ in the destination assignment forces the script

# to copy files to the user's home directory. This should help

# avoid a potential recursive condition where the destination

# directory is one of the ones being searched...

##

## Check if the source, destination and matchlist actually exist

if [ ! -d $source ]; then

echo $source is not a directory

exit

fi

if [ -e $destination ]; then

echo $destination already exists

exit

fi

if [ ! -e $matchlist ]; then

echo $matchlist does not exist

exit

fi

## Create destination directory

mkdir $destination

check=$?

if [ ! $check -eq 0 ]; then

echo error creating destination directory

exit

fi

## Create empty log files

touch $destination/not_found.txt

touch $destination/logfile.txt

## Create lookup file from source directories

echo Building lookup table...

ls -R $source > $destination/files_searched.txt

## Prepare to loop through files in source

line_count=$(wc -l < "$matchlist") echo there are $line_count lines in the search file count=0 ## # Now to go through the specified matchlist file line by line. # First, 'grep' checks to see if 'find' would be fruitful - # this saves a lot of time when there are large numbers of # files to search through. The results of the lookup check are # logged, and the script continues with the 'find' command to # get the full directory path, and to make sure that the hit # isn't for a directory instead of a file. If 'find' fails, it # logs the query to the not_found.txt file. On success, it # copies the file and logs the action. ## while [ "$count" -le $line_count ] do read query echo looking for $query ## Check the lookup file first - it's faster grep -i "$query" $destination/files_searched.txt check=$? if [ ! $check -eq 0 ]; then echo $query not found in lookup file echo "$query" not found in lookup file >> $destination/not_found.txt

else

## If it's in the lookup file, get exact location and copy

result=`find $source -iname $query -type f`

if [ -z "$result" ]; then

echo NOTICE - $query not found after file search - it may be a directory

echo NOTICE - "$query" not found after file search - it may be a directory\

>> $destination/not_found.txt

else

cp $result $destination

echo copied "$source"/"$query" to "$destination"/"$query"\

>> $destination/logfile.txt

fi

fi

let "count += 1"

done <"$matchlist" exit 0

Leave a Reply

Your email address will not be published. Required fields are marked *