Rewriting a Bash Script in Racket

10/15/2022

I have been looking for a more robust scripting language to replace some of my shell scripts, so I decided to give Racket a try.

Goals

Whatever I replace a Bash/shell script with should improve upon its main shortcomings. I try to avoid using very many Bash-specific features, which limits my normal shell scripting mostly to the features of POSIX shell. This leaves me wanting:

More robust error handling
Scoping of variables
Arrays (or something similar)

Replacing a shell script with an alternative almost certainly would mean not having the luxury of having the interpreter installed everywhere I go, but that is a feature I’m willing to give up for my personal scripting.

I’m a big fan of Go (Golang), but the compilation step is too cumbersome for scripting, and the language itself is too verbose.

Based on the above criteria, Python would be a perfect fit. My previous experiences with Python have left me uninterested in it. I’ll still use it at work, where it is a good fit since most people are familiar with it. But for my personal usage, I wanted to try something fun and interesting.

Racket

Racket is a Lisp, specifically, it is derived from Scheme. It is interpreted and dynamically typed. Sources I found online said it was good for basic everyday scripting. In some ways, it gave me the impression it was like Python with a Lisp syntax.

Wikipedia Page on Lisp

Racket Homepage

First Impressions

When I tried to get started with Racket, one of the first aspects I found confusing was that they seem to have a variety of sub-languages within Racket, some of which change the syntax entirely. The language is declared at the start of a script, using something like:

#lang racket

It took me a while, but I eventually understood that the “regular” Racket can be chosen using either #lang racket or #lang racket/base, the base one just being a slimmed down variant of Racket that loads less packages in by default (recommended for scripts so that they start up faster). It looks like most of the racket/ variations just alter what default packages are loaded, whereas other #lang options can be more drastic.

While looking for some resources, I came upon a heavily modified variant built specifically for shell-style scripting. It is called Rash:

Rash

While this would have likely made a lot of my tasks easier, I wanted to stick to using regular Racket. I had three main reasons for this:

I wanted to experience a Lisp-style syntax.
I didn’t want any Racket experience I gained to be limited to a specific niche version of the ecosystem.
I didn’t want to get held up by weird bugs that were specific to this alternate, niche portion of Racket.

Rewriting a Script

I prefer to learn by doing, so I decided to select a shell script that I use frequently and attempt a rewrite of it in Racket. My friend Andrew and I have a Bash script we creatively named provisioning-tool. We use it for updating the software we run on our rented Vultr VPS. I also use it for updating my home server in my basement.

provisioning-tool repository

The rewritten one I named provision.rkt:

provision.rkt repository

The original shell script is a little over 250 lines of Bash, although this includes comments, lots of whitespace, and a block of help text. The rewritten Racket script ended up around 110 lines of very dense Racket, with almost no comments or empty lines.

The purpose of the script is to read in a list of install/update targets and iterate over them. For each one, it fetches the source (either a local folder or a branch on a Git repository) and executes a predetermined binary at a specific path within it (usually another shell script).

The original script is a mix of file-parsing, filesystem manipulation, and executing external binaries (Git and the binaries within the targets). This makes it a good test case, since it touches a lot of the functionality typical of a shell script.

Reading Entries

I first started with the file-ingestion process. The script reads a file called sources.list, which contains a series of entries line-by-line:

# List of sources for provisioning
# Type is either FOLDER, GIT, or WGET
# The respective method will be used to fetch the folder for provisioning.
# Lines starting with # or ; are ignored

# <name> <type> <source> <branch>
provisioning-tool GIT https://gitlab.thorandrew.com/thorandrew-admin/provisioning-tool.git master

In the Bash version, the list is first purged of any lines starting with comment characters:

ITEMS=$(cut -d ' ' -f 1 sources.list | grep -v "^[#|;]" | grep -v "^$")

And then is iterated over:

for line in $ITEMS; do
    echo ""
    do_single "$line" "$2"
    echo ""
done

The “parsing” happens as each line is encountered:

# Parse out the elements of the item
COMPO_PARAM=($(grep "^$1" "$SOURCE"))
COMPO_ITEM="${COMPO_PARAM[0]}"
COMPO_TYPE="${COMPO_PARAM[1]}"
COMPO_SOURCE="${COMPO_PARAM[2]}"

This works well enough. It is subject to the usual shell concerns about quoting, weird characters, and edge cases, but from what I can tell it should work in most scenarios.

Since Racket is a Lisp, the most natural way to represent the entries is as a list of lists, which works really well here. It reminds me of the “Unix pipeline” style of doing things, but maybe a bit less pleasing to read (that’s more of a style opinion than anything):

(define (issource? line)
  (and (not (string-prefix? line "#"))
       (non-empty-string? line)))

(define (getSources path)
  (map string-split
       (filter issource? (string-split (port->string (open-input-file path)) "\n"))))

Note: I forgot to add the ; as a supported comment character.

The result is very concise, and pulls all of the magic to a single location rather than it being interleaved with the process of executing the targets/tasks/entries. Since I have the list-of-lists at my disposal, I’m able process the file here and execute it elsewhere. A language like Bash doesn’t really have a way to organize the data so that it can be acted upon elsewhere - it generally has to get split at its point of usage. I think newer versions of Bash might support arrays-of-arrays, so maybe this could be done, but it certainly would be much clunkier and likely more error prone.

Since Racket has an interactive REPL, it also made it easy to prototype and “build up” my processing pipeline in a way that was also similar to using a shell language. Racket’s string manipulation tools were natural to use and work with, so it went together well.

Executing Entries

Each entry in the sources.list is represented by a list of either three or four parameters: the name, type, location, and (sometimes) the branch name.

In the shell script, a basic set of if...elifs switch between the different source types:

if [[ "$COMPO_TYPE" == "GIT" ]] ; then

    # If this is type GIT, figure out the branch:
    COMPO_BRANCH="${COMPO_PARAM[3]}"

    # Check if it exists, so we can pull
    # Otherwise clone
    if [[ -d "$COMPO_ITEM" ]] ; then
        cd "$TMPPATH/$COMPO_ITEM" || return
        git checkout .
        git fetch
        if [[ "$COMPO_BRANCH" != "" ]]; then
            git checkout "$COMPO_BRANCH"
        fi
        git pull
        cd "$TMPPATH" || return
    else
        git clone --branch="$COMPO_BRANCH" "$COMPO_SOURCE" "$COMPO_ITEM"
    fi

elif [[ "$COMPO_TYPE" == "WGET" ]] ; then
    # TODO
    echo "Not implemented"
    return

elif [[ "$COMPO_TYPE" == "FOLDER" ]] ; then
    # If it already exists, delete it
    if [[ -d "$TMPPATH/$COMPO_ITEM" ]] ; then
        rm -rf "${TMPPATH:?}/$COMPO_ITEM"
    fi

    # Copy 
    cp -r "$SOURCEDIR/$COMPO_SOURCE" "$TMPPATH/$COMPO_ITEM"

elif [[ "$COMPO_TYPE" == "BACKUP" ]] ; then
    if [[ "$BACKUP" == false ]] ; then
        echo "Skipping backup with name $COMPO_ITEM."
        return
    fi

    # Copy 
    cp -r "$SOURCEDIR/$COMPO_SOURCE" "$TMPPATH/$COMPO_ITEM"
    
else
    echo "Invalid type."
    help
    return
fi

# Run
"$TMPPATH/$COMPO_ITEM"/.provision/provision_client "$2"

The logic of which type of target is being processed is intermixed with the execution of the targets themselves. They could have probably been broken out into separate functions though, so this isn’t particularly the fault of Bash.

In Racket, I did break them out into separate functions:

(define (announce name action)
  (printf "\n[ ~a ~a ]\n" action name))

(define (execution-exception-handler exn)
  (printf "Failed to execute target: ~e\n" exn)
  #f)

(define (executeSource source)
  (let ([name (first source)]
        [type (second source)]
        [location (third source)]
        [branch (if (equal? (length source) 4) (fourth source) "")])
       (list name
             (cond [( equal? type "GIT" ) (with-handlers ([exn:fail? execution-exception-handler])
                                                         (announce name action)
                                                         (executeGitSrc name location branch action))]
                   [( equal? type "FOLDER" ) (with-handlers ([exn:fail? execution-exception-handler])
                                                            (announce name action)
                                                            (executeFolderSrc name location action))]
                   [( equal? type "BACKUP" ) (with-handlers ([exn:fail? execution-exception-handler])
                                                            (announce name action)
                                                            (executeBackupSrc name location action))]
                   [( equal? type "WGET" ) (printf "WGET Not implemented.\n") #f ]))))

Here, a source is examined and the appropriate function is called. It returns true or false based on whether the called function succeeds or fails. This is a lot more verbose than the Bash version (and it doesn’t even include the execution itself!). But unlike the Bash version, it also includes error handling! When each execution function is called, it is set up to use an exception handler to report a “false” (failure) if an exception is thrown. This is a tremendous improvement over the Bash version, which has virtually no error checking.

When running the execution portion for the various sources, I struggled at first to understand how to properly handle an error condition. I initially thought what I wanted was an early-return mechanism. Once I realized exceptions were the right way to do this, things fell into place: most Racket components will throw an exception if something is wrong (like for filesystem manipulation), and for reporting e.g., Git command failures, I can just use error to raise an exception with my desired error message:

(define (executeGitSrc name location branch action)
  (let* ([source-path (build-path root-source-path name)]
         [tmp-path (build-path root-source-path (string->path "tmp"))]
         [dest-path (build-path tmp-path name)]
         [dest-executable (build-path dest-path (string->path ".provision/provision_client"))])
    (unless (directory-exists? tmp-path)
            (make-directory tmp-path))
    (if (directory-exists? dest-path)
        (begin (current-directory dest-path)
               (unless (system* (find-executable-path "git") "checkout" ".")
                       (error "Git checkout failed."))
               (unless (system* (find-executable-path "git") "fetch")
                       (error "Git fetch failed.")))
        (begin (current-directory tmp-path)
               (unless (system* (find-executable-path "git") "clone" location)
                       (error "Git clone failed."))))
    (current-directory dest-path)
    (unless (equal? branch "")
       (unless (system* (find-executable-path "git") "checkout" branch)
               (error "Git checkout failed.")))
    (unless (system* (find-executable-path "git") "pull")
            (error "Git pull failed."))
    (current-directory tmp-path) ; The original implementation would switch to the tmp dir before executing
    (system* dest-executable action)))

It is still a lot denser than the equivalent Bash, but I have far greater confidence that the Racket version will avoid strange pathing issues. More importantly, it will correctly handle failures and errors along the way.

Command-Line Arguments

I dislike argument parsing in Bash/shell. Grabbing the first couple of arguments isn’t bad, but flag parsing never feels good. In the original script, I used the following, probably copied from Stack Overflow at some point:

# Parse in options:
while test $# -gt 0; do
    case "$1" in
        -h|--help)
            help
            exit 0
            ;;
        -s|--source)
            shift
            if test $# -gt 0; then
                export SOURCEDIR=$1
            else
                echo "no source specified"
                exit 1
            fi
            shift
            ;;
        -b|--backup)
            shift
            BACKUP=true
            ;;
        *)
            ACTION="$1"
            COMPONENT="$2"
            break
            ;;
    esac
done

The Racket equivalent is very clean and compact. The flag parsing is provided, so the flags and arguments just need to be specified. The program will print out a useful message if the expected flags are violated:

(define input-path (make-parameter ""))
(define do-backup (make-parameter #f))

(match-define (list action target)
  (command-line
    #:program "provision.rkt"
    #:once-each
    [("-s" "--source") path "Path to a directory containing a 'sources.list'." (input-path path)]
    [("-b" "--backup") "Enable processing of BACKUP-type targets from 'sources.list'." (do-backup #t)]
    #:args (action [ target "all targets" ]) ; The sources.list is whitespace-delimited, so this name has no risk of collisions
    (list action target)))

(define root-source-path (path->complete-path (simplify-path (expand-user-path (string->path (input-path))))))

An additional bonus, the help feature is then provided for free:

$ ./provision.rkt --help
usage: provision.rkt [ <option> ... ] <action> [<target>]

<option> is one of

  -s <path>, --source <path>
     Path to a directory containing a 'sources.list'.
  -b, --backup
     Enable processing of BACKUP-type targets from 'sources.list'.
  --help, -h
     Show this help
  --
     Do not treat any remaining argument as a switch (at this level)

 Multiple single-letter switches can be combined after
 one `-`. For example, `-h-` is the same as `-h --`.

This is another huge improvement over writing it in plain shell, since before I had to manually maintain and print out a big wall of text with all the usage information.

Glue

The final piece of the Racket version is responsible for grabbing all the sources and executing whichever ones are desired, then printing out a summary of the successes and failures:

(define (print-result result)
  (printf "~a: ~a\n" (first result) (if (second result) "Success" "Failure")))

(for-each print-result
  (begin0
    (map executeSource
         (filter (lambda (source) (or (equal? target (first source))
                                      (equal? target "all targets")))
                 (getSources (build-path root-source-path (string->path "sources.list")))))
    (printf "\n[ Provisioning Results ]\n\n")))

This portion is quite compact and very slick. The equivalent of the Bash script was already shown, since it more or less was intermixed with the rest of the program logic

Thoughts

Overall, it is safe to say the Racket version is much more robust. This is not a surprise, and would be true for almost any language just due to how unreliable Bash is.

The Racket source is much more compact, and I think it does a better job moving related pieces of the code next to each other. It also removes a lot state-dependent values.

This particular advantage is made possible by Racket having actual data structures (lists), and is encouraged by its functional style (although I probably didn’t adhere to that as strongly as I could have) and local variables (Bash does actually have local variables, but POSIX shell does not so I don’t usually use them).

One area I had difficulty with was knowing when to return multiple values versus a list of values. In some ways, these felt redundant, and I kept having to change the signatures of my functions to properly accept one or the other as I tried to figure out which was the right way. I think with more practice I would get stuck on this problem less.

Filesystem and path manipulation was very easy and straightforward in Racket. I felt much more reassured that I wasn’t overlooking edge cases around path names, and knowing an exception would be thrown if an error was encountered makes it much easier to make sure the script fails gracefully if an error does occur.

I was fortunate that in this program I didn’t have to use the output of any of the system binaries I called, just assess their whether they had a nonzero exit code. For this reason, Racket worked well. It would have been more involved had I needed to capture stdout from Git and parse it.

The argument and flag parsing is a huge win, since it lowers the barrier for adding flags to a script in the first place (something I usually avoid in Bash/shell unless a script really needs it).

My overall experience with Racket was very positive. It was fun and interesting to use, and I liked how chaining functions reminded me of using Unix pipes to pass data and perform manipulations. I will probably consider Racket for personal scripts in the future when I need something more robust than Bash/shell, but less verbose than Go or C.