June 13, 2020

j, a directory navigation tool

In this post, I'm going to discuss a simple command line tool I first wrote a couple of years ago. The tool is called j (short for jump), and it falls into the category of command line directory navigation tools. The code is available here. At this time only zsh is supported; I use zsh and I like zsh, so I haven't had much reason to support bash or other shells.

What it does

A brief note on terminology: given a directory like /path/to/foo, I'll refer to the entire string as the directory's path, and the last component of the path (in this case, foo) as the directory's name.

I won't go into detailed comparisons with other tools, but what I like about j is that it's simple. I wrote j with predictability and determinism as top priorities. I didn't want fuzzy matching or directory ranking based on clever metrics. Generally, I already know where I want to go—j just helps me get there faster.

Usage of j is quite straightforward. On the command line, run

j foo

to change to a previously-visited directory named foo (continuing our example from above, this would take me to /path/to/foo if I've been there before). This is convenient because I can usually remember individual directory names, but not full paths (or at least I don't want to type them out). And j supports tab completion, so there's no need to even manually type out the directory name itself.

What if I've been to multiple directories named foo? Then j opens a command line interface allowing the user to select the desired matching directory from a list, with the most recently-visited listed first. Any directories that no longer exist are automatically pruned out. That's about it—simple but helpful.

Now I'm going to spend the remainder of this post describing a bit about the development process and some things I learned along the way.

Version 1.0

As previously mentioned, I developed the original version of j a couple of years ago. It was written in Python, and had almost the same functionality it has now. At first I was just storing data using Python's native pickle format, but I later decided it was time to try some alternative language-agnostic storage formats. I tried two: JSON and msgpack.

My main concern in choosing a storage format was read and write speed. Since j reads and writes data each time the working directory is changed, long read/write times could significantly slow down directory navigation with cd.

However, it turns out that my concern with read/write speed was misguided. The main performance bottleneck was actually importing the modules—the differences in read/write speed between pickle, JSON, and msgpack were minuscule compared to time it took to import any one of them into Python. And so regardless of which of these storage formats I chose, the performance was always limited.

Out with fancy storage formats then—I switched to storing data in simple plain text files (something I should have just started with). The average time to add a directory using was now 65ms, compared to 75ms with the original pickle storage format. These times were measured using zsh's built-in time function averaged over 100 runs:

time (repeat 100; CMD)

where CMD was replaced by the relevant j command to add a directory. The added directory was the same for each run and all versions of j that were tested.

The change to plain text storage did improve the speed of adding directories, but only slightly. After sprinkling some calls to time.time() around the code and printing the results, I was quite confused as to why the script took such a long time to run. The code itself appeared to execute extremely quickly, taking less than 1ms to add a directory. But calling the script took 65ms according to zsh's time!

Where did this discrepancy come from? It turns out that invoking the Python interpreter itself has some overhead, which, in this case, is the vast majority of the total execution time. This realization really soured me on continuing to use Python for the project. In addition to the slow speed, I was also unhappy with the complexity of the code. For instance, the core Python script needed to be wrapped in a zsh script to handle changing the working directory. And so I thought, why not just write the whole thing in zsh?

Version 2.0

This brings us to the second (and current) version of j, written in zsh. I certainly wouldn't advocate writing anything large or complex in a shell language, but in this case I think the result was simpler and rather elegant.

This version is much faster than any of the earlier Python versions by about an order of magnitude (about 6ms to add a directory), so shell navigation is now much snappier. And shell languages are great for processing plain text files, like those used for j's data, very quickly. Python is still used for the interface to list multiple directories with the same name, but here the time spent invoking Python is minimal compared to the time to receive user input.

I'm quite happy with the state of the code now, and j has become an integral part of my muscle memory and work flow. Go ahead and try it if sounds like it might be useful for you, too.

As a final thought, I'll also mention the zsh profiler, zprof, which I only discovered recently while working on this project. It is activated by running

zmodload zsh/zprof

Then, one just has to run zprof to generate a profile report of all shell functions that have been run since loading the module. I didn't find this terribly helpful for profiling an individual function, but it is great for discovering what your shell is spending time doing.