Here at [my day job], the scientists I support can generate many tens, hundreds, and often thousands of gigabytes of data. We provide them with a few different storage options, each with different performance, redundancy, and (therefore) cost characteristics. Quite a few of the labs here keep data spread across multiple storage tiers. To make it simpler for them to maintain and access their data, they often put symlinks to, for example, archived data in their primary data directories.
A number of issues can arise from this. One is that the scientists will often forget where their data actually resides. This is a major issue if they are planning on using our compute cluster to analyze this data. One of the trade-offs of storing data on our archive tier, besides being slower than our primary tier, is that only a limited set of computer cluster nodes can access the archive tier. That tier is not robust enough to handle a lot of concurrent traffic, so we only allow a small subset of cluster nodes to access it. Unless these nodes are specifically requested when scheduling a cluster job that involves archived files, that job will fail.
Of course, when a job fails to run, we’re usually asked to diagnose the issue. The most common culprit is that these some or all of the files are on the archive storage. My co-worker was getting frustrated constantly diagnosing this issue and opined,
Wouldn’t it be great if we had a tool that would convert paths into something that made any symlinks in the path obvious?
I took that as a challenge. Thanks to python, I came up with a simple command line tool in fairly short order that does just that, plus a little more. I call it realpath. Here’s the usage screen and some examples of how it works.
Usage: realpath [options] path
-h, --help show this help message and exit
-f, --full show full symlink paths
-a, --actual show the actual, non-interleaved path
# realpath /tmp/pathtest/stuff
# realpath --full /tmp/pathtest/stuff
# realpath --actual /tmp/pathtest/stuff
Pretty cool, huh? You can find realpath on my github page.