KerbMinder updated to v1.3

km_logo_256x

This release brings some significant changes (besides the slick new logo) thanks to a new collaborator, Francois Levaux. All of the original functionality is there, but he made the code much better (you should care about such things!) while adding a killer new feature.

With this release, KerbMinder no longer requires the Mac to be bound to Active Directory. On an unbound Mac, KerbMinder will prompt users for their username and domain information and use it to retrieve a kerberos ticket from the domain.

You can download v1.3 here.

Advertisements

Monitor Isilon NFS thread counts

Here at [my workplace] we recently noticed that some of the nodes in our Isilon storage cluster were reaching their NFS thread limit. I won’t go into why that’s a bad thing or the reasons it was occurring, but we quickly realized it was something we should be monitoring closely. To see the current NFS thread counts on all nodes in your Isilon cluster, you use the following command:

isi_for_array -s sysctl vfs.nfsrv.rpc.threads_alloc_current

This returns something like the following:

dm11-1: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-2: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-3: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-4: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-5: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-6: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-7: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-8: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-9: vfs.nfsrv.rpc.threads_alloc_current: 16
...etc...

The first column gives you the node name and the last column gives you the current thread count. With few connections, the numbers on the left will be low. Our nodes are set with a 16 thread minimum. As more clients connect to a given node, more threads are spawned as needed to service them.

Running this command manually every once in a while is obviously less than ideal. Since Isilon nodes run an OS based on FreeBSD and python is available on them, I wrote a python script called ‘nfs_watcher.py‘ to monitor the thread counts for me. The script lives in /root on one of the nodes in the cluster and runs every 5 minutes via a cron entry in /etc/local/crontab.local on the same node.

When the script runs, it checks to see if any of the nodes is at or exceeding our warning threshold (70% of the max thread count of 256). The script sends an alert email (via smtp/sendmail) if at least one node has hit the warning threshold. Nodes beyond the threshold are identified at the top of the message in a line that starts “WARN” or “CRIT” followed by the node’s name and thread count. The email alert also includes a complete copy of the thread count data at the bottom so you can check to see if is an isolated spike or if the entire cluster is undergoing a heavy load.

You can find nfs_watcher.py on my github page.

Revealing symlinks in arbitrary paths

Here at [my day job], the scientists I support can generate many tens, hundreds, and often thousands of gigabytes of data. We provide them with a few different storage options, each with different performance, redundancy, and (therefore) cost characteristics. Quite a few of the labs here keep data spread across multiple storage tiers. To make it simpler for them to maintain and access their data, they often put symlinks to, for example, archived data in their primary data directories.

A number of issues can arise from this. One is that the scientists will often forget where their data actually resides. This is a major issue if they are planning on using our compute cluster to analyze this data. One of the trade-offs of storing data on our archive tier, besides being slower than our primary tier, is that only a limited set of computer cluster nodes can access the archive tier. That tier is not robust enough to handle a lot of concurrent traffic, so we only allow a small subset of cluster nodes to access it. Unless these nodes are specifically requested when scheduling a cluster job that involves archived files, that job will fail.

Of course, when a job fails to run, we’re usually asked to diagnose the issue. The most common culprit is that these some or all of the files are on the archive storage. My co-worker was getting frustrated constantly diagnosing this issue and opined,

Wouldn’t it be great if we had a tool that would convert paths into something that made any symlinks in the path obvious?

I took that as a challenge. Thanks to python, I came up with a simple command line tool in fairly short order that does just that, plus a little more. I call it realpath. Here’s the usage screen and some examples of how it works.

Usage: realpath [options] path

Options:
-h, --help show this help message and exit
-f, --full show full symlink paths
-a, --actual show the actual, non-interleaved path

Examples:

# realpath /tmp/pathtest/stuff
/tmp[private/tmp]/pathtest/stuff[../../../Users/admindude/Documents/stuff]

# realpath --full /tmp/pathtest/stuff
/tmp[/private/tmp]/pathtest/stuff[/Users/admindude/Documents/stuff]

# realpath --actual /tmp/pathtest/stuff
/Users/admindude/Documents/stuff

Pretty cool, huh? You can find realpath on my github page.