Monitor Isilon NFS thread counts

Here at [my workplace] we recently noticed that some of the nodes in our Isilon storage cluster were reaching their NFS thread limit. I won’t go into why that’s a bad thing or the reasons it was occurring, but we quickly realized it was something we should be monitoring closely. To see the current NFS thread counts on all nodes in your Isilon cluster, you use the following command:

isi_for_array -s sysctl vfs.nfsrv.rpc.threads_alloc_current

This returns something like the following:

dm11-1: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-2: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-3: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-4: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-5: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-6: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-7: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-8: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-9: vfs.nfsrv.rpc.threads_alloc_current: 16

The first column gives you the node name and the last column gives you the current thread count. With few connections, the numbers on the left will be low. Our nodes are set with a 16 thread minimum. As more clients connect to a given node, more threads are spawned as needed to service them.

Running this command manually every once in a while is obviously less than ideal. Since Isilon nodes run an OS based on FreeBSD and python is available on them, I wrote a python script called ‘‘ to monitor the thread counts for me. The script lives in /root on one of the nodes in the cluster and runs every 5 minutes via a cron entry in /etc/local/crontab.local on the same node.

When the script runs, it checks to see if any of the nodes is at or exceeding our warning threshold (70% of the max thread count of 256). The script sends an alert email (via smtp/sendmail) if at least one node has hit the warning threshold. Nodes beyond the threshold are identified at the top of the message in a line that starts “WARN” or “CRIT” followed by the node’s name and thread count. The email alert also includes a complete copy of the thread count data at the bottom so you can check to see if is an isolated spike or if the entire cluster is undergoing a heavy load.

You can find on my github page.


One thought on “Monitor Isilon NFS thread counts

Comments are closed.