Monitor Isilon NFS thread counts

Here at [my workplace] we recently noticed that some of the nodes in our Isilon storage cluster were reaching their NFS thread limit. I won’t go into why that’s a bad thing or the reasons it was occurring, but we quickly realized it was something we should be monitoring closely. To see the current NFS thread counts on all nodes in your Isilon cluster, you use the following command:

isi_for_array -s sysctl vfs.nfsrv.rpc.threads_alloc_current

This returns something like the following:

dm11-1: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-2: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-3: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-4: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-5: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-6: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-7: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-8: vfs.nfsrv.rpc.threads_alloc_current: 16
dm11-9: vfs.nfsrv.rpc.threads_alloc_current: 16
...etc...

The first column gives you the node name and the last column gives you the current thread count. With few connections, the numbers on the left will be low. Our nodes are set with a 16 thread minimum. As more clients connect to a given node, more threads are spawned as needed to service them.

Running this command manually every once in a while is obviously less than ideal. Since Isilon nodes run an OS based on FreeBSD and python is available on them, I wrote a python script called ‘nfs_watcher.py‘ to monitor the thread counts for me. The script lives in /root on one of the nodes in the cluster and runs every 5 minutes via a cron entry in /etc/local/crontab.local on the same node.

When the script runs, it checks to see if any of the nodes is at or exceeding our warning threshold (70% of the max thread count of 256). The script sends an alert email (via smtp/sendmail) if at least one node has hit the warning threshold. Nodes beyond the threshold are identified at the top of the message in a line that starts “WARN” or “CRIT” followed by the node’s name and thread count. The email alert also includes a complete copy of the thread count data at the bottom so you can check to see if is an isolated spike or if the entire cluster is undergoing a heavy load.

You can find nfs_watcher.py on my github page.

Advertisements

Update NFS automounts from the terminal

If you’ve used Disk Utility1 to set up automounts — or you recently upgraded to Mountain Lion and found that the GUI for editing NFS mounts has disappeared — and find yourself needing to quickly update the records, this tip is for you.

We moved a bunch of NFS shares from one server to another over the weekend and needed to update the mount records on all clients that aren’t using our LDAP-based automount records. A handful of Macs with manually-configured NFS shares had lost access to these relocated shares. Disk Utility stores its mount records as (non-binary) plists in  /var/db/dslocal/nodes/Default/mounts. One of the lines in a mount plist contains the server:/path/to/share line for that automount.

To update the mount record, do the following using root privileges:

  1. Find the plist that contains the path you need to update in /var/db/dslocal/nodes/Default/mounts.
  2. Use your favorite text editing tool to update the path record, or replace the entire plist with one that contains the updated record.
  3. Run automount -vc to flush the cache and read in the updated information.

That’s all there is to it. I leave it as an exercise for the reader to combine all the steps into a deployable, scripted solution.


1. If you’re using OS X 10.5, it’s in Directory Utility.

Automount NFS in OS X

I work in a mixed Mac/Windows/Linux environment. The majority of our fileshares are located on Isilon gear and are accessible over SMB with AD authentication, and over NFS with LDAP authentication. Our Macs bind to AD and therefore use SMB to access fileshares. As the size of the scientific datasets people use grow, the (lack of) performance of SMB in Mac OS X becomes more of an issue — especially for people who know they can get far better performance in Windows and Linux. To remedy this, with the help of my colleague and Mac admin, Rich Trouton, we’ve started to migrate certain Mac users away from AD and SMB to LDAP and NFS. Because our Mac users have mobile accounts with local homes, the move requires a bit of finagling, which is why Rich scripted the process.

Once a Mac user’s account has been migrated to LDAP, s/he can use NFS URLs in the Finder’s Connect to Server window and will see vastly superior performance to the previous SMB connections. The more Macs we convert to LDAP/NFS, the more active connections we’ll have to our fileservers. At a certain point, this will become a problem. Fortunately, there’s a way around this that all our linux computers already use: autofs. Autofs will automatically mount fileshares on an as-needed basis and will automatically disconnect fileshares after an idle timeout period (which defaults to one hour). Another benefit of autofs is that users no longer have to mount shares manually. Simply by navigating to where the share is supposed to be will mount it there. Automount is clearly the best solution going forward.

My primary goals for this NFS automounting solution was to make it easy to manage and update — we sometimes add and remove fileshares — and to have the Macs mount fileshares at exactly the same paths as in Linux, inside a root-level directory called ‘groups’. Because of a peculiarity in OS X Lion’s Finder (that I’ll discuss later), this goal precluded the use of the automount maps that our linux hosts get from LDAP. My solution (which works with OS 10.6 through 10.8) adds entries to /etc/auto_master that reference files in a new /etc/automounts directory.

Example Scenario

Let’s say my Isilon cluster is called shares.example.com and it is exporting a number of NFS shares with root paths beginning with /ifs/groups/foo, /ifs/groups/bar, and /ifs/groups/baz. Each of these directories contains at least two subfolders which are the actual shares. I want these shares to mount inside /groups/foo, /groups/bar, and /groups/baz. To do this, I need to create three files inside a new /etc/automounts directory called foo, bar, and baz containing the respective automount maps. Because I’m working outside the user space, I’ll need root/admin privileges.

Here’s what /etc/automounts/foo looks like:

The asterisk at the beginning and the ampersand at the end of the line tell automount to mount any shares it finds inside /ifs/groups/foo with the same name as the share. This saves me from having to specify each share individually. (The mount options are beyond the scope of this post, but go here if you want to learn more about them.) The other two mounts follow this same pattern. You can, of course, specify each share individually. These mount files can have any number of lines in them.

When all the mount files are written, I need to add one line per file to my /etc/auto_master file. It ends up looking like this:

Looks pretty straightforward, right? After these modifications are in place, I check to make sure all files are root:wheel owned, the automounts directory has rwxr-xr-x (755) permissions and /etc/auto_master and all files within /etc/automounts have rw-r–r– (644) permissions. Now I need to restart the automounter so it sees the new mount maps:

When this command runs, it should output all the new mounts it has created. The first three lines are from Apple’s default mounts:

Now, even though it says “mounted”, nothing has actually been mounted. If you look in the Finder, you should see that autmount has created the mount  points, but nothing else. This is an important concept to understand. A share will not actually mount until you traverse its mount point. This is confusing for anyone who has not wrapped their head around autofs — and I’m staring in the general direction of most Mac users, here. For example, let’s say you want to get to /groups/foo/images. If you look inside /groups/foo in the Finder or the Terminal, you will see an empty directory. To go that next step, you’ll need to either use “Go to Folder” and specify “/groups/foo/images” or use the terminal and cd into that directory.

One last thing to mention. You will probably wish to disable the creation of .DS_Store files on network volumes when using automounts. The Finder has a bad habit of leaving these files open, so your automounted shares will not unmount after set idle times like they’re supposed to. To keep your Mac from writing .DS_Store files to network drives, run the following defaults command in the terminal. This is a per-user setting.

Deploying this solution

As I mentioned earlier, because we add and remove new shares semi-regularly, I needed this solution to be manageable. If you’re already using configuration management tools in your Mac environment — be it Casper, Puppet, or anything else — you’re probably already familiar with the best way to deploy and manage a small collection of files. My colleague Rich just wrote up how to wrap all this into a package that you can deploy with your tool of choice. Any time we add or remove a share, we can push out a new package with the changes and the packages postinstall script will reload automount. The reload process won’t affect any active mounts so we can push the package out at any time.

Appendix: Why not use LDAP?

Because the Finder (at least since 10.7) will rename mount points to match the filename that contains the autofs mapping. Our LDAP server’s maps are named with a format of auto.groups.foo. As soon as you go into a share inside /groups/foo, e.g. /groups/foo/images, the Finder renames the foo directory to auto.groups.foo. Directory names viewed from the Terminal are unaffected.