This directory contains the stuff associated with FreeBSD nullfs +
NFS panics in 9.x and 10-CURRENT


What's the problem?
===================

When one exports NULLFS filesystems via NFS, he can face kernel
panics if external clients use readdir+ feature and are accessing
same directories simultaneously.

The example of the backtrace is in current directory, file is named
panic-backtrace.txt (backtrace is from 9.x as of December 2011).

The real problem is that the thread that loses the race in
null_nodeget (/sys/fs/nullfs/null_subr.c) will put the native lock
(vp->v_vnlock = &vp->v_lock) to the nullfs vnode that should be
destroyed (because the thread lost the race).  And null_reclaim
(/sys/fs/nullfs/null_vnops.c) will try to lock vnode's v_lock in the
exclusive mode.  This will lead to panic, because v_vnlock is already
locked at the time of VOP_RECLAIM processing and we have v_vnlock that
points to v_lock.  Bingo!


How to reproduce
================

Set up nullfs mount on the local filesystem (/etc/fstab):
{{{
/usr/null	/null		nullfs	rw,noauto		0	0
}}}

Export it via NFS (/etc/exports):
{{{
/null		-maproot=root 127.0.0.1
}}}

Mount the exported filesystem 3 times under different paths (we need
more than one NFS client, since requests within the single client
seems to be serialized):
{{{
127.0.0.1:/null	/nfs1	nfs	rw,udp,rdirplus,noauto		0	0
127.0.0.1:/null	/nfs2	nfs	rw,udp,rdirplus,noauto		0	0
127.0.0.1:/null	/nfs3	nfs	rw,udp,rdirplus,noauto		0	0
}}}
Readdir+ must be active, because it locks vnodes with LK_SHARED.
rc.conf should minimally contain the following lines to run NFS:
{{{
## NFS
rpcbind_enable="YES"
nfs_server_enable="YES"
nfs_reserved_port_only="YES"
mountd_enable="YES"
mountd_flags="-p 904 -l"
rpc_lockd_enable="YES"
rpc_lockd_flags="-p 4001"
rpc_statd_enable="YES"
rpc_statd_flags="-p 4002"
nfsd_enable="YES"
nfsd_flags="-h 127.0.0.1 -u -n 8"
}}}

Mount filesystems
{{{
mount /null
mount /nfs1
mount /nfs2
mount /nfs3
}}}

And run script nullfs-simple-test.sh from the current directory (all
we need is to make multiple clients to do the same thing); here we're
using different mountpoints to make it inside the single machine, but
if you have external clients (that use readdir+), you can mount NFS
filesystem from them.

In case you NFS server is too fast to reveal the races, you can add
the line
{{{
	pause("0pause", hz / 10);
}}}
to the routine null_nodeget, just above the malloc for the new
null_node,
{{{
 	xp = malloc(sizeof(struct null_node),
}}}
It will allow much more time for the races.


How to solve?
=============

Two patches,
 - 0001-NULLFS-properly-destroy-node-hash.patch
 - 0002-NULLFS-fix-panics-when-lowervp-is-locked-with-LK_SHA.patch
will fix this problem.  In reality, the second patch does all work
and the first one is just some nitpicking.

Instead of all this, one can just handle this case inside null_reclaim
by checking the values of vp->v_vnlock and &vp->v_lock, but this is
the kludgy way that can strike in future once again.


My questions
============

Q: I am using VOP_ISLOCKED() to determine what is the lock status
   of the lowervp.  Isn't it too expensive?