Bug #224
Panic when accessing named pipe in ZFS snapshot
| Status: | Closed | Start: | 08/04/2010 | |
|---|---|---|---|---|
| Priority: | High | Due date: | ||
| Assigned to: | - | % Done: | 0% |
|
| Category: | - | Spent time: | - | |
| Target version: | - |
Description
Using NCP3 RC3 under VMware.
I discovered this while trying to backup my root volume using tar, from a ZFS snapshot. This did not happen with OpenSolaris b134 as I was able to successfully tar root in that case.
I can reproduce this easily by simply using ls on the named pipe:
root@nexenta:~# zfs snapshot syspool/rootfs-nmu-002@mysnap root@nexenta:~# cd /.zfs/snapshot/mysnap/etc/saf root@nexenta:/.zfs/snapshot/mysnap/etc/saf# ls -al _sactab -rw-r--r-- 1 root sys 52 Jun 29 09:19 _sactab root@nexenta:/.zfs/snapshot/mysnap/etc/saf# ls -al _sysconfig -rw-r--r-- 1 root sys 45 Jun 29 09:19 _sysconfig root@nexenta:/.zfs/snapshot/mysnap/etc/saf# ls -al _sacpipe panic[cpu0]/thread=ffffff00d199d740: BAD TRAP: type=e (#pf Page fault) rp=ffffff000497ca70 addr=96 occurred in module "zfs" due to a NULL pointer dereference ls: #pf Page fault Bad kernel fault at addr=0x96 pid=1049, pc=0xfffffffff79c523e, sp=0xffffff000497cb60, eflags=0x10286 cr0: 8005003bcr4: 6b8 cr2: 96 cr3: 5182000 cr8: c rdi: 0 rsi: 0 rdx: 30d rcx: 158 r8: ffffff00d360ea00 r9: ffffff00d1625d00 rax: 0 rbx: 10 rbp: ffffff000497cba0 r10: fffffffffb851078 r11: 0 r12: ffffff00d3b33d48 r13: ffffff00d40f8120 r14: 0 r15: 30d fsb: 0 gsb: fffffffffbc2faa0 ds: 4b es: 4b fs: 0 gs: 1c3 trp: e err: 0 rip: fffffffff79c523e cs: 30 rfl: 10286 rsp: ffffff000497cb60 ss: 38 ffffff000497c950 unix:die+dd () ffffff000497ca60 unix:trap+177b () ffffff000497ca70 unix:cmntrap+e6 () ffffff000497cba0 zfs:zil_commit+26 () ffffff000497cc10 zfs:zfs_fsync+e2 () ffffff000497cc60 genunix:fop_fsync+5a () ffffff000497cd40 fifofs:fifo_fsync+fa () ffffff000497cd90 fifofs:fifo_inactive+81 () ffffff000497cde0 genunix:fop_inactive+af () ffffff000497ce00 genunix:vn_rele+5f () ffffff000497cea0 genunix:cstatat64_32+b1 () ffffff000497cec0 genunix:lstat64_32+31 () ffffff000497cf10 unix:brand_sys_sysenter+1e0 ()
I can provide the core files from /var/crash if necessary.
History
Updated by Bryan Leaman about 1 year ago
This appears to be OpenSolaris bug 6957113 - "accessing a fifo special file in .zfs snapshot dir panics kernel". Was fixed in snv_142 but maybe not backported to NCP3?
http://bugs.opensolaris.org/bugdatabase/viewbug.do?bugid=6957113
Updated by Peter Radig about 1 year ago
Bryan Leaman wrote:
This appears to be OpenSolaris bug 6957113 - "accessing a fifo special file in .zfs snapshot dir panics kernel". Was fixed in snv_142 but maybe not backported to NCP3?
http://bugs.opensolaris.org/bugdatabase/viewbug.do?bugid=6957113
The fix for this bug is a one-liner. So that should be easy to integrate.
Updated by Bryan Leaman about 1 year ago
Peter Radig wrote:
The fix for this bug is a one-liner. So that should be easy to integrate.
Yes, that's correct. I rebuilt Nexenta from source after applying the one-line fix, and it eliminated the panic. It would be great if they can integrate this into NCP3 for the release since it's such a simple fix.
Updated by Garrett D'Amore about 1 year ago
- Status changed from New to Closed
I've copied this to our bug reporting tool for NexentaStor, and will be tracking it there. Its bug id 2611 for hg.nexenta.com.
Closing this bug, as we will track it there. That said I think we can integrate this change relatively easily. The window of opportunity for 3.0 might have passed, but at least for 3.1 we can do it.
Updated by Jeff Strunk about 1 year ago
- File mdb.thinkmate1 added
Since I don't have access to hg.nexenta.com I am putting this information here.
I applied the fix for this bug to test. It works for local access. I can run 'find .zfs -type p' as many times as I want when logged into my file server where this fix has been applied. I verified that an identical system without the fix crashes immediately if I run that command.
However, running that command on a linux nfs client twice crashes the fixed NCP file server.
The system in question is running NCP 3.0.1 with only the patch for bug 6957113 applied from upstream.
Updated by Jeff Strunk about 1 year ago
- File mdb.thinkmate1-delete-after-send added
After working with bslbb on IRC, here are the exact steps to reproduce:
root@thinkmate1:~# zpool destroy thinkmate1
root@thinkmate1:~# zpool create thinkmate1 raidz2 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0 log c0t0d0 cache c0t1d0
root@thinkmate1:~# mkfifo /thinkmate1/testfifo
root@thinkmate1:~# zfs snapshot thinkmate1@foo0
root@thinkmate1:~# zfs destroy -r thinkmate2
root@thinkmate1:~# zfs send thinkmate1@foo0 | zfs receive -F thinkmate2
root@thinkmate1:~# zfs snapshot thinkmate1@foo1
root@thinkmate1:~# zfs send -I foo0 thinkmate1@foo1 | zfs receive -F thinkmate2
root@thinkmate1:~# zfs destroy thinkmate1@foo0
root@thinkmate1:~# find /thinkmate1/.zfs/s
shares/ snapshot/
root@thinkmate1:~# find /thinkmate1/.zfs/
/thinkmate1/.zfs/
/thinkmate1/.zfs/snapshot
/thinkmate1/.zfs/snapshot/foo1
/thinkmate1/.zfs/shares
root@thinkmate1:~# ls /thinkmate1/
testfifo
root@thinkmate1:~# find /thinkmate1/.zfs/snapshot/foo1/
/thinkmate1/.zfs/snapshot/foo1/
/thinkmate1/.zfs/snapshot/foo1/testfifo
root@thinkmate1:~# find /thinkmate1/.zfs/snapshot/foo1/
/thinkmate1/.zfs/snapshot/foo1/
/thinkmate1/.zfs/snapshot/foo1/testfifo
root@thinkmate1:~# zfs snapshot thinkmate1@foo2
root@thinkmate1:~# zfs send -I foo1 thinkmate1@foo2 | zfs receive -F thinkmate2
root@thinkmate1:~# zfs destroy thinkmate1@foo1
At this point it crashes. I verified that it crashes without a cache or zil device in the pool as well.
At this point, I think the fix for 6957113 only covers up the deeper problem with FIFO special files.