Discussion:
unknown
1970-01-01 00:00:00 UTC
Permalink
- Tunable name can be dmp_failed_io_threshold, dmp_retry_count,=20
dmp_pathswitch_blks_shift, dmp_queue_depth, dmp_cache_open,=20
dmp_daemon_count, dmp_scsi_timeout, dmp_delayq_interval,
dmp_path_age,
or dmp_stat_interval
Something odd because my version is 5.0 MP3 Solaris SPARC, and
according
to http://seer.entsupport.symantec.com/docs/316981.htm this tunable =
should be available.
modinfo | grep -i vx
38 7846a000 3800e 288 1 vxdmp (VxVM 5.0-2006-05-11a: DMP
Drive)
40 784a4000 334c40 289 1 vxio (VxVM 5.0-2006-05-11a I/O=20
driver)
42 783ec71d df8 290 1 vxspec (VxVM 5.0-2006-05-11a
control/st)
296 78cfb0a2 c6b 291 1 vxportal (VxFS 5.0_REV-5.0A55_sol=20
portal
)
297 78d6c000 1b9d4f 8 1 vxfs (VxFS 5.0_REV-5.0A55_sol SunOS 5)
298 78f18000 a270 292 1 fdd (VxQIO 5.0_REV-5.0A55_sol Quick )
Which version of veritas? Version 4/2MP2 and version 5.x=20
introduced
a
feature called DMP fast recovery. It was probably supposed to be=20
called DMP fast fail but "recovery" sounds better. It is supposed=20
to fail suspect paths more aggressively to speed up failover. But=20
when you only have one vxvm DMP path, as is the case with MPxIO,=20
and fast-recovery fails that path, then you're in trouble. In=20
version
5.x,
it is possible to disable this feature.
Google DMP fast recovery.
http://seer.entsupport.symantec.com/docs/307959.htm
I can imagine there must have been some internal fights at=20
symantec between product management and QA to get that feature =
released.
Vic
On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE=20
Dear Vx-addicts,
- Solaris 9 HW 9/05
- SUN SAN (SFS) 4.4.15
- Emulex with SUN generic driver (emlx)
- VxVM 5.0-2006-05-11a
- storage on HP SAN (XP 24K).
Multipathing is managed by MPxIO (not VxDMP) because the SAN team
and HP
VxVM =3D=3D> VxDMP =3D=3D> MPxIO =3D=3D> FCP ...
We have 2 paths to the switch, linked to 2 paths to the storage,=20
so
the
LUNs have 4 paths, with active/active support.
Failover operation has been tested successfully by offlining each
port
successively on the SAN.
We regulary have transient I/O errors (scsi timeout, I/O error
retries
with "Unit attention"), due to SAN-side issues. Usually these
errors are
transparently managed by MPxIO/VxVM without impact on the
applications.
One of the SAN port was reset , consequently there were some
transient
I/O error.
The other SAN port was OK, so the MPxIO multipathing layer should
have
failover the I/O on the other path, without transmiting the error
to the
VxDMP layer.
For some reason, it did not failover the I/O before VxVM caught=20
it
as
unrecoverable I/O error, disabling the subdisk and consequently=20
the filesystem.
VxVM
vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode
288/0x60
VxVM
vxdmp V-5-0-111 disabled dmpnode 288/0x60 Sep 1 06:18:54=20
VxVM
vxdmp V-5-0-112 disabled path 118/0x538 belonging to the dmpnode
288/0x20
VxVM
vxdmp V-5-0-112 disabled path 118/0x550 belonging to the dmpnode
288/0x18
VxVM
vxdmp V-5-0-111 disabled dmpnode 288/0x20 Sep 1 06:18:54=20
VxVM
vxdmp V-5-0-111 disabled dmpnode 288/0x18 Sep 1 06:18:54=20
Sep 1 06:18:54 myserver SCSI transport failed: reason
'tran_err': retrying command
Sep 1 06:19:05 myserver SCSI transport failed: reason
retrying command
Sep 1 06:21:57 myserver SCSI transport failed: reason
'tran_err': retrying command
Sep 1 06:22:45 myserver SCSI transport failed: reason
retrying command
Sep 1 06:23:03 myserver SCSI transport failed: reason
giving up
VxVM
vxio V-5-3-0 voldmp_errbuf_sio_start: Failed to flush the error
buffer
300ce41c340 on device 0x1200000003a to DMP Sep 1 06:23:03=20
VxVM
vxio V-5-0-2 Subdisk mydisk_2-02 block 5935: Uncorrectable write
error
msgcnt
1 mesg 037: V-2-37: vx_metaioerr - vx_logbuf_clean -
/dev/vx/dsk/mydg/vol1 file system meta data write error in
dev/block 0/5935
msgcnt
2 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/mydg/vol1 file=20
system
disabled
msgcnt
3 mesg 037: V-2-37: vx_metaioerr - vx_inode_iodone -
/dev/vx/dsk/mydg/vol1 file system meta data write error in
dev/block
0/265984
It seems VxDMP gets the I/O error at the same time as MPxIO : I
though
MPxIO would have conceal the I/O error until failover has=20
occured,
which
is not the case.
As a workaround, I increased the VxDMP=20
recoveryotion/fixedretry/retrycount tunable from 5 to 20 to give
MPxIO a
chance to failover before VxDMP fails, but I still don't=20
understand
why
VxVM catch the scsi errors.
Any advice ?
thanks.
--
Sebastien DAUBIGNE
AtosOrigin Infogerance - AIS/D1/SudOuest/Bordeaux/IS-Unix
_______________________________________________
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
--
Sebastien DAUBIGNE
Infogerance - AIS/D1/SudOuest/Bordeaux/IS-Unix
_______________________________________________
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
_______________________________________________
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
_______________________________________________
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
_______________________________________________
Veritas-vx maillist - Veritas-***@mailman.eng.auburn.edu =
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx

Loading...