vmware KB 2014323 PowerCLI to set qla2xxx option at cluster level

I’ve been facing very recently a naughty issue when I updated my vSphere 4.1u2 FCoE hosts to vSphere 5.0u1.

Right after the migration to vSphere 5, majors’ performance issues appeared with the underling storage. I was facing read/write latencies of several thousand milliseconds which lead to dead path detection and of course I/O problem within the VMs.

The servers where working perfectly with vSphere 4.1, so it has been decided to quickly rollback from 5.0u1 to 4.1u2.

After having restored a proper virtualisation service I opened SR at VMware who asked me to open an SR at the storage array vendor, who asked me to open a SR at the FC/FCoE switch vendor. 3 SR for 1 problem L

The servers are Dell PowerEdge R715 with QLogic 8150 CNA cards.

After several diag files uploads, patches at the array level and further researches one of the support team (not VMware’s) came with a very interesting KB from VMware website -> I/O activity pauses on virtual machines with QLogic 81xx series CNA cards on ESXi 5.0

This KB sounded good to my ears. The workaround mentioned in this KB has been tested on a single host with the desired result -> no more latency or dead path.

So it has been decided to apply the workaround on every FCoE hosts.

The KB gives you the way to apply the workaround with esxcli. If you have a vMA and not too many hosts it is fine, but I wanted to be use a more “industrial” way. So I used PowerCLI.

My script below is designed to be applied at a cluster level for hosts running vSphere 5, having vmhba model “ISP81xx-based 10 GbE FCoE to PCI Express CNA” and using the driver “qla2xxx”. It will only run on hosts in maintenance mode and will ask for a reboot after its completion.

The model name mentioned above is the one returned by the cmdlet “Get-VMHostHba” for Qlogic 8150/8152 CNA cards. If you are using other model impacted by this issue as described in the Qlogic website, maybe check if it is the same model name. I’ve set a print mode for these info in the script.

It helped me a lot.

Eric

Download-Script

Start-Script->

## options dedicated to Qlogic CNAs and vSphere 5 issue -> VMware kb: http://kb.vmware.com/kb/2014323
#
# build 1.00
#
# Eric Krejci
#
# ekrejci.wordpress.com
#
# Twitter – @ekrejci
#
# on error stop the script
$erroractionpreference=Stop
$Username=DOMAIN\USERNAME
$Password=read-hostEnter Password-assecurestring

### declare your vcenter

$vCenterserver=vcenter.fqdn

### declare the name of the target DataCenter

$TargetDCName=DC-NAME

### declare the name of the target Cluster

$TargetClusterName=Cluster-Name

### create an empty array to store the modified hosts

$AppliedESXs= @()

### connect to the vCenter
Connect-VIServer-Server$vCenterserver-user$Username-Password ([Runtime.InteropServices.Marshal]::PtrToStringAuto([Runtime.InteropServices.Marshal]::SecureStringToBSTR($password)))

## retreive the vSphere in the target cluster with powerstate -> PoweredOn AND vSphere version equal or larger to 5

$TargetESXs=get-vmhost-Location (Get-Cluster-Name$TargetClusterName-Location (Get-Datacenter-Name$TargetDCName )) | where {($_.powerstate -eqPoweredOn) -and ($_.Version -ge5)}

foreach ($TargetESXin$TargetESXs ) {

$TargetESX.name

## we check if the host is in maintenance mode to avoid any risks when performing the actions. if you don’t want to set the maintenance mode, the script ends.

if ($TargetESX.ConnectionState-eqMaintenance) {
$TargetESX.name+ is already in maintenance mode
}
else {
if (Y-eq ((Read-HostYour ESX is not in maintenance! Do you want to set now? Enter Y or N).ToUpper())) {

Set-VMHost-VMHost$TargetESX-StateMaintenance

} else {

exit
exit

}
}

## uncomment the following to print the name, driver and model of the current vSphere’s VMHBAs

#Get-VMHostHba -VMHost $TargetESX | %{“Name: ” + $_.name + “; driver: ” + $_.driver + “; Model: ” +$_.model}

## retreive every VMHBA of the server with model “*10 GbE FCoE to PCI Express CNA*” and using the driver “qla2xxx”

$TargetESXHBAs=Get-VMHostHba-VMHost$TargetESX | where {($_.driver -eqqla2xxx) -and ($_.model -like*10 GbE FCoE to PCI Express CNA*) }

## if we have corresponding Qlogic CNAs then we apply the option described in the VMware KB.

if ($TargetESXHBAs-ne$null ) {

## getting the module “qla2xxx” from the current vSphere

$qla2xxx=Get-VMHostModule-VMHost$TargetESX-Nameqla2xxx

## setting the module “qla2xxx” with the good option

$qla2xxx | Set-VMHostModule-Optionsql2xenablemsix=0

$AppliedESXs+=$TargetESX

}
}

## you must reboot to have the setting applied. if you don’t want to do it now, don’t forget to reboot the hosts later.

if (Y-eq ((Read-HostDo you want to restart the ESX Servers? Enter Y or N).ToUpper())) {
Rebooting the ESX Server

foreach ($AppliedESXin$AppliedESXs ) {

$AppliedESXView=Get-View-Id$AppliedESX.id

$AppliedESXView.RebootHost($false)

End of configure ESX server +$AppliedESX

}
}

Finished

Disconnect-VIServer-Confirm:$false

<-End-Script

6 thoughts on “vmware KB 2014323 PowerCLI to set qla2xxx option at cluster level”

  1. Eric, we are experiencing the exact same issue but we are using iscsi connections to an emc vnx 5500, and we are using qlogic cards qle4062c. Different protocol, different drivers, but also on vsphere 5 and also huge storage problems. we are actually having an issue where high io will completely drop the datastore from the host all together, and only a rescan of the hba’s within vi client on the host will re-connect the paths. we are also using emc powerpath for both multipathing and load balancing. According to emc support these events are host related not storage related, and whatever is causing it is at the host level. Could your fix in some way apply to us using iscsi and different qlogic cards / drivers?

    Reply
    • Hello,
      I’m sorry to see the terrific problems you are facing.
      To be honest the driver parameter is clearly defined for the qla2xxx driver.
      For example, you can list all parameters of a driver within ESXi by running
      esxcli system module parameters list -m qla2xxx
      Of course you will have to adapt the “qla2xxx” with the driver used by your qle4062c
      Here you will find the article from Qlogic website about the issue https://support.qlogic.com/app/answers/detail/a_id/1877/kw/1877
      Have a try on a test server, maybe a similar parameter to the ql2xenablemsi24xx exists for your driver.
      I would also strongly suggest opening a critical SR against Qlogic for your issue.
      I hope you will find the cause of your problems.
      Eric

      Reply
  2. We are experiencing the same issue with the qla2xxx driver. Are there performance impacts associated with disabling msx-i ? We are very concerned this will be a real shot to performance.

    Reply
    • Based on our performance monitoring, we didn’t noticed any degradation.
      For your information I opened a SR at Qlogic to ask them if this issue has been resolved with the latest qla2 driver “934.5.6.0-1vmw” available at vmware. I received the answer today ->
      The issue is caused by the MSI-X capability structure, which is an extension to MSI (message signaled interrupt). This symptom is a known issue on ESXi 5.0 and will be fixed in a future firmware and driver release but at this moment there not a newer driver.

      Let’s wait. I will update my post as soon as I receive an update for a driver fixing this issue.

      Eric

      Reply

Leave a Reply to ekrejci Cancel reply