The views and opinions expressed on this blog are my own. They may or may not represent the views and opinions of my current or previous employers or any other organization or individual. Any information that I represent as a fact are believed by me to be true. I do not however make any legal claim to this.
With VMware and Equallogic, we are implementing a patch from VMware that caused Volumes to go read-only on one of our two paths in a fixed mode configuration.
Is there any reason not to implement round robin with Equallogic Storage behind VMware?
MPIO in VMware vSphere 4.0 with Round Robin will enable redundancy and performance to your Datastores. This is built-in functionality with VMware vSphere 4.0. Stay tuned in the next couple of months for improvements with EqualLogic and MPIO specifically with VMware vSphere 4.0.
Marc, any news on the MPIO Module…I am about to reconfigure 2 ps6000′s with vsphere and would like to use the mpio module from the get go. Is it likely to be avail sooner rahter than later? I appeciate you might not be able to give dates but in your opinion is it imminent? Thx in advance.
When I install 2008 on my physical server and connect it all up to my PS4000Es I can get up to 220Mbps throughput as recorded on IOMeter and confirmed by SAN HQ.
As soon as I use a virtual machine in the same environment the throughput drops to only 100Mbps, even if I use SQLIO instead, and this is after doing all the little MTU tweaks required by the Dell technical notes etc.
Verify that everything is setup correctly from Tech Report 1049 and that Round Robin is setup inside vSphere. Once you create an EqualLogic Volume, mount as a Datastore, go into Group Manager and confirm that there are multiple iSCSI Connections to the Volume that match up with the IP Addresses/VMNics in vSphere.
Thanks Marc, I have verified the setup, the LUNs are actually being accessed directly from the Guest OS (Windows Server 2008 R2) and the Dell HIT things have been installed and configured too.
You might not have your pathing and port groups properly configured, and it may be only using one NIC at a time, which would explain the speed difference.
What are your thoughts on hot-updating (i.e. hosts active) Equallogic firmware. I have 4 -ESX 4.0 hosts and want to update our firmware to version 5, but was wondering if I could get away with doing to update without taking down all my servers.
Depends what current version you are on. If you are on 4.2 or 4.3, there is little service interruption. We have EqualLogic Users upgrading their Firmware during production hours without any downtime.
I did this last night – updated from 4.3.5 to 5.0.1.
As the upgrade process reboots the SAN, it does cause the I/O to pause. As this is less than 60 seconds, it doesn’t cause Windows / ESX(i) any trouble.
So yes, it is possible to upgrade a SAN without shutting down all your servers, however I would recommend that is is done out of hours during a time where there is little disk I/O – typically after the users have gone home, but before the backups kick in!
Using Equallogic in a Hyper-V R2 cluster, and is enjoying that. But the performance seems to be a bit under what I expected. Do you have any settings that are recommended on Windows 2008 R2 servers to get the performance as it should? Are using Jumbo Frames today
Do you have the HIT installed? Is MPIO setup correctly? Jumbo Frames, Flow Control, etc should be setup on your NICs and the Switch connecting the EqualLogic Array. Are you using CSVs?
Thanks for quick reply Using HIT 3.4 and firmware 5.0 on the EQ. Using Least Queue depth for MPIO. Using the built-in Broadcoms on R710 for iSCSI traffic, and Dell 6224 switches with trunking. Flow control and Jumbo frames is enabled on the swith. And jumbo frames is also configured on the NICs. Are using CSV yes
To test performance, you can create a small Volume on EqualLogic, 10GB as an example. Mount that to your Hyper-V Server, don’t format it. Then use Iometer (http://www.iometer.org/) to test the performance/IOPs/MBs. This will ensure that the Server/NICs/Switch/EqualLogic are setup correctly. You should stay away from RAID 6 and use RAID50 instead.
Regarding MB/s and Iometer testing. A single Gig Ethernet NIC should provide around 110MB/s to 120MB/s. If you have MPIO enabled and have 2 NICs in your server, then you should see approx 210MB/s to 225MB/s when testing with Iometer.
Yeah the 16 SATA drive equipped chassis rocks in RAID10.
– I have a few of those deployments out in the wild…
The reason I asked is becasue it wouldn’t have been the first time I’d come across a RAID6 setup with 8 disks and a customer asking why performance isn’t as it could be.
What you should have done is profiled the servers before putting them on the SAN, then you’d have an idea of what is required. Bit too late for that i guess!
RAID 50 is pretty good for read performance, but starts to suffer with random writes. If you can spare the storage I’d still recommend RAID 10.
RAID 50 will be a major improvement over what you have.
Do you have SAN IQ installed? Is it the SAN that’s causing the bottle necks?
Have you performed the IOmeter tests as recommended by us above? What are the results?
Much I should have done beforehand I have found out Have had to learn from ground up with the EQ. But getting there. Got the SAN HQ installed yes. Will get time a bit later today to do the IOmeter tests. Will post the results here as soon as I got them.
Followed the PDF you sent about IOmeter, and here are the numbers from the two tests:
PS6000E
32K 0% read/0%random
I/O 4600-5300
140-165 MB/s
32K 100% read/0%random
I/O 3000
92-94 MB/s
On the PS6000E
Average latency is about 12 ms for the 32K 0%read/0%random, and 9,5-10 on 32K 100%read/0%random
75%read/25%write/100%random
I/O 7200-8400
230-260 MB
8-8,5 ms
Marc, if i have a 4,1 esx host, with 4 nics dedicated for iscsi, connected via 2 switches to a ps6000 should i see 2 or 4 paths? I saw 4 when using the vmware native driver but only 2 after the mpio plugin is used.
This depends on the mapping between VMKernel Ports and Physical NICs/Uplinks. If you have multiple VMKernel Ports mapped to a Physical NIC, our EqualLogic MEM will only use a single VMKernel Port. How are the vSwithes/VMKernel Ports/Physical NICs configured?
marc, thanks for the response. I have 4 nics on a DVS with 1 kernal port per Nic…so i thought i’d get 4 paths. All hosts are the same. I have 7 volumes created at the moment with 4.1 using fixed routes i get on my iScsi hba the following: Targets:7 Devices:7 Paths:28. I also get 4 paths when i click on a volume and select manager paths. As soon as i install the plugin i get 14:7:14 and under manage paths i get 2.
After installing the mpio module and subsequent reboot the Dell_PSP_EQL_Routed option is automatically set. I didn’t need to set it up. I followed the process in that link. The only thing i didn’t do was delete and re-setup my Nics/vmks/swtiches. I kept the previous setup and just installed the module. Thanks again.
I think you’ll find mileage in deleting your iSCSI vSwitch and use the setup script supplied with MEM to setup and configure your vSwitch using Remote CLI.
They do recommend just starting over with the vSwitch in the documentation. I second that recommendation.
Sorted it. By default you only get 2 paths with the EQL plugin. Changed the config file and after a reboot we’re in business with all 4 paths. I should have read the 24 page manual slower as it explains it there. Thx for you pointers and roll on the 5.0.2 firmware.
No Probs. Got an EQL old boy(b4 dell) with me in 12 days for a health check and sign off on our project so will get him to have a look it’ll be easier once he has it all in front of him. Thanks for your help and the great work you do on this site.
Snapshots and Replication use the same “Scheduler”. Snapshots are local data protection copies within the same Group/Member. Replicated Volumes are remote data protection copies of that Volume in a different Group/Member. Today we do not replicate Snapshots that you have scheduled to happen. You would use the same “Scheduler” for Snapshots and setup Replication from one Group to another Group. The Replicated Volume in the other Group is a sum of all Snapshots, you can set Volume Replication up so you have multiple instances of the same Replicated Volume. Let me know if I can answer any additional questions or dive deeper.
Hi Marc,
I have a new array with 4.3.5 and an existing array with 4.1.4. Will these coexist for a short period (2-3 weeks) to migrate the data to the new array? Thanks.
Happy Monday! I recommend calling EqualLogic Support and confirming with those super smart people first to make sure you can migrate the data during those couple of weeks.
I see a lot of folks here alreadying using firmware 5.0.x
I downloaded it, but got a big DONT USE IT e-mail a week later from Dell.
Today I checked on the EQL downloads site, and 4.3.7 is listed as latest, and 5.0 is relegated to the legacy firmware section …!
What gives?
PS … Is there any sort of cohesive online community/mailing list for Equallogic users? Other than here of course ;P
Marc, it would help if someone from EQL could at least let us end users have a better idea of where we are on the firmware update. I was expecting it (as per the email that was sent out) around the beginning of this week and still no news or sign of yet. I stupidly scheduled some time this weekend to evaulate it. Is there anywhere we can go for an update on the status of 5.0.2? Sorry for the tone but I’m sure you can understand. Many Thanks
I’m employed by Dell as a senior technical consultant in the technical marketing group for the EQL engineering group.
As you noted, the date where we’d originally told our customers we expected to have something for you (August 30th) has come and gone. Our engineers are still working on getting v5.0.2 released, but we promise to keep you in the loop with any relevant updates until it does ship.
We work very hard to maintain the highest possible level of quality and truly appreciate your patience as you wait for us on this. Please let me/us know if you have any questions!
Thomas, Thanks for the reply. I totally agree in that I would rather it was a few days late than not ready. That said as an Enterprise customer it would really help if we had an official update from you guys, even an updated eta (without putting yourselves under to much pressure) or statement on the eql website so at least we can make some plans.
Thanks Marc, I am contacting Pro-Support in Hong Kong, but I still don’t really understand why our replacement box came with FW5.0 (btw, the manufacturing date is Aug 18, 2010 on the sticker) while Dell strongly oppose loading FW5.0, it’s quite a contradiction to what they have recommended.
Dell US EQL support is so great, one of their Tech Consultant called us 10 mins after we opened a case and suggest a simple method that we simply swap the compact flash card from the old unit to the FW5.0.0 one, case solved.
I need your advice about a little design question with two ps6000XV
To have the best SAN performance with load balancing in a storage pool, Is it better to have two nodes with same RAID5 level or one with RAID 10 and one with RAID 5 ?
Unfortunately I do not have the time to answer this question right now, I will reply later. I wanted to approve your comment now so those that do review my site may reply to your question.
The answer really comes down to your IO profile and what you need out of the setup.
RAID5 is typically not a good performer, so I would never recommend this as a RAID level for non-sequential IO workloads.
From a performance perspective, you can’t get faster resilience than RAID10, so if you can afford the capacity hit having both SANs at RAID10 then this would be best.
If capacity is more important then configuring both SANs with RAID50 may be suitable, but expect around a 30% reduction in overall IOP capacity.
To answer your question, the best of both worlds could be achieved by mixing RAID10 and RAID50 SANs in the same group. Over time IO intensive volumes would be moved to the RAID10 SAN when the SAN logic determines there would be a performance benefit in doing so.
There are some gotchas over EqualLogic’s volume distribution logic in SAN groups that have different RAID levels as the calculation only takes the RAID level into consideration, not spindle speeds (it would favour a RAID10 SATA SAN over a RAID50 SSD SAN from a performance perspective!) but as both of your SANs are 15k disks, it should distribute volumes as described.
I’m asking this because, with 2 arrays in RAID 5, the IO intensive volume can be striped across the 2 arrays, isnt it ? and with one array in RAID10 and one in RAID5, the volume can move to raid10, but will be striped on only one array…
So the question is finally : which is better ? raid10 striped on ONE array or raid 5 striped on TWO arrays ?
(Graham you are right, 50 is better than 5, but same question between 10 and 50)
I just came across this PDF last week done by Veritest back in 2006, it’s related to R10 vs R50, it’s not directly related, but somehow explained the difference when adding more members to the group.
Veritest report in 2006 was on 15K PS3000, the real life results shows about 36% for 1 box (R10 vs R50) and it gradually dropped to 22% with 5 boxes. Aren’t with more boxes, the R10 vs R50 difference will get even wider %? (ie, R10 should be way higher than R50), seemed with more boxes, say up to 12 boxes in a storage pool, then R10 vs R50 may only have less than 5-10% difference, strange, any idea?
I have some problems using a four year old document as a basis for a modern day comparison as I have no doubt that the ongoing hardware and software development EqualLogic have made over this period have improved things – interesting reading though and I do have some comments to make:
No mention of network configuration settings – were they using jumbo frames or flow control? These would make big differences to performance figures, especially as you scale up.
The report doesn’t seem to discuss the way that volume distribution works with EqualLogic groups and again this may affect performance figures. Let me explain:
Regardless of how many SANs you have in a group, a volume is not spread across more than (i think) 3 SANs. Internal performance monitoring moves volumes over time to optimise this placement.
Because you can’t control this placement, did they check volume placement was distributed evenly across the SANs? It is possible that volume placement made some SANs in the group busier than others during the brief testing period, which would make a difference to performance testing.
Further statements made in the report that add ammunition to my above theory is that they added additional members to the group instead of creating the whole group at once. This could mean that if they had already created their volumes on 1 SAN, then added an addtional 3 SANs to the group they may have not given the SAN logic time to distribute volumes fully before they performed their testing – this would also cause lop-sided results.
So as you can see, there’s a lot of applicable complexity to the test setup that they don’t adequately explain and without these answers I would question the accuracy of the report.
I still say RAID10 is faster than a 2 SAN RAID5 group.
1. I read somewhere saying PS5000XV is actually the re-branded PS3800XV after Dell acquired EQL, then it came with PS6000XV, and the cache has been doubled (2GB to 4GB in total), as well as there is one more Gbit link in PS6000XV (4 vs 3), also the RISC processors have been increased from dual-core to quad-core.
2. Although I think it’s a biased report (so explained some of your questions) as it indicated as “Test report prepared under contract from EQL right beneath the title”, still I think it gives us a general idea of how different raid group performed (R10 vs R50), I don’t think it’s really that outdated.
3. FYI, test 6 is actually performed on five-members starts from the beginning instead of gradually adding members, but the performance is only a bit behind adding up members gradually.
4. BTW, you are correct a volume (aka lun) won’t spread more than 3 members, I was told the same thing before we purchased the box and we were told even with RAID5, sometimes you can stand with more than 2 disks failure because there are multiple Raid5 within the Raid5, I am not sure if you know what I mean, basically EQL’s raid standard is more advanced and redundant than the normal raid we used in Poweredge server.
5. Actually reading the Key Findings give you many hints already
6. Yes, I second your preference towards R10, we used it on our PS6000XV as well as we can always expand to R50 later.
7. One thing still confused me is
Veritest report in 2006 was on 15K PS3000, the real life results shows about 36% for 1 box (R10 vs R50) and it gradually dropped to 22% with 5 boxes. Aren’t with more boxes, the R10 vs R50 difference will get even wider %? (ie, R10 should be way higher than R50), seemed with more boxes, say up to 12 boxes in a storage pool, then R10 vs R50 may only have less than 5-10% difference, strange, any idea? How come and why?
I’ve spent almost 4 hours on-phone from mid-night to 4am in the morning trouble shooting with Dell Equallogic Consultants in US via WebEx today.
As we found the EQL I/O testing performance is low, only 1 path activated under 2 paths MPIO and disk latency is particular high during write for the newly configured array.
It was finally solved because we forgot the most fundamental concept after all that is Equallogic takes time to kick in the additional paths under MPIO!!! You need to wait say at least 5 mins to see the rest paths kick in.
Anyway, must say EQL’s support are excellent, they gave me many insights to solve this problem from multple dimension.
Detail please see my blog:
Equallogic takes time to kick in the additional paths under Windows MPIO (http://www.modelcar.hk/?)p=2746
Jack, thank you very much for taking the time and posting more details on your Blog. I will send updates out on Twitter and Facebook with a link to your Blog.
We are migrating data to a new array. We have created a new pool and will move volumes to the new pool one at a time. what is the fastest way to move the data? We have a short maintenance window and need to move 15 TB across a 1 GB connection. We are trying to avoid running all the host servers across the 1GB connection and will be moving the servers to whare the new array is while the data is migrating.
We are purchasing a new PS600X array. If we decide to gain additional SAN space by not using hot spares and instead keeping a spare (or 2) on the shelf, is it strongly discouraged? Is the slight gain in risk worth the extra space/spindles? Is anyone doing this now?
I know the question may be difficult to answer since the question depends on the specific environment and importance of the data. But with a next day service plan, is it worth to consider?
Hi all, wondering if someone out there can help out and/or shed some light on a problem I am having with a new 6500E in my lab. Stating to build it up for production.
Here’s the scoop:
FW: 4.3.7
HIT 3.4.1
Host: PowerEdge R810 (2008 R2 x64 Hyper-V)
Switches: Cisco 3750-G stacked and configured as per Dell tech docs in relation to spanning-tree, unicast storm control, etc etc. (BTW, if any one has tried, or is trying, you CANNOT set jumbo-frames per VLAN as the Dell documentation states someplace. It is a system wide change on these switches).
VLAN 20 is dedicated for iSCSI, Native VLAN is for production network.
A few other VLANs exist on these switches for other purposes (primarily Live Migration, CSV/Heartbeat, and VOICE!)
MPIO is working correctly with Jumbo Frames from the R810 LOMs (Broadcom). Moving data back and forth from the R810 host to a volume on the SAN seems good. (I will post IOmeter stats here shortly).
So, I create a new VM (2008 Server) on an EQL volume and fire it up:
Problem/issues:
1. On the desktop of the newly created VM, I simply grab a 2GB file and make a copy of it within the VM. This copying process takes > 10 mins to complete.
In this trial, I am assuming the file is simply being duplicated on/within the array?
This same operation is almost instantaneous on the physical host.
2. Create another VM and try copying the same 2GB file b/t VMs. In this case when I copy/pull FROM a VM to a VM it seems good, but if I copy/push TO a VM from a VM, the same bottleneck as #1 seems to occur.
During both tests, on the physical host, I see the LOMs jump to 15-20% for a few seconds, then drop to 0%. Jump to 15-20%, drop to zero. This pattern repeats. I do not see anything retransmits or errors on the NICs of the EQL to speak of. Do not see any retransmits/crc errors etc on the interfaces from the switch perspective either
Hi chiat, I do have that software installed, but I am not teaming those LOMS. I will peel it off and see what happens. Will advise. Also, IOMETER will not run on my x64 boxes (due to processor timer issue)…ughs…
I’ve already opened a case with EQL, but just in case someone has experienced this before.
Thanks in advance.
PS6000XV MPIO and Disk Read Performance Problems
A Quick question before even going into the following:
A Single Equallogic Volume IS LIMITED TO 1Gbps bandwidth ONLY at Max? (ie, The volume won’t send/receive more than 125MB/sec even there are MPIO NICs and iSCSI sessions connected to it) Does this apply to a single volume within just one member or it can break the 125MB/sec limit if the volume spans across 2 or more members? (for example 250MB/sec if the volume is spread over 2 members)
Summary (2 Problems Found)
a. PS6000XV MPIO DOESN’T work properly and limited to 1Gbps on 1 Interface ONLY on Server (initiator side)
b. 100% Random Read IOmeter Peformance is 1/2 of 100% Random Write
Environment:
a. iSCSI Target Equallogic: PS6000XV 1 array member only, loaded with Latest Firmware 5.0.2 and HIT 3.4.2, configured as RAID10. (16 600GB SAS 15K Disks), HIT Kit and MPIO is installed properly, in MPIO, MSFT2005iSCSI BusType_0x9 is showing besides EQL DSM.
b. iSCSI Initiator Server: Poweredge R610 with latest firmware (BIOS, H700 Raid, Broadcom 5709C Quad, etc)
c. iSCSI Initiators: Using two Broadcom 5709C (one from LOM, one from add-on 5709C Quadcard), using Microsoft
software iSCSI Initiator (not Broadcom hardware iSCSI Initiator mode that is), No Teaming (I didn’t even install Broadcom’s teaming software as I want to make sure the teaming driver doesn’t load into Windows), I’ve also disabled all Offload features, as well as disable RSS and Mode Interruption, I have enabled Flow Control to “TX & RX”, as well as set Jumbo Frame MTU to 9000 (log in EQL group manager event that the initiator is indeed connecting using Jumbo Frame), each NIC has a different IP in the same sub-net as the EQL group IP.
d. Switches: Redundant PowerConnect 5448, setup according to the best practice guide, Enabled Flow Control, Jumbo Frame, STP with Fastports, LAG, Seperate VLAN for iSCSI, disabled iSCSI Optimization and tested redundancy is working fine by unplug different ports and switch off 1 of the switch, etc.
e. IOMeter Version: 2006.07.27
f. Windows Firewall has been turned off for Internal Network (ie, Those two Broadcom 5709C NICs sub-net)
g. There is no error at all showing after a clean reboot.
h. Created two Thick volume (each 50GB) on EQL and assigned iqn permission to the above two NICs iSCSI name.
Using HIT kit, we define MPIO to “Least Queue Depth”, even with just one member, we want to increase the iSCSI session to volumes on that member, so we also set Max sessions per volume slice to 4 and Max sessions per entire volume to 12. So right away we see the two NICs/iSCSI initiators connects volume as 8 paths (2 paths for each NICs to a volume x 2 NICs x 2 volumes)
IOMeter Test Results:
2 Workers, 1GB test file on each of the iSCSI volume.
a. 100% Random, 100% WRITE, 4K Size
- Least Queue Depth is working correctly as all Interface is showing different MB/sec.
- IOPS is showing impressive number over 4000.
b. 100% Random, 100% READ, 4K Size
- Least Queue Depth DOESN’T SEEM TO work correctly as all Interface is showing equal/balanced MB/sec. (lOOKS Like Round Robin to me, but the policy has been set to Least Queue Depth)
- IOPS is showing 2000, which is 1/2 of Random’s IPOS 4000, STRANGE!
c. 100% Sequential, 100% WRITE, 64K Size
- Least Queue Depth is working correctly as all Interface is showing different MB/sec.
d. 100% Sequential, 100% READ, 64K Size
- Least Queue Depth DOESN’T SEEM TO work correctly as all Interface is showing equal/balanced MB/sec. (lOOKS Like Round Robin to me, but the policy has been set to Least Queue Depth)
All of the above test (a to d), the 4 EQL Interface reached total of 120MB/s ONLY, somehow it’s FIXED to one NIC on R610 only and MPIO didn’t kick in even I waited for 5 mins, so all the time there is only one NIC participating in the test, I was expecting 250MB/s with 2 NICs as there are 8 iSCSI sessions/path to two volumes.
I even tried to disable the active iSCSI NIC on R610, as expected the other standby NIC kick in immedaitely without dropping any packets, but I just can’t get BOTH NICs to load-balance the total thoughput, I am not happy with 120MB/sec with 2 NICs. I thought Equallogic will load balance iSCSI traffic between connected iSCSI initiator NICS.
SAN HQ reports no retranmit error at all, always below 2.0%, one error found though saying one of the EQL interface is saturated at 99.8% sometimes. (is this due to least queue depth?)
Findings (again 2 Problems Found)
a. PS6000XV MPIO DOESN’T work properly and limited to 1Gbps on 1 Interface ONLY on Server (initiator side)
b. 100% Random Read IOmeter Peformance is 1/2 of 100% Random Write
I read somewhere on Google saying EQL’s limit on each volume is 125MB/s:
“Though the backend can be 4 Gbps (or 20 Gbps on PS6x10 series), each volume on the EqualLogic can only have 1 Gbps capacity. That means, your disk write/read can go no more than 125 MB/s, no matter how much backend capacity you have.”
“It turns out that the issue was related to the switch. When we finally replaced the HP with a new Dell switch we were able to get multi-gigabit speeds as soon as everything was plugged in.”
and I don’t think there is anything wrong with the switch setting as we also connect two other R710 using VMware and we constant seeing 200MB+, so there must be some setting problem on R610.
Could it be:
a. Set MPIO policy back to Round Robin will effectively use the 2nd NIC (path)?
b. Any setting need to be changed on Broadcom NIC’s Advanced setting? Enable RSS and MOde Interrupt again?
Try setting the BroadCom LOMs advanced properties Flow control to TX only. I had a HORRIBLE throughput problem when TX/Rx were both enabled.
(As I am not at my environment at the moment, I will not be able to confirm the rest of my BroadCom advanced settings, but will do so on Monday and post them here)
I am VERY interested to hear what EQL/Dell says about your first thought:
A Single Equallogic Volume IS LIMITED TO 1Gbps bandwidth ONLY at Max? (ie, The volume won’t send/receive more than 125MB/sec even there are MPIO NICs and iSCSI sessions connected to it) Does this apply to a single volume within just one member or it can break the 125MB/sec limit if the volume spans across 2 or more members? (for example 250MB/sec if the volume is spread over 2 members)
Question for you: Your Hosts…..are you running x64 OS by chance? I have a similar set up as you and would like to run some IO Meter tests against mine and see what I get compared to you. Trouble is, I have NOT been able to get IOMeter to work on my 810s running x64 OS. (Always shows negative value error)
6500E – 48 TB – RAID 50
Firmware 5.0.2 and HIT 3.4.2
(2) R810′s, each with (4) LOMs (Broadcom) and each with (8) Intel ports (2 quad add-ons per server)
Using all 4 LOMs on each host for iSCSi (no teaming or BACS)
Using 3 of the remaining 8 on each host for
Hyper-V/CSV/Management roles, the other 5 are teamed up 802.3ad for guest VMs
In my own testing with IOmeter, I have seen over 200MB/s with 2 NICs and over 300MB/s with 3 NICs. Since you have a case open with support, you should have this issue resolved soon.
Check NIC, MPIO and Switch settings, plus vswitch and vnic settings in Hyper-V.
Thanks Phil and Marc for your feedback (Phil, please do let me know the advance setting of your 5709C), I shall try it later tonight (may be should leave it tomorrow morning as all my late night trail always ends in taking my sleep time away)
Btw, there is no HyberV, just simple directly connect my W2K8 R2 64bits box to EQL SAN and do the testing. This time I waited long enough, but the 2nd link never kicks in. HIT, MPIO, NIC and Switch all checked with proper settings.
Btw, I almost encounter a new problem every day recently during the setup period of my project, it’s hard but rewarding.
Btw, regarding VMware vCenter NICs question (sorry Marc, a bit off here),
On my R610 I installed vCenter on top of W2K8 R2, then I want to use two NICs for the VMware Service Console/COS LAN, previous I can use BACS Teaming, but I no longer have the Teaming software installed, so I only runs off with 1 NIC to COS network now. Is there any way to give my vcenter TWO uplinks?
Currently in veeam SCP or vsphere client host part, I type in myVC.domainname.com but it resolves to 1 IP (to 1 NIC), how can I add 1 more redundancy here?
I will get the advanced props for you on the Broadcom for sure.
You had mentioned you ran IOMeter in the VM….a few questions about this:
1. Was the VM hosted on the array or right off of the HD on your W2K8 R2 64bits box?
2. Within the running VM I am assuming you made iSCSI connections to the array before running your test?
Yea, sorry I do not have a whole lot of VMware experience. Hyper-V over here.
Final question, what type of processor(s) in your W2K8 R2 64bits box? I really need to get IOMETER going on my box, but it **may** have an issue with the type of processors I have.
FYI, we use VMware ESX 4.1, pls also see details below.
1. The VM is hosted on ESX host R710, which R710 connects to the array, so VM’s datastore is on EQL PS6000XV via R710.
2. I didn’t even bother to directly connect this VM to SAN subnet as I simply want to test how ESX host performs. I didn’t even install HIT or MPIO on this VM. Btw, VM is using VMXNet3 and Version 7.
3. What I did is real simple, assign this VM with two new disks (disk 2 and disk 3, they live in different volumes on EQL), then use Disk Management under Computer Management to initialize them, but do not format it, use it as RAW. (note, I did not directly map these two disk from EQL, but via ESX datastore)
Then I fireup IOmeter 2006 version within VM, if you look at Task Manager, the Network part of course is 0 usage, but in IOmeter bandwidth and IOPS, it is corresponding exactly what SAN HQ says in real time graphs. (something like 300MB/s during seq, IOPS is 4000-4500 during random)
4. So what do it mean? it means ESX Host is doing the actual Disk I/O job on backend, using esxtop, then n to show network, all vnics connecting to EQL also showing the exact thoughput as SANHQ.
There is 0% retransmit during the intensive IOmeter test.
So it proved there is no mis-configuration problem in the switch and ESX hosts or the array.
But back to that physical world, R610 no MPIO load-balancing problem, only failover works.
R610 is a single E5620 ? 2.4Ghz, HT 4 cores. 12GB Ram. CPU never went over 10% in all tests.
1. Could it be iSCSI is not using Microsoft DSM as default, but instead uses Dell EqualLogic DSM?
Microsoft DSM
=============
No devices controlled by this DSM at this time!
Dell EqualLogic DSM
===============
Finally, EQL support confirmed THERE IS NO SUCH THING A SINGLE VOLUME IS LIMITED TO 1Gbps (or 125MB/sec) THING. Sorry to confuse anyone, but sources from Internet sometimes is NOT reliable.
Thanks for the detail. I now better understand your setup and testing environment.
Hyper-V works in similar ways when provisioning disks to the guest VMs. There are 3 basic ways to do so:
1. Host connects to EQL via iSCSI.
- Disk (s) are formatted on host, brought on-line, given a drive letter, and VHDs (equivalent to VMDX) are created on those disks via Hyper-V Manager
2. Host connects to EQL via iSCSI.
- Disk (s) are NOT FORMATTED, LEFT OFF-LINE
- Hyper-V Manager then allows this “Offline Disk” to be used as a “passthrough” disk to a VM
3. Hosts connect to EQL via iSCSI
- Disk(s) are formatted on hosts and brought online
- Failover Cluster Manager is used to add these disks as “Available Storage” and/or “Cluster Shared Volumes” (Both of which sound similar to ESX Datastore)
-When a CSV is created the disk shows as “RESERVED” within disk management on the hosts (no drive letters etc).
- VHDs are then created in these CSVs
Of course, there is a last way where HIT/MPIO is installed on an already existing VM and the VM makes direct iSCSI connection to the EQL. I do not like this way b/c it would require the VM to participate in the same VLAN as the iSCSI traffic. (assuming the VM itself is not multi-homed)
Regarding your R610 problem:
1. I will have to see my results with IOMETER (if I can get it working) on my hosts system to see if I can duplicate, or get close to your results.
2. I initially had a problem with the TX setting on Broadcom as I mentioned before
3. I also had a problem where the EQL DSM and the MSOFT DSM were both battling for the same LUNS on the EQL. In San HQ you could see the logs where there was continuous disconnect and reconnect to the LUNS (each one was kicking the other out over and over again).
4. The above problem 3, however, does not seem to be what is going on in your case. That is why I wanted to see if I can do EXACTLY what you are doing in relation the placement of IOMETER (in host and guest)
Curiously, with these seeings the VMs actually get a poor Write performance when running IOMETER in the guests.
If I change TCP Connection Offload (IPv4) to Disabled in the Advanced Properties of BroadCom LOMS, then I get better Write performance in the VMs, but not so good Write performance on the HOSTS. The Read remains the same in either case.
I need to do some more experimentation here because something is a bit odd with these results.
For more information about a particular disk, use ‘mpclaim -s -d #’ where # is t
he MPIO disk number.
MPIO Disk System Disk LB Policy DSM Name
——————————————————————————-
MPIO Disk1 Disk 3 LQD Dell EqualLogic DSM
MPIO Disk0 Disk 2 FOO Dell EqualLogic DSM
That’s why, somehow the testing volume Disk 2 has been set with a LB policy as Fail Over Only (FOO), no wonder it’s always using ONE-PATH ALL THE TIME, after I’ve changed it to LQD, everything works like a champ!
Then I just performed an IOMETER test, Wow! the
RealLife-60%Rand-65%Read is crazy high!!! Almost 7200 IPOS
SERVER TYPE: Physical
CPU TYPE / NUMBER: CPU / 1
HOST TYPE: Dell PER610, 12GB RAM; E6520, 2.4 GHz, 4 Cores Total
STORAGE TYPE / DISK NUMBER / RAID LEVEL: Equallogic PS6000XV x 1 (15K), / 14+2 600GB 15K Disks (Seagate Cheetah 15K.7) / RAID10 / 500GB Volume, 1MB Block Size
SAN TYPE / HBAs : Broadcom 5709C NICs with 2 paths only (ie, 2 physical NICs to SAN)
Worker: Using 2 Workers to push PS6000XV to it’s IOPS peak!
##################################################################################
TEST NAME——————-Av. Resp. Time ms——Av. IOs/sek——-Av. MB/sek——
##################################################################################
Max Throughput-100%Read……..14.3121……….6639.48………207.48
Hi Jack. Thanks for your info here too. I am going to run the MPIO claim to see my settings (for validation purposes). As I had mentioned re: poor VM write performance with ToE enabled, I would like to share:
1. I found that TX/RX BOTH need to be set for flow control despite EQL support recommendation: TX only (for Broadcom anyways)
I read your blog regarding testing the redundancy of EQL and switches, did you also encounter the followings?
I wonder if other EQL users could simulate to see if this would also happen in your environment?
Thanks!
Jack
When Power OFF Master PC5448 Switch, we cannot no longer ping to PS6000XV
We are currently testing the final switch and array redundancy now, we have performed every possible fail scenarios (on Switch, Switch Ports, LAN on ESX, ESX hosts etc), they all worked perfectly as excepted, EXCEPT ONE OF THE following situation.
This is how we performed the test:
1. Putty into ESX 4.1 host Service Console, then issue “vmkping 10.0.8.8 -c 3000″ where 10.0.8.8 is our group IP, it can ping it without problem.
2. Turn OFF the master PowerConnect 5448 swtich (where we have two PC5448, master and slave, no STP and all LAG/VLAN etc has been setup properly according to the guide/best practice and we have connected all the redundancy paths correctly between switches and ESX Hosts), then we see in vCenter the ESX 4.1 host, it shows 2 out of 4 ports failed with a red cross in iSCSI VMKernel vSwitch.
3. The “vmkping 10.0.8.8 -c 3000″ stopped working until we Turn On the the master PowerConnect 5448 swtich again.
Please note the following special findings:
a. Even we cannot ping to 10.0.8.8 from the ESX Host during master switch is off, but in EQL group manager, it is still showing the ESX Host CAN STILL ESTABLISH iSCSI connection to it, and all the VM on that ESX Host is working with no problem, and we can still do VMotion between ESX Hosts even with the master switch turned off. So the iSCSI connection is not dead, just cannot be pinged somehow from ESX Host.
b. We also performed ANOTHER SIMILAR test BY turning off individual array iSCSI ports on master switch, we used OpenMange to connect to the master switch, and then TURN OFF the TWO ports connecting to PS6000XV, so to PS6000XV active controller, it shows again 2 out of 4 ports failed with a red cross.
Please note to EQL PS6000XV active controller, they see 2 out of 4 ports failed, yes, BUT we used TWO different method to have the same goal (1st one is to turn off the whole switch, the 2nd one is to turn off the iSCSI ports connecting to the array on switch) In the 2nd case, “vmkping 10.0.8.8 -c 3000″ IS STILL WORKING! How come the 1st situation doesn”t work? So the conclusion is “vmkping 10.0.8.8 -c 3000″ WILL ONLY NOT WORKING when WE TURN OFF the master switch.
Can anyone offer some suggestions please or simulate to see if this could also happen in your environment?
EQL couldn’t find a reason why? I’ve also spent three hours with local Pro-Support expert via WebEX on Tue, but still nothing firm. Will doan intensive test with him again tomorrow.
However, I googled around and find this guy having similar problem as mine. Hope this information can help others who are having the similar problem in identify the problem ASAP.
Oct 5, 2010 3:16 AM
Fix-List from v5.0.2 Firmware:
iSCSI Connections may be redirected to Ethernet ports without valid network links.
Also he’s problem is similar as whatever iscsi connection left in LAG won’t get redirected to slave switch after shutdown the master switch, I got 4 paths, his PS4000 has two, so my iscsi connection survived due to there is an extra path to the slave switch, but somehow vmkping doesn’t work.
and if you look at comment #30 .
Jul 27, 2010
Dell acknowledged that the known issue they reporting in the manual of the EqualLogic Multipathing Extension Module is the same I get.
They didn’t open a ticket at vmware for now, but they will, after some more tests.
I think this issue is there since esx 4.0. In VI3 they used only one vmkernel for swiscsi with redundancy on layer1/2, so there it should not be the case.
My case number for this issue at vmware is 1544311161, the case number at dell is 818688246.
If vmware acknowledge this as a bug in 4.1, and don’t have a workaround, we will go with at least 4 logical paths for each volume and hope that at least one path is still connected after switch1 fails, until they fix it.
Finally, it could also be something related to EQL MEM Plugin for ESX which we have installed. (Comment #29 on page 2)
It indicates there is a know issue that once a network link failed (could be due to shut down the master switch), if the physical NIC with the network failure is the only uplink for the VMKernel port that is used as the default route for the subnet. This affects several types of kernel network traffic, including ICMP pings which the EqualLogic MEM uses to test for connectivity on the SAN.
Jul 23, 2010
from the dell eql MEM-User_Guide:
4 Known Issues and Limitations
The following are known issues for this release.
Failure On One Physical Network Port Can Prevent iSCSI Session Rebalancing
In some cases, a network failure on a single physical NIC can affect kernel traffic on other NICs. This occurs if the physical NIC with the network failure is the only uplink for the VMKernel port that is used as the default route for the subnet. This affects several types of kernel network traffic, including ICMP pings which the EqualLogic MEM uses to test for connectivity on the SAN. The result is that the iSCSI session management functionality in the plugin will fail to rebuild the iSCSI sessions to respond to failures of SAN changes.
Could it be the same problem I have? So they already know about this problem?
Aside this it looks like the Dell MEM makes only sense in setups with more then one array per psgroup, because the PSP selects a path to a interface of the array where the data of the volume is stored. And it have a lot of limitations. We only have one array per group for now, so I think I skip this.
Still dont understand why there is no way to prevent that the connections go through the LAG in the first place, it should be possible to prefer direct connections…
Here is my result from the newly setup EQL PS6000XV, I noticed the harddisk is Seagate Cheetah 15K.7 (6Gbps) even PS6000XV is a 3Gbps array. (I thought they will ship me Seagate Cheetah 15K.6 originally)
I’ve also spent 1/2 day today to conduct the test on different generation servers both local storage, DAS and SAN.
The result is pretty making sense and reasonable if you look deep into it.
That’s is RAID10 > RAID5, SAN > DAS >= Local and EQL PS6000XV Rocks despite warning saying all 4 links being 99.9% saturated during the sequential tests.
Also
Extract from VMWare Unofficial Storage Performance Comparing Equallogic and other SAN Vendors
(http://www.modelcar.hk/?p=2824)
It’s not offical, but after comparing the results, I would still say Equallogic ROCKS!
Finally, I wonder why there are many results from Lefthand, NetApp, 3PAR and HDS?
We’ve recently implemented firmware 5.0.2 on our PS6000′s. We primarily use them for virtual machine storage on our ESX 4.1 cluster. How do you implement thin clones? My previous attempt included creating a 40 gig volume and installing a VM to it. I then converted it to a template. After creating a thin clone from this volume and adding it as an ESX datastore, it appears empty. Am I missing something? Please let me know your thoughts.
PS – The documents you posted regarding implementing multi-pathing were very helpful. Thank you.
I figured this out myself. In case anyone else was curious, I followed the following procedure. I am running ESX 4.1 with 7 hosts and EQL firmware 5.0.2.
1. Create a thick SAN volume 15% too large for your data
2. Mount it to ESX
3. Create a thin provisioned disk on your volume
4. Configure the VM apprpriately appropriately (sysprep, etc)
5. Remove it from inventory
6. Do not delete the datastore from ESX. Take it offline on the SAN and rescan for datastores in ESX.
7. Convert the volume to a Template on the SAN
8. Create a thin clone and add them to esx one at a time. Always assign a new disk signature when adding thin datastores to inventory!
I was able to save ~6 GB per Server 2008 R2 VM. Multiply that by 80 and I’m going to save half a terabyte.
We have two locations, currently VPN IPSEC channel is built for both these locations. In one location, we have two network mapped drives of the servers where all of our employees store their data. In another location , there are some few mapped network drives. I want to use SAN storage for this using Dell equilogicbox. Can we have two Dell Equilogic boxes on both these locations so that data on both the locations are replicated and thus providing high availability and disaster recovery. How to go about this scenario?
My existing 1TB volume is running out and I extend the storage to 2TB. But vmware is not able to see the changes as some claim that vmware has problems supporting 2TB and more. Is this true?
Also, I am not able to reduce the volume using the GUI manager even when I bring it offline. How do I access the EQL’s CLI to do this? Possible to do this without going offline? I’m on firmware v5.
We rely on any QoS services of the underlying network infrastructure and – like Marc said – use the pipe we’re given with the bandwidth & latency “limitation” (or benefits) they provide.
I’m not sure it would make sense for us to engage in a lot of herotics and use our somewhat scarse resources on the array to implement a robust QoS service for replication considering the fact that there are a lot of moving parts & nobs in a network that it’d probably be better to leave to the network infrastructure to “own”…? Feel free to set me straight!
Replication speed and control or throttling of the replication is a common complaint I hear with EqualLogic customers. I have brought this up with numerous engineers at EqualLogic, but have gotten the same response as Mr. Sjolshagen.
Our current implementation is relying heavily on replication, with some replicas reaching into the multi-TB per week. Having the ability to burst the bandwidth on the array side for the larger replicas would be huge for us.
I’m not going to deny that the fact that we obviously hear the request to control replication speeds & amounts on occasion.
Some of these requests basically represent a request to do in the array what the network infrastructure can already do. Using engineering resources to replicate capabilities that already exist in the network infrastructure are pretty difficult to justify from a business & trade-off perspective (Remember, everything our engineers spend time on comes at the expense of something they can’t spend time on instead).
That said, I’m trying to identify elements of your request that _cannot_ be achieved by using the network infrastructure capabilities, so I’ve got a clarification question;
What level of control are you looking for (that isn’t there) related to the actual replication functionality in the array?
Keep in mind that, in my view, there really isn’t much value we can add to the typical network QoS capabilities/policies, so are there elements of the actual replication process (not the transport of replication data, but maybe the configuration of a replica pair, etc) you think we can modify/add to better meet your needs?
Thank you for the follow up Thomas. We are happy to know that there are EqualLogic engineers out there listening.
We are aware of the abilities the network infrastructure can provide in shaping and classifying traffic types. These are especially useful when contention can occur or where WAN resources are limited. These network mechanisms (MPLS, WAN acceleration, QoS, etc.) are effective, but only to a certain degree. For example, consider the following.
It seems that the EqualLogic arrays already do some network shaping when it comes to replication, albeit very limited. When observing SAN HQ graphs and InMon flow information during replication, it appears that each array involved in sending replica traffic utilizes ~125Mbps of bandwidth. At an individual array level this bandwidth does not seem to fluctuate. It appears to be a hard-coded amount. Would it be possible to allow the customer to adjust this? If there is more bandwidth available, why not give the ability to the customer to adjust the amount?
Even if it was tucked away in the CLI we would be more than happy to have it available. We have some arrays that are essentially idle during replicas with regards to IO type traffic. We are confident that if the allocated bandwidth for a replica were to be increased, it would not cause any contention since no real production I/O is running on the source arrays. Additionally, we are replicating to 6500E’s who essentially serve as a dumping ground for remote site replicas. We would prefer to max out their four network interfaces with replica traffic since there is no I/O ever running against these arrays.
Even if we implemented all the network infrastructure mechanisms mentioned in the outset, they would do little to overcome the ~125Mbps we are observing.
There is an additonal issue we are observing with regards to bandwidth utilization during replication. For example, a typical volume’s slices are spanned across three arrays (if available). If that volume is configured for replication a strange condition occurs. When the replica first kicks off, all three source arrays begin transmitting replica data to the destination array(s). But slowly over time, source arrays begin dropping off completely in trasmitting of replica data. Eventually only one array is left to send the remainder of the replica. The replication will start out at ~350Mbps, but then drop off to ~250Mbps and then finally ~125Mbps. Is this because the deltas associated with the replica are being sent only by the arrays that contain those deltas? Wouldn’t it be better to have all the arrays participate in the replication for the entire duration? After sending their own portion or slice of the replica, the data can be backhauled from the remaining source arrays to the now idle source array. That way all source replica arrays are active and the full bandwidth can be utilized. We have one weekly replica that is ~2.5TB. If all the arrays were able to send for the entire duration of the replication, that would be a big payoff for us.
These are just a few of our observations on the bandwidth control with replication. We are big fans of our EqualLogic arrays! Thanks for pumping out the new features, especially with the 5.0 release.
Just to chuck in my 10 cents… EqualLogic ‘appear’ to traffic shape as replication traffic only runs at 125MB/sec, but this is not traffic shaping – EqualLogic only replicate data using 1 interface on the controller.
125MB/sec = 1000Mbps
It’s also worth noting that replication is performed at a lower IO priority, so you can be sure that no matter how much data there is to replicate the SAN will always give priority to busy volumes.
I was trying to install MEM plugin for vsphere 4.1. Trying it with Update manager. When I attempt to import the plugin, i get the following error message:
failed to import data.
Quote
failed to import data.
Cannot verify checksum for imported upgrade file. This might be caused by corrupted metadata or binaries in the file. You can try to import a new copy of the upgrade file.
Unquote
I downloaded couple of times, and unpacked with Winrar and 7-zip. No luck.
From my understanding after speaking with a colleague it seems that the EQ6000 boxes can simplify expansion greatly. But I was wondering if there is a limit to how many EQ boxes you can expand to? It seems that you can simply add more EQ 6000 boxes and get better performance and the added space. I’m I correct in this assumption or incorrect?
Thanks,
Daniel
Howdy, Marc!
Easy one for you – I think. I’m currently running firmware v4.3.5 on our PS4000s and want to get them up to v5.0.4
I gather from the documentation I’ll need to get the HIT kit on the servers (currently at v3.3.2) up to v3.5.1 before starting that process. Do I install this right over the existing version of the HIT, or do I need to uninstall the exising version and reconfigure iSCSI etc after installation of the new one? Other considerations?
I’m also having difficulty figuring out if I can update the firmware directly from where I am to v5.0.4 – There were installation problems reported by Dell with v5.0.0 and v5.0.1 I’d like to avoid, but I don’t see anything explicitly stating I can bypass them on the way to v5.0.4
Is their any advantage to using dell/EQs MPIO vMware ESX plugin over just using Vmwares native MPIO in 4.1 update 1?
If so I just manually re-crated all my vmkernels and vswitchs for jumbo frames to use vmwares. I then read about dells. If i run the script will it remove my work or pick up after the switch creation?
Click on the “Customer Blogs” Tag on my site, there are a couple very good Customer Blogs on the performance improvements using our EqualLogic MEM for MPIO.
Marc – I currently have a PS6010XV (with SAS drives in RAID50) and a PS6010E (with SATA drives in RAID5). I have the XV in one pool and the E in another. I am going to start replicating to a remote site with the same setup. The problem I’m running into is I do not have enough space on the SAS enclosure to configure the fast failback @ 100%. I am using 100% + whatever local reserve is set to for each respective virtual disk to calculate the local reserve space I need. How detrimental would it be to configure the two in a single pool (keeping in mind that they have mutually exclusive RAID types) and designating RAID type at the virtual disk level? I am running VMware and the SAS disks are currently being used exclusively for the logs/databases of some SQL VMs.
Whether you run the arrays in the same of different pool, as long as they are different raid types and you set the volumes to live on a specific raid type, you are fine. I would recommend reviewing snapshots if you are using those and the possibility of reducing snapshot reserves for your volumes to provide additional space for replication failback.
We are looking into using a Equallogic SAN solution in a cross-site Hyper-V cluster. Can you tell me if Equallogic supports CSV in such a scenario (meaining with a replicated CSV)?
We are looking to upgrade our ESXi infrastructure to vSphere 5. The current Multipathing Extension Module on the EqualLogic site is version 1.0.1 and it indicated that it only support vSphere 4.1. Do you know when a version supporting vSphere 5 will be released?
We are going to join our 2 EQL PS300E arrays, put them in 1 group, with RAID50. Dell says performance will really increase, everything will double: capacity, netw. IO, cache, read IOPS and write IOPS. And according to Dell the new 5.1H2 firmware will make it even better and faster. I understand that capacity and Read IOPS will double, but write IOPS? Doesn’t the group have to write the data to 2x more spindles, requiring more time? Or does the cache deal with that? Anyone have experience with this?
Hi Marc,
I am working for a EQL. reseller in the Netherlands and I have a question regarding a Java script problem using Mac OSX to access the EQL via the web interface!
Due to the Java script problem the user cannot use the web interface to control the EQL and more important he is missing the great features of the latest firmware update.
I know it is in the hands of our friends at Apple but maybe there is a solution for it in the US?
Please let me know when it is the case.
Regards, Paul.
With the latest update off java for OS X the problem was solved!
So all users with OS X can use the remote update of the the great firmware for the EQL!
I have a PS5000 running version 4.1.4 and want to upgrade to the latest version which is 5.1.2. I see I have to update to 4.3.8 first before going to 5.1.2. I have the 3 ESX 4.1 servers connected to the array. Can I upgrade without shutting down the ESX servers? I can upgrade after hours when there isn’t much activity on the SAN but I can’t shut everything down.
We have had a new install of Dell R710 x 4 running vmware 4.1.2 and two equallogic PS4100x with raid 50 and a total of 24x300gb hdd (4.8Tb with the Raid 50). We had a Dpack done which shows that we are currently using 3.9Tb on our standalone server farm. We have created the groups and then created two volumes which we want to replicate to another site with the same setup i.e. equallogic PS4100x with raid 50 and 4.8tb. When we go to create the replication set it tells us that we don’t have enough space for replciation if we set the reserve to 100%. We have been advised that we should drop this down to 40%. Can you tell me with the limited information whether this is the correct way to set the reserve? I would of thought tf we had a 2 volumes each of 1Tb and we wanted to replicate it we would need to reserve 100% for each?
I hope this makes sense sorry if not, but I am a newbie to the Equallogic storage and just wanted to make sure that the supplier isn’t trying to sell us an under sized storage solution?
Thanks.
I have done this a number of times with Equallogic and you don’t necessarily need 100% reserve, but I like to set it at that if I can. The real answer unfortunately is “it depends.” If you have a speedy connection and the replication doesn’t take very long you don’t need a large reserve, but if your connection is slow and/or the is a lot of change on the volume during replication you will need a higher reserve. So to reiterate, the volume reserve needed depends on rate of change during replication and how long it takes to replicate the volume. As a best practice I always set it to 100% so that even if the entire volume changes during replication the replication still goes through. To experiment you can drop down the reserve then run your replications and if it fails then you need to bring it up. If not you should be good to go. Don’t take my opinion as gospel but hopefully that helps you out.
I have a question about storage vMotions with Equallogics and vSphere5. Our ESXi hosts are connected to the Equallogic groups through iSCSI, and they have the vSphere HITKIT and MEM installed. I transferred a 30GB VM from one datatstore to another and it took almost 30 minutes. How can this be sped up? It appears from looking at the iSCSI usage graphs on the ESXi host that very minimal traffic is being used during the transfer, indicating that the Equallogics are handling the transfer and it is not going through the ESXi host. There is some traffic, but just enough to indicate that it is checking that the data is being transferred properly, but not actually performing the transfer itself. Do you have any insights into this?
My company just purchased a Dell PS6100 with two PowerConnect 6224′s to create our first SAN as we were in need of a storage solution. I am tasked with putting it all together and trying to follow best practice but lots of information second guesses itself. The SAN network will be isolated, with the exception of the port for management, and the two 6224′s are going to be stacked. In terms of setting up the ports on the stack, there is conflicting information regarding Portfast and Spanning Tree Protocol. Some documentation says to enable portfast and some says not to. The eight ports from the PS6100 will be split between both 6224′s. Also, each server has 2 ports that will go out to each of the 6224′s as well. When configuring the switch, what is the optimal configuration that I should be working with?
We have a PS4000vx and we are running Vmware 4.1 with 10-15 different servers one of them which is an SQL2008 server. My question is we are running RAID 10 right now but are wanting to change a few things up for extra space. I was thinking about converting it to RAID 50 but am unsure of the performance loss from RAID 10. But then I saw that you can change the RAID based on the volume.
So is it possible to change the entire storage group to RAID 50 giving us more space. But then creating a volume and making it RAID 10 for our more IO intensive servers? Would that give us the space we need and the performance we want?
A Single Array can only be configure with a single raid type. In this case your PS4000 can either be configured with R10 or R50, not both.
Hopefully you are running the latest version of SANHQ and EqualLogic Firmware. With SANHQ you can review the RAID Simulator which will show what the performance would be of your current array if you converted from R10 to R50. You can convert your single array from R10 to R50 without any downtime. The conversion will take a while and run in the background. You cannot go from R50 to R10 so please remember this if you convert from R10 to R50.
I need to increase the hard drives on an EqualLogic PS4000, because one of the boxes is out of drive space. It has roughly 3% of free space now. My customer has a second PS4000 set in a Colo that is replicating snapshots and critical data and has roughly 65% of free space.
What is the best process for migrating the data off the box, increasing the hard drive sizes and restoring the volumes?
The drives are 250GB each and all 16 are being used in a RAID 50 configuration.
We bought two Equallogic 300E units from an auction for a few thousand dollars. In our lab, we found out that all four controllers don’t have the firmware flash cards in them. As a result, we could not boot the units, the fan LEDs are red and the controllers do not engage (ACT LED not green).
We had access to working controllers from other 300E units and we did the following tests:
1. Took controllers from existing working units and put them in the just-bought units. Worked! This confirms that both just-bought units are fine except that they don’t have the firmware.
2. Took the compact flash cards from existing working units and put them in the controllers of the just-bought units. Does not work! This confirms that the firmware versions are different.
I would like to know how I can get the proper firmware for these controllers. Do you know anybody that can fix them for a reasonable price.
We contacted Dell and it would cost around 3K to put each unit under warranty. We can afford this price tag and we intended to use them for testing.
I have a question on consolidation of servers per a particular volume. Is there an approx rule of thumb to use when using an EQL box? I have a ps5000x and a ps6010xv. The 10GbE unit houses my application servers while the 1GbE unit house low overhead servers (dc, file server, small apps).
I was using my old world of thinking and never going over 8 servers in any particular lun. Now that luns are gone and everything is a volume, is there any real benefit at all to breaking up load across multiple volumes?
I am going to run some odometer tests that you have linked here as I have never bench marked my systems.
Environment:
(3) Dell R810s quad proc 6 core
256GB Ram
Dual intel 10GbE network adapters for vm network, guests, vmotion
Dual qlogic 8242 10GbE HBA for ISCSI
Dell 8024f 10GbE switch
VMware 5.0
Per production dell MEM installed.
I have a question on Security with VSS/VDS (or the lack thereof).
I have a Windows 2008 R2 server with the HIT toolkit installed (which included the VDS component) and because I want SQL to have consistent snapshots I have configured a VDS/VSS record with Chap authentication within the group manager.
However what I found out was that if you install the SAN Manager component of Windows 2008 R2, that server (and windows administrator) now has 100% control of the entire SAN Volumes..up to and including creating and deleting volumes at will.
I opened a support case and their comment was “just don’t install the VDS component of the HIT kit”
I cannot accept that in order to allow an application to have VSS capabilities I have to drop my pants and pray that the Windows administrator doesn’t get ahold of the HIT toolkit and then blow away my SAN.
Is this really a correct implementation or have I missed something basic because in my Mind, VDS and VSS should be 2 completely seperate access lists…not a single access list that gives the junior admin of a company that much potential control..
With VMware and Equallogic, we are implementing a patch from VMware that caused Volumes to go read-only on one of our two paths in a fixed mode configuration.
Is there any reason not to implement round robin with Equallogic Storage behind VMware?
MPIO in VMware vSphere 4.0 with Round Robin will enable redundancy and performance to your Datastores. This is built-in functionality with VMware vSphere 4.0. Stay tuned in the next couple of months for improvements with EqualLogic and MPIO specifically with VMware vSphere 4.0.
Marc, any news on the MPIO Module…I am about to reconfigure 2 ps6000′s with vsphere and would like to use the mpio module from the get go. Is it likely to be avail sooner rahter than later? I appeciate you might not be able to give dates but in your opinion is it imminent? Thx in advance.
As soon as vSphere 4.1 Update is released, our MPIO Module for vSphere should be following shortly after.
When I install 2008 on my physical server and connect it all up to my PS4000Es I can get up to 220Mbps throughput as recorded on IOMeter and confirmed by SAN HQ.
As soon as I use a virtual machine in the same environment the throughput drops to only 100Mbps, even if I use SQLIO instead, and this is after doing all the little MTU tweaks required by the Dell technical notes etc.
Do you have any idea why this might be the case?
Steve
Verify that everything is setup correctly from Tech Report 1049 and that Round Robin is setup inside vSphere. Once you create an EqualLogic Volume, mount as a Datastore, go into Group Manager and confirm that there are multiple iSCSI Connections to the Volume that match up with the IP Addresses/VMNics in vSphere.
Thanks Marc, I have verified the setup, the LUNs are actually being accessed directly from the Guest OS (Windows Server 2008 R2) and the Dell HIT things have been installed and configured too.
Have also tried as RDM and still experience only 100mbps.
There is a thread I’ve seen on guest iSCSI connections in vSPhere with EqualLogic. Check it out, it may help. http://www.delltechcenter.com/thread/4022759/Equallogic+and+ESX+4.0+Volumes
You might not have your pathing and port groups properly configured, and it may be only using one NIC at a time, which would explain the speed difference.
Hi Marc
What are your thoughts on hot-updating (i.e. hosts active) Equallogic firmware. I have 4 -ESX 4.0 hosts and want to update our firmware to version 5, but was wondering if I could get away with doing to update without taking down all my servers.
Thanks
Depends what current version you are on. If you are on 4.2 or 4.3, there is little service interruption. We have EqualLogic Users upgrading their Firmware during production hours without any downtime.
I did this last night – updated from 4.3.5 to 5.0.1.
As the upgrade process reboots the SAN, it does cause the I/O to pause. As this is less than 60 seconds, it doesn’t cause Windows / ESX(i) any trouble.
So yes, it is possible to upgrade a SAN without shutting down all your servers, however I would recommend that is is done out of hours during a time where there is little disk I/O – typically after the users have gone home, but before the backups kick in!
Using Equallogic in a Hyper-V R2 cluster, and is enjoying that. But the performance seems to be a bit under what I expected. Do you have any settings that are recommended on Windows 2008 R2 servers to get the performance as it should? Are using Jumbo Frames today
Do you have the HIT installed? Is MPIO setup correctly? Jumbo Frames, Flow Control, etc should be setup on your NICs and the Switch connecting the EqualLogic Array. Are you using CSVs?
Thanks for quick reply
Using HIT 3.4 and firmware 5.0 on the EQ. Using Least Queue depth for MPIO. Using the built-in Broadcoms on R710 for iSCSI traffic, and Dell 6224 switches with trunking. Flow control and Jumbo frames is enabled on the swith. And jumbo frames is also configured on the NICs. Are using CSV yes
Flow control on the NIC is set to Auto
What SAN / RAID level are you using?
Got one PS6000XV with RAID50, and a PS6000E with RAID6
To test performance, you can create a small Volume on EqualLogic, 10GB as an example. Mount that to your Hyper-V Server, don’t format it. Then use Iometer (http://www.iometer.org/) to test the performance/IOPs/MBs. This will ensure that the Server/NICs/Switch/EqualLogic are setup correctly. You should stay away from RAID 6 and use RAID50 instead.
What kind of figures should I see to be sure that our system is configured as it should?
This should help:
http://tinyurl.com/33njh5c
Regarding MB/s and Iometer testing. A single Gig Ethernet NIC should provide around 110MB/s to 120MB/s. If you have MPIO enabled and have 2 NICs in your server, then you should see approx 210MB/s to 225MB/s when testing with Iometer.
If you have VMs and/or other random IO workloads on the PS6000E, performance will suck with RAID 6.
Will have to get the VM’s moved to the PS6000XV and reconfigure the PS6000E it sound like
Personally I only deploy SATA SANs in RAID 10 these days – anything else and the performance tradeoff is too great IMHO.
Is it an 8 disk array perchance?
It’s 16 x 1TB disks in the PS6000E
Will RAID 50 be sufficient for normal VM’s on the S6000E, and then have SQL and Exchange and the heavy VM’s on the PS6000XV with RAID 50 as well?
Yeah the 16 SATA drive equipped chassis rocks in RAID10.
– I have a few of those deployments out in the wild…
The reason I asked is becasue it wouldn’t have been the first time I’d come across a RAID6 setup with 8 disks and a customer asking why performance isn’t as it could be.
What you should have done is profiled the servers before putting them on the SAN, then you’d have an idea of what is required. Bit too late for that i guess!
RAID 50 is pretty good for read performance, but starts to suffer with random writes. If you can spare the storage I’d still recommend RAID 10.
RAID 50 will be a major improvement over what you have.
Do you have SAN IQ installed? Is it the SAN that’s causing the bottle necks?
Have you performed the IOmeter tests as recommended by us above? What are the results?
That’s SAN HQ, not IQ. Doh!
Much I should have done beforehand I have found out
Have had to learn from ground up with the EQ. But getting there. Got the SAN HQ installed yes. Will get time a bit later today to do the IOmeter tests. Will post the results here as soon as I got them.
When doing backup with DPM, I get about 80-85 MB/s with the XV, and about 50-55 MB/s with the E
Lots of factors with DPM – I’d like to see Iometer stats.
Do you have SAN HQ installed?
SAN HQ is installed yes
Followed the PDF you sent about IOmeter, and here are the numbers from the two tests:
PS6000E
32K 0% read/0%random
I/O 4600-5300
140-165 MB/s
32K 100% read/0%random
I/O 3000
92-94 MB/s
PS6000XV
32K 0% read/0%random
I/O 5200-5400
165-172 MB/s
32K 100% read/0%random
I/O 5900-6600
198-202 MB/s
How’s the 75% read 25% write set to 100% random fair up?
I’m interested in average latency as well as raw bandwidth.
On the PS6000E
Average latency is about 12 ms for the 32K 0%read/0%random, and 9,5-10 on 32K 100%read/0%random
75%read/25%write/100%random
I/O 7200-8400
230-260 MB
8-8,5 ms
SAN performance figures look pretty good actually!
How many VMs are you running and what kind of workloads?
How is this compared to RAID10/50 on a box like this?
Are running about 30 VM’s ranging from simple server to SQL2008 for Axapata and RD farms
Ran the 75%read/25%write/100%random test again, came out with some other figures this time:
I/O 4800-5100
156-169 MB
12-12,8 ms
Marc, if i have a 4,1 esx host, with 4 nics dedicated for iscsi, connected via 2 switches to a ps6000 should i see 2 or 4 paths? I saw 4 when using the vmware native driver but only 2 after the mpio plugin is used.
This depends on the mapping between VMKernel Ports and Physical NICs/Uplinks. If you have multiple VMKernel Ports mapped to a Physical NIC, our EqualLogic MEM will only use a single VMKernel Port. How are the vSwithes/VMKernel Ports/Physical NICs configured?
marc, thanks for the response. I have 4 nics on a DVS with 1 kernal port per Nic…so i thought i’d get 4 paths. All hosts are the same. I have 7 volumes created at the moment with 4.1 using fixed routes i get on my iScsi hba the following: Targets:7 Devices:7 Paths:28. I also get 4 paths when i click on a volume and select manager paths. As soon as i install the plugin i get 14:7:14 and under manage paths i get 2.
Did you select Dell_PSP_EQL_Routed for the Path Selection? I posted a Blog from an EqualLogic Customer that went through this, http://marcmalotke.net/2010/07/26/customer-blog-equallogic-mem-multipathing-plugin-install-and-configuration-part-1/
After installing the mpio module and subsequent reboot the Dell_PSP_EQL_Routed option is automatically set. I didn’t need to set it up. I followed the process in that link. The only thing i didn’t do was delete and re-setup my Nics/vmks/swtiches. I kept the previous setup and just installed the module. Thanks again.
I would suggest that you contact EqualLogic Support for assistance moving forward, they are the super smart people, https://www.equallogic.com/secure/login.aspx?ReturnUrl=/default.aspx
I think you’ll find mileage in deleting your iSCSI vSwitch and use the setup script supplied with MEM to setup and configure your vSwitch using Remote CLI.
They do recommend just starting over with the vSwitch in the documentation. I second that recommendation.
Sorted it. By default you only get 2 paths with the EQL plugin. Changed the config file and after a reboot we’re in business with all 4 paths. I should have read the 24 page manual slower as it explains it there. Thx for you pointers and roll on the 5.0.2 firmware.
No Probs. Got an EQL old boy(b4 dell) with me in 12 days for a health check and sign off on our project so will get him to have a look it’ll be easier once he has it all in front of him. Thanks for your help and the great work you do on this site.
Hi Marc,
Is it possible to replicate snapshots?
Thanks,
Dan
Snapshots and Replication use the same “Scheduler”. Snapshots are local data protection copies within the same Group/Member. Replicated Volumes are remote data protection copies of that Volume in a different Group/Member. Today we do not replicate Snapshots that you have scheduled to happen. You would use the same “Scheduler” for Snapshots and setup Replication from one Group to another Group. The Replicated Volume in the other Group is a sum of all Snapshots, you can set Volume Replication up so you have multiple instances of the same Replicated Volume. Let me know if I can answer any additional questions or dive deeper.
Hi Marc,
I have a new array with 4.3.5 and an existing array with 4.1.4. Will these coexist for a short period (2-3 weeks) to migrate the data to the new array? Thanks.
Happy Monday! I recommend calling EqualLogic Support and confirming with those super smart people first to make sure you can migrate the data during those couple of weeks.
I see a lot of folks here alreadying using firmware 5.0.x
I downloaded it, but got a big DONT USE IT e-mail a week later from Dell.
Today I checked on the EQL downloads site, and 4.3.7 is listed as latest, and 5.0 is relegated to the legacy firmware section …!
What gives?
PS … Is there any sort of cohesive online community/mailing list for Equallogic users? Other than here of course ;P
Happy Friday! Regarding FW 5, stay tuned for an update comment/reply from a Dell EqualLogic Resource on my Blog soon.
As for an Online Community we have the TechCenter EqualLogic Community that you can leverage.
Let me know if you need additional details.
Marc, it would help if someone from EQL could at least let us end users have a better idea of where we are on the firmware update. I was expecting it (as per the email that was sent out) around the beginning of this week and still no news or sign of yet. I stupidly scheduled some time this weekend to evaulate it. Is there anywhere we can go for an update on the status of 5.0.2? Sorry for the tone but I’m sure you can understand. Many Thanks
Happy Friday! I understand and no offense on tone whatsoever.
Regarding FW 5, stay tuned for an update comment/reply from a Dell EqualLogic Resource on my Blog soon.
Hi Darren (and Marc),
I’m employed by Dell as a senior technical consultant in the technical marketing group for the EQL engineering group.
As you noted, the date where we’d originally told our customers we expected to have something for you (August 30th) has come and gone. Our engineers are still working on getting v5.0.2 released, but we promise to keep you in the loop with any relevant updates until it does ship.
We work very hard to maintain the highest possible level of quality and truly appreciate your patience as you wait for us on this. Please let me/us know if you have any questions!
// Thomas
Thomas, Thanks for the reply. I totally agree in that I would rather it was a few days late than not ready. That said as an Enterprise customer it would really help if we had an official update from you guys, even an updated eta (without putting yourselves under to much pressure) or statement on the eql website so at least we can make some plans.
Thanks again,
Darren
I received my PS600XV box today and Dell loaded FW 5.0 on top, what shall I do?
Please contact EqualLogic Support at 1-800-945-3355 and review this http://marcmalotke.net/2009/11/02/equallogic-new-customer-checklist/
Thanks Marc, I am contacting Pro-Support in Hong Kong, but I still don’t really understand why our replacement box came with FW5.0 (btw, the manufacturing date is Aug 18, 2010 on the sticker) while Dell strongly oppose loading FW5.0, it’s quite a contradiction to what they have recommended.
We called Dell Pro-Support, they are sending engineer to Downgrade our FW to 4.3.7, but I thought it’s not possible to downgrade EQL FW? Strange.
Dell US EQL support is so great, one of their Tech Consultant called us 10 mins after we opened a case and suggest a simple method that we simply swap the compact flash card from the old unit to the FW5.0.0 one, case solved.
Hi Marc, thanks for your very usefull blog !
I need your advice about a little design question with two ps6000XV
To have the best SAN performance with load balancing in a storage pool, Is it better to have two nodes with same RAID5 level or one with RAID 10 and one with RAID 5 ?
Thanks for your help
Unfortunately I do not have the time to answer this question right now, I will reply later. I wanted to approve your comment now so those that do review my site may reply to your question.
Hi Ner,
The answer really comes down to your IO profile and what you need out of the setup.
RAID5 is typically not a good performer, so I would never recommend this as a RAID level for non-sequential IO workloads.
From a performance perspective, you can’t get faster resilience than RAID10, so if you can afford the capacity hit having both SANs at RAID10 then this would be best.
If capacity is more important then configuring both SANs with RAID50 may be suitable, but expect around a 30% reduction in overall IOP capacity.
To answer your question, the best of both worlds could be achieved by mixing RAID10 and RAID50 SANs in the same group. Over time IO intensive volumes would be moved to the RAID10 SAN when the SAN logic determines there would be a performance benefit in doing so.
There are some gotchas over EqualLogic’s volume distribution logic in SAN groups that have different RAID levels as the calculation only takes the RAID level into consideration, not spindle speeds (it would favour a RAID10 SATA SAN over a RAID50 SSD SAN from a performance perspective!) but as both of your SANs are 15k disks, it should distribute volumes as described.
Hope this helps!
Graham
Ner,
I would post this question on the Dell Techcenter website. They’ve got dedicated storage people there.
http://www.delltechcenter.com/
Hi all
Many thanks for your answers !
I’m asking this because, with 2 arrays in RAID 5, the IO intensive volume can be striped across the 2 arrays, isnt it ? and with one array in RAID10 and one in RAID5, the volume can move to raid10, but will be striped on only one array…
So the question is finally : which is better ? raid10 striped on ONE array or raid 5 striped on TWO arrays ?
(Graham you are right, 50 is better than 5, but same question between 10 and 50)
Thanks for your help
Write performance of one RAID10 SAN would be better than 2 RAID5 SANs IMHO.
Please don’t use RAID5
Hi Graham,
I just came across this PDF last week done by Veritest back in 2006, it’s related to R10 vs R50, it’s not directly related, but somehow explained the difference when adding more members to the group.
http://www.lionbridge.com/competitive_analysis/reports/equallogic/EqualLogic_PS_Series_Test_Final_Report.pdf
Veritest report in 2006 was on 15K PS3000, the real life results shows about 36% for 1 box (R10 vs R50) and it gradually dropped to 22% with 5 boxes. Aren’t with more boxes, the R10 vs R50 difference will get even wider %? (ie, R10 should be way higher than R50), seemed with more boxes, say up to 12 boxes in a storage pool, then R10 vs R50 may only have less than 5-10% difference, strange, any idea?
Jack
Hi Jack,
I have some problems using a four year old document as a basis for a modern day comparison as I have no doubt that the ongoing hardware and software development EqualLogic have made over this period have improved things – interesting reading though and I do have some comments to make:
No mention of network configuration settings – were they using jumbo frames or flow control? These would make big differences to performance figures, especially as you scale up.
The report doesn’t seem to discuss the way that volume distribution works with EqualLogic groups and again this may affect performance figures. Let me explain:
Regardless of how many SANs you have in a group, a volume is not spread across more than (i think) 3 SANs. Internal performance monitoring moves volumes over time to optimise this placement.
Because you can’t control this placement, did they check volume placement was distributed evenly across the SANs? It is possible that volume placement made some SANs in the group busier than others during the brief testing period, which would make a difference to performance testing.
Further statements made in the report that add ammunition to my above theory is that they added additional members to the group instead of creating the whole group at once. This could mean that if they had already created their volumes on 1 SAN, then added an addtional 3 SANs to the group they may have not given the SAN logic time to distribute volumes fully before they performed their testing – this would also cause lop-sided results.
So as you can see, there’s a lot of applicable complexity to the test setup that they don’t adequately explain and without these answers I would question the accuracy of the report.
I still say RAID10 is faster than a 2 SAN RAID5 group.
Graham
Hi Graham,
1. I read somewhere saying PS5000XV is actually the re-branded PS3800XV after Dell acquired EQL, then it came with PS6000XV, and the cache has been doubled (2GB to 4GB in total), as well as there is one more Gbit link in PS6000XV (4 vs 3), also the RISC processors have been increased from dual-core to quad-core.
2. Although I think it’s a biased report (so explained some of your questions) as it indicated as “Test report prepared under contract from EQL right beneath the title”, still I think it gives us a general idea of how different raid group performed (R10 vs R50), I don’t think it’s really that outdated.
3. FYI, test 6 is actually performed on five-members starts from the beginning instead of gradually adding members, but the performance is only a bit behind adding up members gradually.
4. BTW, you are correct a volume (aka lun) won’t spread more than 3 members, I was told the same thing before we purchased the box and we were told even with RAID5, sometimes you can stand with more than 2 disks failure because there are multiple Raid5 within the Raid5, I am not sure if you know what I mean, basically EQL’s raid standard is more advanced and redundant than the normal raid we used in Poweredge server.
5. Actually reading the Key Findings give you many hints already
6. Yes, I second your preference towards R10, we used it on our PS6000XV as well as we can always expand to R50 later.
7. One thing still confused me is
Veritest report in 2006 was on 15K PS3000, the real life results shows about 36% for 1 box (R10 vs R50) and it gradually dropped to 22% with 5 boxes. Aren’t with more boxes, the R10 vs R50 difference will get even wider %? (ie, R10 should be way higher than R50), seemed with more boxes, say up to 12 boxes in a storage pool, then R10 vs R50 may only have less than 5-10% difference, strange, any idea? How come and why?
Hi Everyone,
I’ve spent almost 4 hours on-phone from mid-night to 4am in the morning trouble shooting with Dell Equallogic Consultants in US via WebEx today.
As we found the EQL I/O testing performance is low, only 1 path activated under 2 paths MPIO and disk latency is particular high during write for the newly configured array.
It was finally solved because we forgot the most fundamental concept after all that is Equallogic takes time to kick in the additional paths under MPIO!!! You need to wait say at least 5 mins to see the rest paths kick in.
Anyway, must say EQL’s support are excellent, they gave me many insights to solve this problem from multple dimension.
Detail please see my blog:
Equallogic takes time to kick in the additional paths under Windows MPIO (http://www.modelcar.hk/?)p=2746
Jack, thank you very much for taking the time and posting more details on your Blog. I will send updates out on Twitter and Facebook with a link to your Blog.
We are migrating data to a new array. We have created a new pool and will move volumes to the new pool one at a time. what is the fastest way to move the data? We have a short maintenance window and need to move 15 TB across a 1 GB connection. We are trying to avoid running all the host servers across the 1GB connection and will be moving the servers to whare the new array is while the data is migrating.
Thanks
We are purchasing a new PS600X array. If we decide to gain additional SAN space by not using hot spares and instead keeping a spare (or 2) on the shelf, is it strongly discouraged? Is the slight gain in risk worth the extra space/spindles? Is anyone doing this now?
I know the question may be difficult to answer since the question depends on the specific environment and importance of the data. But with a next day service plan, is it worth to consider?
Hi all, wondering if someone out there can help out and/or shed some light on a problem I am having with a new 6500E in my lab. Stating to build it up for production.
Here’s the scoop:
FW: 4.3.7
HIT 3.4.1
Host: PowerEdge R810 (2008 R2 x64 Hyper-V)
Switches: Cisco 3750-G stacked and configured as per Dell tech docs in relation to spanning-tree, unicast storm control, etc etc. (BTW, if any one has tried, or is trying, you CANNOT set jumbo-frames per VLAN as the Dell documentation states someplace. It is a system wide change on these switches).
VLAN 20 is dedicated for iSCSI, Native VLAN is for production network.
A few other VLANs exist on these switches for other purposes (primarily Live Migration, CSV/Heartbeat, and VOICE!)
MPIO is working correctly with Jumbo Frames from the R810 LOMs (Broadcom). Moving data back and forth from the R810 host to a volume on the SAN seems good. (I will post IOmeter stats here shortly).
So, I create a new VM (2008 Server) on an EQL volume and fire it up:
Problem/issues:
1. On the desktop of the newly created VM, I simply grab a 2GB file and make a copy of it within the VM. This copying process takes > 10 mins to complete.
In this trial, I am assuming the file is simply being duplicated on/within the array?
This same operation is almost instantaneous on the physical host.
2. Create another VM and try copying the same 2GB file b/t VMs. In this case when I copy/pull FROM a VM to a VM it seems good, but if I copy/push TO a VM from a VM, the same bottleneck as #1 seems to occur.
During both tests, on the physical host, I see the LOMs jump to 15-20% for a few seconds, then drop to 0%. Jump to 15-20%, drop to zero. This pattern repeats. I do not see anything retransmits or errors on the NICs of the EQL to speak of. Do not see any retransmits/crc errors etc on the interfaces from the switch perspective either
Any ideas?
Phil,
Are you using any Broadcom Teaming (BACS3) on LOMs? If yes, please disable it, we had similar problem with the teaming.
Also Broadcom Teaming (BACS3) causes a lot of problem on MS Hyper-V, see google for it.
If you are not using those, then I would suggest you to open (log) a case with EQL, their support is really good.
Hi chiat, I do have that software installed, but I am not teaming those LOMS. I will peel it off and see what happens. Will advise. Also, IOMETER will not run on my x64 boxes (due to processor timer issue)…ughs…
No go on removing bacs….calling dell/eql..
Marc, Any news on the v5.02 firmware update?
Thanks
Stay tuned, you will hear back from me on this soon.
I’ve already opened a case with EQL, but just in case someone has experienced this before.
Thanks in advance.
PS6000XV MPIO and Disk Read Performance Problems
A Quick question before even going into the following:
A Single Equallogic Volume IS LIMITED TO 1Gbps bandwidth ONLY at Max? (ie, The volume won’t send/receive more than 125MB/sec even there are MPIO NICs and iSCSI sessions connected to it) Does this apply to a single volume within just one member or it can break the 125MB/sec limit if the volume spans across 2 or more members? (for example 250MB/sec if the volume is spread over 2 members)
Summary (2 Problems Found)
a. PS6000XV MPIO DOESN’T work properly and limited to 1Gbps on 1 Interface ONLY on Server (initiator side)
b. 100% Random Read IOmeter Peformance is 1/2 of 100% Random Write
Environment:
a. iSCSI Target Equallogic: PS6000XV 1 array member only, loaded with Latest Firmware 5.0.2 and HIT 3.4.2, configured as RAID10. (16 600GB SAS 15K Disks), HIT Kit and MPIO is installed properly, in MPIO, MSFT2005iSCSI BusType_0x9 is showing besides EQL DSM.
b. iSCSI Initiator Server: Poweredge R610 with latest firmware (BIOS, H700 Raid, Broadcom 5709C Quad, etc)
c. iSCSI Initiators: Using two Broadcom 5709C (one from LOM, one from add-on 5709C Quadcard), using Microsoft
software iSCSI Initiator (not Broadcom hardware iSCSI Initiator mode that is), No Teaming (I didn’t even install Broadcom’s teaming software as I want to make sure the teaming driver doesn’t load into Windows), I’ve also disabled all Offload features, as well as disable RSS and Mode Interruption, I have enabled Flow Control to “TX & RX”, as well as set Jumbo Frame MTU to 9000 (log in EQL group manager event that the initiator is indeed connecting using Jumbo Frame), each NIC has a different IP in the same sub-net as the EQL group IP.
d. Switches: Redundant PowerConnect 5448, setup according to the best practice guide, Enabled Flow Control, Jumbo Frame, STP with Fastports, LAG, Seperate VLAN for iSCSI, disabled iSCSI Optimization and tested redundancy is working fine by unplug different ports and switch off 1 of the switch, etc.
e. IOMeter Version: 2006.07.27
f. Windows Firewall has been turned off for Internal Network (ie, Those two Broadcom 5709C NICs sub-net)
g. There is no error at all showing after a clean reboot.
h. Created two Thick volume (each 50GB) on EQL and assigned iqn permission to the above two NICs iSCSI name.
Using HIT kit, we define MPIO to “Least Queue Depth”, even with just one member, we want to increase the iSCSI session to volumes on that member, so we also set Max sessions per volume slice to 4 and Max sessions per entire volume to 12. So right away we see the two NICs/iSCSI initiators connects volume as 8 paths (2 paths for each NICs to a volume x 2 NICs x 2 volumes)
IOMeter Test Results:
2 Workers, 1GB test file on each of the iSCSI volume.
a. 100% Random, 100% WRITE, 4K Size
- Least Queue Depth is working correctly as all Interface is showing different MB/sec.
- IOPS is showing impressive number over 4000.
b. 100% Random, 100% READ, 4K Size
- Least Queue Depth DOESN’T SEEM TO work correctly as all Interface is showing equal/balanced MB/sec. (lOOKS Like Round Robin to me, but the policy has been set to Least Queue Depth)
- IOPS is showing 2000, which is 1/2 of Random’s IPOS 4000, STRANGE!
c. 100% Sequential, 100% WRITE, 64K Size
- Least Queue Depth is working correctly as all Interface is showing different MB/sec.
d. 100% Sequential, 100% READ, 64K Size
- Least Queue Depth DOESN’T SEEM TO work correctly as all Interface is showing equal/balanced MB/sec. (lOOKS Like Round Robin to me, but the policy has been set to Least Queue Depth)
All of the above test (a to d), the 4 EQL Interface reached total of 120MB/s ONLY, somehow it’s FIXED to one NIC on R610 only and MPIO didn’t kick in even I waited for 5 mins, so all the time there is only one NIC participating in the test, I was expecting 250MB/s with 2 NICs as there are 8 iSCSI sessions/path to two volumes.
I even tried to disable the active iSCSI NIC on R610, as expected the other standby NIC kick in immedaitely without dropping any packets, but I just can’t get BOTH NICs to load-balance the total thoughput, I am not happy with 120MB/sec with 2 NICs. I thought Equallogic will load balance iSCSI traffic between connected iSCSI initiator NICS.
SAN HQ reports no retranmit error at all, always below 2.0%, one error found though saying one of the EQL interface is saturated at 99.8% sometimes. (is this due to least queue depth?)
Findings (again 2 Problems Found)
a. PS6000XV MPIO DOESN’T work properly and limited to 1Gbps on 1 Interface ONLY on Server (initiator side)
b. 100% Random Read IOmeter Peformance is 1/2 of 100% Random Write
I read somewhere on Google saying EQL’s limit on each volume is 125MB/s:
“Though the backend can be 4 Gbps (or 20 Gbps on PS6x10 series), each volume on the EqualLogic can only have 1 Gbps capacity. That means, your disk write/read can go no more than 125 MB/s, no matter how much backend capacity you have.”
“It turns out that the issue was related to the switch. When we finally replaced the HP with a new Dell switch we were able to get multi-gigabit speeds as soon as everything was plugged in.”
and I don’t think there is anything wrong with the switch setting as we also connect two other R710 using VMware and we constant seeing 200MB+, so there must be some setting problem on R610.
Could it be:
a. Set MPIO policy back to Round Robin will effectively use the 2nd NIC (path)?
b. Any setting need to be changed on Broadcom NIC’s Advanced setting? Enable RSS and MOde Interrupt again?
Anyone? Please kindly advise, Thanks!
Jack
Hi Jack,
Try setting the BroadCom LOMs advanced properties Flow control to TX only. I had a HORRIBLE throughput problem when TX/Rx were both enabled.
(As I am not at my environment at the moment, I will not be able to confirm the rest of my BroadCom advanced settings, but will do so on Monday and post them here)
I am VERY interested to hear what EQL/Dell says about your first thought:
A Single Equallogic Volume IS LIMITED TO 1Gbps bandwidth ONLY at Max? (ie, The volume won’t send/receive more than 125MB/sec even there are MPIO NICs and iSCSI sessions connected to it) Does this apply to a single volume within just one member or it can break the 125MB/sec limit if the volume spans across 2 or more members? (for example 250MB/sec if the volume is spread over 2 members)
Question for you: Your Hosts…..are you running x64 OS by chance? I have a similar set up as you and would like to run some IO Meter tests against mine and see what I get compared to you. Trouble is, I have NOT been able to get IOMeter to work on my 810s running x64 OS. (Always shows negative value error)
6500E – 48 TB – RAID 50
Firmware 5.0.2 and HIT 3.4.2
(2) R810′s, each with (4) LOMs (Broadcom) and each with (8) Intel ports (2 quad add-ons per server)
Using all 4 LOMs on each host for iSCSi (no teaming or BACS)
Using 3 of the remaining 8 on each host for
Hyper-V/CSV/Management roles, the other 5 are teamed up 802.3ad for guest VMs
Stacked Cisco 3750-G’s
In my own testing with IOmeter, I have seen over 200MB/s with 2 NICs and over 300MB/s with 3 NICs. Since you have a case open with support, you should have this issue resolved soon.
Check NIC, MPIO and Switch settings, plus vswitch and vnic settings in Hyper-V.
Thank you.
Marc
Marc/Phil,
I check the MPIO output, somehow it’s using DELL EQL DSM instead of Microsoft DSMs’, is this normal to you guys?
See
MPIO Storage Snapshot on Saturday, 02 October 2010, at 23:33:04.307
Registered DSMs: 2
================
+——————————–|——————-|—-|—-|—-|—|—–+
|DSM Name | Version |PRP | RC | RI |PVP| PVE |
|——————————–|——————-|—-|—-|—-|—|—–|
|Microsoft DSM |006.0001.07600.16385|0120|0003|0001|030|False|
|Dell EqualLogic DSM |003.0004.00000.5282|0120|0003|0001|030|False|
+——————————–|——————-|—-|—-|—-|—|—–+
Microsoft DSM
=============
No devices controlled by this DSM at this time!
Dell EqualLogic DSM
===================
MPIO Disk4: 04 Paths, Least Queue Depth, ALUA Not Supported
SN: 6090A078E03BB0C762CAE49EE7007042
Supported Load Balance Policies: FOO RR LQD
Path ID State SCSI Address Weight
—————————————————————————
0000000077030005 Active/Optimized 003|000|005|000 0
Adapter: Microsoft iSCSI Initiator… (B|D|F: 000|000|000)
Controller: 1DC4C8F600000000 (State: No Controller)
0000000077030004 Active/Optimized 003|000|004|000 0
Adapter: Microsoft iSCSI Initiator… (B|D|F: 000|000|000)
Controller: DA372AF600000000 (State: No Controller)
0000000077030003 Active/Optimized 003|000|003|000 0
Adapter: Microsoft iSCSI Initiator… (B|D|F: 000|000|000)
Controller: D3918EF500000000 (State: No Controller)
0000000077030001 Active/Optimized 003|000|001|000 0
Adapter: Microsoft iSCSI Initiator… (B|D|F: 000|000|000)
Controller: 6A4759CF00000000 (State: No Controller)
MPIO Disk3: 04 Paths, Least Queue Depth, ALUA Not Supported
SN: 6090A078C06B1219D3C8D49CF188CD5B
Supported Load Balance Policies: FOO RR LQD
Path ID State SCSI Address Weight
—————————————————————————
0000000077030008 Active/Optimized 003|000|008|000 0
Adapter: Microsoft iSCSI Initiator… (B|D|F: 000|000|000)
Controller: DE6DABF800000000 (State: No Controller)
0000000077030007 Active/Optimized 003|000|007|000 0
Adapter: Microsoft iSCSI Initiator… (B|D|F: 000|000|000)
Controller: FFCB0DF800000000 (State: No Controller)
0000000077030006 Active/Optimized 003|000|006|000 0
Adapter: Microsoft iSCSI Initiator… (B|D|F: 000|000|000)
Controller: BD3F6FF700000000 (State: No Controller)
0000000077030000 Active/Optimized 003|000|000|000 0
Adapter: Microsoft iSCSI Initiator… (B|D|F: 000|000|000)
Controller: 770147CD00000000 (State: No Controller)
MSDSM-wide default load balance policy: N\A
No target-level default load balance policies have been set.
Is the HIT installed and setup correctly with MPIO?
Something changed….for good finally!!!
After taking a shower, I’ve decided to change my IOmeter to VM instead of using physical machine.
FYI, I’ve install MEM and upgraded FW to 5.0.2, I think those helped!
1st time in history on my side!!! It’s FINALLY over 250MB during 100% Seq. Write, 0% Random and over 300MB during 100% Seq. Read, 0% Random.
1. Does IOPS looks good? (100% RAMDOM, 4K Write is about 4,500 IOPS and Read is about 4,000 IOPS 1 Member only PS6000XV)
2. Does the throughput look fine? I can increase more worker to get it to the peak 400MB/sec, 300MB/sec for Read and 250MB/sec for Write currently
Now if the above 1-2 are all ok now, then we only left with 1 Big Question.
Why doesn’t this work in physical Windows Server 2008 R2? The MPIO load-balancing never kicks-in somehow, only failover is working.
Thanks,
Jack
Thanks Phil and Marc for your feedback (Phil, please do let me know the advance setting of your 5709C), I shall try it later tonight (may be should leave it tomorrow morning as all my late night trail always ends in taking my sleep time away)
Btw, there is no HyberV, just simple directly connect my W2K8 R2 64bits box to EQL SAN and do the testing. This time I waited long enough, but the 2nd link never kicks in. HIT, MPIO, NIC and Switch all checked with proper settings.
Btw, I almost encounter a new problem every day recently during the setup period of my project, it’s hard but rewarding.
see this…Do not update firmware/BIOS from within ESX console (http://www.modelcar.hk/?p=2780)
Btw, regarding VMware vCenter NICs question (sorry Marc, a bit off here),
On my R610 I installed vCenter on top of W2K8 R2, then I want to use two NICs for the VMware Service Console/COS LAN, previous I can use BACS Teaming, but I no longer have the Teaming software installed, so I only runs off with 1 NIC to COS network now. Is there any way to give my vcenter TWO uplinks?
Currently in veeam SCP or vsphere client host part, I type in myVC.domainname.com but it resolves to 1 IP (to 1 NIC), how can I add 1 more redundancy here?
Anyone please?
Hi Jack.,
I will get the advanced props for you on the Broadcom for sure.
You had mentioned you ran IOMeter in the VM….a few questions about this:
1. Was the VM hosted on the array or right off of the HD on your W2K8 R2 64bits box?
2. Within the running VM I am assuming you made iSCSI connections to the array before running your test?
Yea, sorry I do not have a whole lot of VMware experience. Hyper-V over here.
Final question, what type of processor(s) in your W2K8 R2 64bits box? I really need to get IOMETER going on my box, but it **may** have an issue with the type of processors I have.
Mine are: Intel® Xeon® E6540 2.00GHz, 18M cache, 6.40 GT/s QPI, Turbo, HT, 6C
Hi Phil,
FYI, we use VMware ESX 4.1, pls also see details below.
1. The VM is hosted on ESX host R710, which R710 connects to the array, so VM’s datastore is on EQL PS6000XV via R710.
2. I didn’t even bother to directly connect this VM to SAN subnet as I simply want to test how ESX host performs. I didn’t even install HIT or MPIO on this VM. Btw, VM is using VMXNet3 and Version 7.
3. What I did is real simple, assign this VM with two new disks (disk 2 and disk 3, they live in different volumes on EQL), then use Disk Management under Computer Management to initialize them, but do not format it, use it as RAW. (note, I did not directly map these two disk from EQL, but via ESX datastore)
Then I fireup IOmeter 2006 version within VM, if you look at Task Manager, the Network part of course is 0 usage, but in IOmeter bandwidth and IOPS, it is corresponding exactly what SAN HQ says in real time graphs. (something like 300MB/s during seq, IOPS is 4000-4500 during random)
4. So what do it mean? it means ESX Host is doing the actual Disk I/O job on backend, using esxtop, then n to show network, all vnics connecting to EQL also showing the exact thoughput as SANHQ.
There is 0% retransmit during the intensive IOmeter test.
So it proved there is no mis-configuration problem in the switch and ESX hosts or the array.
But back to that physical world, R610 no MPIO load-balancing problem, only failover works.
R610 is a single E5620 ? 2.4Ghz, HT 4 cores. 12GB Ram. CPU never went over 10% in all tests.
1. Could it be iSCSI is not using Microsoft DSM as default, but instead uses Dell EqualLogic DSM?
Microsoft DSM
=============
No devices controlled by this DSM at this time!
Dell EqualLogic DSM
===============
Finally, EQL support confirmed THERE IS NO SUCH THING A SINGLE VOLUME IS LIMITED TO 1Gbps (or 125MB/sec) THING. Sorry to confuse anyone, but sources from Internet sometimes is NOT reliable.
Hi Jack,
Thanks for the detail. I now better understand your setup and testing environment.
Hyper-V works in similar ways when provisioning disks to the guest VMs. There are 3 basic ways to do so:
1. Host connects to EQL via iSCSI.
- Disk (s) are formatted on host, brought on-line, given a drive letter, and VHDs (equivalent to VMDX) are created on those disks via Hyper-V Manager
2. Host connects to EQL via iSCSI.
- Disk (s) are NOT FORMATTED, LEFT OFF-LINE
- Hyper-V Manager then allows this “Offline Disk” to be used as a “passthrough” disk to a VM
3. Hosts connect to EQL via iSCSI
- Disk(s) are formatted on hosts and brought online
- Failover Cluster Manager is used to add these disks as “Available Storage” and/or “Cluster Shared Volumes” (Both of which sound similar to ESX Datastore)
-When a CSV is created the disk shows as “RESERVED” within disk management on the hosts (no drive letters etc).
- VHDs are then created in these CSVs
Of course, there is a last way where HIT/MPIO is installed on an already existing VM and the VM makes direct iSCSI connection to the EQL. I do not like this way b/c it would require the VM to participate in the same VLAN as the iSCSI traffic. (assuming the VM itself is not multi-homed)
Regarding your R610 problem:
1. I will have to see my results with IOMETER (if I can get it working) on my hosts system to see if I can duplicate, or get close to your results.
2. I initially had a problem with the TX setting on Broadcom as I mentioned before
3. I also had a problem where the EQL DSM and the MSOFT DSM were both battling for the same LUNS on the EQL. In San HQ you could see the logs where there was continuous disconnect and reconnect to the LUNS (each one was kicking the other out over and over again).
4. The above problem 3, however, does not seem to be what is going on in your case. That is why I wanted to see if I can do EXACTLY what you are doing in relation the placement of IOMETER (in host and guest)
Hi Jack,
As promised here are my settings in BroadCom advanced properties:
Ethernet@Wirespeed = Enable
Flow Control = Tx Enabled
Interrupt Modification = Enable
IPv4 Checksum Offload = None
IPv4 Large Send Offload = Enable
IPv6 Checksum Offload = None
IPv6 Large Send Offload = Disable
Jumbo MTU = 9000
# of RSS Queues = 8
Priority + VLAN = Priorty and VLAN Enabled
Receive Buffers = 750
Receive Side Scaling = Enable
Speed + Duplex = Auto
TCP Connection Offload (IPv4) = Enable
TCP Connection Offload (IPv6) = Disable
Transmit Buffers – 1500
WIth these setting IO Meter yields the following AT THE HOST:
32K: 0% Read; 0% Random:
I/O sec – 9353
MB sec – 291.92
Avg I/O Resp: 6.86
32K: 100% Read; 0% Random:
I/O sec – 15070
MB sec – 471.03
Avg I/O Resp: 4.25
Curiously, with these seeings the VMs actually get a poor Write performance when running IOMETER in the guests.
If I change TCP Connection Offload (IPv4) to Disabled in the Advanced Properties of BroadCom LOMS, then I get better Write performance in the VMs, but not so good Write performance on the HOSTS. The Read remains the same in either case.
I need to do some more experimentation here because something is a bit odd with these results.
Thanks Phil for the detail feedback.
I found the reason just 5 mins ago!!!
Solution FOUND! (October 8, 2010)
C:\Users\Administrator>mpclaim -s -d
For more information about a particular disk, use ‘mpclaim -s -d #’ where # is t
he MPIO disk number.
MPIO Disk System Disk LB Policy DSM Name
——————————————————————————-
MPIO Disk1 Disk 3 LQD Dell EqualLogic DSM
MPIO Disk0 Disk 2 FOO Dell EqualLogic DSM
That’s why, somehow the testing volume Disk 2 has been set with a LB policy as Fail Over Only (FOO), no wonder it’s always using ONE-PATH ALL THE TIME, after I’ve changed it to LQD, everything works like a champ!
Then I just performed an IOMETER test, Wow! the
RealLife-60%Rand-65%Read is crazy high!!! Almost 7200 IPOS
SERVER TYPE: Physical
CPU TYPE / NUMBER: CPU / 1
HOST TYPE: Dell PER610, 12GB RAM; E6520, 2.4 GHz, 4 Cores Total
STORAGE TYPE / DISK NUMBER / RAID LEVEL: Equallogic PS6000XV x 1 (15K), / 14+2 600GB 15K Disks (Seagate Cheetah 15K.7) / RAID10 / 500GB Volume, 1MB Block Size
SAN TYPE / HBAs : Broadcom 5709C NICs with 2 paths only (ie, 2 physical NICs to SAN)
Worker: Using 2 Workers to push PS6000XV to it’s IOPS peak!
##################################################################################
TEST NAME——————-Av. Resp. Time ms——Av. IOs/sek——-Av. MB/sek——
##################################################################################
Max Throughput-100%Read……..14.3121……….6639.48………207.48
RealLife-60%Rand-65%Read……12.8788……….7197.69………150.51
Max Throughput-50%Read……….11.3125……….6837.76………213.68
Random-8k-70%Read……………..13.7343……….6739.38………142.22
EXCEPTIONS: CPU Util. 25.99, 24.10, 28.22, 25.36%;
##################################################################################
Hi Jack. Thanks for your info here too. I am going to run the MPIO claim to see my settings (for validation purposes). As I had mentioned re: poor VM write performance with ToE enabled, I would like to share:
1. I found that TX/RX BOTH need to be set for flow control despite EQL support recommendation: TX only (for Broadcom anyways)
2. I opened an official case with Microsoft
3. I opened an official case with BroadCom
Will post what turns out in the end.
Marc,
I read your blog regarding testing the redundancy of EQL and switches, did you also encounter the followings?
I wonder if other EQL users could simulate to see if this would also happen in your environment?
Thanks!
Jack
When Power OFF Master PC5448 Switch, we cannot no longer ping to PS6000XV
We are currently testing the final switch and array redundancy now, we have performed every possible fail scenarios (on Switch, Switch Ports, LAN on ESX, ESX hosts etc), they all worked perfectly as excepted, EXCEPT ONE OF THE following situation.
This is how we performed the test:
1. Putty into ESX 4.1 host Service Console, then issue “vmkping 10.0.8.8 -c 3000″ where 10.0.8.8 is our group IP, it can ping it without problem.
2. Turn OFF the master PowerConnect 5448 swtich (where we have two PC5448, master and slave, no STP and all LAG/VLAN etc has been setup properly according to the guide/best practice and we have connected all the redundancy paths correctly between switches and ESX Hosts), then we see in vCenter the ESX 4.1 host, it shows 2 out of 4 ports failed with a red cross in iSCSI VMKernel vSwitch.
3. The “vmkping 10.0.8.8 -c 3000″ stopped working until we Turn On the the master PowerConnect 5448 swtich again.
Please note the following special findings:
a. Even we cannot ping to 10.0.8.8 from the ESX Host during master switch is off, but in EQL group manager, it is still showing the ESX Host CAN STILL ESTABLISH iSCSI connection to it, and all the VM on that ESX Host is working with no problem, and we can still do VMotion between ESX Hosts even with the master switch turned off. So the iSCSI connection is not dead, just cannot be pinged somehow from ESX Host.
b. We also performed ANOTHER SIMILAR test BY turning off individual array iSCSI ports on master switch, we used OpenMange to connect to the master switch, and then TURN OFF the TWO ports connecting to PS6000XV, so to PS6000XV active controller, it shows again 2 out of 4 ports failed with a red cross.
Please note to EQL PS6000XV active controller, they see 2 out of 4 ports failed, yes, BUT we used TWO different method to have the same goal (1st one is to turn off the whole switch, the 2nd one is to turn off the iSCSI ports connecting to the array on switch) In the 2nd case, “vmkping 10.0.8.8 -c 3000″ IS STILL WORKING! How come the 1st situation doesn”t work? So the conclusion is “vmkping 10.0.8.8 -c 3000″ WILL ONLY NOT WORKING when WE TURN OFF the master switch.
Can anyone offer some suggestions please or simulate to see if this could also happen in your environment?
Thanks in advance.
EQL couldn’t find a reason why? I’ve also spent three hours with local Pro-Support expert via WebEX on Tue, but still nothing firm. Will doan intensive test with him again tomorrow.
However, I googled around and find this guy having similar problem as mine. Hope this information can help others who are having the similar problem in identify the problem ASAP.
This is THE LINK:
http://communities.vmware.com/thread/277156?start=0&tstart=0.
Oct 5, 2010 3:16 AM
Fix-List from v5.0.2 Firmware:
iSCSI Connections may be redirected to Ethernet ports without valid network links.
Also he’s problem is similar as whatever iscsi connection left in LAG won’t get redirected to slave switch after shutdown the master switch, I got 4 paths, his PS4000 has two, so my iscsi connection survived due to there is an extra path to the slave switch, but somehow vmkping doesn’t work.
and if you look at comment #30 .
Jul 27, 2010
Dell acknowledged that the known issue they reporting in the manual of the EqualLogic Multipathing Extension Module is the same I get.
They didn’t open a ticket at vmware for now, but they will, after some more tests.
I think this issue is there since esx 4.0. In VI3 they used only one vmkernel for swiscsi with redundancy on layer1/2, so there it should not be the case.
My case number for this issue at vmware is 1544311161, the case number at dell is 818688246.
If vmware acknowledge this as a bug in 4.1, and don’t have a workaround, we will go with at least 4 logical paths for each volume and hope that at least one path is still connected after switch1 fails, until they fix it.
Finally, it could also be something related to EQL MEM Plugin for ESX which we have installed. (Comment #29 on page 2)
It indicates there is a know issue that once a network link failed (could be due to shut down the master switch), if the physical NIC with the network failure is the only uplink for the VMKernel port that is used as the default route for the subnet. This affects several types of kernel network traffic, including ICMP pings which the EqualLogic MEM uses to test for connectivity on the SAN.
Jul 23, 2010
from the dell eql MEM-User_Guide:
4 Known Issues and Limitations
The following are known issues for this release.
Failure On One Physical Network Port Can Prevent iSCSI Session Rebalancing
In some cases, a network failure on a single physical NIC can affect kernel traffic on other NICs. This occurs if the physical NIC with the network failure is the only uplink for the VMKernel port that is used as the default route for the subnet. This affects several types of kernel network traffic, including ICMP pings which the EqualLogic MEM uses to test for connectivity on the SAN. The result is that the iSCSI session management functionality in the plugin will fail to rebuild the iSCSI sessions to respond to failures of SAN changes.
Could it be the same problem I have? So they already know about this problem?
Aside this it looks like the Dell MEM makes only sense in setups with more then one array per psgroup, because the PSP selects a path to a interface of the array where the data of the volume is stored. And it have a lot of limitations. We only have one array per group for now, so I think I skip this.
Still dont understand why there is no way to prevent that the connections go through the LAG in the first place, it should be possible to prefer direct connections…
Hi Marc and EQL fellow,
I would like to share with you my findings:
Equallogic PS6000XV using VMWare Unofficial Storage Performance IOMeter parameters
(http://www.modelcar.hk/?p=2818)
Here is my result from the newly setup EQL PS6000XV, I noticed the harddisk is Seagate Cheetah 15K.7 (6Gbps) even PS6000XV is a 3Gbps array. (I thought they will ship me Seagate Cheetah 15K.6 originally)
I’ve also spent 1/2 day today to conduct the test on different generation servers both local storage, DAS and SAN.
The result is pretty making sense and reasonable if you look deep into it.
That’s is RAID10 > RAID5, SAN > DAS >= Local and EQL PS6000XV Rocks despite warning saying all 4 links being 99.9% saturated during the sequential tests.
Also
Extract from VMWare Unofficial Storage Performance Comparing Equallogic and other SAN Vendors
(http://www.modelcar.hk/?p=2824)
It’s not offical, but after comparing the results, I would still say Equallogic ROCKS!
Finally, I wonder why there are many results from Lefthand, NetApp, 3PAR and HDS?
We’ve recently implemented firmware 5.0.2 on our PS6000′s. We primarily use them for virtual machine storage on our ESX 4.1 cluster. How do you implement thin clones? My previous attempt included creating a 40 gig volume and installing a VM to it. I then converted it to a template. After creating a thin clone from this volume and adding it as an ESX datastore, it appears empty. Am I missing something? Please let me know your thoughts.
PS – The documents you posted regarding implementing multi-pathing were very helpful. Thank you.
I figured this out myself. In case anyone else was curious, I followed the following procedure. I am running ESX 4.1 with 7 hosts and EQL firmware 5.0.2.
1. Create a thick SAN volume 15% too large for your data
2. Mount it to ESX
3. Create a thin provisioned disk on your volume
4. Configure the VM apprpriately appropriately (sysprep, etc)
5. Remove it from inventory
6. Do not delete the datastore from ESX. Take it offline on the SAN and rescan for datastores in ESX.
7. Convert the volume to a Template on the SAN
8. Create a thin clone and add them to esx one at a time. Always assign a new disk signature when adding thin datastores to inventory!
I was able to save ~6 GB per Server 2008 R2 VM. Multiply that by 80 and I’m going to save half a terabyte.
Hi,
We have two locations, currently VPN IPSEC channel is built for both these locations. In one location, we have two network mapped drives of the servers where all of our employees store their data. In another location , there are some few mapped network drives. I want to use SAN storage for this using Dell equilogicbox. Can we have two Dell Equilogic boxes on both these locations so that data on both the locations are replicated and thus providing high availability and disaster recovery. How to go about this scenario?
Thanks
Sanjay
My existing 1TB volume is running out and I extend the storage to 2TB. But vmware is not able to see the changes as some claim that vmware has problems supporting 2TB and more. Is this true?
Also, I am not able to reduce the volume using the GUI manager even when I bring it offline. How do I access the EQL’s CLI to do this? Possible to do this without going offline? I’m on firmware v5.
Hi ,
Wanted to know if there`s a built in way to limit the bandwidth between replication partners?
Thanks
EqualLogic simply uses the “pipe” that it has been given.
Thank you.
Marc
We rely on any QoS services of the underlying network infrastructure and – like Marc said – use the pipe we’re given with the bandwidth & latency “limitation” (or benefits) they provide.
I’m not sure it would make sense for us to engage in a lot of herotics and use our somewhat scarse resources on the array to implement a robust QoS service for replication considering the fact that there are a lot of moving parts & nobs in a network that it’d probably be better to leave to the network infrastructure to “own”…? Feel free to set me straight!
Replication speed and control or throttling of the replication is a common complaint I hear with EqualLogic customers. I have brought this up with numerous engineers at EqualLogic, but have gotten the same response as Mr. Sjolshagen.
Our current implementation is relying heavily on replication, with some replicas reaching into the multi-TB per week. Having the ability to burst the bandwidth on the array side for the larger replicas would be huge for us.
I’m not going to deny that the fact that we obviously hear the request to control replication speeds & amounts on occasion.
Some of these requests basically represent a request to do in the array what the network infrastructure can already do. Using engineering resources to replicate capabilities that already exist in the network infrastructure are pretty difficult to justify from a business & trade-off perspective (Remember, everything our engineers spend time on comes at the expense of something they can’t spend time on instead).
That said, I’m trying to identify elements of your request that _cannot_ be achieved by using the network infrastructure capabilities, so I’ve got a clarification question;
What level of control are you looking for (that isn’t there) related to the actual replication functionality in the array?
Keep in mind that, in my view, there really isn’t much value we can add to the typical network QoS capabilities/policies, so are there elements of the actual replication process (not the transport of replication data, but maybe the configuration of a replica pair, etc) you think we can modify/add to better meet your needs?
Thank you for the follow up Thomas. We are happy to know that there are EqualLogic engineers out there listening.
We are aware of the abilities the network infrastructure can provide in shaping and classifying traffic types. These are especially useful when contention can occur or where WAN resources are limited. These network mechanisms (MPLS, WAN acceleration, QoS, etc.) are effective, but only to a certain degree. For example, consider the following.
It seems that the EqualLogic arrays already do some network shaping when it comes to replication, albeit very limited. When observing SAN HQ graphs and InMon flow information during replication, it appears that each array involved in sending replica traffic utilizes ~125Mbps of bandwidth. At an individual array level this bandwidth does not seem to fluctuate. It appears to be a hard-coded amount. Would it be possible to allow the customer to adjust this? If there is more bandwidth available, why not give the ability to the customer to adjust the amount?
Even if it was tucked away in the CLI we would be more than happy to have it available. We have some arrays that are essentially idle during replicas with regards to IO type traffic. We are confident that if the allocated bandwidth for a replica were to be increased, it would not cause any contention since no real production I/O is running on the source arrays. Additionally, we are replicating to 6500E’s who essentially serve as a dumping ground for remote site replicas. We would prefer to max out their four network interfaces with replica traffic since there is no I/O ever running against these arrays.
Even if we implemented all the network infrastructure mechanisms mentioned in the outset, they would do little to overcome the ~125Mbps we are observing.
There is an additonal issue we are observing with regards to bandwidth utilization during replication. For example, a typical volume’s slices are spanned across three arrays (if available). If that volume is configured for replication a strange condition occurs. When the replica first kicks off, all three source arrays begin transmitting replica data to the destination array(s). But slowly over time, source arrays begin dropping off completely in trasmitting of replica data. Eventually only one array is left to send the remainder of the replica. The replication will start out at ~350Mbps, but then drop off to ~250Mbps and then finally ~125Mbps. Is this because the deltas associated with the replica are being sent only by the arrays that contain those deltas? Wouldn’t it be better to have all the arrays participate in the replication for the entire duration? After sending their own portion or slice of the replica, the data can be backhauled from the remaining source arrays to the now idle source array. That way all source replica arrays are active and the full bandwidth can be utilized. We have one weekly replica that is ~2.5TB. If all the arrays were able to send for the entire duration of the replication, that would be a big payoff for us.
These are just a few of our observations on the bandwidth control with replication. We are big fans of our EqualLogic arrays! Thanks for pumping out the new features, especially with the 5.0 release.
Hi Graham,
Just to chuck in my 10 cents… EqualLogic ‘appear’ to traffic shape as replication traffic only runs at 125MB/sec, but this is not traffic shaping – EqualLogic only replicate data using 1 interface on the controller.
125MB/sec = 1000Mbps
It’s also worth noting that replication is performed at a lower IO priority, so you can be sure that no matter how much data there is to replicate the SAN will always give priority to busy volumes.
Graham Gray-
Thanks for the info. We are seeing 125 megabits, not megabytes for replication traffic.
Has anybody worked with the “support repl-window-size” command to tune TCP window sizes for WAN replication?
Hi Marc,
I was trying to install MEM plugin for vsphere 4.1. Trying it with Update manager. When I attempt to import the plugin, i get the following error message:
failed to import data.
Quote
failed to import data.
Cannot verify checksum for imported upgrade file. This might be caused by corrupted metadata or binaries in the file. You can try to import a new copy of the upgrade file.
Unquote
I downloaded couple of times, and unpacked with Winrar and 7-zip. No luck.
Any idea? appreicate your help.
Please contact EqualLogic Support for assistance.
https://www.equallogic.com/secure/login.aspx?ReturnUrl=%2fsupport%2fDefault.aspx
Thank you.
Marc
From my understanding after speaking with a colleague it seems that the EQ6000 boxes can simplify expansion greatly. But I was wondering if there is a limit to how many EQ boxes you can expand to? It seems that you can simply add more EQ 6000 boxes and get better performance and the added space. I’m I correct in this assumption or incorrect?
Thanks,
Daniel
You can have up to 16 Physical PS6000 Arrays in a Group and up to 8 Physical PS6000 Arrays in a Pool.
So in a Group with 16 Physical PS6000 Arrays, you will have at least (2) Pools of 8 Physical PS6000 Arrays.
Thank you.
Marc
Howdy, Marc!
Easy one for you – I think. I’m currently running firmware v4.3.5 on our PS4000s and want to get them up to v5.0.4
I gather from the documentation I’ll need to get the HIT kit on the servers (currently at v3.3.2) up to v3.5.1 before starting that process. Do I install this right over the existing version of the HIT, or do I need to uninstall the exising version and reconfigure iSCSI etc after installation of the new one? Other considerations?
I’m also having difficulty figuring out if I can update the firmware directly from where I am to v5.0.4 – There were installation problems reported by Dell with v5.0.0 and v5.0.1 I’d like to avoid, but I don’t see anything explicitly stating I can bypass them on the way to v5.0.4
Any suggestions? Thanks!
Greetings….
I recommend calling EqualLogic Support since you are in production. Our Support Team can better assist you.
EqualLogic Support: 1-800-945-3355
Thank you.
Marc
Thanks!
Hello Marc, great site.
Is their any advantage to using dell/EQs MPIO vMware ESX plugin over just using Vmwares native MPIO in 4.1 update 1?
If so I just manually re-crated all my vmkernels and vswitchs for jumbo frames to use vmwares. I then read about dells. If i run the script will it remove my work or pick up after the switch creation?
thanks
sorry should have been clearer.. this is in a environment of only 1 Dell SAN.
Click on the “Customer Blogs” Tag on my site, there are a couple very good Customer Blogs on the performance improvements using our EqualLogic MEM for MPIO.
Thank you.
Marc
Marc – I currently have a PS6010XV (with SAS drives in RAID50) and a PS6010E (with SATA drives in RAID5). I have the XV in one pool and the E in another. I am going to start replicating to a remote site with the same setup. The problem I’m running into is I do not have enough space on the SAS enclosure to configure the fast failback @ 100%. I am using 100% + whatever local reserve is set to for each respective virtual disk to calculate the local reserve space I need. How detrimental would it be to configure the two in a single pool (keeping in mind that they have mutually exclusive RAID types) and designating RAID type at the virtual disk level? I am running VMware and the SAS disks are currently being used exclusively for the logs/databases of some SQL VMs.
Whether you run the arrays in the same of different pool, as long as they are different raid types and you set the volumes to live on a specific raid type, you are fine. I would recommend reviewing snapshots if you are using those and the possibility of reducing snapshot reserves for your volumes to provide additional space for replication failback.
Thank you.
Marc
We are looking into using a Equallogic SAN solution in a cross-site Hyper-V cluster. Can you tell me if Equallogic supports CSV in such a scenario (meaining with a replicated CSV)?
Just thought I would leave my info here since I came here and found some good info.
We just purchased 4 EQL units and I wanted to make sure we had everything setup correctly.
Using the IOmeter PDF from this site, I setup the test.
Hardware:
Dell R610, dual Xeon 5600 CPU’s (2.66ghz) 96gigs of RAM, two dedicated Intel X520-DA2 (10gb)Ports running the latest Hit Kit 3.51 with MPIO setup.
Cisco Nexus 5548, with unicast off, jumbo frames, flow control on.
Equallogic PS6010, with 16 600gig 15k drives in a RAID 10.
I created a 5 gig volume and ran the test according to the PDF I got of this site.
My results were…
278.65MB per Second.
8916.65 I/O’s per Second
7.1 ms average I/O repsonse time.
14ms Max I/O response time.
1.6% CPU Utilization.
Task Manager showed both NIC’s at about 13% utilization.
If I disable a NIC, I get the same basic results, but the single NIC jumps up to about 27%
We are looking to upgrade our ESXi infrastructure to vSphere 5. The current Multipathing Extension Module on the EqualLogic site is version 1.0.1 and it indicated that it only support vSphere 4.1. Do you know when a version supporting vSphere 5 will be released?
The beta is just about to end if not already. Hopefully this means MEM is imminent.
An early release version of the HIT Kit for VMware which is compatible with vSphere 5 is now also available.
We are going to join our 2 EQL PS300E arrays, put them in 1 group, with RAID50. Dell says performance will really increase, everything will double: capacity, netw. IO, cache, read IOPS and write IOPS. And according to Dell the new 5.1H2 firmware will make it even better and faster. I understand that capacity and Read IOPS will double, but write IOPS? Doesn’t the group have to write the data to 2x more spindles, requiring more time? Or does the cache deal with that? Anyone have experience with this?
Mark,
We are ordering a new vSphere5 cluster (Dell R810s).
What is your recommendation for HBAs in an EqualLogic environment?
Dell offers Broadcom, Qlogic or the Intel ETs?
Hi Marc,
I am working for a EQL. reseller in the Netherlands and I have a question regarding a Java script problem using Mac OSX to access the EQL via the web interface!
Due to the Java script problem the user cannot use the web interface to control the EQL and more important he is missing the great features of the latest firmware update.
I know it is in the hands of our friends at Apple but maybe there is a solution for it in the US?
Please let me know when it is the case.
Regards, Paul.
Dear all,
With the latest update off java for OS X the problem was solved!
So all users with OS X can use the remote update of the the great firmware for the EQL!
Regards, Paul.
Marc,
I have a PS5000 running version 4.1.4 and want to upgrade to the latest version which is 5.1.2. I see I have to update to 4.3.8 first before going to 5.1.2. I have the 3 ESX 4.1 servers connected to the array. Can I upgrade without shutting down the ESX servers? I can upgrade after hours when there isn’t much activity on the SAN but I can’t shut everything down.
Thanks,
Ken
We have had a new install of Dell R710 x 4 running vmware 4.1.2 and two equallogic PS4100x with raid 50 and a total of 24x300gb hdd (4.8Tb with the Raid 50). We had a Dpack done which shows that we are currently using 3.9Tb on our standalone server farm. We have created the groups and then created two volumes which we want to replicate to another site with the same setup i.e. equallogic PS4100x with raid 50 and 4.8tb. When we go to create the replication set it tells us that we don’t have enough space for replciation if we set the reserve to 100%. We have been advised that we should drop this down to 40%. Can you tell me with the limited information whether this is the correct way to set the reserve? I would of thought tf we had a 2 volumes each of 1Tb and we wanted to replicate it we would need to reserve 100% for each?
I hope this makes sense sorry if not, but I am a newbie to the Equallogic storage and just wanted to make sure that the supplier isn’t trying to sell us an under sized storage solution?
Thanks.
Hey Kevin
I have done this a number of times with Equallogic and you don’t necessarily need 100% reserve, but I like to set it at that if I can. The real answer unfortunately is “it depends.” If you have a speedy connection and the replication doesn’t take very long you don’t need a large reserve, but if your connection is slow and/or the is a lot of change on the volume during replication you will need a higher reserve. So to reiterate, the volume reserve needed depends on rate of change during replication and how long it takes to replicate the volume. As a best practice I always set it to 100% so that even if the entire volume changes during replication the replication still goes through. To experiment you can drop down the reserve then run your replications and if it fails then you need to bring it up. If not you should be good to go. Don’t take my opinion as gospel but hopefully that helps you out.
I have a question about storage vMotions with Equallogics and vSphere5. Our ESXi hosts are connected to the Equallogic groups through iSCSI, and they have the vSphere HITKIT and MEM installed. I transferred a 30GB VM from one datatstore to another and it took almost 30 minutes. How can this be sped up? It appears from looking at the iSCSI usage graphs on the ESXi host that very minimal traffic is being used during the transfer, indicating that the Equallogics are handling the transfer and it is not going through the ESXi host. There is some traffic, but just enough to indicate that it is checking that the data is being transferred properly, but not actually performing the transfer itself. Do you have any insights into this?
Mark,
My company just purchased a Dell PS6100 with two PowerConnect 6224′s to create our first SAN as we were in need of a storage solution. I am tasked with putting it all together and trying to follow best practice but lots of information second guesses itself. The SAN network will be isolated, with the exception of the port for management, and the two 6224′s are going to be stacked. In terms of setting up the ports on the stack, there is conflicting information regarding Portfast and Spanning Tree Protocol. Some documentation says to enable portfast and some says not to. The eight ports from the PS6100 will be split between both 6224′s. Also, each server has 2 ports that will go out to each of the 6224′s as well. When configuring the switch, what is the optimal configuration that I should be working with?
Thanks!
Good Day….
Please contact EqualLogic Support. Since you have PowerConnect switches, EqualLogic Support can assist in confirming the correct setup.
Thank you.
Marc
Marc,
We have a PS4000vx and we are running Vmware 4.1 with 10-15 different servers one of them which is an SQL2008 server. My question is we are running RAID 10 right now but are wanting to change a few things up for extra space. I was thinking about converting it to RAID 50 but am unsure of the performance loss from RAID 10. But then I saw that you can change the RAID based on the volume.
So is it possible to change the entire storage group to RAID 50 giving us more space. But then creating a volume and making it RAID 10 for our more IO intensive servers? Would that give us the space we need and the performance we want?
Good Day….
A Single Array can only be configure with a single raid type. In this case your PS4000 can either be configured with R10 or R50, not both.
Hopefully you are running the latest version of SANHQ and EqualLogic Firmware. With SANHQ you can review the RAID Simulator which will show what the performance would be of your current array if you converted from R10 to R50. You can convert your single array from R10 to R50 without any downtime. The conversion will take a while and run in the background. You cannot go from R50 to R10 so please remember this if you convert from R10 to R50.
Thank you.
Marc
I need to increase the hard drives on an EqualLogic PS4000, because one of the boxes is out of drive space. It has roughly 3% of free space now. My customer has a second PS4000 set in a Colo that is replicating snapshots and critical data and has roughly 65% of free space.
What is the best process for migrating the data off the box, increasing the hard drive sizes and restoring the volumes?
The drives are 250GB each and all 16 are being used in a RAID 50 configuration.
Hello,
We bought two Equallogic 300E units from an auction for a few thousand dollars. In our lab, we found out that all four controllers don’t have the firmware flash cards in them. As a result, we could not boot the units, the fan LEDs are red and the controllers do not engage (ACT LED not green).
We had access to working controllers from other 300E units and we did the following tests:
1. Took controllers from existing working units and put them in the just-bought units. Worked! This confirms that both just-bought units are fine except that they don’t have the firmware.
2. Took the compact flash cards from existing working units and put them in the controllers of the just-bought units. Does not work! This confirms that the firmware versions are different.
I would like to know how I can get the proper firmware for these controllers. Do you know anybody that can fix them for a reasonable price.
We contacted Dell and it would cost around 3K to put each unit under warranty. We can afford this price tag and we intended to use them for testing.
Thanks in advance.
Regards,
Joseph
Good Day….
Going through Dell is the only official way that I know of to accomplish what you would like to do.
Thank you.
Marc
Hello Marc,
I have a question on consolidation of servers per a particular volume. Is there an approx rule of thumb to use when using an EQL box? I have a ps5000x and a ps6010xv. The 10GbE unit houses my application servers while the 1GbE unit house low overhead servers (dc, file server, small apps).
I was using my old world of thinking and never going over 8 servers in any particular lun. Now that luns are gone and everything is a volume, is there any real benefit at all to breaking up load across multiple volumes?
I am going to run some odometer tests that you have linked here as I have never bench marked my systems.
Environment:
(3) Dell R810s quad proc 6 core
256GB Ram
Dual intel 10GbE network adapters for vm network, guests, vmotion
Dual qlogic 8242 10GbE HBA for ISCSI
Dell 8024f 10GbE switch
VMware 5.0
Per production dell MEM installed.
Hello…
Can any one help me that how to find out the current Firmware version of my ps4100 Equillogic?
And What is the procedure to update my current Firmware?
Thanks in Advance
Mahfuz
I have a question on Security with VSS/VDS (or the lack thereof).
I have a Windows 2008 R2 server with the HIT toolkit installed (which included the VDS component) and because I want SQL to have consistent snapshots I have configured a VDS/VSS record with Chap authentication within the group manager.
However what I found out was that if you install the SAN Manager component of Windows 2008 R2, that server (and windows administrator) now has 100% control of the entire SAN Volumes..up to and including creating and deleting volumes at will.
I opened a support case and their comment was “just don’t install the VDS component of the HIT kit”
I cannot accept that in order to allow an application to have VSS capabilities I have to drop my pants and pray that the Windows administrator doesn’t get ahold of the HIT toolkit and then blow away my SAN.
Is this really a correct implementation or have I missed something basic because in my Mind, VDS and VSS should be 2 completely seperate access lists…not a single access list that gives the junior admin of a company that much potential control..