Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Issues related to configuring your network
alpha754293
Posts: 69
Joined: 2019/07/29 16:15:14

Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by alpha754293 » 2019/09/08 03:57:51

I have a cluster where all of the nodes (head and slave nodes) are using ConnectX-4 dual port 4X EDR IB (MCX456A-ECAT) connected to an externally managed MSB-7890 36-port 4X EDR IB switch.

All of the nodes are also running CentOS 7.6.1810 with the software group 'Infiniband Support' installed (because this one still supports NFSoRDMA).

On the head node, I have four Samsung 860 EVO 1 TB SATA 6 Gbps SSDs in RAID0 through the Marvell 9230 controller on an Asus P9X79-E WS motherboard.

Testing on the headnode itself shows that I can get around 21.9 Gbps total throughput when running:

Code: Select all

$ time -p dd if=/dev/zero of=10Gfile bs=1024k count=10240

But when I trying to do the same thing over IB, I can only get about 8.5 Gbps at best.

NFSoRDMA is configured properly.

Here is /etc/exports:

Code: Select all

/home/cluster *(rw,async,no_root_squash,no_all_squash,no_subtree_check)
Here is /etc/rdma/rdma.conf:

Code: Select all

# Load IPoIB
IPOIB_LOAD=yes
# Load SRP (SCSI Remote Protocol initiator support) module
SRP_LOAD=yes
# Load SRPT (SCSI Remote Protocol target support) module
SRPT_LOAD=yes
# Load iSER (iSCSI over RDMA initiator support) module
ISER_LOAD=yes
# Load iSERT (iSCSI over RDMA target support) module
ISERT_LOAD=yes
# Load RDS (Reliable Datagram Service) network protocol
RDS_LOAD=no
# Load NFSoRDMA client transport module
XPRTRDMA_LOAD=yes
# Load NFSoRDMA server transport module
SVCRDMA_LOAD=yes
# Load Tech Preview device driver modules
TECH_PREVIEW_LOAD=no
Here is /etc/fstab on the slave nodes:

Code: Select all

aes0:/home/cluster /home/cluster nfs defaults,rdma,port=20049 0 0
And here is confirmation that the NFS share is mounted using RDMA:

Code: Select all

aes0:/home/cluster on /home/cluster type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=rdma,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=xxxxxxx,local_lock=none,addr=xxxxxxx)
The RAID volume is mounted like this:

Code: Select all

$ mount
...
/dev/sdb1 on /home/cluster type xfs (rw,relatime,attr2,inode64,noquota)
...

I don't really understand why the NFSoRDMA mount appears to be capped at those less than 10 Gbps speeds.

Your help is greatly appreciated.

Thank you.

chemal
Posts: 776
Joined: 2013/12/08 19:44:49

Re: Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by chemal » 2019/09/09 19:46:11

Have you already measured the performance of your network directly? With qperf for example?

And your dd line on the host will see a lot of write buffering.

alpha754293
Posts: 69
Joined: 2019/07/29 16:15:14

Re: Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by alpha754293 » 2019/09/10 01:49:41

chemal wrote:
2019/09/09 19:46:11
Have you already measured the performance of your network directly? With qperf for example?

And your dd line on the host will see a lot of write buffering.
I did it with ib_send_bw.

More specifically:

Code: Select all

host# ib_send_bw -a -n 100000 -d mlx5_0 -F --report_gbits
client# ib_send_bw -a -n 10000 -d mlx5_0 -F --report_gbits host
With a message size of 4 kiB, it is able to hit 96.19 Gbps and pretty much stays about that for the rest of the test.

But this is what I don't understand, there aren't very many tools that I can use to see how long it actually takes to write a 10 GiB file over to the system to confirm that the NFSoRDMA is working properly.

when I use conv=fdatasync, that makes it even slower since I'm currently using async.

I thought that part of the point in regards to RDMA is that it would bypass all of that, which means that buffering or not, I should still be able to get speeds higher than that, else, I am not sure how I would be able to actually test the array itself (without using something like fio because I don't know if I would be able to use fio over NFSoRDMA).

There's got to be some way that I can consistently check and test the RAID0 array of SSDs both on the local host and also remotely via NFSoRDMA to make sure that the NFSoRDMA is able to achieve similar results as the local host, no?

Otherwise, what would be the point of NFSoRDMA?

chemal
Posts: 776
Joined: 2013/12/08 19:44:49

Re: Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by chemal » 2019/09/10 02:22:34

Code: Select all

$ echo $((16*1024*1024)) > /proc/sys/vm/dirty_background_bytes
$ echo $((48*1024*1024)) > /proc/sys/vm/dirty_bytes
$ dd if=/dev/zero of=xxx bs=8k count=12500000
That's what I would type on the head node. It makes sure the OS starts writing to the disk once more than 16M of dirty buffers have accumulated and that it never allows more than 48M of dirty buffers. Then write a 100G file. This will tell you the true write performance of your SSD raid.

alpha754293
Posts: 69
Joined: 2019/07/29 16:15:14

Re: Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by alpha754293 » 2019/09/10 03:36:08

chemal wrote:
2019/09/10 02:22:34

Code: Select all

$ echo $((16*1024*1024)) > /proc/sys/vm/dirty_background_bytes
$ echo $((48*1024*1024)) > /proc/sys/vm/dirty_bytes
$ dd if=/dev/zero of=xxx bs=8k count=12500000
That's what I would type on the head node. It makes sure the OS starts writing to the disk once more than 16M of dirty buffers have accumulated and that it never allows more than 48M of dirty buffers. Then write a 100G file. This will tell you the true write performance of your SSD raid.
*strikethrough* a) does it matter that I am using a 64 kiB stripe size on the RAID0 array (at the RAID controller level, e.g. not md RAID)? */strikethrough*

8k block size, average of three runs: 763.67 MB/s.

64k block size, average of three runs: 768.67 MB/s.

*strikethrough* b) would I need to or should I repeat the process when I test NFSoRDMA? */strikethrough*

64k block size, average of three runs: 589 MB/s.

Why would anybody use NFSoRDMA on a 100 Gbps interconnect if it doesn't even get half of a 10 Gbps connection?

c) is there a better way to confirm that NFSoRDMA is working as it should?
Last edited by alpha754293 on 2019/09/10 04:08:52, edited 1 time in total.

chemal
Posts: 776
Joined: 2013/12/08 19:44:49

Re: Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by chemal » 2019/09/10 03:57:49

I'm just curious what your SSD raid on the head node can really do (locally). Please post the numbers.

alpha754293
Posts: 69
Joined: 2019/07/29 16:15:14

Re: Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by alpha754293 » 2019/09/10 04:09:31

chemal wrote:
2019/09/10 03:57:49
I'm just curious what your SSD raid on the head node can really do (locally). Please post the numbers.
I edited the post above with the results.

P.S. At 768.67 MB/s (64k block size), that's just above 6 Gbps, which, unfortunately, I don't have a way of testing what a single Samsung 860 EVO 1 TB SATA 6 Gbps SSD can do, but my point being that's just a smidge above the theorectical interface speed with a four drive RAID0.

Maybe its just me, but something doesn't seem to make sense to me.

The published specifications for the random write IOPS is 42000 (4 kiB, QD1) which is about 172 MB/s.

With four drive in RAID0, that would be 688.13 MB/s, and that would sit somewhere in between what I can write on the local host vs. what I am able to write remotely from one of the cluster nodes via NFSoRDMA.

I am somewhat surprised that the sequential write speed using this method is only 11.7% faster than pure 4kiB, QD1 random write speeds.

Do these numbers seem to make sense to you?

chemal
Posts: 776
Joined: 2013/12/08 19:44:49

Re: Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by chemal » 2019/09/10 04:21:22

Well, what you really have is a RAID that can do at most 800 MB/s. And that's what you get via NFS too.

What brand of RAID ist this? :)

alpha754293
Posts: 69
Joined: 2019/07/29 16:15:14

Re: Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by alpha754293 » 2019/09/10 04:27:01

chemal wrote:
2019/09/10 04:21:22
Well, what you really have is a RAID that can do at most 800 MB/s. And that's what you get via NFS too.

What brand of RAID ist this? :)
I'm not sure what you mean by "what brand of RAID is this"?

It's the Marvell controller that's on the motherboard.

NFS is notably slower than that. About 23.37% slower than the local host.

chemal
Posts: 776
Joined: 2013/12/08 19:44:49

Re: Why is NFSoRDMA in CentOS 7.6.1810 limited to 10 Gbps?

Post by chemal » 2019/09/10 04:49:43

Marvell? That's fake RAID, isn't it? (I see you mentioned Marvell before, I must have read over it.)

A single 860 EVO writes at about 500 MB/s (sequentially).

Edit: Google says, a Marvell 9230 is h/w RAID but really low-end. It has two PCIe 2.0 lanes for a theoretical max of 1 GB/s of which you can get ~800 MB/s in reality.

Post Reply