KB labels


Ticket Product Version

Ticket Category

Ticket Assignee


Hot Fixes

Published Fixes
Your Arcserve Support User Profile
First Name:
Last Name:
email:
Phone:
Company:*
Customer Type:
Language:

Country:
Region:

Time zone:
Follow

arcserve-KB : Tuning the performance of arcserve Replication and High Availability to increase throughput and efficiency across a WAN.

Last Update: 2017-01-17 07:42:34 UTC

 

 

 

 

 

 

How to improve replication throughput and network efficiency with CA ARCserve Replication and High Availability

 

If you are experience slow replication performance, or feel that your WAN should be able to achieve a greater throughput value, you may need to consider enabling multiple streams or tuning your TCP parameters.

 

Generally, RHA will auto tune certain parameters to what is best for your network.   

 

For example:

 

If your network round trip time or RTT is > 50ms

by default we use WAN settings (chunkLength=4K and # of sockets = 5)

 

if RTT<=50ms we use LAN settings (chunkLength=64K and 1 socket)

 

In this example ‘sockets’ refers to the number of  ‘streams’.

 

In certain cases improvement in throughput can be achieved by increasing your streams… if your RTT is <50ms CA ARCserve RHA will only use 1 stream by default.  Try 5 or 10 streams and see what your results are.  You should never use more than one stream in a LAN environment.  This feature is only recommended for use across a WAN or a network with high latency >50 ms.

 

In addition to increasing the number of replication streams, you may also see an increase in achievable throughput by tuning the TCPSendRecieve Window size.  This parameter by default is set to 256K for RHA Scenarios.  However based on your network’s available bandwidth and WAN latency it may benefit from tuning TCP parameters.  

 

TCP uses a sliding window mechanism to limit the amount of data that the sender needs to buffer while waiting for acknowledgements to return from the receiver. This mechanism works well if the bandwidth is low or the TCP acknowledgements return quickly. However, if acknowledgements do not return from the receiver before the buffer becomes full, either because the bandwidth is very large or the round trip time is long, the buffer will become full and TCP will not be able to transmit additional data until acknowledgements return, even though additional bandwidth is available. TCP’s maximum transmission speed is therefore determined by the size of the TCP window and the amount of time for acknowledgement to return.

 

How to determine the Maximum Theoretical Throughput

The maximum theoretical throughput that can be achieved over a WAN is based on the TCPSendReceive Window size divided by the round trip time represented as follows:

 

TCPSendReceive Win Size / RTT = Maximum Theoretical Throughput

 

The RTT can be determined using a simple ping.  It is recommended to ping with a payload, and to send enough packets to get more meaningful and accurate RTT values.  It is suggested to issue the following command:

 

Ping –n 50 –l 512

 

This will send 50 packets of size 512 bytes to the IP address provided.  The resulting summary information at the end of the ping test will provide an average RTT which can then be used to determine the Maximum Theoretical Throughput.

 

As an example:

Assume that you have a Windows client with a 256KB TCPSendReceive window (RHA Default TCP Send/Recv Window is 256K) and you are replicating over a WAN with 100 Mbps available bandwidth that has a 50ms delay/RTT.

 

Based on those assumptions the maximum throughput for a single TCP connection would be:

 

Max. throughput = 256 KB * (8 bits/byte) / 0.050 sec = 40,960 Kbps = 40.960 Mbps

This is less than your available bandwidth which means multiple streams would improve throughput.

 

If you assume 72ms delay/RTT:

Max. throughput = 256 KB * (8 bits/byte) / 0.072 sec = 28,444 Kbps = 28.44 Mbps

This is less than your available bandwidth which means multiple streams would improve throughput.

 

If you assume a 10ms delay/RTT:

Max. throughput = 256 KB * (8 bits/byte) / 0.010 sec = 204,800 Kbps = 204.80 Mbps

This exceeds available throughout and additional streams would not improve throughput.

 

Another example using a smaller TCPSendReceive window size of only 64K

 

Using a Windows client with a 64 KB TCPSendReceive window size replicating over a 100 Mbps WAN with a 50ms delay, the maximum throughput for a single TCP connection would be:

 

Max. throughput = 64 KB * (8 bits/byte) / 0.050 sec = 10240 Kbps = 10.24 Mbps

 

Therefore, even though the link speed is 100 Mbps, under these particular conditions, TCP is able to achieve a maximum throughput of only 10 Mbps, utilizing only 10% of the available bandwidth.  In addition to the window size limitation, TCP’s algorithm is designed to prevent congestion on the link and will likely reduce performance below this theoretical maximum, particularly over links with packet loss.  A link with only 2% packet loss can see a drop in throughput of over 90%! 

 

Tuning the TCPSendReceive Window can improve throughput

The TCPSendReceive window size setting is a parameter of the operating system, usually pre-configured to a value between 8 KB and 64 KB depending on the particular operating system.  By default Windows sets this value to 64KB

 

The Bandwidth-Delay Product (BDP) of a link is the bandwidth of the link multiplied by the round trip time (RTT).  In order to fully utilize the available bandwidth, the TCP window size setting specified by the receiver when establishing the TCP connection must be larger than this BDP.

 

For example, on a 100 Mbps link with an RTT of 50 ms, the BDP is calculated as:

 

BDP = 100 Mbit/sec * 0.050 sec / (8 bits/ byte) = 62,500 Bytes = 62.5 KB

 

The resulting value is close to the default of 64K and therefore there is no need to change it.  However if we assumed a latency of 100 ms then we would see:

 

BDP = 100 Mbit/sec * 0.100 sec / (8 bits/ byte) = 125 KB

 

In this case increasing the TCPSendReceive window size to 128K will significantly improve the throughput.

 

NOTE: CA ARCserve RHA will set the TCP window size per scenario; do not modify the window size in Windows since that may adversely affect local clients as well.

 

RHA uses a TCPSendRecieve Window size (TCPSendRecvBufferSize) of 256K.  

If your calculated BDP is higher or lower than the default value you may want to tune this setting to see how it affects your achieved throughput. 

 

 

 

 

How to confugure the CA ARCserve Replication Engine

Edit the ws_rep.cfg file found in the CA ARCserve RHA Engine install folder as shown below.  You will need to edit this file using the dos EDIT tool or word pad.  You will need to make this change on the Master and Replica servers then restart the CA ARCserve RHA Engine service after making the change.  It is also recommended to stop your scenario before making this change.    

   *** Note: When making changes to the TCPSendRecvBufferSize that the Chunklength attribute within the ws_rep.cfg file should             be less than TCPSendRecvBufferSize divided by 2. i.e. TCPSendRecvBufferSize = 256K, Chunklength should be < 128K. As a recommendation the ChunkLength should not be smaller than 1K. Find information on this setting here.

 

 

Default:

--------------------------------

###################

# XONET parameters

 

# TCPSendRecvBufferSize = 256K

--------------------------------

 

 

New Configuration:

--------------------------------

###################

# XONET parameters

 

TCPSendRecvBufferSize = Set to your Calc’d value if higher or lower than 256K.  Avoid values over 2Mb, and only enter values divisible by 64K.

 

Examples of acceptable values…

TCPSendRecvBufferSize = 128K

TCPSendRecvBufferSize = 384K

TCPSendRecvBufferSize = 512K 

--------------------------------

 

Remember to remove the preceding # symbol from lines you have edited.

 

 

What about enabling Compression?

Compression of the replication stream can also improve your achieved througput.  However, compression should not be enabled unless you have first properly assessed the environment and tuned performance to maximize achievable throughput without compression.

Once you have verified that you have properly tuned the environment to achieve the maximum throughput possible for your environment, then you can consider enabling compression.
Caution should be taken however, as enabling compression from the master to the replica will impact the CPU load of the master due to the additional overhead required for compression. This overhead can be off-loaded from the master by deploying an additional local replica.  For example, you can configure your scenario as follows:

 

 

 

 

 

 

 

 

Master (Active) ---no compression-->> Replica1 ---compression--->> Replica2

 

 

 

Also be sure to enable compression on the remote replica so when a scenario is running in reverse mode the compression will also be enabled:

 

 

 

 

 

 

 

 

 

Master <<---no compression--- Replica1 <<---compression--- Replica2(Active)

The caveat to this approach is that additional hardware, OS license and RHA product license is required for the extra replica, but it will offload compression from the master CPU.

Compression will generally improve throughput by a 2:1 ratio in most cases. However this ratio is variable depending on the type of data being replicated. Your mileage may vary.

Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

Comments