The impact of communication layer latency is typically under estimated when trying to fix application performance problems but correct understanding is critical if you are to direct your efforts toward practical solutions.
First, by communication layer latency I mean the time it takes a packet to get from the local system to the remote system and back again. The biggest factor in communication layer latency is distance – at least for geographically separated hosts. The other major component is the number of devices between the local and remote hosts that process the packets; this includes things like routers and firewalls. The bandwidth of the links plays a role but not as large as most people think. Bandwidth affects the insertion delay – how long it takes to put a packet on the wire but the signal on the wire travels at a speed based on the medium not the bandwidth. For the rest of this discussion I’ll just use latency when I mean communication layer latency.
To demonstrate how latency effects your application let’s say you enter some data into a client application and hit return. The client application sends a message to a server application, waits for a response and then sends another message, waits for a response, etc; for some number N of “turns”. At the end of N turns the client application presents the results back to you.
Assume that 10 turns are required and it takes the server application 1ms to process the client message and send back an answer. With a latency of 1ms you get a response time of 10 * (1ms + 1ms) or 20ms. Now get on a plane and travel to Chicago, the server stays in Boston so you now have a latency of say 50ms and a response time of 10 * (50ms + 1ms) or 550ms. This is enough to be noticeable but not painful. Increase the number of turns to 100 and you now have a painful 5.5 second response time. You may think 100 turns is excessive but some complex database queries or applications that fill complex forms can do just that. Do you know how many turns your applications require?
Doing a copy_file via OSL also exhibits this behavior. OSL will send a file in 4K transactions. Each transaction requires a response so a 1,000,000 byte file will require 1,000,000 / 4,096 = 245 transactions or turns to use the nomenclature from the previous paragraph. Again, assuming 1ms to process the transaction and a 1ms latency the copy_file will take 490ms. If we increase the latency to 50ms it will take 12.495 seconds. If we increase the file to 1,000,000,000 bytes it will require 244,141 transactions; with corresponding times of 488.282 seconds for 1ms latency and 12,451.191 seconds or almost 3.5 hours for 50ms latency.
The simplest way to measure latency is with ping.
ping 192.168.200.197 Pinging host 192.168.200.197 : 192.168.200.197 ICMP Echo Reply:TTL 53 time = 84 ms ICMP Echo Reply:TTL 53 time = 80 ms ICMP Echo Reply:TTL 53 time = 81 ms ICMP Echo Reply:TTL 53 time = 96 ms Host 192.168.200.197 replied to all 4 of the 4 pings |
start_process 'packet_monitor -numeric -time_stamp -filter -host A.B.C.D -port NNN' -privileged
”. Then type the command “telnet A.B.C.D NNN
”. You should make several connections and get an average. Notice that I am connecting to an unused port on the remote host. This reduces the number of packets in the trace but if a firewall is blocking the port you may need to use an active port.
start_process 'packet_monitor -numeric -time_stamp -filter -host 192.168.200.197 + -port 6666' -privileged ready 09:12:42 telnet 192.168.200.197 6666 Trying... telnet: Unable to connect to remote host: The connection was refused. ready 09:12:53 telnet 192.168.200.197 6666 Trying... telnet: Unable to connect to remote host: The connection was refused. ready 09:12:56 telnet 192.168.200.197 6666 Trying... telnet: Unable to connect to remote host: The connection was refused. ready 09:12:58 telnet 192.168.200.197 6666 Trying... telnet: Unable to connect to remote host: The connection was refused. ready 09:12:58 stop_process packet_monitor -no_ask Stopping Noah_Davids.CAC (packet_monitor). ready 09:13:07 d packet_monitor.out %phx_vos#m16_mas>SysAdmin>Noah_Davids>packet_monitor.out 09-11-13 09:13:17 mst Noah_Davids.CAC logged in on %phx_vos#m16 at 09-11-13 09:12:42 mst. packet_monitor -numeric -time_stamp -filter -host 192.168.200.197 -port 6666 *********************************************************** WARNING: failure to specify a specific interface will cause packet_monitor to bind to ALL interfaces on the module, greatly increasing the amount of Streams memory used!!! *********************************************************** dir icmp type + tcp hh:mm:ss.ttt len proto source destination src port dst p +ort type 9:12:52.984 T 0004 TCP 172.16.77.128 192.168.200.197 62515 + 6666 S 9:12:53.067 R 0000 TCP 192.168.200.197 172.16.77.128 6666 + 62515 RA 9:12:56.643 T 0004 TCP 172.16.77.128 192.168.200.197 62516 + 6666 S 9:12:56.724 R 0000 TCP 192.168.200.197 172.16.77.128 6666 + 62516 RA 9:12:58.003 T 0004 TCP 172.16.77.128 192.168.200.197 62517 + 6666 S 9:12:58.086 R 0000 TCP 192.168.200.197 172.16.77.128 6666 + 62517 RA 9:12:58.805 T 0004 TCP 172.16.77.128 192.168.200.197 62518 + 6666 S 9:12:58.887 R 0000 TCP 192.168.200.197 172.16.77.128 6666 + 62518 RA ready 09:13:07 Process finished. Terminating Noah_Davids.CAC (packet_monitor). Process stopped by Noah_Davids.CA +C. |
Latency times: 58.887 - 58.805 = 0.082 == 82ms
58.086 - 58.003 = 0.083 == 83ms
56.724 - 56.643 = 0.081 == 81ms
53.067 - 52.984 = 0.083 == 83ms
You can also use a program I wrote that times the connections without needing to use packet_monitor. See http://members.cox.net/ndav1/self_published/stcp_tping.doc. The stcp_tping command does require that you connect to an active port on the remote host. In this case I am using port 23 (telnet) but any active port will work. The number 1 at the end of the command indicates that a request will be sent once a second.
stcp_tping 192.168.200.197 23 1 tping 192.168.200.197 23 1 Success/Tries=Percent min/average/max success times 1/1=100.000% 83.284/83.284/83.284 Connection in 83.284 ms 2/2=100.000% 81.697/82.490/83.284 Connection in 81.697 ms 3/3=100.000% 80.858/81.946/83.284 Connection in 80.858 ms 4/4=100.000% 80.858/81.895/83.284 Connection in 81.743 ms ^C |