The impact of communication layer latency is typically under estimated when trying to fix application performance problems but correct understanding is critical if you are to direct your efforts toward practical solutions.
First, by communication layer latency I mean the time it takes a packet to get from the local system to the remote system and back again. The biggest factor in communication layer latency is distance – at least for geographically separated hosts. The other major component is the number of devices between the local and remote hosts that process the packets; this includes things like routers and firewalls. The bandwidth of the links plays a role but not as large as most people think. Bandwidth affects the insertion delay – how long it takes to put a packet on the wire but the signal on the wire travels at a speed based on the medium not the bandwidth. For the rest of this discussion I’ll just use latency when I mean communication layer latency.
To demonstrate how latency effects your application let’s say you enter some data into a client application and hit return. The client application sends a message to a server application, waits for a response and then sends another message, waits for a response, etc; for some number N of “turns”. At the end of N turns the client application presents the results back to you.
Assume that 10 turns are required and it takes the server application 1ms to process the client message and send back an answer. With a latency of 1ms you get a response time of 10 * (1ms + 1ms) or 20ms. Now get on a plane and travel to Chicago, the server stays in Boston so you now have a latency of say 50ms and a response time of 10 * (50ms + 1ms) or 550ms. This is enough to be noticeable but not painful. Increase the number of turns to 100 and you now have a painful 5.5 second response time. You may think 100 turns is excessive but some complex database queries or applications that fill complex forms can do just that. Do you know how many turns your applications require?
Doing a copy_file via OSL also exhibits this behavior. OSL will send a file in 4K transactions. Each transaction requires a response so a 1,000,000 byte file will require 1,000,000 / 4,096 = 245 transactions or turns to use the nomenclature from the previous paragraph. Again, assuming 1ms to process the transaction and a 1ms latency the copy_file will take 490ms. If we increase the latency to 50ms it will take 12.495 seconds. If we increase the file to 1,000,000,000 bytes it will require 244,141 transactions; with corresponding times of 488.282 seconds for 1ms latency and 12,451.191 seconds or almost 3.5 hours for 50ms latency.
The simplest way to measure latency is with ping.
start_process 'packet_monitor -numeric -time_stamp -filter -host A.B.C.D -port NNN' -privileged”. Then type the command “
telnet A.B.C.D NNN”. You should make several connections and get an average. Notice that I am connecting to an unused port on the remote host. This reduces the number of packets in the trace but if a firewall is blocking the port you may need to use an active port.
Latency times: 58.887 - 58.805 = 0.082 == 82ms
58.086 - 58.003 = 0.083 == 83ms
56.724 - 56.643 = 0.081 == 81ms
53.067 - 52.984 = 0.083 == 83ms
You can also use a program I wrote that times the connections without needing to use packet_monitor. See http://members.cox.net/ndav1/self_published/stcp_tping.doc. The stcp_tping command does require that you connect to an active port on the remote host. In this case I am using port 23 (telnet) but any active port will work. The number 1 at the end of the command indicates that a request will be sent once a second.