The CAC is frequently asked to look into problems with VOS message queues. Here are a couple of interesting ones, along with some solutions and recommendations that I’d like to share with you.
Problem 1: Recently, a customer came to us with a problem. A requester was unable to add a message to a message queue, receiving the error code e$max_file_exceeded. Strangely, the queue was empty, as shown by the list_messages command.
Upon examination of the queue, it was seen that the count of disk blocks used for this queue was approaching the maximum file size for a non-extent file.
file organization: message queue file
last used at: 11-08-16 14:45:13 edt
last modified at: 11-08-16 14:45:13 edt
last saved at: 10-06-14 21:34:18 edt
time created: 10-06-09 11:03:15 edt
transaction file: yes
log protected: no
safety switch: no
dynamic extents: no
extent size: 1
last message: 51689380
blocks used: 520201
Why was this queue both full, and empty?
At some point in the past, the servers responsible for draining the messages out of the queue were off-line. This resulted in a very large backlog of messages. These messages were eventually handled by the servers and deleted from the queue. When messages are deleted from a queue, a key is added to the _record_index index of the queue, and the key value indicates the number of bytes of the deleted message(s). When a new message is added to a message queue, the file system will attempt to find a previously deleted message of the exact size of the new message. If one is not available, the new message is written to the virgin space at end of the queue.
In this case, there was not enough virgin space in the queue to contain the new message, and there was no pre-existing deleted message of the correct size.
The moral of this story is that it is a good idea to limit the number of unique message lengths in any given queue. Rather than have each message use the exact number of bytes it needs, round the value up to some standard size. By using this technique, you increase the chance that a new message can reuse the space from a previously deleted message.
Problem 2: Recently, another situation arose regarding the performance of message queues. A customer stated that the time to empty a message queue of 400,000+ messages was taking an inordinate amount of time.
They had recently had a problem with their server processes being unable to process messages in a message queue in a timely manner. Fortunately, the requesters had been kept running so that no data was lost. When the server problem was resolved, it was many hours before they had caught up with the backlogged requests and could then start processing recent transactions. The customer was asking why this occurred, and how can it be either prevented or sped up in future situations.
When a message is deleted in a message queue, a key is added to the system-maintained _record_index, where the value of the key is the message length. If the message being deleted is the same length as a previously deleted message, the unused data position is saved as a duplicate entry on the key containing that message size at the end of the list of duplicate values. Thus, if there are hundreds of thousands of deleted messages, all the same size (or the set of lengths of deleted messages is small), the list of duplicate keys is very long and the time to delete a single message goes up linearly.
Conversely, when a message is added to the queue and a _record_index key for the message length exists, the space occupied by the newest deleted record is reused to contain the data for the new message. This value must then be deleted from the key value containing the message length. Thus, the time to add a message goes up linearly; the more deleted messages, the longer it takes to add a new message.
The moral of this story is that system maintained data in message queues have memory; the queues remember the locations and sizes of all previous messages. This information persists even after the queue is emptied. Try to avoid allowing your message queues to grow to a huge size (tens or hundreds of thousands of disk blocks). Otherwise, you will find that the cost of adding and deleting messages to a queue can grow over time.
The solutions to both of these situations are the same.
Solution A: A message queue can be truncated while it is opened. The routine s$truncate_queue can be used to accomplish this. However, there are 4 conditions that must be satisfied:
1: there must be no requesters holding the message queue open
2: the message queue must be drained of all messages
3: this routine must be called by a server
4: the queue cannot be a transaction file
If the first 3 conditions are not met, s$truncate_queue will return e$no_truncate_queue. If the last condition is not met, s$truncate_queue will return e$invalid_io_operation.
Solution B: if the application design allows having multiple servers, you can periodically rename the existing message queue, create a new message queue with the correct name, start a new set of servers, and bounce the requesters. When the servers start up, they will start processing on a new, but empty, message queue. When the requesters start up, they will add their requests into the new and empty message queue. The original set of servers can remain running, processing the backlog of requests until the queue is empty. Then the old set of servers can be stopped, and the old message queue can be deleted.
In addition, a solution to problem 1 may be to use an extent based message queue. That would allow additional messages to be placed in the queue as the maximum file size would be larger by a factor of the extent size. However, by using extent message queues, performance will be even worse than normal if or when the message queue ever contains a large number of messages at any one point in time.
As mentioned earlier, limiting the number of unique message lengths in any given queue will improve the probability that a new message can reuse the space released by a previously-deleted message. This will help solve both problems mentioned in this post.