The OpenVOS Operating System provides a high-level Application Programming Interface (API) that on the whole makes programming the system easy.  But sometimes, it becomes too easy – because a simple subroutine call to an s$… routine might hide a lot of complexity.

This is the first of an irregular series of posts to call your attention to such pitfalls so you may avoid them in your application designs.

The Evils of 25th-Line Messages

The ability to write a message to the 25th (or whatever is the bottom) line of a terminal allows a user (or a running program) to notify other users of unusual conditions.   The command line interface is the send_message command.   The command and user programs ultimately invoke the s$send_message API subroutine.    Arguments to this call provide the user_name of a receiver, the target  module_name where that user might be logged in, the notification text, and some flags to modify aspects of the operation.   Privileged callers are allowed to specify the name of the sender; otherwise it defaults to the user_name of the sending process.

So far it seems straightforward.   Since the operation requires access to the terminal owned by another user, the kernel sends the request on a server queue for TheOverseer process on the target module.  Normally, the sending process then waits for status of message delivery but may opt to not wait for this reply.

TheOverseer process for any module keeps track of all login processes and the terminal device used by those processes.   When it receives a request to send a message, it searches its list of processes and devices, and for each process user_name that matches the receiver_name argument,  it attaches a port to the terminal device of the process, opens the port, writes the message to the notification line, closes the port, and detaches the port.   If multiple processes match the receiver_name argument, this attach/open/write/close/detach cycle is repeated for each.

Okay – this is a little more complicated than we first talked about.   Exploring it further, we find that any given module in the system may be hosting login processes where the real terminal is a device on another system/module that has used Open StrataLink (OSL) to login across the network (with the login –module command).   Now the seemingly simple attach/open/write/close/detach operation on that terminal has to involve OSL processes and network remote procedure calls to the remote module, and is much more costly than the on-module handling.

Magnification of the effects happens when the receiver user_name is specified as a starname that might match multiple or all login users.    And it is further magnified if the target module also is specified as a starname.   The worst case would be sending the message to user “*.*” (all users, all groups) on module “*” (all modules of the current system).

Implementation

s$expand_module_starname(target_module) =>  module_list;

foreach module_name in module_list
   send_overseer_request(module_name, send_message_request_data)
   The message is sent via a server queue and OSL to TheOverseer process
   on that module.

   TheOverseer  process on each receiving module then does:
   foreach terminal_process in TheOverseer’s list of
      login processes and sub-processes;

      if (terminal_process user_name matches receiver star name)
         if (messages_queued_for_device < 5)
            queue it for the login device
         else
            reject the request (e$too_many_terminal_messages)
   Simultaneously, use up to 10 ports at a time to send any queued
        messages to the terminals;
   For each message for each terminal
      Attach port
      Open port
      s$control(…WRITE_SYSTEM_MESSAGE…)
      Close port
      Detach port

This processing is done in no_wait_mode for each terminal so that
delays (such as I/O requests across the network) affect only the
message processing for that terminal.   

If any message can’t be delivered in 300 seconds, it is flushed.

When an application uses this mechanism to report errors, and a flood of errors comes along, the application can easily become mired in reporting, alternating between waiting for other messages to be processed and flooding the user terminals with messages that arrive too fast to be understood.

Avoidance

Now that you know what’s going on behind the scenes, the guidance I can give you is:

  • Carefully craft star names for user names and module names to ensure that you reach just the users that must get the message and no others.  For example, John_Doe.* or *.Operations or John_Doe.SysAdmin.
  • Consider implementing a non-starname list of users to notify and send each notification separately.
  • Put the detail of error messages elsewhere (an error log, for example) and use s$send_message to alert to just the presence of errors.
  • Limit the frequency of notifications to something reasonable (perhaps no more than once a minute).
  • Suppress any message that is the same as the last one sent. The new one would just overlay the previous one.

If you have any other suggestions, please post your comments for others to see.