SMiC
System Management in Control
Software Product
Description (SPD)
© Copyright 2016
Timmers IT Consultancy
Confidential computer software. Valid license from Timmers IT
Consultancy required for possession, use or copying. The information contained
herein is subject to change without notice. Nothing herein should be construed
as constituting an additional warranty. Timmers IT Consultancy shall not be
liable for technical or editorial errors or omissions contained herein.
Content
SMiC is a tool for
system management on OpenVMS.
It automates the daily work for a system
manager by checking processes, batch jobs, queues etc.
SMiC does not only
check, but can also perform actions to restore the documented situation.
SMiC can use a
master node, to keep track of the work that needs to be done.
The master node
checks the availability of the client nodes and has a copy of all setup files.
The SMiC setup
files are all in standard ASCII, so they can be read and modified using the
preferred edittor. No special tool is needed to maintain the setup files.
The setup files
will be used by SMiC to “know” what the normal situation is. There are scripts
available to use the setup files during startup of the node. The advantage is
that there is only one place to keep setup information.
Example:
DISKS.DAT has the following disk setup.
DSA1:
DISK11 DISK$DISK11 40%
$1$DKC100:,$1$DKC600:
This information
will be used by the SMiC control procedure and can also be used by the system
startup to mount the disk in a proper way.
It is very simple
to add local procedures to SMiC.
The control process
uses a special setup file SMIC_ROOT:[DATA] SYSTEMFILE.DAT
Example:
DiskCheck |
VMS1 |
3 |
smic_root:[com]checkdisks.com |
|
VMS2 |
3 |
smic_root:[com]checkdisks.com |
BatchJobCheck |
VMS1 |
3 |
smic_root:[com]checkbatchjobs.com |
|
VMS2 |
3 |
smic_root:[com]checkbatchjobs.com |
ProcesCheck |
VMS1 |
3 |
smic_root:[com]checkprocess.com |
|
VMS2 |
3 |
smic_root:[com]checkprocess.com |
The SMiC control
job executes every 5 minutes and will read systemfile.dat to know what to do. In
the example file, on the nodes VMS1 andVMS2 a diskcheck
procedure will be executed every 3 runs, that is every 15 minutes. The diskcheck executes the command procedure smic_root:[com]checkdisks.com. This is also for the batch
job and the process check.
A monitor job runs
on every node to make it independently of the SMiC node.
On the SMiC node a web
interface can be installed. Then it is
possible to manage most settings using this web interface.
There are a lot of
check procedures available in SMiC. When there are already check procedures in
use, it is very easy to bring this procedures into SMiC. Bringing them into the
web interface is very often possible.
The following SMiC
procedures are standard available:
The control job on
every node must make a connection with the SMiC node every 10 minutes. This
will be used by the control job on the SMiC node to notify when a node is not
available. The WEB interface can make a graphical view of the availability of
the remote nodes.
The disk check uses
the setup file SMIC_DATA:DISKS.DAT
Example:
DSA0:
|
DISK10 |
DISK$DISK10 |
70% |
$1$DKC0:,$1$DKC500 |
A disk DSA0: must
be active on the node and the label should be DISK10. A disk logical DISK$DISK10
must be present and a notification will be send when the disk occupation is
over 70%. This disk is a shadow-set, using two disks. The disks data file can
be edited using the normal text editor, or using the WEB interface.
Every 15 minutes
the disks will be checked.
The occupation is
the most important item of the disk check. A border is set by the system
manager when to get a notification. When the disk occupation is changed more
than 5% between two checks, a warning notification will be send.
For shadow sets,
the shadow configuration will be checked.
The batchjob check uses the setup file SMIC_DATA: BATCHJOBS.DAT
Example:
backup_status |
* |
######################## |
* |
"smic_root:[bck.com]submit_backup_status.com" |
A job named backup_status must be present in a batch queue (not specified),
and must be checked during 24 hours of the day, on all days (working days and
weekends). When the job is not found, the procedure submit_backup_status.com will
be executed to get the job back into the batch queue. The batch job data file
can be edited using the normal text editor, or using the WEB interface.
The proces check uses the setup file SMIC_DATA:INTERACTIVEJOBS.DAT
Example:
WASD:80 |
|
######################## |
* |
"sys$common:[sys$startup]webserver_startup.com" |
A process named
WASD:80 must be present, and must be checked during 24 hours of the day, on all
days (working days and weekends). When the process is not found, the procedure sys$common:[sys$startup]webserver_startup.com will be executed to get the process back into
the system. The process data file can be edited using the normal text editor,
or using the WEB interface.
There are a few network
checks:
One check is to
PING all the nodes, to get a overview off the network. To make the checks node
independent, the SMiC node will PING all the nodes, and in turn the other nodes
will PING the SMiC node. This ensures that also a failure in the SMiC node will
be reported.
In an environment
using DECnet it is possible to perform a
DECnet speed test.
The batch queue
check uses the setup file SMIC_DATA:BATCHQUEUES.DAT
Example:
beheer_batch |
######################## |
* |
"/BASE_PRIORITY=2/JOB_LIMIT=20/OWNER=[SYSTEM]/ PROTECTION=(SYSTEM=E,OWNER=D,GROUP=R,WORLD=W)" |
A batch queue named
beheer_batch must be active on the node, and must be
checked during 24 hours of the day, on all days (working days and weekends).
When the batch queue is not found the
queue can be initialized using the given settings. The batch queue data file
can be edited using the normal text editor, or using the WEB interface.
The batch queue
check uses the setup file SMIC_DATA:PRINTERQUEUES.DAT
Example:
allinone |
######################## |
* |
/on="192.168.1.39:9100"/PROCESSOR=TCPIP$TELNETSYM) |
A printer queue
named allinone must be active on the node, and must
be checked during 24 hours of the day, on all days (working days and weekends).
When the printer queue is not found the
queue can be initialized using the given settings. The printer queue data file
can be edited using the normal text editor, or using the WEB interface.
The FTP process
check job will search for active FTP processes. When a process is running for
over 8 hours a warning notification will be send to the System Manager.
The MWAIT check job
will send a warning notification when a
process is in a mwait status and does not use CPU,
DIO or BIO.
The CPU usage check
job will send a warning notification when a process has a CPU usage for over
2.5 hour.
The settings for
the network ports can be checked for speed and duplex setting.
The network
settings data file can be edited using the normal text editor, or using the WEB
interface.
The SMiC messaging
utility makes it possible to control the notifications of deviations from the desired situation. There are 63 messages
available, and it is possible to select by mail address which message to
receive at this address.
Also OpenVMS Opcom
can be used to send and receive messages.
SMiC assumes there
is a special node that is the linchpin of the monitoring and surveillance.
This SMiC node
checks the availability of all SMiC client nodes and keeps a copy of all the
data files.
The data files for
all nodes in the network can be edited on the SMiC node. A synchronization
procedure is available the copy the data files to the correct node and the
correct location.
Using the DCL
procedures on the SMiC node it is possible to make most adjustments on the SMiC
node. To make this more user friendly, a web interface is available.
Using the web
interface it is possible to adjust the SMiC setup files for all nodes using a
web browser. The web interface can make more friendly views. Using the web
interface there is no need for excellent system management knowledge, it is not
possible to make mistakes. The web interface keeps track of changes executed
using the web interface.