Trailing-Edge - PDP-10 Archives - BB-BT99V-BB_1990 - 10,7/anf10/anf10a.mem
There is 1 other file named anf10a.mem in the archive. Click here to see a list.

     Doc  :  EJW-78-011-00-S

! D I G I T A L !   I N T E R O F F I C E   M E M O R A N D U M
To:  TOPS-10 network group        Date:  20 July 78
                                  From:  Ric Werme
cc:  Jared Aaronson               Dept:  LCG Communication
     George Conant                       Software Engineering
     Ralph Dement                 Ext:   6059  Loc: MR1-2/E89
     Dave Kirschen
     Tony Lauck
     Arnie Miller
     Marty Palmieri
     Rob Ryan
     Retrieval system (2)  ML2-5/B39
     SWE Tech Archive
Subj:  Hierarchiacal routing in the TOPS-10 networks


A serious problem in the TOPS-10 network architecture is its  limitation
of  only  63  nodes  in a network.  To date, the most seriously affected
customer  is  IPC,  the  inhouse  timesharing  group   which   now   has
responsibility  for  more  than  that  many nodes.  The largest customer
network I know of is Western Electric with some 20-30 nodes.

The reason for the limitation is two-fold.  First, the  TOPS-10  network
architecture  demands  that  each node know about the existance of every
other node in the network.  The database  required  per  node  is  large
enough  to become a significant fraction of the memory available to many
of our network systems.  (For example, the DN82 remote station and  DN87
front end have 16K for both program and database.) Second, the lack of a
system filespec parser has forced TOPS-10 to  squeeze  the  node  number
into  device  names  as part of the unit number (e.g.  LPT220 is LPT0 on
node 22.  This inherently implies  a  limit  of  63  nodes  and  why  we
document the limit to be 63.

This paper discusses a mechanism to build an arbitraty size network with
the  database requirements proportional to log(n) instead of n.  It will
not address the device naming changes required to allow TOPS-10 programs
to access other devices on nodes whose addresses are bigger than 63.

                                                                  Page 2


In today's TOPS-10 networks, each node must know about the existance  of
up  to 62 other nodes.  To spread routing information, each node sends a
list of its neighbors to each other node in the  net.   Associated  with
each  neighbor  is  the cost of reaching it.  The routing algorithm uses
that information to determine the best  neighbor  to  send  messages  to
every  node  in  the network.  The optimal path may be determined solely
because of the total knowledge base a node has of the network.

To build large networks, I propose a  hierarchy  build  of  areas  whose
number of nodes grow exponentially.  Let's define a level 0 area to be a
single node.  A level 1 area would be analogous to today's  network  and
would  be  a group of up to x level 0 areas, or x nodes.  A level 2 area
would be a group of up to x level 1 areas, or in general, a level n area
would  be  a  group of up to x level (n-1) areas.  The maximum number of
nodes in a level n area would then be x**n, so we can see that x  has  a
lower  bound  of  2.   Just  what  x  should  be seems to be a matter of
personal choice.  Something like making x equal the highest  area  level
needed  may  be  attractive mathematically, but I will leave that to the
computer scientists.  There are benefits of keeping x small  and  large.
If  it  is  kept  small, the routing database will be smallest, becoming
minimal at x=2, but the larger the routing database is the better we can
route messages.

From an administrative point of view, it would be attractive  to  assign
level  numbers  to individual administrators.  Therefore, at a level one
area, the Marlboro development systems could have their own area and  we
could  assign node numbers within our area as we see fit.  Maynard could
do the same and make PK1 be a level 1 area and could assign node numbers
as  they  wish.   For the two groups to be able to communicate, we would
simply have to agree on different level 1 area numbers.  I suggest  that
something  like  31  subareas  per  area  would be manageable, both from
administrative and memory availability information.  I will not  attempt
to  access  the  routing  efficiency  of various values of x in variable
sized networks.

The following table shows the maximum size of networks  varying  both  x
and  the highest level.  A standard practice in MACRO-10 is to only fill
a listing page of code 2/3s full to allow room  to  grow  but  still  be
reasonably  compact.   The  column for x=20 is interesting because it is
what one would get if he filled his x=31 areas only 2/3s full.

                        X values
           2      10       20       31         63
      0 !  1       1        1        1          1
      1 !  2      10       20       31         63
Level 2 !  4     100      400      961       3969
      3 !  8    1000     8000    29791     250047
      4 ! 16   10000   160000   923521   15752961

Another definition that will  be  useful  later  is  the  term  "routing
level".   This  is  a characteristic of a node and is the smallest level
number needed to encompass a node and its neighbors.  In the text below,
                                                                  Page 3

the  phrase  'level  x  node'  will  be  an abbreviation for 'node whose
routing level is x'.


On the next page is a diagram of a hierarchical network which I will use
to  disucss  the  mechanics  of  hierarchical routing.  Each ring in the
drawing surrounds an area.  These rings are associated with the  various
levels,  the  level  number being equivalent to the number of rings that
must be crossed to get to a particular node number (or level 0  number).
A node address may be formed by summing the numbers associated with each
ring over all the levels.  For example, the node closest  to  the  upper
right  corner  has the address 10223.  It is located in the level 1 area
with the address 10200, and therefore is in level 2 address  10000.   If
any  level 1 area stood alone, like area 10200, what would be left would
be the sort of network we can support  today.   It  is  the  connections
between areas of order greater than 1 that this paper defines.

For expositional purposes, the top level area covers the entire network.
It needs no address since messages are never routed outside.  Therefore,
I will call it area 0 when I need to refer to the  highest  level  area.
Also,  if  I  am  discussing  some  area,  its address is unnecessary to
describe areas below it and will usually be omitted.  For example,  area
4400 in area 10000 is area 14400 and node 4411 is node 14411.

The drawing also demonstrates the routing level concept.   Node  14411's
routing  level  is  1  since  it and its neighbors are within area 4400.
Likewise, node 14422 is level 2 and  nodes  10520  and  70707  level  3.
Incidentally, area 10500 isn't a star network.


Message routing involves moving a message from one area toward  another.
Routing  is done on the level of the lowest order level that encompasses
both the node presently holding the  message  (the  source  node  or  an
intermediate)  and  destination.   A message will never be routed out of
that area.  To be able to route between any two nodes in  a  network,  a
node must be able to route to all areas (except its own) one level below
each level of its hierarchy.  For example, node 14411 must  be  able  to
route  to  areas 60000 and 70000 within area 0, areas 100, 200, 300, and
500 within area 10000, and nodes 22 and 33 within area 4400.

One might ask if a node only needed to know how to route to a level  one
higher  than its routing level.  Could nodes pass messages to nodes with
successively higher routing levels to  approach  the  destination?   The
answer is no.  While this works in a tree structured hierarchy, it fails
                                                                  Page 4

             100                   1000                   200
         1                                              3
                3                                             23
           2                                          10
                                      11     33
                500                                    300
             5                                       1      2
        25      10                    
         20   15                                     3      4
            600                   3000                    700
                 6                 100                      7
           5      7                 3                    7000
                                                                  Page 5

badly in an arbitrary topology network because  messages  between  nodes
with  high  routing  levels  must  pass  through  nodes with low routing
levels.  For example, if we assigned equal cost  to  all  lines  in  the
network  and node 14411 wanted to send a message to 60606, it might send
it to node 33 since it interfaces to a higher level.  However,  node  33
knows  that the minimum cost path to area 60000 is via 500 and since the
minimum cost path to 500 is via 11, it will send the message back.  This
could repeat indefinitely and hence is not a viable design.


5.1  Neighbors

As mentioned before, the NEIGHBORS message is used in NCL to  learn  the
topology  of  today's  level 1 networks.  This message contains data the
network cannot afford to lose since it is sent only to nodes  when  they
come up or when the local topology changes.  To guarantee delivery it is
a numbered message which means that NCL must have gone through  its  end
to  end  startup  sequence  before NEIGHBORS messages are exchanged.  To
extend this mechanism to hierarchical networks, two rules will  have  to
be  changed.   First,  nodes  will have to start NCL with a node in each
area one level below all areas in a node's hierarchy between level 1 and
the  node's routing level.  Therefore, node 10520 will have to start NCL
with nodes 5, 10, 15, 20, 25, 102, 203, 303, 4422, 30103, 60606,  70707.
Note  that for a level 1 node, NCL startup will occur exactly as it does
today.  Second, NEIGHBORS messages will have to be sent between nodes in
contact,  which is the same as today's rules.  However, the content will
become a list generated by the following algorithm:

     1.  Let YS be the routing level of the  node  sending  a  NEIGHBORS
         message and YD be the routing level of the destination.

     2.  If YD is greater  than  or  equal  to  YS  then  the  NEIGHBORS
         messages  will  contain  a  list  of  neighbors  of  level YS-1
         adjacent to the sender's area of level YS-1.

     3.  If YD is less than YS then the NEIGHBORS message  will  contain
         the  list  of a node from areas of level YD-1 that are adjacent
         to the sender's area of level YD-1 and the list of a node  from
         all  areas  one  level below each level higher than YD+1 in the
         sender's hierarchy.

For example, a NEIGHBORS message from node 14411 (YS=1) to 14422  (YD=1)
will contain nodes 22 and 33.  From node 10102 (YS=2) to 10505 (YD=3) it
will contain nodes 505, 525, and 203 and will not contain nodes 1 or  3.
In  the  opposite  direction  it will contain nodes 10102, 14422, 10303,
30103, 60606, and 70707.  The area 10000 nodes enables node 102 to learn
about the level 1 areas within area 10000 and the rest enable it to gain
information  about  routing  to  higher  levels.   It  is  node  10102's
                                                                  Page 6

responsiblity  to pass this information onward to nodes 10101 and 10103.
Note that in today's network (a topology where all nodes have a  routing
level of one) NEIGHBORS messages will contain the same data they do now.

5.2  Line Costs

NEIGHBORS messages include link costs in addition to nodes it interfaces
with.  When doing level Y routing, should a node be concerned with costs
of sending messages between  levels  less  than  Y?   If  it  did,  that
information would allow more accurate routing but would require NCL in a
node of level Y with all nodes of level Y or higher in an area of  level
Y+1.   For  example, node 14422 would have to start with nodes 10101 and
10102 since they may provide different costs about interfacing  to  area
10200.   If  it  only  started  with one node, routing would be based on
inconsistant costs and message  looping  might  occur.   To  maintain  a
database  size proportional to log(n), a node must not have to start NCL
with more than one node in high level areas.  Therefore, I specify  here
that  a node need only start with one node in areas outside of its level
1 area and that routing on level Y  will  be  based  on  the  costs  for
routing between level Y-1 areas.


NCL will require changes in areas other than routing to permit a node to
access  all  others  outside  its  level  1 area.  In level 2 and higher
areas, a node will not know of the existance of all  the  nodes  in  the
network,  but will be able to route a message far enough to reach a node
that knows  about  the  desired  destination.   Before  any  significant
information  between  NCL  nodes  can begin, NCL between the two must be
started.  Therefore, we will need a mechanism to do this  in  multilevel
networks.   If  node  A wishes to communicate with node B but level 2 or
higher routing is required to do that, no new  problems  will  occur  if
node B exists.  If node B does not exist and node A sends a NCL START to
it,  some  intermediate  node  will  discover  that   the   message   is
undeliverable.   Instead of simply discarding the message as the current
NCL  specifies,  I  propose  a  more  responsive  mechanism,  i.e.   the
intermediate  must  return  a  new unnumbered message, which I will call
STOP, back to node A claiming node B as the source.

Once NCL is running, nodes A and B may exchange  any  numbered  messages
including  CONNECTs,  DISCONNECTs,  and DATA.  After all the connections
are broken, the two ends will probably want to terminate NCL  to  reduce
their   database.   This  could  be  done  immediately  after  the  last
connection is broken, but I would prefer to see  it  happen  after  some
time  delay since it is very common for a user to break a connection and
try another immediately.  To terminate NCL, a  node  will  send  a  STOP
message.   Since  STOP  messages are unnumbered, it may be lost, however
                                                                  Page 7

that case will be handled as a side effect of the next section.

Another problem exists if node B should go down since  that  information
will  not be transmitted outside its level 1 area.  I propose that three
mechanisms be used to detect and service this case:

     1.  If a node receives a message that cannot be delivered, it  will
         return  a  STOP  message  to  the  sender  on the behalf of the
         nonexistant destination, but only if the  message  received  is
         not  a  STOP  message.  (Replying to STOP messages loses big if
         both nodes go down while there is traffic in the net  for  them
         ....).  This generalizes the paragraph that introduced the STOP

     2.  If no messages have been sent recently, a node will send a  REP
         (NCL  REP,  not DDCMP REP) and expect an ACK or STOP message in

     3.  To catch nodes that restart NCL faster than the  REP  mechanism
         can  catch,  nodes  that receive messages in states that do not
         allow that  message  (like  receiving  a  REP  when  NCL  isn't
         running) will reply with STOP messages to provide fast response
         to letting the sender know that he has to restart NCL.


To this point I have ignored area 10300 which is broken in  two  pieces.
This  is  an  invalid configuration and offers some of the problems seen
when two nodes in a network claim to have the same node  address.   This
situation  can  be detected by the high level routing nodes in that area
when they receive NEIGHBORS messages from nodes outside  of  area  10300
that claim nodes in area 10300 to be neighbors that the receiver doesn't
know about.  For example, node 10302 will receive  a  NEIGHBORS  message
from area 10500 listing node 10303 as a neighbor.  Node 10302 can notice
that it has never heard of that node but can't  really  do  much  except
notify  the operator.  (Note that if the area was properly connected and
node 10303 went down, node 10302 would probably learn that  before  area
10500  tells it of the event.  This means that if a node tries to detect
fragmented areas it must be able to handle that.)

The effect of a fragmented area is not too severe.  Message routing that
does  not  try  to  route  through  that  area will not be impacted, and
routing at levels lower than the fragmented area will never be impacted.
Of course, the maintainer of that area should never let it fragment, but
I can offer no good cure.

                                                                  Page 8


Ancient Greek philosophers got themselves into all sorts of  trouble  by
thinking about problems without bothering to perform experiments to test
their results.  I have done  no  better.   This  proposal  contains  the
results  of  a  couple  days  active  thought and a year's worth of idle
contemplation.  No simulations have been run, no  algorithms  have  been
proven,  no  experiments  done,  and I may be claiming that a heavy node
falls faster than a light one.  It is, however, a  start.   One  problem
that  I'm  sure  this  proposal  exacerbates  is  network congestion.  I
believe we already have a significant number of problems,  and  reducing
the information used to route messages cannot improve matters.  The goal
of this memo has been merely to record what I know on the subject before
I leave DEC.  It's up to you guys to make it work!