Doc : EJW-78-011-00-S +---------------+ ! D I G I T A L ! I N T E R O F F I C E M E M O R A N D U M +---------------+ To: TOPS-10 network group Date: 20 July 78 From: Ric Werme cc: Jared Aaronson Dept: LCG Communication George Conant Software Engineering Ralph Dement Ext: 6059 Loc: MR1-2/E89 Dave Kirschen Tony Lauck Arnie Miller Marty Palmieri Rob Ryan Retrieval system (2) ML2-5/B39 SWE Tech Archive Subj: Hierarchiacal routing in the TOPS-10 networks ------------------- 1.0 INTRODUCTION A serious problem in the TOPS-10 network architecture is its limitation of only 63 nodes in a network. To date, the most seriously affected customer is IPC, the inhouse timesharing group which now has responsibility for more than that many nodes. The largest customer network I know of is Western Electric with some 20-30 nodes. The reason for the limitation is two-fold. First, the TOPS-10 network architecture demands that each node know about the existance of every other node in the network. The database required per node is large enough to become a significant fraction of the memory available to many of our network systems. (For example, the DN82 remote station and DN87 front end have 16K for both program and database.) Second, the lack of a system filespec parser has forced TOPS-10 to squeeze the node number into device names as part of the unit number (e.g. LPT220 is LPT0 on node 22. This inherently implies a limit of 63 nodes and why we document the limit to be 63. This paper discusses a mechanism to build an arbitraty size network with the database requirements proportional to log(n) instead of n. It will not address the device naming changes required to allow TOPS-10 programs to access other devices on nodes whose addresses are bigger than 63. Page 2 2.0 THE HIERARCHY AND ASSOCIATED JARGON In today's TOPS-10 networks, each node must know about the existance of up to 62 other nodes. To spread routing information, each node sends a list of its neighbors to each other node in the net. Associated with each neighbor is the cost of reaching it. The routing algorithm uses that information to determine the best neighbor to send messages to every node in the network. The optimal path may be determined solely because of the total knowledge base a node has of the network. To build large networks, I propose a hierarchy build of areas whose number of nodes grow exponentially. Let's define a level 0 area to be a single node. A level 1 area would be analogous to today's network and would be a group of up to x level 0 areas, or x nodes. A level 2 area would be a group of up to x level 1 areas, or in general, a level n area would be a group of up to x level (n-1) areas. The maximum number of nodes in a level n area would then be x**n, so we can see that x has a lower bound of 2. Just what x should be seems to be a matter of personal choice. Something like making x equal the highest area level needed may be attractive mathematically, but I will leave that to the computer scientists. There are benefits of keeping x small and large. If it is kept small, the routing database will be smallest, becoming minimal at x=2, but the larger the routing database is the better we can route messages. From an administrative point of view, it would be attractive to assign level numbers to individual administrators. Therefore, at a level one area, the Marlboro development systems could have their own area and we could assign node numbers within our area as we see fit. Maynard could do the same and make PK1 be a level 1 area and could assign node numbers as they wish. For the two groups to be able to communicate, we would simply have to agree on different level 1 area numbers. I suggest that something like 31 subareas per area would be manageable, both from administrative and memory availability information. I will not attempt to access the routing efficiency of various values of x in variable sized networks. The following table shows the maximum size of networks varying both x and the highest level. A standard practice in MACRO-10 is to only fill a listing page of code 2/3s full to allow room to grow but still be reasonably compact. The column for x=20 is interesting because it is what one would get if he filled his x=31 areas only 2/3s full. X values 2 10 20 31 63 ---------------------------------------- 0 ! 1 1 1 1 1 1 ! 2 10 20 31 63 Level 2 ! 4 100 400 961 3969 3 ! 8 1000 8000 29791 250047 4 ! 16 10000 160000 923521 15752961 Another definition that will be useful later is the term "routing level". This is a characteristic of a node and is the smallest level number needed to encompass a node and its neighbors. In the text below, Page 3 the phrase 'level x node' will be an abbreviation for 'node whose routing level is x'. 3.0 SAMPLE NETWORK On the next page is a diagram of a hierarchical network which I will use to disucss the mechanics of hierarchical routing. Each ring in the drawing surrounds an area. These rings are associated with the various levels, the level number being equivalent to the number of rings that must be crossed to get to a particular node number (or level 0 number). A node address may be formed by summing the numbers associated with each ring over all the levels. For example, the node closest to the upper right corner has the address 10223. It is located in the level 1 area with the address 10200, and therefore is in level 2 address 10000. If any level 1 area stood alone, like area 10200, what would be left would be the sort of network we can support today. It is the connections between areas of order greater than 1 that this paper defines. For expositional purposes, the top level area covers the entire network. It needs no address since messages are never routed outside. Therefore, I will call it area 0 when I need to refer to the highest level area. Also, if I am discussing some area, its address is unnecessary to describe areas below it and will usually be omitted. For example, area 4400 in area 10000 is area 14400 and node 4411 is node 14411. The drawing also demonstrates the routing level concept. Node 14411's routing level is 1 since it and its neighbors are within area 4400. Likewise, node 14422 is level 2 and nodes 10520 and 70707 level 3. Incidentally, area 10500 isn't a star network. 4.0 HIGH ORDER ROUTING Message routing involves moving a message from one area toward another. Routing is done on the level of the lowest order level that encompasses both the node presently holding the message (the source node or an intermediate) and destination. A message will never be routed out of that area. To be able to route between any two nodes in a network, a node must be able to route to all areas (except its own) one level below each level of its hierarchy. For example, node 14411 must be able to route to areas 60000 and 70000 within area 0, areas 100, 200, 300, and 500 within area 10000, and nodes 22 and 33 within area 4400. One might ask if a node only needed to know how to route to a level one higher than its routing level. Could nodes pass messages to nodes with successively higher routing levels to approach the destination? The answer is no. While this works in a tree structured hierarchy, it fails Page 4 0 100 1000 200 1 3 3 23 2 10 6 4400 11 33 22 500 300 5 1 2 25 10 20 15 3 4 6000 600 3000 700 6 100 7 5 7 3 7000 Page 5 badly in an arbitrary topology network because messages between nodes with high routing levels must pass through nodes with low routing levels. For example, if we assigned equal cost to all lines in the network and node 14411 wanted to send a message to 60606, it might send it to node 33 since it interfaces to a higher level. However, node 33 knows that the minimum cost path to area 60000 is via 500 and since the minimum cost path to 500 is via 11, it will send the message back. This could repeat indefinitely and hence is not a viable design. 5.0 IMPLEMENTATION 5.1 Neighbors As mentioned before, the NEIGHBORS message is used in NCL to learn the topology of today's level 1 networks. This message contains data the network cannot afford to lose since it is sent only to nodes when they come up or when the local topology changes. To guarantee delivery it is a numbered message which means that NCL must have gone through its end to end startup sequence before NEIGHBORS messages are exchanged. To extend this mechanism to hierarchical networks, two rules will have to be changed. First, nodes will have to start NCL with a node in each area one level below all areas in a node's hierarchy between level 1 and the node's routing level. Therefore, node 10520 will have to start NCL with nodes 5, 10, 15, 20, 25, 102, 203, 303, 4422, 30103, 60606, 70707. Note that for a level 1 node, NCL startup will occur exactly as it does today. Second, NEIGHBORS messages will have to be sent between nodes in contact, which is the same as today's rules. However, the content will become a list generated by the following algorithm: 1. Let YS be the routing level of the node sending a NEIGHBORS message and YD be the routing level of the destination. 2. If YD is greater than or equal to YS then the NEIGHBORS messages will contain a list of neighbors of level YS-1 adjacent to the sender's area of level YS-1. 3. If YD is less than YS then the NEIGHBORS message will contain the list of a node from areas of level YD-1 that are adjacent to the sender's area of level YD-1 and the list of a node from all areas one level below each level higher than YD+1 in the sender's hierarchy. For example, a NEIGHBORS message from node 14411 (YS=1) to 14422 (YD=1) will contain nodes 22 and 33. From node 10102 (YS=2) to 10505 (YD=3) it will contain nodes 505, 525, and 203 and will not contain nodes 1 or 3. In the opposite direction it will contain nodes 10102, 14422, 10303, 30103, 60606, and 70707. The area 10000 nodes enables node 102 to learn about the level 1 areas within area 10000 and the rest enable it to gain information about routing to higher levels. It is node 10102's Page 6 responsiblity to pass this information onward to nodes 10101 and 10103. Note that in today's network (a topology where all nodes have a routing level of one) NEIGHBORS messages will contain the same data they do now. 5.2 Line Costs NEIGHBORS messages include link costs in addition to nodes it interfaces with. When doing level Y routing, should a node be concerned with costs of sending messages between levels less than Y? If it did, that information would allow more accurate routing but would require NCL in a node of level Y with all nodes of level Y or higher in an area of level Y+1. For example, node 14422 would have to start with nodes 10101 and 10102 since they may provide different costs about interfacing to area 10200. If it only started with one node, routing would be based on inconsistant costs and message looping might occur. To maintain a database size proportional to log(n), a node must not have to start NCL with more than one node in high level areas. Therefore, I specify here that a node need only start with one node in areas outside of its level 1 area and that routing on level Y will be based on the costs for routing between level Y-1 areas. 6.0 ACCESSING NODES IN MULTILEVEL NETWORK. NCL will require changes in areas other than routing to permit a node to access all others outside its level 1 area. In level 2 and higher areas, a node will not know of the existance of all the nodes in the network, but will be able to route a message far enough to reach a node that knows about the desired destination. Before any significant information between NCL nodes can begin, NCL between the two must be started. Therefore, we will need a mechanism to do this in multilevel networks. If node A wishes to communicate with node B but level 2 or higher routing is required to do that, no new problems will occur if node B exists. If node B does not exist and node A sends a NCL START to it, some intermediate node will discover that the message is undeliverable. Instead of simply discarding the message as the current NCL specifies, I propose a more responsive mechanism, i.e. the intermediate must return a new unnumbered message, which I will call STOP, back to node A claiming node B as the source. Once NCL is running, nodes A and B may exchange any numbered messages including CONNECTs, DISCONNECTs, and DATA. After all the connections are broken, the two ends will probably want to terminate NCL to reduce their database. This could be done immediately after the last connection is broken, but I would prefer to see it happen after some time delay since it is very common for a user to break a connection and try another immediately. To terminate NCL, a node will send a STOP message. Since STOP messages are unnumbered, it may be lost, however Page 7 that case will be handled as a side effect of the next section. Another problem exists if node B should go down since that information will not be transmitted outside its level 1 area. I propose that three mechanisms be used to detect and service this case: 1. If a node receives a message that cannot be delivered, it will return a STOP message to the sender on the behalf of the nonexistant destination, but only if the message received is not a STOP message. (Replying to STOP messages loses big if both nodes go down while there is traffic in the net for them ....). This generalizes the paragraph that introduced the STOP message. 2. If no messages have been sent recently, a node will send a REP (NCL REP, not DDCMP REP) and expect an ACK or STOP message in return. 3. To catch nodes that restart NCL faster than the REP mechanism can catch, nodes that receive messages in states that do not allow that message (like receiving a REP when NCL isn't running) will reply with STOP messages to provide fast response to letting the sender know that he has to restart NCL. 7.0 FRAGMENTED AREAS To this point I have ignored area 10300 which is broken in two pieces. This is an invalid configuration and offers some of the problems seen when two nodes in a network claim to have the same node address. This situation can be detected by the high level routing nodes in that area when they receive NEIGHBORS messages from nodes outside of area 10300 that claim nodes in area 10300 to be neighbors that the receiver doesn't know about. For example, node 10302 will receive a NEIGHBORS message from area 10500 listing node 10303 as a neighbor. Node 10302 can notice that it has never heard of that node but can't really do much except notify the operator. (Note that if the area was properly connected and node 10303 went down, node 10302 would probably learn that before area 10500 tells it of the event. This means that if a node tries to detect fragmented areas it must be able to handle that.) The effect of a fragmented area is not too severe. Message routing that does not try to route through that area will not be impacted, and routing at levels lower than the fragmented area will never be impacted. Of course, the maintainer of that area should never let it fragment, but I can offer no good cure. Page 8 8.0 CONCLUSIONS Ancient Greek philosophers got themselves into all sorts of trouble by thinking about problems without bothering to perform experiments to test their results. I have done no better. This proposal contains the results of a couple days active thought and a year's worth of idle contemplation. No simulations have been run, no algorithms have been proven, no experiments done, and I may be claiming that a heavy node falls faster than a light one. It is, however, a start. One problem that I'm sure this proposal exacerbates is network congestion. I believe we already have a significant number of problems, and reducing the information used to route messages cannot improve matters. The goal of this memo has been merely to record what I know on the subject before I leave DEC. It's up to you guys to make it work!