T3 Network Nears Full Production

By Ellen Hoffman, Mark Knopper, and Pat Smith

A year of dedication and hard work by the NSFNET partnership has begun to pay dividends for network users as the ANS T3 technology becomes a production tool for NSFNET sites.

As of March 1, a number of midlevel networks were moving their traffic across the T3 network and plans were in place to move the remaining networks over the next few months.

Merit obtains NSFNET backbone services from ANS which provides a major national network that operates at T3 speeds using circuits provided by MCI and central networking technology based on the IBM RS/6000.

Fine-tuning over the past year

Activities over the past twelve months have focused on installing T3 technology at all NSFNET sites, re-engineering the early T3 backbone to expand total bandwidth and provide greater redundancy, and refining the initial developmental hardware and software to bring the level of reliability up to production standards. In addition, some organizational changes have been made to ensure a smooth problem-solving path.

The first stage is nearly complete and a plan is in place to deploy higher speed T3 interface cards which will increase the throughput on all nodes during second quarter 1992.

Strengthening the architecture

When T3 connections were originally implemented at the beginning of 1991, some problems were experienced with the new architecture. This led to a major effort by Merit, ANS, IBM, and MCI to improve the technology before T3 traffic was increased. Debugging activities continued at a fast pace through October 1991, until a reasonable degree of reliability was achieved.

Peeling the onion

The problem analysis and resolution process has been compared to "peeling the skin of an onion", as multiple glitches that at first appeared to have the same symptoms were discovered and fixed. Problems were investigated starting at the circuit level, moving to the DSU or interface hardware, and finally to the router software and hardware.

In recent months, T3 circuit and router deployment has continued at the sixteen NSFNET regional sites with the last of these completed in November. Nevertheless, the phase-in of additional production traffic to the T3 network was delayed because of continuing concerns over reliability and performance of the new technology. These issues have been addressed through a series of actions including:

  1. improve the network software and hardware,
  2. deploy new monitoring tools to better track problems,
  3. provide a fallback system in the event of core node isolation, and
  4. develop new test strategies to ensure that the final technology implemented would meet the standards of reliability to which NSFNET users had become accustomed.

In addition, new organizational structures have been developed to enhance network operation and improve the interaction with regionals on network problems.

Visible improvements

At the top of the stack of visible improvements are performance capabilities which now exceed that of most local networks to which users are attached. The national infrastructure comprises local/campus area networks (local area networks or LANs) connected to the regional and national networks (wide area networks or WANs).

In the past, constraints in performance were generally related to the WANs having slower speeds than the LANs. Many LANs which interconnect to the regional and national networks are Ethernets capable of 10 Mb/sec, much higher than the maximum 1.544 Mb/sec of the T1 networks to which the LANs are attached. In some cases, attachments to campus and other organization LANs have been at speeds of 56 Kbps or less.

Bandwidth limitation now at the local level

Implementation of the T3 infrastructure has produced a setting in which backbone capacity is higher than much of the interconnecting network bandwidth beneath it. As a result, bandwidth limitation now occurs for most sites at the midlevel/regional level rather than at the national network level. While 100 Mb/sec Fiber Distributed Data Interface (FDDI) is supported at some local and regional networks, most have yet to deploy this technology.

Stability period used for tests

A two-week hiatus in December was declared a period of "no change" on the T3 network. The time was used to judge the stability and reliability of the network in a steady state condition, to prepare for the cutover of T1 traffic, to test the T1/T3 interconnect gateway backup, and to perform tests which were scheduled and coordinated with the regional networks. The testing period proved successful and enhanced confidence in the network.

Safety Net

Safety Net has been put in place as a fallback in case all T3 paths to a core node become unusable. It represents the addition of 12 T1 links to interconnect with the T3 backbone Core Nodal Switching Subsystem (CNSS) nodes. These safety net links are installed between the MCI switching centers and do not connect to the Exterior Nodal Switching Subsystem (ENSS) nodes. The T1 link metrics are designed so that a T1 path is used only if all other T3 paths to adjacent CNSS nodes become unreachable.

Restructuring of Merit/ANS NOC and Internet Engineering Group

The NSFNET is a vital communications link which demands a trouble-shooting structure that is as efficient and reliable as possible. In an effort to produce such a framework the Network Operations Center and Internet Engineering groups have been modified to provide a three-tier problem-resolution setting.

1st Level: Network Operations Technicians. NOC operators will continue to provide first level technical support for network problems. The NOC function will become more formalized with emphasis on reporting, escalation, tracking, and procedure execution.

2nd level: National Network Attack Force. As part of the ongoing efforts to ensure complete trouble-shooting coverage of the national network, the National Network Attack Force (NNAF) has been formed. This group of six individuals provides 24-hours per day/seven-days per week implementation of the T3 network problem resolution process. The team develops NOC diagnostic tools, trains NOC operators, and performs network-related tests. The group reports to the IE manager.

3rd Level: Internet Engineering Group. A number of the tasks indicated above were previously handled by the Internet Engineering staff. As the NNAF members assume those duties, the Internet Engineers will have time to pursue longer-term engineering activities and for attending to the highest priority problems which reach the third level of escalation.

Future T3

The mix of a solid foundation plus future leading edge technology such as the imminent deployment of the new IBM-developed, high-performance
interfaces for the RS/6000-based routers, helps ensure that notable network performance will be realized.

As noted on page 10 of this issue, the National Science Board recently approved a request by the National Science Foundation to extend the current Cooperative Agreement with Merit for a period of up to 18 months beyond the November 1992 expiration date of the current contract.

Developers and engineers within the partnership continue to work toward a greater understanding of the interactions of application and network engineering, and to tune the network to maximize efficiency as the user base grows.

Taken from The Link Letter, Vol. 5 No. 1, March/April 1992.