TopoGen: A network topology generation architecture with application to automating simulations of software defined networks

Simulation is an important tool to validate the performance impact of control decisions in Software Defined Networks (SDN). Yet, the manual modeling of complex topologies that may change often during a design process can be a tedious error-prone task. We present TopoGen, a general purpose architecture and tool for systematic translation and generation of network topologies. TopoGen can be used to generate network simulation models automatically by querying information available at diverse sources, notably SDN controllers. The DEVS modeling and simulation framework facilitates a systematic translation of structured knowledge about a network topology into a formal modular and hierarchical coupling of preexisting or new models of network entities (physical or logical). TopoGen can be flexibly extended with new parsers and generators to grow its scope of applicability. This allows to design arbitrary workflows of topology transformations. We tested TopoGen in a network engineering project for the ATLAS detector at CERN.


Software Defined Networking
Software Defined Networking (SDN) is an emerging architectural approach for computer networks where control logic is taken away from switching devices and moved up to centralized software running in controller devices (Kreutz et al. 2015).
In this new architecture, switching devices are lumped into much simpler packet forwarding elements operating at the so-called Data Plane. Controllers decide on and set up forwarding rules for each connected switching device, in an effort to comply with the overall quality of service requirements for the entire network. Controllers operate at the so-called Control Plane and concentrate most of the network service logic (e.g. monitoring, packet forwarding decisions -in case no rules are set up-, network topology discovery, etc.) Popular centralized SDN controller implementations (ONOS, OpenDayLight, Nox, Floodlight) provide different functionalities through APIs.
While the SDN approach promises boosted network flexibility, reconfigurability and scalability, the overall performance of the SDN main elements (the controller and its software components) is not yet clearly understood (Fernandes 2017). SDN performance is the motivation for active research to assess system limits (e.g. maximum forwarding rate and latency) recognizing that each controller implementation can perform very differently in different settings (Zhao et al. 2015). Moreover, even when correct at the Laurito, Bonaventura, Pozo Astigarraga, and Castro functional level, the control actions decided by an efficient controller may provoke undesired inefficiencies at the performance level in the underlying controlled network.
In this context, modeling and simulation-driven network engineering ) stands as a promising strategy: system verification with dynamic simulations can mitigate the risks of deploying functional SDN controllers that may cause poor quality of performance. Among the services implemented by SDN controllers, topology discovery keeps the topology information updated. Another service is the exposure of the known information about the network, usually by means of APIs (providing e.g. number of nodes, hosts, links, etc., and network metrics such as number of bytes forwarded per link, packet loss rates per port, etc.). TopoGen harnesses SDN services to systematically create up-to-date simulation models.

DEVS-Based Network Simulation With PowerDEVS
PowerDEVS (Bergero and Kofman 2011) is a discrete event simulator that implements the DEVS mathematical formalism (Zeigler et al. 2000) capable of representing any type of discrete system and approximating continuous systems with controlled accuracy. PowerDEVS provides a graphical interface to compose DEVS models via hierarchical block diagrams. While PowerDEVS can represent any kind of discrete-event system, it provides a model library specific to computer network simulation (Castro and Kofman 2015).
In PowerDEVS systems can be built by composing graphically pre-developed units of behaviour (atomic models) and structures (coupled models) from a model library (e.g. routers, switches, links, generators, etc.) and interconnecting them through input/output ports. In DEVS, structure and behaviour are kept under strict separation. The interconnection of several atomic and/or coupled models creates the coupling information and in our case matches the network topologies directly.
As with other simulators, defining large topologies graphically can be a tedious task. Vectorial DEVS (Bergero and Kofman 2014) makes it possible to graphically represent multiple instances, but only for regular topologies. Also, there is a possibility to program the network topology in C++ at the cost of higher code complexity and a detachment between code and graphical layouts.
We adopt the DEVS formalism and the PowerDEVS tool in our case study as they currently support the TDAQ network engineering team of the ATLAS experiment at CERN , introduced below. Yet, the TopoGen architecture could be directly applied to produce network topologies for any other DEVS-based toolkits (Van Tendeloo and Vangheluwe 2017) that accepts a file-based structured specification of models (e.g. CD++ (Bonaventura et al. 2013) or VLE (Quesnel et al. 2009)).

RELATED WORK
There are several network simulators available both for commercial and academic use (Wehrle et al. 2010). They vary in several aspects: the discrete-event techniques and principles (sequential or parallel, replicationor decomposition-based, CPU-or GPU-based) (Ngangue Ndih and Cherkaoui 2015); the library of reusable models, and the software interfaces to assist the modeling activity (e.g. to define a network topology).
In some simulation packages network model behaviour and model topology are defined intermingled in the code (e.g. NS-3 (Carneiro 2010)). While this allows for great flexibility, the code can soon become too complex to understand, debug and maintain. A number of simulation tools (e.g. OPNET (Chang 1999), OMNET++ (Varga and Hornig 2008)) provide graphical editors which allow for an easy and compact understanding of the network topology, separating topology from model behaviour.
Nevertheless, defining a topology graphically can soon become inflexible for mid-to large-sized topologies (adding thousands of nodes with drag and drop methods can be very tedious and time-consuming). To address this issue some tools combine graphical editors with domain-specific languages (e.g. OMNET++) making it possible to parametrize the number of nodes and use programming-like structures to describe regular interconnect structures. This approach is efficient to describe large, mostly regular, topologies, but presents some limitations: 1) the modeler learns a description language that is specific to a single simulation Laurito, Bonaventura, Pozo Astigarraga, and Castro tool, 2) a new topology always needs to be created from scratch, and 3) when dealing with an existing network, there is no guarantee that a network description accurately represents the real system.
The alternative presented in this paper tends to mitigate these problems by accommodating all the aforementioned methods under a common architecture to define/transform network topologies: either using a graphical editor (when available), programming code (when desired), or using automatic data retrieval (e.g. from SDN controllers, if needed).
Meanwhile, network description languages exist beyond the simulation domain. For example, YANG (Bjorklund 2010) relies on an XML oriented approach. VXDL (Koslovski et al. 2008) allows to specify virtual resources, interconnections, topology, etc. in great detail. NDL (Van der Ham et al. 2007) can describe optical networks and is used by applications to query network capabilities to perform requests. The Internet Topology Zoo (ITZ) (Knight et al. 2011) provides a wide range of real Internet topologies in GML format (Himsolt 2010). Different software toolkits (including simulators) can find some languages more suitable than others to retrieve network information. For example in (Großmann and Schuberth 2013) a GML parser was implemented to generate Mininet models out of ITZ topologies.
Parsers and generators typically serve for a specific application and are usually coupled together. A parser can not be reused to generate other formats and a generator can not be used with other input formats.
In this work we introduce an intermediate network format (a core piece of the architecture) to act as a bridge between network definition languages and software tools willing to consume those descriptions. To the best of our knowledge there exist no equivalent solutions that permit flexible and extensible translation of various sources into various targets. The ATLAS experiment at CERN hosts one of the four detectors at the Large Hadron Collider (LHC) where bunches of particles collide every 25 ns. Currently, the ATLAS detector generates information at about 80 TB/s which needs to be filtered before it can be permanently stored for offline analysis. The TDAQ layered system reduces a 40 MHz collision event rate down to 1 kHz by analyzing events in real time. A first-level trigger (L1) uses custom electronics, filtering events down to roughly 100 kHz. L1-accepted events are temporarily transfered over custom optical point-to-point fibers to 100 Read-Out System (ROS) server nodes. The High Level Trigger (HLT) accesses events stored in the ROS to further filter the data by running selection algorithms on approximately 2000 server nodes interconnected with 1 Gbps and 10 Gbps Ethernet links.

MOTIVATING CASE STUDY: DESIGNING THE FELIX NETWORK AT CERN
For 2025, the ATLAS experiment is planning full deployment of the new Front-End LInk eXchange (FELIX) system (Anderson et al. 2015), shown in Figure 1, that aims at interfacing between detector electronics and the TDAQ system. FELIX is meant to replace the custom point-to-point connections with Laurito, Bonaventura, Pozo Astigarraga, and Castro a Commercial-Off-The-Shelf (COTS) network technology (e.g. Ethernet, Infiniband, Omnipath). FELIX servers will act as a routers between 24-48 detector serial links and 2-4 standard 40Gbps/100Gbps links. FELIX servers will communicate with a smaller set of commercial servers, known as Software ReadOut Drivers (SW ROD), used for data collection and processing of physics data. In addition, different components need to connect to the FELIX servers. For example, the Detector Control System (DCS) monitors and controls the detector front-end electronics while the Control & Configuration system sets up and manages data acquisition applications.
The FELIX project is planned to be implemented in two phases. In 2018-2019 some detector hardware will be moved to this new schema (approx. 68 FELIX and 44 SW ROD servers will be installed). A complete migration of the remaining hardware is planned for 2025. Part of this effort consists of designing and implementing a network that can meet the demands of the system (high-availability, high-throughput, low-latency, redundancy, etc.) Dataflow modeling and simulation methodology supports the design of the network and aids in the decision process (e.g. for selecting technologies, topologies, node distributions, etc.). Yet, the generation of many possible simulation scenarios to be evaluated is currently a manual process which is time-consuming, error-prone and does not provide an automated update procedure.

TOPOGEN: A TOPOLOGY GENERATION AND TRANSFORMATION ARCHITECTURE
TopoGen is a flexible architecture for network topology description, generation and translation. Yet, its conception was motivated by technology-specific needs, namely, to obtain a graph model representation of a network topology by querying SDN controllers (such as ONOS and OpenDayLight) and to generate DEVS simulation models for the PowerDEVS tool (according to the goals described in section 4). In this section we describe the overall architecture of TopoGen and some illustrative implementation-related details using the ONOS SDN controller as a sample source for topology information, and PowerDEVS as a sample destination for simulation.

Architecture
In Figure 2  • Network: A network description to be loaded, modified or translated. It can be either a real network (e.g. one described by an SDN controller) or a virtual one described by some description language. Laurito, Bonaventura, Pozo Astigarraga, and Castro • Topology Intermediate Format (TIF): The internal in-memory representation of a network topology, written by any Provider and read by any Builder. The main goal of a TIF is to serve as an internal abstraction layer that permits orchestration of different Builders and Providers in a flexible way. • Builder: A component that parses a TIF and serializes it according to a desired output. A Builder component is specialized to the Output component to be generated. Built-in builders are discussed in section 5.2. • Output: An output format that some Builder must comply with in order to perform a translation from the TIF format. The Output can consist of a single file or a set of files, depending on the requirements of the software tool that will ultimately consume them. instance. This instance is typically used to programmatically customize a topology retrieved from different sources before generating a final target Output. An example of topology augmentation is presented in section 6. PowerDEVS: This builder creates a PowerDEVS model simulation structure. It relies on parameterizable DEVS atomic models that provide basic behavioural building blocks. The builder hierarchically composes DEVS atomic models to create DEVS coupled models with more complex behaviours. Typical DEVS atomic models are queues, links, etc. For instance, to compose a switch, several atomic models of input/output queues are composed together and parameterized as Prioritized Queues with the Quality of Service (QoS) flag activated. Meanwhile, in order to compose a regular host, only one input/output queue is needed parameterized as a standard non-prioritized NIC queue.

Built-in Providers and Builders
Providers and Builders are decoupled by means of the Topology Intermediate Format. Any provider implementation can be used with any builder implementation. TopoGen can be extended by defining new providers and builders at will. Provider and Builder classes implement a Strategy Pattern, where each strategy is implemented outside TopoGen. This allows TopoGen to be adapted to different scenarios while implementing each provider and builder only once.

Figure 3 presents a class diagram for the TopoGen implementation in the Ruby language.
When TopoGen is used, an instance of TopologyGenerator class is created and the initialize method is invoked with parameters denoting a provider, a builder (the directory where the output will be stored) and a URI (from where to retrieve the topology from). The TopologyGenerator class has one TopologyProvider and one OutputBuilder instance. The TopologyProvider class can be mapped to the Provider component showed in Figure 2 (a). This class has four children: OnosTopologyProvider and OpenDaylightProvider classes encapsulate the logic for retrieving information from SDN controllers' APIs. The ObjectTopologyProvider class retrieves a topology from a Topology instance. Finally, the CustomTopologyProvider class retrieves information from a NTM instance.
Laurito, Bonaventura, Pozo Astigarraga, and Castro The Topology class in Figure 3 plays the role of the TIF in the architecture (Figure 2 (a)). It uses TopologyElements classes to represent the elements in the network. A Topology can have multiple TopologyElements, however every TopologyElement belongs to a unique Topology. The TopologyElements box contains all the classes that can be used to create elements in a Topology instance. The NetworkElements class represents an abstraction of the physical elements of the network (in this case Host, Link and Router). The Flow and NetworkElement classes implement a SerializeBehaviour, which is a module for serializing classes. The Flow class represents a flow of packets between hosts. When creating a Flow instance, a packet rate distribution and a packet size distribution are needed. Distribution classes are shown in the Flow Distribution box. Finally, the OutputBuilder class represents the component Builder in Figure 2 (a).

The Network Topology Model
The Network Topology Model (NTM) is an object oriented approach to represent data networks in Ruby. NTM makes it possible to describe all the elements in a network: physical elements (e.g. hosts, routers, links, etc.), and logical elements (e.g. data flows, routing paths, etc). NTM is currently dependent on TopoGen as it was designed to be used by the CustomTopologyProvider class (see Figure 3). The following NTM Ruby code describes the network shown in Figure 4 including the communication flow between Host1 and Host3: Each NTM instance must define a NetworkTopology module (line 1) and a get topology method (line 2) which returns the elements added in the Topology instance (variable @topology). To create the topology, the router is first added in line 5 (add router method). The hosts are defined in lines 6-9 (add host method) with a unique identifier for each host. In lines 11-13 links are added using add full duplex link. This method expects an ID, a source element (an instance of Router or Host), a source port number, a destination element, a destination port number and the bandwidth (in bps), and creates two source/destination links.
The first link goes from source to destination (its ID is the concatenation of strings up and the ID receives). The second link goes from destination to source (its ID is the concatenation of strings down and the ID received). The method returns the second link only. In lines 14 and 15 the links that define a path between Host1 and Host3 are retrieved. In lines 16 to 18 a new path is created between Host1 and Host3 using the retrieved links. In lines 19 to 20 a new flow is added with the add flow method (it expects an ID, a flow priority, an array of possible paths for the flow, and stochastic distributions for packet rate and size). The NTM description ends by returning the new elements with the topology elements method.

SUPPORT FOR DESIGNING THE FELIX NETWORK AT THE ATLAS DATA ACQUISITION SYSTEM
In this section we describe TopoGen as applied in a real world scenario. The case study builds upon a modeling and simulation-driven engineering process ) developed for the ATLAS TDAQ network at CERN (Pozo Astigarraga et al. 2015). We show how TopoGen can assist the design phase for the network to be implemented 2019-2020 in the ATLAS FELIX project (Anderson et al. 2015).

The FELIX Network Requirements
The FELIX network will provide connectivity between different components of the FELIX system (see Figure 1) and will handle various types of traffic which differ in their throughput, latency, priority and availability requirements. For example, the Detector Control System (DCS) that monitors and controls the detector's front-end electronics requires the highest priority and low latency to react fast, but is expected to require low throughput. Meanwhile, the detector's data will use most of the network bandwidth so it can have less priority to avoid saturation. Table 1 summarizes the different traffic types and their requirements. The communication patterns are also different for each type of traffic. While DCS traffic follows a many-to-one pattern (all FELIX servers communicate with a single DCS server), Control and Monitoring traffic require a many-to-few pattern. Detector data, on the other hand, uses a simple one-to-one or two-to-one pattern from a FELIX server to SW RODs.
To provide confidence about the coexistence of these traffic types while meeting performance requirements we adopt a modeling and simulation approach to study expected throughput and latency for each traffic type, and anticipate possible bottlenecks. Although the high level requirements are defined, each subsystem's specification is updated often during the design process. Specific subsystem parameters (throughput, processing times, etc.) will not be known until the final system is in place. Yet, simulation can provide guidelines for realistic ranges of candidate parameter values (parameter sweeping).
Laurito, Bonaventura, Pozo Astigarraga, and Castro  Technologies rely on different protocols, congestion control algorithms, and routing schemes which also need to be considered in simulation studies. In particular, different restrictions are imposed over candidate topologies depending on selected technologies (e.g. Ethernet allows for heterogenous link speeds, Infiniband does not. Infiniband efficiently supports mesh and leaf-spine topologies, Ethernet supports topologies with cycles but using algorithms that perform poorly). The simulation platform needs to be able to define all these types of topologies in a flexible way to support agile design iterations.
6.2 Using TopoGen to Support the Modeling and Simulation Process Figure 6 shows the workflow used while applying TopoGen to aid the modeling and simulation process for the FELIX network. The workflow consists of three phases: first, the topology under design by the networking team is automatically retrieved and serialized into an NTM instance; second, the NTM topology is augmented programmatically with extra resources (nodes and data flows), and third the new topology is serialized into a PowerDEVS simulation model. Hence a simulation model is automatically created from an existing specification originally meant for other purposes. This workflow deals with topology changes at design-time. Run-time adaptation of simulations to topology changes remains a subject of future work. In the first phase, the networking team provided a Mininet emulated environment used to test the connectivity of a topology, including nodes only from the FELIX network. The ONOS SDN controller was installed within the emulated environment to provide network discovery services. Then, the TopoGen ONOS Provider was configured to connect with the REST API exposed by ONOS to query the topology.
Laurito, Bonaventura, Pozo Astigarraga, and Castro Once the topology is retrieved, the TopoGen NTM Builder serialized it into an NTM instance for later use. Each time the networking team updates their emulated topology, it can be retrieved again to keep the NTM and simulation models up-to-date. The fact that the original topology was specified in an emulated environment is transparent for TopoGen. In the second phase, additional nodes are added to the NTM topology to also represent the HLT network (see Figure 5). To generate a meaningful simulation model extra information is needed about the traffic generated by different servers. Nodes and data flows, along their respective parameters, were added programmatically into the original NTM instance guided by the network engineers. For this case study, only the Detector Data traffic type and the Monitoring traffic type were considered (see Table 1). In the third phase, the augmented NTM instance is loaded by the TopoGen NTM Provider and used by the PowerDEVS Builder to generate all necessary files for simulation. The network actually simulated with PowerDEVS is the one presented in Figure 5. It includes the FELIX network nodes (automatically retrieved from the SDN controller) and the HLT network nodes, the Detector Data and Monitoring traffic flows (added programmatically with NTM). This case study focused on network behaviour under different intensities of Monitoring traffic rates, matching an engineering requirement.

Simulation Results
We studied the potential effects on the average latency of FELIX Monitoring traffic in the case of an upgrade of the bottleneck links from 1 Gbps to 10 Gbps. Figure 7 (a) shows the average packet latency for FELIX Monitoring flows in different scenarios with increasing monitoring throughput for all servers. As monitoring traffic grows the latency slightly increases until a transition is observed at the point when each server generates 650 Mbps of monitoring data. After that point the latency increases rapidly denoting the presence of congestion. The buffer sizes and link utilization at the switches (not included in this report) indicate that the source of congestion are the 1 Gbps links of monitoring servers. We then updated the topology in the NTM instance, now with a link capacity of 10 Gbps for monitoring nodes. The PowerDEVS simulation model was regenerated with TopoGen, and new experiments were run. Figure 7 (b) shows how the saturation point moves up to 6500 Mbps of monitoring traffic. The congestion point in the topology remains at the links directly connecting the monitoring servers.