Download - IRATI @ RINA Workshop 2014, Dublin
Investigating RINA as an Alternative to TCP/IPInvestigating RINA as an Alternative to TCP/IP
Project overview, use cases, specifications, software development and experimental
activities RINA Workshop, Dublin, January 28th – 29th 2014
2
Agenda• Project overview• Use cases
– Basic scenarios (Phases 1 and 2)– Advanced scenarios (Phases 2 and 3)
• Specifications– Shim DIF over 802.1Q– PDU Forwarding Table Generator– Y2 plans
• Software development– High level software architecture– User-space– Kernel-space– Wrap-up
• Experimental activities– Intro, goals, Y1 experimentation use case– Testbed and results at i2CAT OFELIA island– Testbed and results at iMinds OFELIA island– Conclusions
3
Project at a glance• What? Main goals
– To advance the state of the art of RINA towards an architecture reference model and specifications that are closer to enable implementations deployable in production scenarios.
– The design and implementation of a RINA prototype on top of Ethernet will enable the experimentation and evaluation of RINA in comparison to TCP/IP.
Budget
Total Cost 1.126.660 €
EC Contribution 870.000 €
Duration 2 years
Start Date 1st January 2013
External Advisory Board
Juniper Networks, ATOS, Cisco Systems, Telecom Italia
5 activities: WP1: Project management WP2: Architecture, Use cases and
Requirements WP3: Software Design and
Implementation WP4: Deployment into OFELIA
testbed, Experimentation and Validation
WP5: Dissemination, Standardisation and Exploitation
Who? 5 partners
From 2014
4
Objectives (I)• Enhancement of the RINA specifications
– The specification of a shim DIF over Ethernet– The completion of the specifications that enable DIFs that provide
a level of service similar to the current Internet (low security, best-effort)
– The project use cases
• RINA Open Source Prototype for the Linux Operating System– Targeting both the user and kernel spaces, allowing RINA to be
used on top of different technologies (Ethernet, TCP, UDP, etc)– It will provide a solid baseline for further RINA work after the
project. IRATI will setup an initial open source community around the prototype.
5
Objectives (II)• Experimentation with RINA and comparison with TCP/IP
– IRATI will follow iterative cycles of research, design, implementation and experimentation, with the experimental results retrofitting the research of the next phase
– Experiments will collect and analyse data to compare RINA and TCP/IP in various aspects like: application API, programmability, cost of supporting multi-homing, simplicity, etc.
• Interoperability with other RINA prototypes– The achievement of interoperability between independent
implementations is a good sign that a specification is well done and complete.
– Current RINA prototypes target different programming platforms (middleware vs. OS kernel) and work over different underlying technologies (UDP/IP vs. Ethernet) compared to the IRATI prototype.
6
Objectives (III)• Provide feedback to OFELIA
– Apart from the feedback to the OFELIA facility in terms of bug reports and suggestions of improvements, IRATI will actively contribute to improving the toolset used to run the facility.
– Moreover, the experimentation with a non-IP based solution is an interesting use case for the OFELIA facility, since IRATI will be the first to conduct these type of experiments in the OFELIA testbed.
7
Project Outcomes• Enhanced RINA architecture reference model and specifications,
contributed to the Pouzin Society for experimentation. IRATI will focus on advancing the RINA state of the art in the following areas:
– DIFs over Ethernet– DIFs over TCP/UDP– DIFs for hypervisors– Routing– Data transfer
• Linux OS kernel implementation of the RINA prototype over Ethernet– By the end of the project an open source community will be setup in order
to allow the research/industrial networking community to use the prototype and/or contribute to its development
• Experimental results of the RINA prototype, compared to TCP/IP• DIF over TCP/UDP extensions, interoperable with existing RINA
prototypes
8
Overview of the project structure
9
Agenda• Project overview• Use cases
– Basic scenarios (Phases 1 and 2)– Advanced scenarios (Phases 2 and 3)
• Specifications– Shim DIF over 802.1Q– PDU Forwarding Table Generator– Y2 plans
• Software development– High level software architecture– User-space– Kernel-space– Wrap-up
• Experimental activities– Intro, goals, Y1 experimentation use case– Testbed and results at i2CAT OFELIA island– Testbed and results at iMinds OFELIA island– Conclusions
10
BASIC SCENARIOSPHASES 1 AND 2
11
Basic use casesShim DIF over Ethernet
• Goal: to ensure that the shim DIF over Ethernet provides the required functionality. The purpose of a Shim DIF is to provide a RINA interface to the capability of a legacy technology, rather than give the legacy technology the full capability of a RINA DIF.
12
Basic use casesTuring machine DIF
• Goal: to provide a testing scenario to check a normal DIF complies with a minimal set of functionality (the “Turing machine” DIF).
13
ADVANCED SCENARIOSPHASES 2 AND 3
14
Advanced use casesIntroduction• RINA applied to a hybrid cloud/network provider
– Mixed offering of connectivity (Ethernet VPN, MPLS IP VPN, Ethernet Private Line, Internet Access) + computing (Virtual Data Center)
Access Network
Wide Area Network
Datacenter Design
15
Advanced use casesModeling
HV VM VM VM
HV VM VM VM
HV VM VM VM
HV VM VM VM
TOR
HV VM VM VM
HV VM VM VM
HV VM VM VM
HV VM VM VM
TOR
CE
Data Center 1
HV VM VM VM
HV VM VM VM
HV VM VM VM
HV VM VM VM
TOR
HV VM VM VM
HV VM VM VM
HV VM VM VM
HV VM VM VM
TOR
CE
Data Center 2
PE PE
CECustomer 1 Site A
PE
CECustomer 2 Site A
PE
PE
CE Customer 1 Site B
CE Customer 1 Site C
CE Customer 2 Site B
MPLS backbone
Internet GW
Customer 2 Site C
Public Internet
End user
16
Advanced use casesEnterprise VPN over operator’s network
Wide Area Network
• Logical separation of customers through: MPLS encapsulation, BGP-based MPLS VPNS and Virtual Routing and Forwarding (VRF)
Access network
• Use of Ethernet switching within metro-area networks • Logical separation of traffic belonging to multiple customers implemented through IEEE 802.1Q
17
Advanced use casesEnterprise VPN over operator’s network: Applying RINA
• Backbone DIF: provides the equivalent of the MPLS network. This DIF must be able to provide flows with “virtual circuit” characteristics, equivalent to MPLS LSPs.
• Provider top-level DIF: This DIF provides IPC services to the different customers, by connecting together the CE routers. The DIF may provide different levels of service, depending on the customer’s requirements. There may be one or more of these DIFs (one per customer, one for all the provider customers, etc).
• Intra customer-site DIFs: The DIF whose scope is a single customer site. Its characteristics will depend on the size and needs of the customer (e.g. could be a campus network, an enterprise network, etc)
• Customer A DIF: Can provide connectivity to all the application processes within customer A’s organization. More specialized DIFs targeting concrete application types (e.g. voice, file transfer) could be created on top.
18
Advanced use casesHypervisor integration: With TCP/IP
Virtual Machine 1 Virtual Machine 2
eth0 eth0vif1.0
192.168.1.1 192.168.1.2
vif2.0
eth0
eth0
.2
SW bridge 0
shared memory
Virtual Machine 3
eth0vif3.0shared
memory
eth1
.5et
h1
SW bridge 1 192.168.1.1
Hypervisor Machine
shared memory
eth0
eth0
.2
SW bridge 0
eth0
eth1
eth2
eth3
Top of Rack Switch
eth5 Out of the DC
eth0
eth1
Virtual Machine 1vif3.0
192.168.1.3
Hypervisor Machine
VLAN 2
eth0Virtual Machine 2
192.168.1.2shared memory
eth1
.5
vif2.0
SW bridge 1shared
memory
eth0vif3.0shared
memory
Virtual Machine 3
192.168.1.3
VLAN 5eth6
Out of the DC
bridge if
bridge if
bridge if
bridge if
19
Advanced use casesHypervisor integration: With RINA
VM
Shim DIF for HV
Hypervisor
TOR
VMShim DIF for H
V
Shim DIF over 802.1q
Hypervisor
VM
Shim DIF for HV
Green customer DIF
Out of the DC (to customer VPN or Internet Gateway)
20
Advanced use casesVDC + Enterprise VPNs over the Internet: With TCP/IP
eth0
Datacentre Border router
NAT, Gateway
eth2 eth3 Public InternetPublic Internet
eth1
Blue Customer premisesBorder router
NAT, Gateway Switch
Customer machinesGreen Customer premises
Border router
NAT, Gateway
Switch
Customer machines
Datacenter premises
21
Advanced use casesVDC + Enterprise VPNs over the Internet: With RINA
VM
Shim DIF for HV
Hypervisor
TOR
VMShim DIF for H
V
Shim DIF over 802.1Q
Hypervisor
VM
Shim DIF for HV
DC Border router
VLAN 2
VLAN 2
VLAN 2
Shared memory
Shared memory
Shared memory
Shim DIF over
TCP/UDPPublic
Internet
Customer Border router Shim DIF over
802.1Q
Layer 2 switch
Server
Server
VLAN 10
VLAN 10
VLAN 10
Green customer DIF
Datacenter premises
Green Customer premises
22
Agenda• Project overview• Use cases
– Basic scenarios (Phases 1 and 2)– Advanced scenarios (Phases 2 and 3)
• Specifications– Shim DIF over 802.1Q– PDU Forwarding Table Generator– Y2 plans
• Software development– High level software architecture– User-space– Kernel-space– Wrap-up
• Experimental activities– Intro, goals, Y1 experimentation use case– Testbed and results at i2CAT OFELIA island– Testbed and results at iMinds OFELIA island– Y2 plans
23
SHIM DIF OVER 802.1Q
24
Shim DIF over EthernetGeneral requirements• The task of a shim DIF is to put a small as possible veneer
over a legacy protocol to allow a RINA DIF to use it unchanged.
• The shim DIF should provide no more service or capability than the legacy protocol provides.
25
Examining the Ethernet Header• Ethernet II: specification released by DEC, Intel,
Xerox (hence also called DIX Ethernet)
Preamble MAC dest MAC src 802.1q header (optional)
Ethertype Payload FCS Interframe gap
7 bytes 6 bytes 6 bytes 4 bytes 2 bytes 42-1500 bytes
4 bytes 12 bytes
26
Ethertype• Identifies the syntax of the encapsulated
protocol• Layers below need to know the syntax of the
layer above• Layer violation!
27
Consequences of using an Ethertype
• Also means only one flow can be distinguished between an address pair
• The MAC address doubles as the connection endpoint-id
Investigating RINA as an Alternative to TCP/IP 28
Shim DIF over EthernetEnvironment
30
Address Resolution Protocol• Resolves a network address to a hardware address
– Most ARP implementations do not conform to the standard– Shim IPC process assumes RFC826 compliant
implementation
IRATI - Investigating RINA as an Alternative to TCP/IP
Usage of ARP• Maps the application process name to a shim IPC
Process address (MAC address) – Application process name is transformed into a network
protocol address
– Application registration adds an entry in the local ARP cache
• Flow allocation request results in an ARP request/reply– Instantiates a MAC protocol machine equivalent of DTP (cf.
Flow Allocator)
Process name: My_IPC_Process
Process instance: 1
Entity name: Management
Entity instance: 2
My_IPC_Process/1/Management/2
32
PDU FORWARDING TABLE GENERATOR
33
PDU Forwarding Table GeneratorRequirements and general choices
It’s all policy!• Every DIF can do it its own way
• We start with a link-state routing approach
34
PDU Forwarding Table GeneratorHigh-level view and relationship to other IPC Process components
RIB Daemon
5 6 7
N-1 Flows to nearest neighbors (Layer management)
Resource Allocator
PDU Forwarding Table Generator
Events
N-1 flow allocatedN-1 flow deallocated
N-1 flow downN-1 flow up
Neighbor B invoked write operation on object X
CDAPIncoming CDAP messages from
neighbor IPC Processes
CDAP
Outgoing CDAP messages to neighbor IPC Processes
Invoke write operation on object X to neighbor A
Update knowledge on N-
1 flow statePropagate
knowledge on N-1 flow state
Recompute forwarding table
PDU Forwarding Table
Relaying and Multiplexing Task
Lookup PDU Forwarding table to select output N-1
flow for each PDU
4321
N-1 Flows to nearest neighbors (Data Transfer)
IPC Process
Enrollment Task
Events
Enrollment completed successfully
35
Plans for Year 2• Shim DIF for Hypervisors
– Enable communications between VMs in the same physical machine without using the networking subsystem
• Updated shim DIF over TCP/UDP– Current version requires manual discovery of mappings of app names
to IP addresses and TCP/UDP ports, investigate the use of DNS
• Updated PDU Forwarding Table Generator– Based on lessons learned from implementation and experimentation
• Feedback to EFCP– Based on implementation and experimentation experience
• Faux sockets API
36
Agenda• Project overview• Use cases
– Basic scenarios (Phases 1 and 2)– Advanced scenarios (Phases 2 and 3)
• Specifications– Shim DIF over 802.1Q– PDU Forwarding Table Generator– Y2 plans
• Software development– High level software architecture– User-space– Kernel-space– Wrap-up
• Experimental activities– Intro, goals, Y1 experimentation use case– Testbed and results at i2CAT OFELIA island– Testbed and results at iMinds OFELIA island– Y2 plans
37
INTRODUCTION
Investigating RINA as an Alternative to TCP/IP 38
Project’ targets and timeline (SW)
• fx• IRATI SW goals:• Release 3 SW prototypes in 2 years• Each prototype provides incremental functionalities
• 1st prototype: basic functionalities (unreliable flows)• Comparable to a UDP/IP
• 2nd prototype: “complete” stack (reliable flows + routing)• Comparable to a TCP/IP
• 3rd prototype: enhancements (hardened proto + RINA over IP + …)• More product-like than prototype-like
• Glancing at extendibility, portability, performances & usability• The SW components lay at both kernel & user spaces
Investigating RINA as an Alternative to TCP/IP 39
Problems …• Problems are mostly SW-engineering related
– Time constrained1. Ref-specs → HL arch2. HL arch → detailed design3. Detail design → implementation, debug, integration …
• Since the IRATI stack spans user and kernel spaces…
• User-space problems (as usual):– Memory (e.g. corruptions, leaks)– Bad logic (e.g. faults)– Concurrency (e.g. dead-locks, starvation)– …– Anything that special (but … time consuming for sure)
Investigating RINA as an Alternative to TCP/IP 40
… and problems• Kernel space problems are the user-space ones PLUS:
– A harsher environment, e.g.• The develop, install & test cycle is (a lot) slower
– Huge code-base (takes lot to compile) – Faults in the kernel code may bring the whole host down– Reboot s are usually required to test a new “version” (at early stages)
• C is “the” language → less expressive than others in userland• No “external libraries” …
– The kernel is “cooperative”, e.g.• Stack & heap handling must be “careful”, e.g.
– Memory corruptions could propagate everywhere– Different mechanics, e.g.
• Mutex, semaphores, spinlocks, rcus … coupled with un-interruptable sleeps
– Syscalls may sleep … but spinlocks can’t be held while “sleeping”• No recursive locking• Memory allocation is in different flavours: NOWAIT, NOIO, NOFS …
– ... … …
Investigating RINA as an Alternative to TCP/IP 41
Outline• Introduction• High level software architecture• Detailed software architecture
– Kernel space– User space
• Wrap-up
Investigating RINA as an Alternative to TCP/IP 43
Splitting the spaces: user vs kernelFast/slow paths → user vs kernel• We split the “design” in different “lanes” and placed SW components
there, depending on their timing requirements– Fast-path → stringent timings → kernel-space– Slow-path → loose timings → user-space
• ... looking for our optimum– fiddling with time/easiness/cost/problems/schedule/final-solution etc.
UserKernel
UserKernel
Investigating RINA as an Alternative to TCP/IP 44
API & kernel• OS Processes request services to the kernel with
syscalls– User originated (user → kernel)– Unicast
• Modern *NIX systems extend the user/kernel communication mechanisms
– Netlink, uevent, devfs, procfs, sysfs etc.
• We wanted a “bus-like” mechanism: 1:1/N:1, user/kernel & user/user
– User OR kernel originated– Multicast/broadcast
• We adopted syscalls and Netlink– Syscalls (fast-path):
• Bootstrapping & SDUs R/W (fast-path)– Netlink (mostly slow-path):
• We introduced a RINA “family” and its related messages(*) Bootstrapping needs: Syscalls create kernel components which
will be using Netlink functionalities later on
1
IPC ProcessDaemonIPC Process
Daemon
ApplicationApplicationApplication
Application
N
M
UserKernel
1
IPC ManagerDaemon
Kernel
Application
IPC ProcessDaemon
Investigating RINA as an Alternative to TCP/IP 45
Introducing librina• Syscalls are “wrapped” by libc (kernel abstraction)
– i.e. syscall(SYS_write, …) → write(…)– glibc in a OS/Linux
• Changes to the syscalls → changes to glibc– Breaking glibc could break the whole host
• Sandboxed environments are necessary– Dependencies invalidation → Time consuming compilations– That sort of changes are really hard to get approved
upstream– etc.
• We introduced librina as the initial way to overcome these problems …– … use IRATI in a host without breaking the whole
system
Investigating RINA as an Alternative to TCP/IP 46
librina• It is more a framework/middleware than a library
– It has explicit memory allocation (no garbage collection)– It’s event-based– It’s threaded
• Completely abstract the interactions with the kernel– syscalls and Netlink
• Adds functionalities upon them• Provides them to userland (apps & daemons)
– Static/dynamic linking (i.e. for C/C++ programs)– Scripting language extensions (i.e. Java)
Investigating RINA as an Alternative to TCP/IP 47
librina interface• librina contains a set of “components”:
– Internal components– External components
• And a portable framework to build components on top, e.g.:– Patterns: e.g. singletons, observers, factories, reactors– Concurrency: e.g. threads, mutexes, semaphores,
condition variables– High level “objects” in its core
• FlowSpecification, QoSCube, RIBObject etc.
• Only the “external “components are “exported” as classes
Investigating RINA as an Alternative to TCP/IP 50
Core components
librina core (HL) SW architecture
RINA Manager
NetlinkManager
Event Queue
NetlinkSessionNetlinkSession
NetlinkSessions
APIapplicationcdap faux-sockets ipc-process ipc-managersdu-protection
fram
ewor
k
libnl / libnl_genl
RINA syscallsRINA Netlink
nl_send() / nl_recv()
ApplicationeventPoll()eventWait() eventPost()
common
• Allocate / deallocate flows• Read / write SDUs to flows• Register/unregister to 1+ DIF(s)
• Creation• Deletion• Configuration
• Configure PDU Forwarding Table• Create / delete EFCP instances• Allocation of kernel resources to support a flow
Syscall wrappers
syscall(SYS_*)
librina
Userkernel
Investigating RINA as an Alternative to TCP/IP 51
How to RAD, effectively ?• OO was the “natural” way to represent the RINA entities• We embraced C++ as the “core” language for librina:
– Careful usage produces binaries comparable to C– The STL reduces the dependencies
• in the plain C vs plain C++ case– Producing C bindings is possible– …
…
• There was the ALBA prototype already working …• … and ALBA has RINABand … • BUT that prototype is Java based …
Investigating RINA as an Alternative to TCP/IP 52
Interfacing librina to other languages
SWIG
example_wrap.c
example.pyGCC
libexample.so Python
int fact(int n);
example.h
#include "example.h"
int fact(int n) { … }
example.c
/* File: example.i */%module example
%{#include "example.h"%}
int fact(int n);
example.i
Low levelwrapper
High levelwrapper
Nativeinterface
• We “adopted” SWIG: the Software Wrapper and Interface Generator• SWIG “automatically” generates all the code needed to connect
C/C++ programs to scripting languages– Such as Python, Java and many, many others …
Investigating RINA as an Alternative to TCP/IP 53
librina wrapping• Wrapping “cost”:
– The wrappers (.i files) are small: ~480 LOCs– They produce ~13.5 KLOCS bindings → ~1/28 ratio …
• The wrappers are the only thing needed to obtain the bindings for a scripting language– SWIG support vary on the target language, i.e.
• Java: so-so (not all data-types mapped natively)• Python: good• …
– Our wrappers contain only the missing data-type mappings for Java
• Java interface = C++ interface• Bindings for other languages (i.e. Python) are expected
to be straightforward
Investigating RINA as an Alternative to TCP/IP 54
High level software architecture
libnl / libnl-gen
Kernel
JNI
Static/dynamiclinking
Third partiesSW Packages(Applications)
rinad(Java)
Netlinksyscalls
Java “imports”
Language Ximports
RINABand HL ipcpd
Language X “NI”
Core (C++)
API (C++)
API (C)
SWIG HL wrappers (Java)
SWIG LL wrappers (C++, for Java)
SWIG LL wrappers (C++, for language X)
SWIG HL wrappers (Language X)
librina
RINABand HL
RINABand LL
ipcmd
55
DETAILED SOFTWARE ARCHITECTUREKERNEL SPACE
Investigating RINA as an Alternative to TCP/IP 56
The Linux object model• Linux has its “generic” object abstraction: kobject, kref and kset
struct kref { atomic_t refcount; }
struct kobject { const char * name; struct kset { struct list_head entry; struct list_head list; struct kobject * parent; spinlock_t klist_lock; struct kset * kset; struct kobject kobj; struct kobj_type * ktype; const struct kset set_uevent_ops * uevent_ops; struct sysfs_dirent * sd; }; struct kref kref; unsigned int state_initialized:1; unsigned int state_in_sysfs:1; unsigned int state_add_uevent_sent:1; unsigned int state_remove_uevent_sent:1; unsigned int uevent_suppress:1;};
• Generic enough to be applied “everywhere”– E.g. FS, HW Subsystems, Device drivers
Objects (dynamic) [re-]parenting(loosely typed)
Garbage collection & SysFS integration
Objects grouping
References counting (explicit)
Naming & sysfs
SysFS integration
Investigating RINA as an Alternative to TCP/IP 57
kobjects, ksets and krefs in IRATI• They are the way to go for embracing OOD/OOP kernel-wide
• If the design has a “limite scope” the code gets bloated for:– Ancillary functions & data structures– (unnecessary) Resources usage
• We don’t need/want all these functionalities (everywhere):– Reduced (finite) number of classes
• We don’t have the needs of a “generic kernel”– Reduced concurrency (can be missing, depending on the object)– Object parenting is “fixed”(obj x is always bound to obj y)
• E.g. DTP/DTCP are bound to EFCP …– Not all our objects have to be published into sysfs– We have different lookups requirements
• No needs to “look-up by name” every object– Inter-objects bindings shouldn’t loose the object’ type – …
Investigating RINA as an Alternative to TCP/IP 58
• We adopted a (slightly) different OOD/OOP approach• (almost) Each “entity” in the stack is an “object”• All our “objects” provide a basic common interface & behavior• They have no implicit embedded locking semantics
struct object_t { … };
struct obj_ops_t { result_x_t (* method_1)(object_t * o, …); … result_y_t (* method_n)(object_t * o, …);
};
int obj_init(object_t * o, …);void obj_fini(object_t * o);
object_t * obj_create(…);object_t * obj_create_ni(…);int obj_destroy(object_t * o);
int obj_<method_1>(object_t * o, …);...int obj_<method_n>(object_t * o, …);
vtable (if needed)
Our OOP/OOD approach
API opaque
Interruptable ctxt
Non-interruptable ctxt
Static
Dynamic
vtable proxy (if needed)
Investigating RINA as an Alternative to TCP/IP 59
OOD/OOP & the framework• This approach:
– Reduces the stack (overall) bloating• no krefs, spinlocks, sysfs etc. where unnecessary• Only objects requiring sysfs, debugfs and/or uevents embed a kobject
– (or it is comparable)• E.g. the same bloating related to _init, _fini, _create and _destroy
– Speeds-up the developments– Helps debugging
• (re-)Parenting is constrained to specific objects• No loose-typing → type-checking is maintained (no casts)
– Decouples (mildly) from the underlying kernel
• With these assumptions we built our framework– Basic components: robj, rmem, rqueue, rfifo, rref, rtimer, rwq, rmap, rbmp– OOP facilities/Patterns: Factories, singletons, facades, observers,
flyweights, publisher/subscribers, smartpointers, etc.– Ownership-passing + smart-pointing memory model
62
The HL software architecture (Y1)
Investigating RINA as an Alternative to TCP/IP
Personality mux/demux
KIPCM
KIPCMcore
IPCP Factories
KFA
EFCPRMT PFT
Normal IPC P.
Fram
ewor
k
RNL
libnl / libnl-gen
Third partiesSW Packages
rinad
Netlinksyscalls
RINABand HL ipcpd
Core (C++)
API (C++)
API (C)
SWIG HL wrappers (Java)
SWIG LL wrappers (C++, for Java)
SWIG LL wrappers (C++, for language X)
SWIG HL wrappers (Language X)
librina
shim-eth-vlan
RINA-ARP
shim-dummy
Userspace
Kernelspace
rinad
librina
kernel
ipcmd
Fram
ewor
k
Investigating RINA as an Alternative to TCP/IP 63
The API exposed to user-space: KIPCM + RNL• Kernel interface = syscalls + Netlink messages• KIPCM:
– Manages the syscalls• Syscalls: a small-numbered, well defined set of calls (#8) :
– IPCs: ipc_create and ipc_destroy– Flows: allocate_port and deallocate_port– SDUs: sdu_read, sdu_write, mgmt_sdu_read and mgmt_sdu_write
• RNL:– Manages the Netlink part
• Abstracts message’s reception, sending, parsing & crafting• Netlink: #36 message types (with dynamic attributes):
– assign_to_dif_req, assign_to_dif_resp, dif_reg_notif, dif_unreg_notif…• Partitioning:
– Syscalls → KIPCM → “Fast-path” (read and write SDUs)– Netlink → RNL → “Slow-path” (mostly conf and mgmt)
Investigating RINA as an Alternative to TCP/IP 64
KIPCM & KFA
NormalIPCP
EFCP
RMT
OUT IN
KIPCM KFA
PDU-FWD-T
User space
syscallsNetlink
ShimIPCP
• The KIPCM:– Counterpart of the IPC Manager in user-space– Manages the lifecycle the IPC Processes and KFA– Abstract IPC Process instances
• Same API for all the IPC Processes regardless the type• maps: ipc-process-id → ipc-process-instance
• The KFA– Manages ports and flows– Ports
• Flow handler and ID• Port ID Manager
– Flows• maps: port-id → ipc-process-instance
• Both “bind” the kernel stack:– Top: user-interface– Bottom: ipc processes (maps)
• They are the Initial point where “recursion” is transformed into “iteration”
– When KIPCM calls KFA to inject/get SDUs:• N-IPCP → EFCP → RMT → PDU-FWD → Shim/IPC Process
Investigating RINA as an Alternative to TCP/IP 66
The RINA Netlink Layer (RNL)• Integrates Netlink in the SW framework
– Hides all the configuration, generation and destruction of Netlink sockets and messages from the user
– Defines a Generic Netlink family (NETLINK_RINA) and its messages
Investigating RINA as an Alternative to TCP/IP 67
• They are used by IPC Processes to publish/un-publish their availability– Publish:
• x = kipcm_ipcp_factory_register(…, char * name, …)– Unpublish:
• kipcm_ipcp_factory_unregister(x)
• The factory name is the way KIPCM can look for a specific IPC Process type– It’s published into sysfs too
• There are two “major” types of IPC Processes :– Normal– Shims
The IPC Process Factories
Investigating RINA as an Alternative to TCP/IP 68
• Factory operations are the same for both types• Upon registration
– A factory publishes its hooks
• Upon user-request (ipc_create)– The KIPCM creates a particular IPC Process instance
1. Looks for the correct factory (by name)2. Calls the .create “method”3. The factory returns a “compliant” IPC Process object4. Binds that object into its data model
• Upon un-registration– The factory triggers the “destruction” of all the IPC
Processes it “owns”
The IPC Process Factories Interface
.init → x_init
.fini → x_fini
.create → x_create
.destroy → x_destroy
.configure → x_configure
Investigating RINA as an Alternative to TCP/IP 69
IPC Process Instances• The .create provided to the factories returns an IPC
Process “object”• There are two “major” types of IPC Processes:
– Normal– Shims
• Regardless of its type– The interface is the same– Each IPC Process implements its “core” code:
• Shim IPC Process:– Each Shim IPC Processes provide its implementation
• Normal IPC Process:– The stack provides an implementation for all of them
Investigating RINA as an Alternative to TCP/IP 70
IPC Process Instances Interface• The IPC Process “object”
• The IPC Process Interface is the same for all types, but each type decides which ops will support– Some are specific for normal or shim, a few are
common to both
– They support similar functionalities (except the PFT’s)– How they translate into ops depends on the type
• .connection_create = normal_ connection_create• . connection_update = normal _ connection_update• . connection_destroy = normal _ connection_destroy• .connection_create_arrived = normal _connection_arrived• .pft_add = normal_pft_add• . pft_remove = normal_pft_remove• . pft_dump = normal_pft_dump
• .application_register = x_application_register• .application_unregister = x_application_unregister• .assign_to_dif = x_assign_to_dif• .sdu_write = x_sdu_write• .flow_allocate_request = shim_allocate_request• .flow_allocate_response = shim_allocate_response• .flow_deallocate = shim_deallocate
instance_ops
• instance_data• instance_ops
port_id app2
SHIM
RMT 1
EFCP 1j
IPCP 1
RMT 2
EFCP 2i
IPCP 2
APP
port_id 21
Pid 10
IPCP 0
sys_sdu_write(sdu, app2)
KIPCM
KFA
Kernel spaceUser space
kipcm_sdu_write(sdu, app2)
kfa_flow_sdu_write(sdu, app2)
normal_write(sdu, app2)
efcp_container_write(sdu, 2i)
efcp_write(sdu)DTPdtp_write(sdu)
rmt_send(pdu)
kfa_flow_sdu_write(sdu*, 21)
efcp_container_write(sdu*, 1j)
EFCPC 2
EFCPC 1
efcp_write(sdu*)
kfa_flow_sdu_write(sdu**, 10)
rmt_send(pdu*)
DTPdtp_write(sdu*)
normal_write(sdu*, 21)
shim_write(sdu**, 21)
Write operation
port_id app2
SHIM
RMT 1
EFCP 1jIPCP 1
RMT 2
EFCP 2iIPCP 2
APP
port_id 21
port_id 10
IPCP 0
sys_sdu_read(app2)
KIPCM
KFA
Kernel space
User space
kipcm_sdu_read(app2)
kfa_flow_sdu_read(app2)
kfa_sdu_post(sdu, app2)
efcp_container_receive(pdu, 2i)
DTPdtp_receive(pdu)
efcp_container_receive(pdu*, 1j)
EFCPC 2
EFCPC 1
rmt_receive(sdu**, 10)
dtp_receive(pdu*)
DTP
kfa_sdu_post(sdu**, 10)
kfa_sdu_post(sdu*, 21)
efcp_receive(pdu*)
efcp_receive(pdu)
rmt_receive(sdu*, 21)
Read operation
Investigating RINA as an Alternative to TCP/IP 73
Shim IPC Processes• The shims are the “lowest” components in the
kernel-space• They have two interfaces:
– NB: The same for each shim, represented by hooks published into KIPCM factories
– SB: Depends on the technology
• There are currently 2 shims:– shim-dummy:
• Confined into a single host (“loopback”)• Used for debugging & testing the stack
– shim-eth-vlan:• As defined in the spec, runs over 802.1Q
IRATI - Investigating RINA as an Alternative to TCP/IP
Shim-dummy
User-space
Kernel
KIPCM / KFA
Dummy shim IPC Process
RINA IPC API
IPC Process Daemon
IPC Manager Daemon
shim_dummy_destroy
shim_dummy_create
IRATI - Investigating RINA as an Alternative to TCP/IP
Shim-eth-vlan
User-space
Kernel
KIPCM / KFA
Shim IPC Process over 802.1Q
Devices layer
RINARP
rinarp_add rinarp_remove
rinarp_resolve
dev_queue_xmit
RINA IPC API
IPC Process Daemon
IPC Manager Daemon
shim_eth_rcv
shim_eth_destroyshim_eth_create
IRATI - Investigating RINA as an Alternative to TCP/IP
RINARP
RINARP
shim-eth-vlan
ARP826
CoreMaps
Tables
ARMRX
DevicesLayer
API
TX
78
DETAILED SOFTWARE ARCHITECTUREUSER SPACE
79
Introduction to the user space framework
Application AApplication A
Normal IPC Process (Layer Management)
Kernel
User space
Netlinksockets
IPC Process Daemon (Layer Management)
RIB & RIB Daemon
librina
Resource allocation
Flow allocation
Enrollment
PDU Forwarding
Table Generation
System calls Netlinksockets Sysfs
IPC ManagerDaemon
RIB & RIB Daemon
librina
Management agent
IDD
Main logic
System calls
Netlinksockets
SysfsApplication A
librina
Application logic
System calls Netlink
sockets
• IPC Manager Daemon: Broker between apps & IPC Processes, central point of Management in the system
• IPC Process Daemon: Implements the layer management components of an IPC Process
• Librina: Abstracts out the communication details between daemons and the kernel
80
Librina software architecture
API (C++)
Core (C++)
libnl/libnl-gen
Kernel
Perform action
Netlink ManagerNetlink Message
Parsers / Formatters
Message classesMessage
classesMessageclasses
Syscall wrappers
Message reader Thread
Message classesMessage
classesProxy classes
Message classesMessage
classesModel classes
Message classesMessage
classesEvent classes
libpthread
Concurrencyclasses
Logging framework
Events queue
Event Producer
User space
Get event
81
The IPC Process and IPC Manager Daemons• IPC Manager Daemon
– Manages the IPC Processes lifecycle– Broker between applications and IPC Processes– Local management agent– DIF Allocator client (to search for applications not available through local DIFs)
• IPC Process Daemon– Layer Management components of the IPC Process
• RIB Daemon, RIB, • CDAP parsers/generators • CACEP • Enrollment • Flow Allocation • Resource Allocation • PDU Forwarding Table Generation • Security Management
83
IPC Manager DaemonIPC Manager Daemon (Java)
librina (C++)IPC Process
FactoryIPC
ProcessMessage classesMessage
classesEvent classes
Event Producer
Message classesMessage
classesModel classes
SWIG Wrappers (Low-level, C++)
SWIG Wrappers (high-level, Java)
System calls Netlink Messages
Java Native Interface (JNI)
Command Line
Interface Server Thread
local TCP Connection
Main event loop
EventProducer.eventWait()
IPC Manager core classes
IPC Process Manager Flow Manager
Application Registration
Manager
Call IPC Process Factory, IPC Process or Application Manager
Call operation on IPC Manager core classes
Application Manager
CLI Session
Message classesMessage
classesConsoleclasses
Operation result
Bootstrapper
Configuration file
Call operation on IPC Manager core classes
EventProducer.eventWait()
Message classesMessage
classesConfigura
tionclasses
85
IPC Process DaemonIPC Process Daemon (Java)
librina (C++)IPC
Manager
KernelIPC
Process
Message classesMessage
classesEvent classes
Event Producer
Message classesMessage
classesModel classes
SWIG Wrappers (Low-level, C++)
SWIG Wrappers (high-level, Java)
System calls Netlink Messages
Java Native Interface (JNI)
CDAP Message
reader Thread
KernelIPCProcess.readMgmtSDU()
RIB Daemon
Resource Information Base (RIB)
RIBDaemon.cdapMessageReceived()
Main event loop
EventProducer.eventWait()
Supporting classes
Delimiter EncoderCDAP parser
Layer Management function classes
Enrollment Task
Flow Allocator
Resource Allocator
Forwarding Table
Generator
Registration Manager
Call IPCManager or KernelIPCProcess
RIBDaemon.sendCDAPMessage()
KernelIPCProcess.writeMgmtSDU()
86
Example workflow : IPC Process creation
Kernel
User space
IPC Manager Daemon IPC Process Daemon
3. Initialize librina
4. When completed notify IPC Manager (NL)
10. Update state and forward to Kernel (NL)
5. IPC Process initialized (NL)
local TCP Connection
CLI Session
Configuration file
OR
1. Create IPC Process
(syscall)
2. Fork(syscall)
6. Register app
request(NL)
7. Register app response (NL)
8. Notify IPC Process registered (NL)
9. Assign to DIF request (NL)
11. Assign to DIF request
(NL)
12. Assign to DIF response
(NL)
13. Assign to DIF response (NL)
• The IPC Manager reads a configuration file with instructions on the IPC Processes it has to create at startup
– Or the system administrator can request creation through the local console
• The configuration file also instructs the IPC Manager to register the IPC Process in one or more N-1 DIFs, and to make it member of a DIF
87
Example workflow : Flow allocation
Application A
Kernel
User space
IPC Manager Daemon
IPC Process Daemon1. Allocate Flow
Request (NL)
2. Check app permissions
3. Decide what DIF to use
4. Forward request to adequate IPC Process Daemon
5. Allocate Flow Request (NL)
6. Request port-id (syscall)
7. Create connection request (NL)
8. On create connection response (NL), write CDAP message to N-1 port (syscall)
9. On getting an incoming CDAP message response (syscall), update connection (NL)
10. On getting update connection response (NL) reply to IPC Manager (NL)
11. Allocate Flow Request Result (NL)
12. Forward response to app
13. Allocate Flow Request Result (NL)
14. Read data from the flow (syscall) or write data to the flow (syscall)
• An application requests a flow to another application, without specifying what DIF to use
88
WRAP UP
Investigating RINA as an Alternative to TCP/IP 89
Y1: Where we are / What do we have…• 9 months, ~3700 commits and ~214 KLOCs later …
– ~27 KLOCs in the kernel;– ~87 KLOCs in the librina (hand-written);– ~35 KLOCS in the librina (automatically generated);– ~65 KLOCs in rinad
• .. the project released its 1st prototype (internal release):– User and kernel space components providing unreliable flow
functionalities– We have the building|configuration|development frameworks– A testing framework
• A testing application (RINABand, compilation-time)• A regression framework (ad-hoc, run-time)
• We’re actively working on the 2nd prototype
Investigating RINA as an Alternative to TCP/IP 90
Y2: Plans …• Prototype 2:
– Reliable flows support– Shim DIF for HV
• Same schema as shim-dummy/shim-eth-vlan as in prototype 1
– Complete routing– Public release as FOSS (July 2014)
• Prototype 3:– Shim DIF over TCP/UDP
• same schema as prototype 2– Faux sockets API via
1. FI: Functions interposition (dynamic linking)2. SCI: System calls interposition (static linking)
92
Agenda• Project overview• Use cases
– Basic scenarios (Phases 1 and 2)– Advanced scenarios (Phases 2 and 3)
• Specifications– Shim DIF over 802.1Q– PDU Forwarding Table Generator– Y2 plans
• Software development– High level software architecture– User-space– Kernel-space– Wrap-up
• Experimental activities– Intro, goals, Y1 experimentation use case– Testbed and results at i2CAT OFELIA island– Testbed and results at iMinds OFELIA island– Conclusions
Investigating RINA as an Alternative to TCP/IP
93
IRATI EXPERIMENTATION GOALS
Investigating RINA as an Alternative to TCP/IP 94
Experimentation goals
Use Cases
TCP/IPUDP/IP
RINA prototype
Specifications
Investigating RINA as an Alternative to TCP/IP 95
IRATI experimentation in a nutshellPhase I Phase II Phase III
iLab.tEXPERIMENTA
OFELIA
OFELIA
iLab.tEXPERIMENTA
OFELIA
OFELIA
iLab.tEXPERIMENTA
PSOC
Investigating RINA as an Alternative to TCP/IP
96
PROTOTYPE STATUS AND TOOLS
Investigating RINA as an Alternative to TCP/IP 97
Available Tools• Rinaband
– Test application for RINA– Java (user space)– Requires multiple flows between to Api’s
• Echoserver/client– test parameters number and size of SDUs to be sent– Ping-like operation– The test completes when either all the SDUs have been sent and
received, or when more than a certain interval of time elapses without receiving an SDU.
– client and server report statistics• the number of transmitted and received SDUs• time the test lasted.
– Single flow between two Api’s
DIF
RINABandClient1
RINABand1
Control
AE
DataAE
DataAE
Control
AE 1 control flowN data flows
Investigating RINA as an Alternative to TCP/IP 98
First Phase Prototype capabilities• Capabilities
– Decision to focus on the Shim- ETH-VLAN– Supports only a single flow between two APi’s
• Impact on experiments– Could not use RinaBand– Rely on Echoserver/client application
Preamble MAC dest MAC src 802.1q header (optional)
Ethertype Payload FCS Interframe gap
7 bytes 6 bytes 6 bytes 4 bytes 2 bytes 42-1500 bytes
4 bytes 12 bytes
Investigating RINA as an Alternative to TCP/IP
99
FIRST PHASE EXPERIMENTS
Investigating RINA as an Alternative to TCP/IP 100
First phase use case
Investigating RINA as an Alternative to TCP/IP 101
Single flow echo/bw test
•Validate Stack / Prototype 1•Validate Ethernet transparency•Measure goodput
Investigating RINA as an Alternative to TCP/IP 102
Multiple flow echo/bw validation
•Validate multiple IPC processes•Measure goodput
Investigating RINA as an Alternative to TCP/IP 103
Concurrent RINA and IP
•Validate concurrency IP and RINA stack•Measure goodput
Investigating RINA as an Alternative to TCP/IP
104
FIRST PHASE RESULTS @ I2CATPresented by Leonardo Bergesio
Investigating RINA as an Alternative to TCP/IP 105
i2CAT OFELIA Island, EXPERIMENTA• Experiment ==
slice• FlowSpace:
– Arbitrary Topology– Partition of the
vectorial space of OF header fields
– Slicing by VLANs• VMs to be used as
end points or controllers
• Perfect march:– SLICE VLAN Shim DIF over Ethernet
Investigating RINA as an Alternative to TCP/IP 106
Workflow I• Access island using OCF. Create or access your
project/slice
Investigating RINA as an Alternative to TCP/IP 107
Workflow II• Select FlowSpace Topology and slice VLAN/s
(DIFs)
Investigating RINA as an Alternative to TCP/IP 108
Workflow III• Create VMs Nodes and OpenFlow Controller
Investigating RINA as an Alternative to TCP/IP 109
Resources Mapping
Slice with two VLANs ids,one per DIF: 300, 301
Investigating RINA as an Alternative to TCP/IP 111
Single flow
Packets are sent over the Ethernet/VLAN bridgeGoodput roughly 60% of Link capacity (iperf tested)
Project: IRATI basic usecaseSlice: multi vlan slice
Investigating RINA as an Alternative to TCP/IP 112
Multiple flows
Project: IRATI basic usecaseSlice: multi vlan slice
Flows to shared server (B & C to D)achieved half the throughput than the single flow (A to B)
Investigating RINA as an Alternative to TCP/IP 113
Concurrency between IP and RINA stack
113
Project: IRATI basic usecaseSlice: multi vlan slice
UDP
Time Interval Nº of datagrams Data sent BW90s 554915 778 MB 75.5 Mbps
Investigating RINA as an Alternative to TCP/IP
114
FIRST PHASE RESULTS @ IMINDS
115
iLab.t “Virtual Wall”: Concept
116
Virtual Wall: Topology Control
117
Virtual Wall: Topology Control
Investigating RINA as an Alternative to TCP/IP 118
Virtual wall @ iMinds
p. 119
Emulab: architecture
emulab Architecture
Programmable “Patch Panel”
PCPCPC
Web/DB/SNMPSwitch MgmtUsers
Internet
Control Switch/Router
Serial
168
PowerCntl
p. 120
Emulab: programmable patch panel
Investigating RINA as an Alternative to TCP/IP 121
Workflow
GUIns script
Experiment idea
HardwareMapping
and swap in
Additional scripting
Emulab runs the additional scripts from
ns file
Investigating RINA as an Alternative to TCP/IP 122
Basic Experiment on iMinds island• Use a LAN for the VLAN bridge
Investigating RINA as an Alternative to TCP/IP 123
Single flow
Packets are sent over the Ethernet/VLAN bridgeGoodput roughly 60% Iperf bandwidth
Investigating RINA as an Alternative to TCP/IP 124
Multiple flows
Investigating RINA as an Alternative to TCP/IP 125
Concurrency between IP and RINA stack
125
UDP
Start Echo Server
Investigating RINA as an Alternative to TCP/IP
126
CONCLÚIDÍ
Investigating RINA as an Alternative to TCP/IP 127
Conclusions from phase I experimentation
• IRATI stack and Shim DIF are running• ~60% goodput in comparison to iperf • No major performance problems• When running concurrently, the IRATI stack take
precedence over the IP stack– our stack doesn't loose a packet from syscalls to devs-layer
• ARP in Shim DIF should not reuse 0x0806 ETHERTYPE because of incompatibility with existing implementations
• Registration to Shim-DIF over Ethernet should be explicit
Investigating RINA as an Alternative to TCP/IPInvestigating RINA as an Alternative to TCP/IP
Thanks for your attention!Questions?