gsi, oct 2005hans g. essel daq control1 h.g.essel, j.adamczewski, b.kolb, m.stockmeier
TRANSCRIPT
GSI, Oct 2005 Hans G. Essel DAQ Control 3
CMS: blueprint for clustered DAQ
DAQ staging
TTC Timing, Trigger and Control FU Filter UnitTPD Trigger Primitive Data FFN Filter Farm NetworkaTTS asynchronous Trigger Throttle System EVM Event ManagerD2S Data to Surface RCN Readout Control NetworkFRL Frontend Readout Link BCN Builder Control NetworkRU Readout Unit DCN Detector Control NetworkBU Builder Unit DSN DAQ Service Network
GSI, Oct 2005 Hans G. Essel DAQ Control 4
CMS DAQ: requirements
• Communication and Interoperability– Transmission and reception within and across subsystem boundaries without regard of the used protocols– Addition of protocols without a need for modifications in the applications
• Device Access– Access to custom devices for configuration and readout– Access to local and remote devices (bus adapters) without the need for modifications in applications
• Configuration, control and monitoring– Make parameters of built-in or user defined types visible and allow their modification– Allow the coordination of application components (define their states and modes)– Allow the inspection of states and modes– Provide services for recording structured information
• Logging, error reporting• Interface to persistent data stores (preferrably without the need to adapt the applications)• Publish all information to interested subscribers
– Device allocation, sharing and concurrent access support
• Maintainability and Portability– Allow portability across operating system and hardware platforms– Support access for data across multiple bus systems– Allow addition of new electronics without changes in user software– Provide memory management functionality to
• Improve robustness
• Give room for efficiency improvements
– Application code shall be invariant with respect to the physical location and the network– Possibility to factorise out re-usable building blocks
• Scalability– Overhead introduced by the software environment must be constant for each transmission operation and small with respect to the
underlying communication hardware in order not to introduce unpredictable behaviour– Allow applications to take advantage of additional resource availability
• Flexibility– Allow the applications to use multiple communication channels concurrently– Addition of components must not decrease the system’s capacity
GSI, Oct 2005 Hans G. Essel DAQ Control 5
CMS XDAQ
• XDAQ is a framework targeted at data processing clusters– Can be used for general purpose applications– Has its origins in the I2O (Intelligent IO) specification
• The programming environment is designed as an executive– A program that runs on every host– User applications are C++ programmed plug-ins– Plug-ins are dynamically downloaded into the executives– The executive provides functionality for
• Memory management• Systems programming
queues, tasks, semaphores, timers• Communication
Asynchronous peer-to-peer communication modelIncoming events (data, signals, …) are demultiplexed to callback functions of application components
• Services for configuration, control and monitoring• Direct hardware access and manipulation services• Persistency services
GSI, Oct 2005 Hans G. Essel DAQ Control 6
XDAQ Availability
Platform OS CPU Description
Linux (RH) x86 Baseline implementation
Mac OS X PPC G3, G4 no HAL, no raw Ethernet PT
Solaris Sparc no HAL, ro raw Ethernet PT
VxWorks PPC 603, Intel x86 no GM
http://cern.ch/xdaq
Current version: 1.1Next releases: V 1.2 in October 2002 (Daqlets)
V 1.3 in February 2003 (HAL inspector)
Change control: Via sourceforge: http://sourceforge.net/projects/xdaqVersion control: CVS at CERNLicense: BSD style
GSI, Oct 2005 Hans G. Essel DAQ Control 7
XDAQ: References
J. Gutleber, L. Orsini, « Software Architecture for Processing Clusters based on I2O », Cluster Computing, the Journal of Networks, Software and Applications, Baltzer Science Publishers, 5(1):55-64, 2002(goto http://cern.ch/gutleber for a draft version or contact me)The CMS collaboration, “CMS, The Trigger/DAQ Project”, Chapter 9 - “Online software infrastructure”, CMS TDR-6.2, in print (contact me for a draft), also available at http://cmsdoc.cern.ch/cms/TDR/DAQ/G. Antchev et al., “The CMS Event Builder Demonstrator and Results with Myrinet”, Computer Physics Communications 2189, Elsevier Science North-Holland, 2001 (contact [email protected])E. Barsotti, A. Booch, M. Bowden, “Effects of various event building techniques on data acquisition architectures”, Fermilab note, FERMILAB-CONF-90/61, USA, 1990.
GSI, Oct 2005 Hans G. Essel DAQ Control 8
XDAQ event driven communication
• Dynamically loaded application modules (from URL, from file)• Inbound/Outbound queue (pass frame pointers, zero-copy)• Homogeneous frame format
Readout componentGenerates a DMA completion event
Application componentImplements callback function
foo( )
Peer transportReceives messages from network
Executive frameworkDemultiplexes incoming events to listener application component
Computer
GSI, Oct 2005 Hans G. Essel DAQ Control 9
XDAQ: I2O peer operation for clusters
• Application component device• Processing node IOP• Controller node host
• Homogeneous communication– frameSend for local, remote,
host– single addressing scheme (Tid)
• Application framework
Executive
Messaging Layer
Peer TransportAgent
Messaging Layer
Executive
Peer TransportAgent
ˆ
‚
ƒ
„ …
†
‡
‰PeerTransport
Application Application
I2O Message Frames
GSI, Oct 2005 Hans G. Essel DAQ Control 11
XDAQWin client
Daqlet windowDaqlets are Java applets that can be used to customize the configuration,control and monitoring of all componentsin the configuration tree
Configuration treeXML based configuration of a XDAQ cluster
GSI, Oct 2005 Hans G. Essel DAQ Control 13
XDAQ: component properties
Component PropertiesAllows the inspection andmodification of components’exported parameters.
GSI, Oct 2005 Hans G. Essel DAQ Control 15
BTeV: a 20 THz real-time system
• Input: 800 GB/s (2.5 MHz)• Level 1
– Lvl1 processing: 190srate of 396 ns
– 528 “8 GHz” G5 CPUs (factor of 50 event reduction)
– high performance interconnects• Level 2/3:
– Lvl 2 processing: 5 ms (factor of 10 event reduction)
– Lvl 3 processing: 135 ms (factor of 2 event reduction)
– 1536 “12 GHz” CPUs commodity networking• Output: 200 MB/s (4 kHz) = 1-2 Petabytes/year
GSI, Oct 2005 Hans G. Essel DAQ Control 16
BTeV: The problem
• Monitoring, Fault Tolerance and Fault Mitigation are crucial
– In a cluster of this size, processes and daemons are constantly hanging/failing without warning or notice
• Software reliability depends on
– Physics detector-machine performance
– Program testing procedures, implementation, and design quality
– Behavior of the electronics (front-end and within the trigger)
• Hardware failures will occur!
– one to a few per week
• Given the very complex nature of this system where thousands of events are simultaneously and asynchronously cooking, issues of data integrity, robustness, and monitoring are critically important and have the capacity to cripple a design if not dealt with at the outset… BTeV [needs to] supply the necessary level of “self-awareness” in the trigger system.
Real Time Embedded System
GSI, Oct 2005 Hans G. Essel DAQ Control 17
BTeV: RTES goals
• High availability– Fault handling infrastructure capable of
Accurately identifying problems (where, what, and why)Compensating for problems (shift the load, changing thresholds)Automated recovery procedures (restart / reconfiguration)Accurate accountingExtensibility (capturing new detection/recovery procedures)Policy driven monitoring and control
• Dynamic reconfiguration– adjust to potentially changing resources
• Faults must be detected/corrected ASAP– semi-autonomously
with as little human intervention as possible
– distributed & hierarchical monitoring and control
• Life-cycle maintainability and evolvability– to deal with new algorithms, new hardware and new versions of the OS
GSI, Oct 2005 Hans G. Essel DAQ Control 18
RTES deliverables
A hierarchical fault management system and toolkit:
– Model Integrated Computing
• GME (Generic Modeling Environment) system modeling tools – and application specific “graphic languages” for modeling system configuration, messaging, fault
behaviors, user interface, etc.
– ARMORs (Adaptive, Reconfigurable, and Mobile Objects for Reliability) • Robust framework for detection and reaction to faults in processes
– VLAs (Very Lightweight Agents for limited resource environments)• To monitor/mitigate at every level
– DSP, Supervisory nodes, Linux farm, etc.
GSI, Oct 2005 Hans G. Essel DAQ Control 19
RTES Development
• The Real Time Embedded System Group– A collaboration of five institutions,
• University of Illinois• University of Pittsburgh• University of Syracuse• Vanderbilt University (PI)• Fermilab
• NSF ITR grant ACI-0121658• Physicists and Computer Scientists/Electrical Engineers at
BTeV institutions
GSI, Oct 2005 Hans G. Essel DAQ Control 20
RTES structure
Analysis
Runtime
Design andAnalysis
AlgorithmsFault Behavior
Resource
Synt
hesi
s
PerformanceDiagnosabilityReliability
ExperimentControlInterface
SynthesisFe
edba
ck
ModelingReconfigure
Logi
cal D
ata
Net
Region Operations
Mgr
Region Operations
Mgr
L2,3/CISC/RISC L1/DSP
Region Fault Mgr
LocalFault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/LinuxARMOR/Linux
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
LocalFault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/LinuxARMOR/Linux
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
LocalFault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/DSPARMOR/DSP
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
LocalFault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/DSPARMOR/DSP
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Logical Control N
etwork
Logical Control N
etworkLo
gica
l Dat
a N
et
Logical Control N
etwork
Logical Control N
etwork Local
Fault MgrLocalFault Mgr
LocalOper. MgrLocalOper. Mgr
ARMOR/LinuxARMOR/Linux
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Time Time
TriggerAlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithmTrigger
AlgorithmTriggerAlgorithm
Logical Control N
etwork
Global Fault Manager
Global Operations Manager
Soft Real Time Hard
GSI, Oct 2005 Hans G. Essel DAQ Control 23
GME: data type modeling
• Modeling of Data Types and Structures
• Configure marshalling-demarshalling interfaces for communication
GSI, Oct 2005 Hans G. Essel DAQ Control 24
RTES: GME modeling environment
Fault handlingProcess dataflowHardware Configuration
GSI, Oct 2005 Hans G. Essel DAQ Control 26
• Configuration of ARMOR Infrastructure (A)
• Modeling of Fault Mitigation Strategies (B)
• Specification of Communication Flow (C)
A B
C
RTES: GME fault mitigation modeling language (1)
GSI, Oct 2005 Hans G. Essel DAQ Control 27
RTES: GME fault mitigation modeling language (2)
• Model translator generates fault-tolerant strategies and communication flow strategy from FMML models• Strategies are plugged into ARMOR infrastructure as ARMOR elements• ARMOR infrastructure uses these custom elements to provide customized fault-tolerant protection to the
application
ARMOR
ARMOR Microkernel
Fault TolerantCustom Element
FMML Model – Behavior Aspect
CommunicationCustom Element
Switch(cur_state)case NOMINAL:I f (time<100) { next_state = FAULT; }Break;case FAULT if () { next_state = NOMINAL; } break;
class armorcallback0:public Callback{
public:ack0(ControlsCection *cc, void *p) : CallbackFaultInjectTererbose>(cc, p) { } void invoke(FaultInjecerbose* msg)
{ printf("Callback. Recievede
dtml_rcver_LocalArmor_ct *Lo; mc_message_ct *pmc = new m_ct; mc_bundle_ct *bundlepmc->ple();
pmc->assign_name(); bundle=pmc->push_bundle();mc);
}};
Translator
GSI, Oct 2005 Hans G. Essel DAQ Control 30
ARMOR
• Adaptive Reconfigurable Mobile Objects of Reliability:– Multithreaded processes composed of replaceable building blocks
– Provide error detection and recovery services to user applications
• Hierarchy of ARMOR processes form runtime environment:– System management, error detection, and error recovery services distributed across ARMOR processes.
– ARMOR Runtime environment is itself self checking.
• 3-tiered ARMOR support of user application – Completely transparent and external support– Enhancement of standard libraries– Instrumentation with ARMOR API
• ARMOR processes designed to be reconfigurable:– Internal architecture structured around event-driven modules called elements. – Elements provide functionality of the runtime environment, error-detection capabilities, and recovery policies.– Deployed ARMOR processes contain only elements necessary for required error detection and recovery
services.• ARMOR processes resilient to errors by leveraging multiple detection and recovery mechanisms:
– Internal self-checking mechanisms to prevent failures from occurring and to limit error propagation.– State protected through checkpointing.– Detection and recovery of errors.
• ARMOR runtime environment fault-tolerant and scalable:– 1-node, 2-node, and N-node configurations.
GSI, Oct 2005 Hans G. Essel DAQ Control 31
ARMOR system: basic configuration
Heartbeat ARMORDetects and recovers FTM failures
Fault Tolerant ManagerHighest ranking manager in the system
DaemonsDetect ARMOR crash and hang failures
ARMOR processesProvide a hierarchy of error detection and recovery.ARMORS are protected through checkpointingand internal self-checking.
Execution ARMOROversees application process(e.g. the various Trigger Supervisor/Monitors)
Daemon
Fault TolerantManager (FTM)
Daemon
HeartbeatARMOR
Daemon
ExecARMOR
AppProcess
network
GSI, Oct 2005 Hans G. Essel DAQ Control 32
EPICS overview
EPICS is a set of software components and tools to develop control systems.
The basic components are:
OPI (clients)
– Operator Interface. This is a UNIX or Windows based workstation which can run various EPICS tools (MEDM, ALH, OracleArchiver).
IOC (server)
– Input Output Controller. This can be VME/VXI based chassis containing a Motorola 68xxx processor, various I/O modules, and VME modules that provide access to other I/O buses such as GPIB, CANbus.
LAN (communication)
– Local area network. This is the communication network which allows the IOCs and OPIs to communicate. EPICS provides a software component, Channel Access, which provides network transparent communication between a Channel Access client and an arbitrary number of Channel Access servers.
GSI, Oct 2005 Hans G. Essel DAQ Control 35
Hierarchy in a flat system
tasks
tasks
tasks
tasks
IOC
IOC
IOC
IOC
IOC
IOC
Client
• IOCs
– One IOC per standard CPU (Linux, Lynx, VxWorks)
• clients
– on Linux, (Windows)
• Agents
– Segment IOCs beeing also clients
Name space architecture!
GSI, Oct 2005 Hans G. Essel DAQ Control 36
Local communication (node)
statussegmentIOC
Taskintertask
commands
messages
memoryTask
Task
command threadworking thread
message thread
• Commands handled by threads• Execution maybe in working thread• Message thread maybe not needed
Node
GSI, Oct 2005 Hans G. Essel DAQ Control 37
MBS node and monitor IOC
Externalcontrol
asynchronous
IOC
statussegmentDispatcher
Task
commands (text)
messages (text)
Task
Messageserver
Statusserver
asynchronous
on request
GSI, Oct 2005 Hans G. Essel DAQ Control 39
Kind of conclusion
• RTES: Very big and powerful. Not simply available!– Big collaboration– Fully modelled and simulated using GME– ARMORs for maximum fault tolerance and control
• XDAQ: Much smaller. Installed at GSI.– Dynamic configurations (XML)– Fault tolerance?
• EPICS: From accelerator controls community. Installed at GSI– Maybe best known– No fault tolerance– Not very dynamic