control in atlas tdaq

23
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group

Upload: raquel

Post on 20-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Control in ATLAS TDAQ. Dietrich Liko on behalf of the ATLAS TDAQ Group. Overview. The ATLAS TDAQ System Dataflow & HLT Control Subsystem of the Online Software Architecture TDAQ Wide Run Control Group Technology Choice CLIPS Design & Implementation Expert System Framework - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Control in ATLAS TDAQ

Control in ATLAS TDAQ

Dietrich Liko on behalf of

the ATLAS TDAQ Group

Page 2: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 2

Overview The ATLAS TDAQ System

Dataflow & HLT

Control Subsystem of the Online Software Architecture TDAQ Wide Run Control Group

Technology Choice CLIPS

Design & Implementation Expert System Framework Run Control, Supervision & Verification

Testing & Verification Test beam Scalability Tests

Page 3: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 3

The ATLAS TDAQ System Dataflow

ROD ROS

LVL1 HLT

LVL2 Event Filter

Online System Operation

DCS Detector control

Test beam: see [331] Event Building

Performance: see [217]

Page 4: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 4

Control Aspects

Dataflow Fixed configuration Synchronization, classical Run Control Error handling

High level Triggers Flexible configuration Synchronization Error Handling

Page 5: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 5

ATLAS Online Software

Component Architecture Object Oriented, C++ and Java Distributed system (CORBA) XML for Configuration

Specialized services for a TDAQ system Information sharing, Message Reporting, Configuration

Iterative Development Model Prototype already in use Laboratories, Test beam, Scalability tests Evolvement into the systems for initial ATLAS system

Page 6: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 6

Online Software Architecture

In the context of the iterative development cycle and the Technical Design Review Reevaluation of requirements and architecture Several high level packages & corresponding subsystems

Control Supervision, Verification

Databases: see [130] Configuration, Conditions

Information Sharing: see [166] Information Service, Message Service, Monitoring

Page 7: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 7

Control Subsystem

In the following only the Supervision subsystem is discussed

Page 8: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 8

Supervision The Initialization and Shutdown is responsible for:

initialization of TDAQ hardware and software components; re-initialization of a part of the TDAQ partition when necessary; shutting the TDAQ partition down gracefully; TDAQ process supervision.

The Run Control is responsible for controlling the Run by accepting commands from the user and sending

commands to TDAQ sub-systems; analyzing the status of controlled sub-systems and presenting the

status of the whole TDAQ to the Operator

The Error Handling is concerned with analyzing run-time error messages coming from TDAQ sub-systems; diagnosing problems, proposing recovery actions to the operator, or

performing automatic recovery if requested.

Page 9: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 9

TDAQ Wide Run Control group

Examines the requirements from the subsystem side Dataflow, HLT

Hierarchical concept Follows the overall organization of the TDAQ system

Controller central element All control functionality in combined controller State machine concept for synchronization Flexibility in error handling User customization

Page 10: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 10

Initial Design & Technology Choice

A Run Control implementation is based on a State Machine model and uses the State Machine compiler, CHSM, as underlying technology. P.J. Lucas, An Object-Oriented language system for

implementing concurrent hierarchical, finite state machines, MS Thesis, University of Illinois, (1993)

A Supervisor is mainly concerned with process management. It has been built using the Open Source expert system CLIPS CLIPS, A tool for building expert systems,

http://www.ghg.net/clips/CLIPS.html

A Verification system (DVS) performs tests and provides diagnosis. It is also based on CLIPS.

Page 11: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 11

Experiences

PLUS Scalability test in 2002 demonstrated that a

system of the size of ATLAS TDAQ system can be controlled

MINUS Lack of flexibility (CHSM)

Page 12: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 12

Technologies CLIPS

Production system, standard open source expert system So-called Rete algorithm drives the evaluation rules on a set of facts In house experience General purpose scripting language, OO features C language bindings

Alternatives Jess: Java based, very similar to CLIPS Eclipse: Commercial evolution of CLIPS

SMI++ State Machine No general purpose scripting language Difficult to integrate in our environment

Python Excellent scripting language No expert system

Page 13: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 13

Design & Implementation

General Framework embedding CLIPS in a CORBA server Periodic evaluation of knowledge base Extension mechanism

Online Software Components embedded as plug ins

Control functionality fully described by CLIPS rules

Page 14: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 14

Proxy Objects

Represent external entities Other controllers, processes etc Member attributes exposed to expert system as facts Member functions implement functionality in terms of

Online software components

Example Proxy objects represents child controllers State of the object corresponds to state of the child

(idle, configured, running) Commands are forwarded to child controllers

Page 15: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 15

Controller

Proxy Objects

Other Controllers

External processes

Rules drive interactions between objects

Page 16: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 16

Status

Supervisor Uses Framework

Run Control Uses Framework

Verification system CLIPS based

Choice of a common technology drives the path to an unified control system based on Controllers

Page 17: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 17

Scalability Test 2004

Test bed Up to 330 PCs of the CERN IT LXSHARE 600 to 800 MHz to 2.4 GHZ Dual Pentium III 256 to 512 MB Linux RedHat 7.3

Only control aspect verified No Dataflow network

Various configurations Servers on standard machines Servers on dedicated high end machines

Page 18: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 18

Supervisor – Process Management

Supervisor P

P

P

One Supervisor PMG Agents

Startup limited by initialization of processes

Enhanced recovery

procedures

Page 19: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 19

Startup with 1000 Controllers & 3000 processes in 40 to 100 seconds

Several configurations: mon_standard has two additional processes for a controller

Page 20: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 20

Run Control

Usual RC tree Actually 10 controllers

on the lowest level Variation of the

number of intermediate nodes

Some central infrastructure Name Service (IPC) Information Sharing

Page 21: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 21

Transitions

7 internal phases With 1000 Controllers 2 to 6 seconds No “real life” actions

Again:

More flexible error handling

Page 22: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 22

Combined Testbeam 2004

Stable operation from the start – Advantage of the component model

Page 23: Control in ATLAS TDAQ

CHEP04 - Interlaken Control of the ATLAS TDAQ system 23

Conclusions New assessment of requirements

Overall Architecture Controller studied in detail

CLIPS confirmed as technology choice

Design and implementation of a new framework

First test of new systems Test beam Scalability test

We can control a system of the size of the ATLAS TDAQ system Much more flexible system

Common technology in various control components Unified controllers in the future