adaptive partition scheduling

Upload: sumanth-goud

Post on 06-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Adaptive Partition Scheduling

    1/15

    January 24, 2012

    Adaptive Partition Scheduling

    Part 1: Why we did itCool stuff from QNXA.Danko

  • 8/3/2019 Adaptive Partition Scheduling

    2/15

    Cool Stuff from QNX 2January 24, 2012

    Evolution of schedulersWhy?

    Timeline

    priority pre-emptive

    Timeslicing

    Time-varying priority

    Really clever time-varying

    Fair Share scheduling

    Adaptive configuration

    Yes, but:

    System locks up

    Backhoes and Mothers day

    Untuneable for more than 1

    application.

    US Military Satcom

    Hard to manage share interactions.

    Not invented until now.

    SCHED_FIFO

    SCHED_RR

    SCHED_SPORADIC

  • 8/3/2019 Adaptive Partition Scheduling

    3/15

    Cool Stuff from QNX 3January 24, 2012

    Evolution: Lessons learned

    Numerical priorities are chosen by applications but systemscheduling behavior must be designed globally

    Degradation and overload: Priorities are not constants.Importance of work depends on circumstances.> Modes: normal operation, restart, emergency maintenance

    Scheduling strategy needs to be based on unit of work, butwhat we have is communicating threads.

    must measure real-time behavior.> 0.1 % accuracy

    Want to specify shares as global percentages> Applications dont get to pick their importance or shares. System engineers

    do.

    Need to throttle cpu usage without losing realtime latencies.

    Why?

  • 8/3/2019 Adaptive Partition Scheduling

    4/15

    Cool Stuff from QNX 4January 24, 2012

    QNX Answer POSIX compatible design which can be

    applied to existing systems with little or

    no recoding

    A global hard real-time scheduler with

    overload protection and CPU guarantees> Separation of work based on working for

    common purpose

    Runtime typed memory and kernel object

    guarantees and limits

    >With full inheritance and accounting for allchildren

    Persistent storage (file system)

    guarantees and limits

    Process model for fault isolation

    Dynamic configuration

    What is Partitioning?

    General Answer

    Separation of

    work

    To isolate:> cpu usage

    > memory usage

    > system resource

    usage> Failures

    Design

    Adaptive Partition Scheduling

  • 8/3/2019 Adaptive Partition Scheduling

    5/15

    Cool Stuff from QNX 5January 24, 2012

    Principles

    Scheduler must not trigger an overload> Overhead may not increase with # of threads

    Real-time during underload> Same behavior as today

    Real-time during overload> At least for interrupt handling

    Must also be a fair-share scheduler> global scheduler algorithm

    > globally configured

    Must mesh with current QNX architecture Preemptive priority, individual thread scheduling

    Heavy use of message passing

    > Easy to drop onto existing applications

    > Cant be a bag on the side

    Simple enough for customers to use> Engineerable

    > Reconfigure on the fly

    Offered load

    Throughput

    Insert picture ofJuggling Watermelons

    here

    Design

  • 8/3/2019 Adaptive Partition Scheduling

    6/15

    Cool Stuff from QNX 6January 24, 2012

    Counting time

    What does 14% cpu mean?> CPU usage is calculated over a sliding window.

    >

    Accuracy:

    > Counting ticks is not enough. Micro-billing is used to track actual CPUutilization even when threads dont use their whole timeslice.

    > micro- and nano-second resolution

    > Threads are billed based on real usage, not statistics

    windowsize is configurable as an argument to kernel at boot> Tradeoff maximum READY-state latency with accuracy of CPU budgeting

    100ms window -> 1% accuracy or better.

    > Internal arithmetic accurate to 0.5% or better

    Partition usage> ns cpu time executed, during last sliding window, expressed as percentage

    Partition budget> Guaranteed percentage of cpu time, balanced over sliding window

    Design

    T= nowT= -100ms

  • 8/3/2019 Adaptive Partition Scheduling

    7/15

    Cool Stuff from QNX 7January 24, 2012

    File System

    Process

    -

    Whos got time: Partition Inheritance

    Adaptive Partition 1

    (Multi-media)

    Adaptive Partition 2

    (Java application)

    CPU budget

    available

    6

    11

    8

    9

    Resource manager threads work on behalf of sender

    Priority and adaptive partition in inherited on receive> Execution time in server billed to clients partition

    This allows proper accounting for shared resources

    -

    -

    Receive Threads CPU budget

    available

    6

    67

    4

    10

    Design

    99Message

    9

    10

    Message

    9

    10

  • 8/3/2019 Adaptive Partition Scheduling

    8/15

    Cool Stuff from QNX 8January 24, 2012

    Real time: Behavior under normal load

    Adaptive Partition 1

    (Multi-media)

    Adaptive Partition 2

    (Java application)

    Blocked

    Running

    Ready

    CPU budget

    available

    CPU budget

    available

    6

    118

    99

    6

    67

    4

    1010

    Hard real-time scheduler under normal load

    Running thread selected as highest priority READY thread

    No delay on scheduling if adaptive partition has budget

    Design

  • 8/3/2019 Adaptive Partition Scheduling

    9/15

    Cool Stuff from QNX 9January 24, 2012

    Out of time: Behavior under overload

    Adaptive Partition 1

    (Multi-media)

    Adaptive Partition 2

    (Java application)

    Blocked

    Running

    Ready

    CPU budget

    available

    CPU budget

    exceeded

    6

    118

    9

    6

    67

    4

    10

    Highest priority READY thread in Partition with budget runs

    No delay on scheduling if adaptive partition has budget

    Design

  • 8/3/2019 Adaptive Partition Scheduling

    10/15

    Cool Stuff from QNX 10January 24, 2012

    Free Time: Behavior with unused CPU

    Adaptive Partition 1

    (Multi-media)

    Adaptive Partition 2

    (Java application)

    Blocked

    Running

    CPU budget

    exceeded

    CPU budget

    exceeded

    6

    118

    9

    6

    67

    4

    10

    If no partitions with remaining budget have READY threads, highest

    priority READY thread is selected to run from other partitions

    This allows free time to be given based upon priority> Free time is still accounted and may have to be paid back (for example, if partition 3

    becomes ready within 1 averaging window)

    Adaptive Partition 3

    6

    10

    8

    CPU budget

    available

    Design

    109

  • 8/3/2019 Adaptive Partition Scheduling

    11/15

    Cool Stuff from QNX 11January 24, 2012

    30

    Borrowed Time: Critical Threads

    Adaptive Partition 1

    (Multi-media)

    Adaptive Partition 2

    (Air Bag Control)

    Blocked

    Running

    Ready

    CPU budget

    available

    CPU budget

    exceeded

    6

    118

    11

    6

    67

    4

    30

    Critical threads still run (based on priority) even if partition has no budget

    Critical threads provide deterministic scheduling even in overload

    Critical threads are given critical budget and can go into short-term debt> Critical time is accounted and has to be repaid

    > Exceeding critical budget is considered an error and causes notification/action

    Critical

    Thread

    11

    Design

  • 8/3/2019 Adaptive Partition Scheduling

    12/15

    Cool Stuff from QNX 12January 24, 2012

    Equal time.

    How to choose between partitions of equal priority> Unimportant?

    > Many threads run at default priority, therefore equal priority

    Possible algorithms:

    > - round robin

    > - favor partition with most free time

    > - favor longest waiter

    Requirement:> Minimize latencies during underload

    > WBN: divide free time by % cpu share.

    Solution: Interleave partitions by ratio of partition shares

    We found a clever way to do that, so its in the patent.

    Design

  • 8/3/2019 Adaptive Partition Scheduling

    13/15

    Cool Stuff from QNX 13January 24, 2012

    How it does it

    uKernel

    libmod_aps.aProcesscreation

    messaging

    Per-partitionReady Q

    Schedulerclock intr handler

    ready()

    block()

    select_thread()

    for all partitions, p

    Def m(p) ->

    (bud(p)||crit(p), prio(p), run_t/wsize/bud(p))

    Then schedule ps

    Def ps -> rdy(ps) and (m(ps) < m(pi))

    For all i != s

  • 8/3/2019 Adaptive Partition Scheduling

    14/15

    Cool Stuff from QNX 14January 24, 2012

    Overhead: Fancy, but is it fast?

    Scheduling overhead increases with:> - number of partitions

    > - number of messages/sec

    > - number of clock interrupts/sec, i.e. ClockPeriod()

    > * does not increase with number of threads *

    Free or almost free operations:> Inheriting partition as part of message receive> Joining a thread to a partition

    > Dynamically changing budgets

    Computational requirements> 32 bit multiply, 64bit add

    > *no floating point* *no divides* *no address space swapping**short-circuit calculation of merit function* *no inter-cpu msging onSMP* *history-less algorithm*

    Overhead typically 1% of total cpu

  • 8/3/2019 Adaptive Partition Scheduling

    15/15

    Cool Stuff from QNX

    Any Queries????

    15January 24, 2012