1 linux vs hpc : life (and death) of strange features2010.rmll.info/img/pdf/lsm2010-os-hpc-2.pdf ·...

29
1 Linux vs HPC : Life (and death) of strange features Brice Goglin RMLL – Talence - 2010/07/09

Upload: others

Post on 22-May-2020

4 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

1

Linux vs HPC :Life (and death) ofstrange features

Brice Goglin

RMLL – Talence - 2010/07/09

Page 2: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

2

High Performance Computing

Solving very large/complex problems• Numerical simulations

● Weather forecasting, seismology, …

Huge amount of computation• A single machine cannot do it• Interconnect many servers and make them work together

Eats computing power like a black hole• Computing time does not decrease with bigger machines

● Problem size and solution accuracy increase

Page 3: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

3

Linux rules the HPC world

http://top500.org

Page 4: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

4

Open-source in the HPC world

Why Linux for high performance computing?• Long history of manual tuning/tweaking/modifying the

system for better performance● Tune the OS to improve hardware usage

• Strong links with academic world● Many researchers involved, from computer science and other

sciences

Open source everywhere in HPC ?• No, many applications/libraries/drivers are proprietary

Page 5: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

5

Linux rules HPC but HPC doesn't rule Linux

Linus doesn't like HPC that much• or doesn't like what HPC people do

HPC needs strange features that seem very specific

HPC tries to avoid the kernel because it's slow• or not clever enough

Long history of hacks to improve performance• Performance at any cost

Page 6: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

6

Linux modified a lot for HPC

Performance records drive development• Roadrunner broke the Petaflop/s barrier

● Who will break the 10Peta- or Exaflop/s barrier ?

Breaking records more important than portability• There are very few huge computing machines in the world

● Portability does not appear so important for these people

Ugly hardware and/or software hacks

Page 7: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

7

HPC doesn't like when Linux tries to be clever

Operating systems are full of tradeoffs• Desktop/server/embedded/… have different workloads

● Cannot support all of them optimally at the same time• Try to support all of them satisfyingly

Operating systems are full of heuristics• Need to predict the future to anticipate

● Load pages from disks in advance, …• Try to be clever and guess what the user wants to do

HPC doesn't want any of this• HPC wants best performance, no tradeoff, no heuristics• HPC wants Linux to be dumb and just do what we want

Page 8: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

8

Example : Accessing files with O_DIRECT

Operating systems try to reduce disk accesses by reading files earlier and writing files later• Not always efficient

● The kernel doesn't know what the application really want• Not good for memory consumption• Not good when the application does it better

● Only the application knows what it really wants● and what it will really do in the future

Page 9: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

9

Example : Accessing files with O_DIRECT

Disks

Application

OS trying todo clever things

Application doingclever things

OS trying todo clever things

Application doingclever things

Disks Disks

Likely OK Not OK OK

Page 10: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

10

Example : Accessing files with O_DIRECT

Operating systems try to reduce disk accesses by reading files earlier and writing files later• Not always efficient

● The kernel doesn't know what the application really want• Not good for memory consumption• Not good when the application does it better

● The application knows what it really wants● and what it will really do in the future

High-performance applications (database, out-of-core computation, ...) want the kernel to be dumb• Stop doing this heuristics that doesn't help us !

A new way to open files was added (O_DIRECT)

Page 11: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

11

What do people actually think of O_DIRECT?

« The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances »

Linus Torvalds – man 2 open

What's the actual problem ?• Some nice interfaces exist (e.g. fadvise) but they may be

a bit less efficient● And people don't want to rewrite/retune their HPC code for

this other interface

Page 12: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

12

Another Example : Large Receive Offload

Agregation of incoming packets• Process a single big packet instead of many small ones

The history of LRO in Linux isn't really clear• Custom implementation inside Neterion 10Gbit/s driver

● Added to Linux 2.6.17• Other custom implementations rejected later

● Myricom (2.6.19) and Chelsio (2.6.20)• Kernel maintainers want a generic implementation that all

drivers would use!● They're right !

Page 13: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

13

Another Example : Large Receive Offload

History of LRO in Linux isn't really clear (continued)• Another custom implementation in Netxen driver !?

● Hidden and undocumented in big commit in 2.6.20● Not really reviewed by kernel maintainers ?

« It was an error on my part […] I would gladly accept patches to rip out the code from NetXen. »

• Generic implementation added in 2.6.24● Pushed by drivers that didn't have a custom implementation

● Mostly Intel and Myricom● Others still using their custom implementation

● They won't convert to the new generic implementation unless the kernel maintainers remove their custom code by force ?

Page 14: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

14

Yet Another Example : TCP Offload Engine

Advanced network cards embed the TCP/IP stack• Nothing to do in the processor and operating system

● Supposedly very nice for performance

Rejected by kernel devs because it's a bad idea• No coordination/compatibility between duplicated

components● Firewall, quality of service, …

• Based on closed-source blackbox firmwares● Unclear security, maintenance, updates, …

http://www.linuxfoundation.org/collaborate/workgroups/networking/toe

Page 15: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

15

So what ?

Conflict between (sane?) people• want nice code and features

and (crazy?) people• want the best performance ever at any cost

● Want random stuff in the kernel for performance reasons● For some HPC people, 10ns is worth uglifying the code

● Being non-portable, not supporting some corner-cases, …● Could lead to breakage, security risk, ...

Many discussions that may be constructive or not• The way HPC features are accepted/rejected isn't so clear

Page 16: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

16Reasons for rejecting high-performance features

• The whole idea is stupid

• The implementation is wrong

• No user

• Not enough users

• Stupid because the kernel can be clever than that or than your application

Page 17: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

17

Don't try to abuse kernel maintainers

« How about we just remove the RDMA stack altogether ? [...] If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. […] It seems an at least bi-monthly event that the RDMA folks need to put their fingers into something else in the normal networking stack. No more. »

Page 18: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

18

Adding things to the Linux kernel

Applications

Hardware

System Calls

Stuff

New Hardware

Drivers New Driver

New Needs

New System Call

Improved Stuff

New Stuff

Easy

Careful review :Is the application sane ?

Is the interface safe ?

What improvement ?Who will use it ?

Page 19: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

19

The easy way

The InfiniBand stack added in 2.6.11• High performance networking technology• Supported by many vendors, described in lengthy

specifications, …

Many new drivers• No problem

New application interface to access new hardware• Many existing applications already ported• Only minor technical problems were raised

No intrusive internal changes in the kernel• Advanced features not included (see later in this talk)

Page 20: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

20

I want my cool feature added in Linux !

You need somebody to use it in the official kernel• Code that's not used isn't tested/maintained• If only your external module uses it, you need to add your

module to the official kernel first

That's even true for some bugs• If a bug only occurs with a non-official module, it's not an

important bug :)● Some work-arounds in external HPC modules

Isn't InfiniBand the HPC user that you needed?• Having a more widespread user may be better to

convince people that your feature is really useful

Page 21: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

21Another example :Page Attribute Table (PAT)

Since Pentium III, caching may be tuned precisely• Eases very fast data transfers from the processor to I/O

devices (write-combining)• Critical for networking latency !

● Supported on Windows but not on Linux ?!● Lots of hacks in HPC network stacks

● Custom non-portable PAT implementations within HPC drivers

InfiniBand (in the kernel) wanted PAT support• But PAT support required a lot of work in Linux

● Discussed and rejected periodically since 2006• And PAT support is buggy on many old processors

Page 22: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

22Another example :Page Attribute Table (PAT)

Linux finally got PAT support in 2.6.26• Not because HPC needed it• Who else needs high-performance transfer to I/O

devices ?

GPUs !• Latest Linux graphics stack pushed PAT support for

improved performance• HPC may now benefit from PAT too :)

● No need for ugly custom hacks anymore

Page 23: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

23

MMU Notifiers

HPC needs deep knowledge of virtual memory• Applications use virtual memory, hardware uses physical• HPC needs to know how they correspond to each other

● It eases data transfers without expensive memory copies

HPC stacks have been hacking the kernel for 10 years to extract this knowledge• No HPC stack in the kernel, no official user, no way to get

official support for this feature

What about now ?• InfiniBand would be a user in the kernel

Page 24: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

24

MMU Notifiers

This feature is highly specific• Some people think the whole idea is wrong...• Nobody envisions any usage outside of HPC...

What about virtualization ?• KVM needs similar knowledge of virtual/physical memory

correspondancy• KVM is in the kernel

● And virtualization is widely used, more than HPC

KVM developers pushed MMU Notifiers in 2.6.27• Should solve what HPC has been wanting for 10 years !

Page 25: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

25

ummunotify

MMU Notifier is the kernel side• Some HPC software want it in user applications too

Hard work to design/implement what HPC still needs• Not accepted (yet?)

« The interface claims to be generic, but is really just a hack for a single use case that very few people care about. I find the design depressingly stupid, even if the code itself is at least small and simple. […] Can't you crazy RDMA people just agree on an RDMA interface, and making it part of that ? It still makes zero sense outside of that small niche as far as I can tell. »

Page 26: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

26

Summary

HPC uses Linux intensively• But Linux support for HPC is always very late

HPC has special requirements• Don't want the kernel to be clever

● Want the kernel to let HPC applications do what they want• Specific needs that are rarely used in any other context• Makes new features hard to merge in the Linux kernel

Things are getting better on the networking side• But HPC is more than networking• e.g. Storage still has problems with POSIX API being too

restrictive for parallel file systems

Page 27: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

27Last example :HPC vs. Complex modern architectures

HPC wants to know what the hardware is made of• Try to exploit the cores and

memory in the best way• Very important on modern

machines● Many processors, cores, shared-

caches, …

Needs Linux to show the hardware structure• Many things shown in /sys/

Page 28: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

28Kernel developers want to drive things they can't

AMD Magny-Cours processor adds new type of structure

« First I must say it's unclear to me if CPU topology is really generally useful to export to the user. »

« It would be very nice to propagate this info to where it really matters : the sched-domains topology info. »

Exposing this info to the scheduler is enough?• Assumes the scheduler is clever enough for HPC

● Far from true...

Page 29: 1 Linux vs HPC : Life (and death) of strange features2010.rmll.info/IMG/pdf/LSM2010-OS-HPC-2.pdf · HPC has special requirements • Don't want the kernel to be clever Want the kernel

29

Thanks for your attention!

Questions?

[email protected]