9p overview

33
© 2010 IBM Corporation IBM Research 9P Overview Eric Van Hensbergen IBM Austin Research Lab ([email protected] )

Upload: eric-van-hensbergen

Post on 14-Dec-2014

3.223 views

Category:

Documents


2 download

DESCRIPTION

Overview of the 9P Protocol

TRANSCRIPT

Page 1: 9P Overview

© 2010 IBM Corporation

IBM Research

9P Overview

Eric Van HensbergenIBM Austin Research Lab([email protected])

Page 2: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation2

Agenda

• Historical Background (Plan 9 & Inferno)• 9P Protocol Basics• Extensions• Linux Client Code Overview

Page 3: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

Historical Background

• Plan 9 from Bell Labs was a distributed operating system developed as a successor to UNIX starting in the mid-1980’s.

• Primary motivation for Plan 9 was to rethink operating systems in light of pervasive networking (networking was added an afterthought to original.

• Plan 9 resources were scattered across cluster of machines with each machine having a role (Terminal, CPU Server, Auth Server, File Server)

• Inferno was a commercial venture based off of Plan 9 which provided Plan 9’s environment tightly coupled with a virtual machine in both native and hosted (Linux, BSD, Windows)platforms.

3

Page 4: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

Plan 9 Trivia

• Supported Multiple Hosts, but only 32-bit• x86, MIPS, Alpha, SPARC, PowerPC, ARM

• Native Support for UTF-8 from inception• Own Tool Set (Ken Thompson’s C compilers)• Some Kernel Stats• 37 syscalls• 178,738 lines of code amongst all ports (38k lines portable)• optional real-time scheduler

• User development environment primarily C and Alef• ANSI/POSIX Emulation environment available

• Open sourced (Lucent Public License 1.02)

4

Page 5: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

Plan 9 Core Design Concepts

• All Resources Represented as File Hierarchies• System Resources: processes, devices, networking stack• System Services: DNS, Window System, Plumbing• Application Services: Editor Interfaces, Plumbing

• Namespaces• private, per-process by default• user manipulatable• bind and union directories

• Standard Communication Protocol• a standard protocol, 9P, used to access both local and

remote resources

5

Page 6: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

Implication of Design Concepts

• Since all resources exposed as file hierarchies and remote hierarchies could be accessed via 9P• remote resources could be accessed as easily as local

ones (audio, graphics, network) without specialized protocols for each

• Since namespaces were private and per-process• individual users could compose namespaces of local and

remote resources and subsequent applications could access those resources transparently

• individual applications can do this as well without affecting other applications (each window in the window manager had its own namespace)

6

Page 7: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation7

9P Protocol Basics

• Based around core Plan 9 System Call I/O operations• Local operations degrade to functional calls• Remote operations closer to proxy operations• Pure request/response RPC model• Transport Independent• only requires reliable, in order delivery mechanism• can be secured with authentication, encryption, & digesting

• By default, requests are non-cached avoiding coherence problems and race conditions

• Design stresses keeping things simple resulting in small and efficient client and servers

Page 8: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

9P Protocol Terms and Structures

• tag - numeric identifier for multiplexing operations• fid - numeric identifier for file system entities• represent transient position in filesystem (directory or files)• also represent open files• transient fids can navigate or queried for meta-data, open

fids can only be used for operations (read, write, close)• qids• qid.type: type of qid (directory, file, etc.)• qid.path: unique per-entity identifier• qid.version: monotonically increasing file version

• stat - metadata structure (directories or files)• strings - always size prefixed

8

Page 9: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation9

9P Basics: Protocol Overview

size op tag

fid offsetsize Twrite tag count data

size Rwrite tag count

Protocol Specification Available: http://ericvh.github.com/9p-rfc/

Numeric pointer to a path element or open file...

Numeric transaction id for multiplexing

Page 10: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation10

9P Basics: Operations Session Management

– Version: protocol version and capabilities negotiation

– Attach: user identification and session option negotiation

– Auth: user authentication enablement

– Walk: hierarchy traversal and transaction management

– Clunk: forget about a fid

Error Management

– Error: a pending request triggered an error

– Flush: cancel a pending request

Metadata Management– Stat: retrieve file metadata

– Wstat: write file metadata

File I/O

– Create: atomic create/open

– Open, Read, Write, Close

– Directory read packaged w/read operation (Reads stat information with file list)

– Remove

Page 11: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation11

version

size Tversion tag versionmsize

size Rversion tag versionmsize

Initial tag is always (ushort)~0msize defines maximum length in bytes of any single 9P message.

version string (size prefixed) must always begin with 9P, if the server doesn’trecognize, it responds with version=unknown and client retries until it gets a match. version of 9P specified by 4 characters after 9P (ie. 9P2000)

optional extensions specified by . specifiers (9P2000.U and 9P2000.L)

Page 12: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation12

auth

size Tauth tag unameafid aname

size Rattach tag qid

User selects afid to represent authentication channel for a particular user(identified by uname) and attach parameter (aname).

Auth protocol is not defined by 9P, once it is complete afid is presented insubsequent attach message. The same validated afid may be used for multiplemessages with the same uname and aname.

Page 13: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation13

attach

size Tattach tag unameafid anamefid

size Rattach tag aqid

Serves as an introduction from the user to the server.fid chosen initially by clientuname identifies user to serveraname identifies an attach parameter (optional)afid identifies previously negotiated authentication channel

(set to (u32int)~0 if client doesn’t wish to authenticate

Page 14: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation14

flush

size Tflush tag oldtag

size Rflush tag

Flush is sent to server to cancel an outstanding operation (specified by oldtag)

Server always sends RflushIt is permitted for server to have already sent response and still send RflushIf client receives response before Rflush, it must honor response

It is also permitted to Flush a Flush, server must handle flush requests in order

Tag may not be reused until all Rflush have returned

Page 15: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation15

error

size Rerror tag ename

Rerror sent in response to report errors on other operations.

Plan 9 errors returned as strings from the server.

Page 16: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation16

walk - fid creation and navigation

size Twalk tag nwnamenewfid wnamefid

size Rwalk tag nwqid qid ...

...

new fids are created by a walk with no name arguments (nwname=0)this is also known as a ‘clone’ operation for historical reasons

walks with fid=newfid move the fid around fs hierarchy following path specified bynwnames wname(s)

walks can both create and navigate fids (newfid is navigated)

partial path resolution failures return nwqid < nwname (with qids for successful path elements walked)

dot-dot (..) and dot (.) treated special meaning parent directory or current directory

Page 17: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation17

clunk - fid reclaimation

size Tclunk tag fid

size Rclunk tag

sent when a fid is no longer needed, client may reuse fid as a newfid for other operations

even if clunk returns an error, fid is no longer valid

typically invoked on a close, but also invoked when a transient reference is no longer needed

Page 18: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

Entity Operations

• Create, Open, Read, Write, Remove, Stat, Wstat• basically what you would think

• Create functions as atomic create/open operation• Plan 9 has special open modes for exclusive access, append

only, and temporary files.• No special dirread function, just open & read directory• returns integral number of stat structures, one for every file

in the directory• Rename within directory accomplished with Wstat• non-directory renames non-atomic

• Read/Write include offsets in operation•Wstat can selectively set attributes by used “don’t touch” flag

18

Page 19: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation19

9P Packet Trace (from v9fs)<<< (0x8055650) Tattach tag 0 fid 2 afid -1 uname aname nuname 266594>>> (0x8055650) Rattach tag 0 qid (0000000000000002 48513969 'd')<<< (0x8055650) Twalk tag 0 fid 1 newfid 3 nwname 1 'test'>>> (0x8055650) Rwalk tag 0 nwqid 1 (000000000000401a 48613b9d 'd')<<< (0x8055650) Tstat tag 0 fid 3>>> (0x8055650) Rstat tag 0 'test' 'ericvh' 'root' '' q (000000000000401a 48513b9d 'd') m d777 at 1213278479 mt 1213283229 l 0 t 0 d 0 ext ''<<< (0x8055650) Twalk tag 0 fid 3 newfid 4 nwname 1 'hello.txt'>>> (0x8055650) Rwalk tag 0 nwqid 1 (000000000000401b 4851379d '')<<< (0x8055650) Tstat tag 0 fid 4>>> (0x8055650) Rstat tag 0 'hello.txt' 'ericvh' 'ericvh' '' q (000000000000401b 4851379d '') m 644 at 1213283229 mt 1213283229 l 12 t 0 d 0 ext ''<<< (0x8055650) Twalk tag 0 fid 4 newfid 5 nwname 0>>> (0x8055650) Rwalk tag 0 nwqid 0<<< (0x8055650) Topen tag 0 fid 5 mode 0>>> (0x8055650) Ropen tag 0 (000000000000401b 4851379d '') iounit 0<<< (0x8055650) Tstat tag 0 fid 4>>> (0x8055650) Rstat tag 0 'hello.txt' 'ericvh' 'ericvh' '' q (000000000000401b 4851379d '') m 644 at 1213283229 mt 1213283229 l 12 t 0 d 0 ext ''<<< (0x8055650) Tread tag 0 fid 5 offset 0 count 8192>>> (0x8055650) Rread tag 0 count 12 data 68656c6c 6f20776f 726c640a

<<< (0x8055650) Tread tag 0 fid 5 offset 12 count 8192>>> (0x8055650) Rread tag 0 count 0 data

<<< (0x8055650) Tclunk tag 0 fid 5>>> (0x8055650) Rclunk tag 0<<< (0x8055650) Tclunk tag 0 fid 4>>> (0x8055650) Rclunk tag 0<<< (0x8055650) Tclunk tag 0 fid 3>>> (0x8055650) Rclunk tag 0

Page 20: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

Extension Models

• Extend arguments to existing operations to accommodate non-Plan 9 environments

• Provide a single extension operation which encapsulates any extended protocol operations

• Provide a set of complimentary operations which provide any extensions (including extensions which are semantic changes to existing operations)

• Provide synthetic file system interfaces which exist either within the hierarchy or within an alternate aname mount• can either be provided by primary server, or through a

secondary server either mounted underneath

20

Page 21: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation21

Unix Extensions (9P2000.u)

• Existing Support:• UID/GID support• Error ID support• Stat mapping• Permissions mapping• Symbolic and Hard Links• Device Files

• All accomplished via optional extended arguments to existing operations and an extended Stat structure

Page 22: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation22

Future Work: .L extension series

• The 9P protocol is a network mapping of the Plan 9 file system API

• Many mismatches with Linux/POSIX• Existing .U extension model is clunky• Developing a more direct mapping to Linux VFS• New opcodes which match VFS API• Linux native data formats (stat, permissions, etc.)• Direct support of extended attributes, locking, etc.

• Should be able to co-exist with legacy 9P and 9P2000.u protocols and servers.

Page 23: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation23

9P Client/Server Support

• Comprehensive list: http://9p.cat-v.org/implementations• C, C#, Python, Ruby, Java, Python, TCL, Limbo, Lisp, OCAML,

Scheme, PHP and Javascript• FUSE Clients (for Linux, BSD, and Mac)‏• Native Kernel Support for OpenBSD•Windows support via Rangboom proprietary client• Inferno supports native 9P (aka Styx)• Simple server library available (libixp) (9P2000 only)• 9P2000.u available in spfs (single threaded) and npfs (multi-

threaded)• golang client and server now available

Page 24: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation24

9P in the Linux Kernel

• Since 2.6.14• Small Client Code Base• include/net/9p - global definitions and interface files• fs/9p: VFS Interface ~1500 lines of code• net/9p

• Core: Protocol Handling ~2500 lines of code• FD Transport (sockets, etc.): ~1100 lines of code• Virtio Transport: ~300 lines of code• RDMA Transport: ~700 lines of code

• Small Server Code Base• Spfs (standard userspace server): ~7500 lines of code• Current KVM-qemu patch: ~1500 lines

Page 25: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

9P Linux Kernel Debug• Enable debug for client side trace (-o debug=0xffff turn all on)• 0x001 - display verbose error messages (via syslog)• 0x002 - used for more verbose granular debug• 0x004 - 9p trace• 0x008 - VFS trace• 0x010 - marshalling debug• 0x020 - RPC debug• 0x040 - transport specific debug• 0x080 - allocation debug• 0x100 - display protocol message debug• 0x200 - display FID debug• 0x400 - display packet debug• 0x800 - display fscache tracing debug

25

Page 26: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

v9fs access modes

• access=user• new attach every time a new user tries to access the file

system• access=<uid>• single attach and only allows uid=<uid> to access

• access=any• single attach and allows all users to access with rights of

user who performed initial attach

26

Page 27: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

v9fs transport options

• trans_fd module• tcp: normal socket operations• unix: mount a named pipe• fd: used passed file descriptors for connection (rfdno,

wfdno)• virtio: use virtio channel• rdma: use infiniband RDMA

27

Page 28: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

v9fs cache modes

• Default is no cache• cache=loose• no attempts are made at consistency, intended for

exclusive access, read-only mounts• fids aren’t generally clunked in order to hold reference to

files• cache=fscache• use FS-Cache for persistent, read-only cache backend• EXPERIMENTAL. Hasn’t been fully tested.

• Other options possible in future including path caches (dentry cache) and/or temporal based cache with semantics similar to other distributed file systems.

28

Page 29: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

v9fs other options

• port=<port> - specify TCP port• uname=<user> - specify user to initially mount as• aname=<name> - attach argument• maxdata=<n> - specify maximum single packet size• noextend - only use vanilla protocol (no .u)• dfltuid - specify default uid to mount as (.u)• dfltgid - specify default gid to mount as (.u)• afid - specify a security channel (only valid for fd transport)• nodevmap - no special files, make any special fils look normal• cachetag - optional persistent tag signature

29

Page 30: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

Typical Regressions Process

• Simple mount against spfs file server• Test with short set of Linux file system benchmarks• fsx -N 1000 -R -W testfile• echo run | postmark• bonnie -s 1• dbench -t 60 4

30

Page 31: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

9p server operation

• spfs/npfs: (9P2000.u)• ufs -p 5670 -s• -p specifies port number• -s specifies single user (whoever is running spfs)• can also pass -d to see server side trace• if using npfs, specify -w to limit number of threads

• patched kvm-qemu (for virtio transport)• kvm <other_args> -share /• tells kvm to share / over virtio channel to guest

31

Page 32: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

Code Style and Development Goal

• Stick to Linux Coding Style Guidelines (of course)• Keep It Simple• short names• limit any use of macro definitions or conditionals (#ifdef)• extensions should be kept optional• any cache extensions should be kept optional (configurable

at mount time)• send patches for review on:• [email protected]

• bug tracking for client on bugzilla.kernel.org• protocol documentation/updates to • http://github.com/ericvh/9p-rfc

32

Page 33: 9P Overview

IBM Research

9P Overview © 2010 IBM Corporation

Code Review

• http://lxr.linux.no/linux/include/net/9p/• http://lxr.linux.no/linux/fs/9p/• http://lxr.linux.no/linux/net/9p/

33