learn ruby by reading the source

Post on 10-May-2015

1.179 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

An updated version of a talk I've given a few times before; this one explains ruby's object model with a bit more lucidity and more C.

TRANSCRIPT

LEARNING RUBY BY READING THE SOURCE

tw:@burkelibbey / gh:@burke

THESIS:The best way to learn a piece of infrastructure is to learn about

how it’s implemented.

So let’s dig in to ruby’s source!

TOPICS

• Basic Object Structure

• Class inheritance

• Singleton classes

•Module inheritance

•MRI Source spelunking

BASIC OBJECT STRUCTURE

Every object has an RBasic

struct RBasic { VALUE flags; VALUE klass;}

flags stores information like whether the object is frozen, tainted, etc.

struct RBasic { VALUE flags; VALUE klass;}

It’s mostly internal stuff that you don’t think about very often.

klass is a pointer to the class of the object

struct RBasic { VALUE flags; VALUE klass;}

(or singleton class, which we’ll talk about later)

...but what’s a VALUE?

struct RBasic { VALUE flags; VALUE klass;}

VALUE is basically used as a void pointer.

typedef uintptr_t VALUE;

It can point to any ruby value.

You should interpret “VALUE” as:“a (pointer to a) ruby object”

This is a Float.

struct RFloat { struct RBasic basic; double float_value;}

Every type of object, including Float, has an RBasic.

struct RFloat { struct RBasic basic; double float_value;}

And then, after the RBasic, type-specific info.

struct RFloat { struct RBasic basic; double float_value;}

Ruby has quite a few types.

Each of them has their own type-specific data fields.

But given a ‘VALUE’, we don’tknow which type we have.

How does ruby know?

Every object has an RBasic

struct RBasic { VALUE flags; VALUE klass;}

And the object type is stored inside flags.

Given an object of unknown type...

struct αѕgєנqqωσ { struct RBasic basic; ιηт נѕƒкq; // ??? ƒנє σтнנ¢є; // ???}

We can extract the type from ‘basic’, which is guaranteed to be the first struct member.

VALUE a

e.g. if the type is T_STRING,struct RString { struct RBasic basic; union { struct { long len; ...

then we know it’s a `struct RString`.

Every* type corresponds to a struct type, which ALWAYShas an RBasic as the firststruct member.

* exceptions for immediate values

There are custom types forprimitives, mostly to make them faster.

The special-case primitivetypes aren’t particularlysurprising or interesting.

T_STRING => RString RBasic, string data, length.

T_ARRAY => RArray RBasic, array data, length.

T_HASH => RHash RBasic, hashtable.

...and so on.

T_OBJECT (struct RObject)is pretty interesting.

It’s what’s used for instances of any classes you define, or most of the standard library.

TL;DR: Instance Variables.

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

This makes sense; an instance of a class has its own data, and nothing else.

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

It stores the number of instance variables

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

And a pointer to a hashtable containing the instance variables

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

This is a shortcut to the class variables of the object’s class.

You could get the same result by looking it up onbasic.klass (coming up right away)

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

This definition is actually slightly simplified. I omitted another performance optimization for

readability.

Go read the full one after this talk if you’re so inclined!

Class and Module types

struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

Classes have instance variables (ivars),class variables (cvars), methods, and a superclass.

struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

This is where the methods live.

st_table is the hashtable implementation ruby uses internally.

struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

Class variables live in iv_index_tbl.

struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

struct rb_classext_struct { VALUE super; struct st_table *iv_tbl; struct st_table *const_tbl;}typedef struct rb_classext_struct \ rb_classext_t;

struct rb_classext_struct { VALUE super; struct st_table *iv_tbl; struct st_table *const_tbl;}

The superclass, instance variables, and constants defined inside the class.

struct RClass { struct RBasic basic; VALUE super; struct st_table *iv_tbl; struct st_table *const_tbl; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

It ends up looking kinda like:

...though this isn’t really valid because rb_classext_t is referred to by a pointer.

struct RClass { struct RBasic basic; VALUE super; (st) *iv_tbl; (st) *const_tbl; (st) *m_tbl; (st) *iv_index_tbl;}

So classes have:

* RBasic* superclass* instance vars.* constants* methods* class vars.

Modules

#define RCLASS(obj) (R_CAST(RClass)(obj))#define RMODULE(obj) RCLASS(obj)

Same underlying type (struct RClass) as a class

...just has different handling in a few code paths.

Immediate values

Sort of complicated.

For an integer N:The fixnum representation is:

2N + 1

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

A pointer is basically just a big integer, with a number referring to a memory address.

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

Remember how a VALUE is mostly a pointer?These tiny addresses are in the kernel space

in a process image, which means they’re unaddressable.

So ruby uses them to refer to special values.

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

Any VALUE equal to 0 is false, 2 is true, 4 is nil, and 6 is a special value only used internally.

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

Integers and Symbols work on the principle that memory is never allocated without 4-byte

alignment.

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

Any odd VALUE > 0 is a Fixnum.

An even VALUE not divisible by 4 might be a Symbol.

Symbols are just integers.

There is a global table mapping Symbol IDs to the strings they

represent.

Symbols are immediates because their IDs are stored in VALUE, and looked up in the symbol

table for display.

CLASS INHERITANCE

We have a pretty good picture of how values are represented; now we’re going to talk about how

they interact.

class Language @@random_cvar = true attr_reader :name def initialize(name) @name = name endend

basic.klass

ptr->super

iv_tbl

const_tbl

m_tbl

iv_index_tbl

Class

Object

{}

{}

{name: #<M>, initialize: #<M>}

{@@random_cvar: true}

class Ruby < Language CREATOR = :matz @origin = :japanend

basic.klass

ptr->super

iv_tbl

const_tbl

m_tbl

iv_index_tbl

Class

Language

{@origin: :japan}

{CREATOR: :matz}

{} # NB. Empty!

{} # NB. Empty!

When you subclass, you create a new RClass with

super=(parent) and klass=Class

When you instantiate a class, you create a new RObject with

klass=(the class)

Method lookup

Class methods

class Foo def bar :baz endend

Foo.new.bar

class Foo def self.bar :baz endend

Foo.baz

We know howthis works now.

But how doesthis work?

SINGLETON CLASSES

class Klass def foo; endendobj = Klass.newdef obj.bar ; end

Image borrowed from Ruby Hacking Guide

ptr->super(superclass)

basic.klass(class)

Singleton classes get type T_ICLASS.

T_ICLASS objects are never*returned to ruby-land methods.

*for sufficiently loose definitions of “never”

class A def foo ; endendclass B < A def self.bar ; endend

Image borrowed from Ruby Hacking Guide

ptr->super(superclass)

basic.klass(class)

class A def foo ; endendclass B < A def self.bar ; endend

Image borrowed from Ruby Hacking Guide

ptr->super(superclass)

basic.klass(class)

MODULE INHERITANCE

MRI SOURCE SPELUNKING

First, check out the source

github.com/ruby/ruby

google “<your editor> ctags”

CASE STUDY:How does Array#cycle work?

brb live demo

Builtin types have a <type>.c(string.c, array.c, proc.c, re.c, etc.)

Interesting methods tend to be in those files

They are always present inside double quotes

(easy to search for)

The next parameter after the string is the C function name

e.g. Search for “upcase” (with the quotes) in string.c and follow the

chain.

Most of the supporting VM internals are in vm_*.c

Garbage collection is in gc.c

Don’t look at parse.y.Trust me.

Almost all of the stuff we’ve looked at today is in object.c,

class.c, or ruby.h

I mostly look up definitions of built-in methods

Further reading:

Ruby under a Microscopehttp://patshaughnessy.net/ruby-under-a-microscope

Ruby Hacking Guidehttp://ruby-hacking-guide.github.io/

Thanks, questions?

top related