dragoncraft architectural overview
DESCRIPTION
Presentation given to NYC Tech Talks Meetup group on June 26 2012. More info here: http://www.meetup.com/NYC-Tech-Talks/events/69478562/TRANSCRIPT
Jesse Sanford, Joshua Kehn Freeverse / ngmoco:) / DeNA
● Web guys brought in to design RESTful HTTP based games for handheld clients.
● Platform concurrently being built by Ngmoco team out in San Francisco.
● First games in company's history built entirely on○ EC2○ Node.js○ MongoDB
● There is a lot of firsts here!
Why Node.js?
● Already using javascript. Knowledge share!● Fast growing ecosystem.● Reasonable to bring libraries from client to
server and vice versa.● Lots of javascript patterns and best practices
to follow.● Growing talent pool.
Why MongoDB?
● Ever changing schemas make document stores attractive.
● Easy path to horizontal scalability.● 10gen is very easy to work with.● Lots of best practice patterns for running on
ec2 infrastructure.● Javascript is a friendly interface.
Handling the change.
● Lots of patience.● Many proof of concepts.● Dedicated poc's for different puzzle pieces.
(Platform services, Game Libraries)● Developer training and evangelists.● Performance testing and open source library
vetting.● Lots of patience. Seriously.
Building from scratch.
● Lots of testing.● Pre-flight environment for content.● Duplicate of production for release staging.● Full stack developer sandboxes on every
workstation.○ Individual MongoDB and Node.js instances running.○ Full client stack available as a browser based
handheld simulator for client interface.
"Physical" Infrastructure
● EC2 fabric managed by Rightscale● Extensive library of "Rightscripts" and
"Server templates".● Different deployments for each environment.● Deployments a mix of single service
machines and arrays of machines.● Arrays load balanced by HA proxy not ELB's● Mongo clusters are largest expense.
Mongo Infrastructure
● Mongo cluster per environment.● 3 config nodes split between 2 availability
zones.● Currently only 1 shard.● 3 db nodes split between 2 availability● mongos processes running directly on app
servers.
Mongo Infrastructure cont.
● Config nodes on t1-micros.● DB nodes on m1-xlarges.● DB nodes running raid 10 on ebs.● XFS with LVM.● Snapshots taken after forcing fsync and lock
on db and then XFS freeze. ● Backups always done on secondary.
Shrinking Mongo
● Staging and testing environments too costly.● Logically the application knows no
MongoD/S differences.● Still single shard.● Spinning instances is quick.● Only used for smoke testing at the end of
every dev cycle.● Moving to single master -> slave replication.● Cost savings of 60% in these environments.
Other Services
● HA-proxy 2 m1-small● Memcached - 1 m1-large● PHP+Apache (cms), Flume/Syslog - 1 m1-
large● Ejabberd - 1 m1-large● Beanstalkd - 1 m1-large● Nodejs - (currently 3) c1-xlarge
Log4js-syslog, Flume
● Centralized logging from all application servers in the cluster.
● Configurable log levels at both the application layer and filters on the stream after that.
● Flume speaks syslog fluently● Flume allows us to point the firehose
wherever we want.● It's trivial to ingest the Flume ouput from s3
into Hadoop/Elastic Map Reduce
Daida, Beanstalkd
● Needed fast worker queue for push messaging and out-of-band computation.
● Considered Redis and Resque● Considered RabbitMQ/AMPQ● Beanstalkd was built for work queues.● Beanstalkd is very simple.● No real support for HA● Workers needed to be written in javascript.● No upfront knowledge about the runtime
activities of workers.
Daida, Beanstalkd cont.
● Developers define jobs (payload contains variables needed for job to execute)
● Developers schedule jobs.● Developers create "strategies" which know
how to execute the jobs.● At runtime using some functional magic
Daida closes the developer defined strategy around the payload variables that came with the job.
● This is somewhat similar to the job being run by a worker inside a container with a context.
var handlers = { bar: function(data, cb) { var callback = cb || function() { /* noOp */ }; //if callback wasn't passed console.log('test job passed data: ' + JSON.stringify(data)); callback(); //always make sure to callback!!!! },
foo: function(data, cb) { var callback = cb || function() { /* noOp */ }; console.log('foo job passed name'+ data.name); callback(); //again never forget to callback!!! }, }; exports.handlers = handlers;
exports.bar = handlers.bar; exports.foo = handlers.foo;
//taken from https://github.com/ngmoco/daida.js
Daida handler example.
Ejabberd
● Best multi-user-chat solution for the money.● Considered IRC and other more custom
solutions.● Javascript handhelds can use javascript chat
client libraries!● Capable of being run over plain HTTP.
(Comet/long-poll/BOSH)● Widely used.● Fine grained control over users and rooms.● A little complex for our needs.● Erlang/OTP is solid.
Megaphone load tester
● Written in erlang/otp to make use of it's lightweight processes and distributed nature.
● SSL Capable HTTP Reverse proxy.● Records sessions from handhelds.● Proxy is transparent and handhelds are
stupid.● Choose which sessions to replay.● Write small scripts to manipulate req/resp
during replay. OAuth handshakes?● Interact with replay in console.● Record results of replay.
Megaphone load tester cont.
● Replay in bulk! (Load test).● Centralized console can spawn http replay
processes on many headless machines. Similar to headless Jmeter.
● A single session (some number of individual requests) is sent to the client process when spawned
● Responses are sent back to the centralized databases as clients receive them.
● The same session can be sent to multiple clients and played back concurrently.
%% This module contains the functions to manipulate req/resp for the dcraft_session1 playback-module(dcraft_session1).-include("blt_otp.hrl").-export([ create_request/1, create_request/2, create_request/3]).-record(request, { url, verb, body_vars}).-record(response, {request_number, response_obj}).create_request(Request) -> create_request(Request, []).create_request(Request, Responses) -> create_request(Request, Responses, 0).create_request(#request{url="http://127.0.0.1:8080/1.2.1/dragoncraft/player/sanford/mission/"++OldMissionId} = Request, Responses, RequestNumber) ->
?DEBUG_MSG("~p Request for wall Found!~n", [?MODULE]),[LastResponseRecord|RestResponses] = Responses,{{_HttpVer, _ResponseCode, _ResponseDesc}, _Headers, ResponseBodyRaw} = LastResponseRecord#response.response_obj,{ok, ResponseBodyObj} = json:decode(ResponseBodyRaw),ResponseKVs = element(1, ResponseBodyObj),[Response_KV1 | [Response_KV2 | Response_KV_Rest ]] = ResponseKVs,Response_KV2_Key = element(1, Response_KV2),Response_KV2_Val = element(2, Response_KV2),ResponseDataObj = element(1, Response_KV2_Val),[ResponseDataKV | ResponseDataKVRest ] = ResponseDataObj,ResponseData_KV_Key = element(1, ResponseDataKV), %<<"identifier">>ResponseData_KV_Val = element(2, ResponseDataKV),MissionId = binary_to_list(ResponseData_KV_Val),Replaced = re:replace(Request#request.url, OldMissionId, MissionId++"/wall"),[ReHead|ReRest] = Replaced,[ReTail] = ReRest,?DEBUG_MSG("~p replaced head is ~p and tail ~p ~n", [?MODULE, ReHead, ReTail]),NewUrl = binary_to_list(ReHead)++binary_to_list(ReTail),NewRequest = Request#request{url=NewUrl};
create_request(Request, Responses, RequestNumber) -> Request.
EX. Session handler script for manipulating requests at runtime
Other notables
● Recently started using python's fabric library for rolling releases.
● Node cluster for multiprocess node. ● Node ipc with linux signals to raise and lower
logging levels and content updates.
Links
● http://dragoncraftthegame.com/● http://freeverse.com/● http://blog.ngmoco.com/● https://developer.mobage.com/● http://dena.jp/intl/