picnic software - developing a flexible and scalable application

Developing a flexible and scalable application

Presenting Today

Andrew Browne Dave Churchill Basarat Syed

Nick Josevski Matt Walkenhorst

Disclaimer

The views expressed here are solely those of the authors in their private capacity and do not in any way represent the views of the Picnic Software Pty Ltd, or any other associated entity or shareholder.

Picnic Software Pty Ltd has not approved, endorsed, embraced, friended, liked, tweeted, google-plused, pinterested, dugg, reddited, hacker-newsed, sanctioned or authorized this presentation.

Agenda

• App & Tech

• Infrastructure & Data Flow

• Deployment & Scalability

• Permissions with Neo4j

• Client Side Technologies

• Development & Testing Workflow

What we do

• We are an ISV (Independent Software Vendor)

– Building and running a workflow/collaboration application

• Partnerships with large businesses in the Advertising/Marketing sector

• Our customers are primarily large retailers

Our App

• Media Library

– High resolution files; PSDs, Video

• Collaborative workflows

– Coordinating inputs

• Photography, Illustrations, Graphic Design

– Producing advertising outputs

• Over multiple media channels – Catalogues, Billboards/Print, TV, Radio, Web

Our Tech Stack

• F#, C#, ASP.NET MVC, ServiceStack

• EventStore, Eventful, RavenDB, Neo4j

• Angular, TypeScript, Mocha, Node, Sass

• SignalR, AutoMapper, RabbitMQ, LINQ, NSubstitute, Nunit, FSUnit, FParsec, FsCheck

• AWS, Docker, Riemann, Logstash, PowerShell, PSake, NodaTime

• GitHub, TeamCity, Octopus Deploy, Slack, YouTrack

Flexibility

• Architecture choices to support

– Changes in requirements

– Future customers working in same domain

• When we started building we had

– Known customer workflows

– Known unknown customer workflows

– Unknown unknown customers workflows

Questions

Our Infrastructure

Andrew

Sydney region

CloudFront

S3Core, Neo4j,

RavenDbRabbit,

EventStore

Processing

Elastic Load

Balancing

Availability Zone x 2

HA Proxy

CloudFront

Elastic Load Balancer

Event Store

Writes

EventStore

Availability Zone x 2

Core Neo4j RavenDb Media

Processing

Rabbit

Disposable

Questions

Deployment & Scalability

Deployment

• Server Infrastructure – AWS

– Starting new instances largely a manual affair at this point

• Configuration Management – Chef

• Application deployment – Team City, Octopus Deploy

“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”

• Windows / Linux deployment nodes

• Each node is in an Environment

• Each node performs one or more Roles

• Each Role requires the running of one or more Recipes

• Recipes are stored in cookbooks

• Keep configuration in Git

• Keep CHEF Server configuration in Git

Octopus Deploy

• Windows code deployments only – Linux coming soon.

• Environments, Roles, Apps, Releases.• Deployment Process – steps executed on Roles to

“Tentacles”– Nuget packages retrieved from Team City– Store configuration as variables – Variable snapshot + Nuget packages = release– IIS, Windows Service, PowerShell steps available– Partial and Rolling deploys– Easy to roll back – just re-deploy last working release.

SOON: Blue / Green Deployments

• Asgard brings up a NEW (GREEN) copy of production AWS infrastructure.

• Automatically Bootstrap instances against Chef and Octopus.

• Smoke test GREEN environment• Add GREEN web servers to load balancer• Remove OLD (BLUE) web servers from load

balancer• Asgard tears down BLUE production AWS

infrastructure.

Message delivery between processes

• Requirements

– Reliable.

– Easy to manage.

– Easy to use.

– Low latency.

Things we looked at (2+ years ago)

• NServiceBus

– Tied to MSMQ at the time.

• MassTransit

– Lacked documentation.

– Not ready for prime-time at that point.

• RabbitMQ + EasyNetQ

– Simple, best fit for us.

– Wrote our own client – bad idea.

RabbitMQ

• Written in Erlang, maintained by Pivotal.

• Linux / Windows.

• Easy administration (Web, command line, JSON).

• Supports– Clustering and failover.

– Durable and HA queues.

– ‘At least once’ delivery guaranteed.

– Direct, Fan-out and Topic exchanges.

– Partitioning (vhosts), Federation & Shovelling.

How Picnic use RabbitMQ

• Setup

– Cluster of RabbitMQ servers behind ELB in multiple AZs.

• You can use HAProxy instead of ELB.

– EasyNetQ library by Mike Hadlow.

• Handles subscription, publish and reconnection logic.

– So solid now we hardly think about it.

Use case: Scaling of long-running CPU and IO intensive operations

• File format conversion, Zip bundling, PDF & InDesign creation etc.

• Uses Topic exchange. Currently just one topic!

• Subscribers are round-robined by the broker.

• Subscribers are isolated – no clustering.

• Scaling - just launch new instances.

• Redundancy – launch in multiple Azs

• This has worked really well for us.

Use case: Distribution to SignalRclients

• In-app notifications, long running task progress etc. to the browser.

• Each web server receives all messages (Fan-out exchange)

• Messages delivered to users / groups via SignalR

Questions

Scalable Permissions

Permissions Model

<approval-1-guid> Owner

<footwear-folder-guid> Write

• All entities in the system are identified by GUIDs• Each permission applies to a specific entity

Permissions Model

• Permissions also have a role

<footwear-folder-guid> Read

Permissions Model

• Each user has a corresponding "Me" permission

<user-dave-guid> Me

Permissions Model

• As events arrive, relationships are built up between permissions• e.g. a JobCreated event might give Owner permission to the creator

<user-dave-guid> Me

Permissions Model

• Relationships don't have to stem from the Me permissions• e.g. having Write permission on a folder could mean you can also Read

<user-dave-guid> Me

Permissions Model

• A user has a permission if there's a path from the user's Me node• This user doesn't have Read on the folder

<user-dave-guid> Me

Permissions Model

• A user has a permission if there's a path from the user's Me node• Giving Write on the footwear folder also gives the user Read

<user-dave-guid> Me

Original Implementation

ApprovalOwner

• RavenDB document for each permission• Records which permissions directly inherit this permission

Read <footwear-folder-guid>:

[Write <footwear-folder-guid>]

Write <footwear-folder-guid>:

[Me <user-dave-guid>]

Owner <approval-1-guid>:

Me <user-dave-guid>:

ApprovalOwner

• Worker task builds a second state document for each permission

• Records all permissions which inherit this permission

[Write <footwear-folder-guid>,

Me <user-dave-guid>]

Write <footwear-folder-guid>:

Owner <approval-1-guid>:

Me <user-dave-guid>:

ApprovalOwner

• A user has a permission if their Me appears in the permission's state document

[Write <footwear-folder-guid>,

Me <user-dave-guid>]

Original Implementation - Issues

• State documents can get large• Introduce intermediate groups

• Takes time for state documents to be updated• Cache and update permission graph in process

• Other processes can still sometimes see out-of-date permissions• Use a graph database!

New Implementation

• In process of switching to Neo4j

• Transactional updates• No need to calculate intermediate state• Faster• Simpler

• Still need to send some state data across to RavenDB for permissions when searching

Questions

Client Side Architecture

http://slides.com/basarat/picnic-frontend

Questions

Development Workflow

• GitHub

– Feature branches

– Pull Requests

Pull Requests

• Just over 2 months now using the PR based workflow

• Approx 120 closed pull requests so far

• How

– Features branches, GitHub Tagging, TeamCity build process comments against PR

– Asynchronous task for a team member

• “Here’s a PR please review when you can”

Team City & Tagging

• Why

– It was recommended to us.

– Supports consistent and frequent code reviews.

• Improves code quality

• Shares knowledge amongst the team

– Lets us catch some bugs much earlier.

• The wins

– Build server is more often in a green state.

• Can push to your PR branch to rely on CI to give feedback

– Knowledge sharing

• “Oh, that’s how you solved that”

• Reducing silo effects

– Offer constructive feedback to others

• “This could be made better by…”

– Bugs / issues caught

• Typos, debug code left in, incomplete/missed features

• Testing

– Each PR builds as if it was already merged

– Unit/Integration tests run against PR in TeamCity

– YouTrack bugs marked with build numbers to track deployment

• Agility

– Tracking feature changes as they evolve alongside code

• As with all documentation - trying our best to keep up to date

• PRs feedback can “code change not reflected in docs”

– Testing team

• Can review these changes and be more up to date with

• Review PRs for an idea of scope of changes and where to look for issues

Questions

Thanks

Other ALT.NET Presentations by us

Event Sourcing with F# - Andrew BrowneThinking in a document centric

world with RavenDB - Nick Josevski

picnic software - developing a flexible and scalable application

octopus deploychefgive

deployment process steps

picnic software pty

team citystore configuration

cookbookskeep configuration

requirementsfuture customers

blue green deploymentsasgard

environmenteach node

Technology

flexible micro-hydro systems scalable affordable for grid

tensorflow flexible, scalable, portable

physical and logical design of flexible and scalable

spatialhadoop: towards flexible and scalable...

the scalable and flexible access control solution · the...

decision forest: a scalable architecture for flexible...

scd: a scalable coherence directory with flexible sharer

building scalable and flexible cluster managers using...

infor plm accelerate: flexible, scalable plm for discrete...

cloud–powered plm. secure, scalable, flexible

omega: flexible, scalable schedulers for large compute...

open, flexible and scalable video surveillance platform

a flexible, scalable, distributed, fault tolerant

scalable, flexible payments platform - profitstars

high density scalable high throughput flexible · high...

flexible air interface for scalable service delivery

scalable and flexible multivariate process monitoring...

flexible adaptable scalable transfer capability path

building scalable, flexible enterprise architectures...

gmlp: building scalable and flexible graph neural networks