julia computing - an alternative to hadoop

33
Julia Computing Shaurya Shekhar (14BCE0497) Aarushi Thakral (14BCE0499)

Upload: shaurya-shekhar

Post on 12-Jan-2017

340 views

Category:

Engineering


5 download

TRANSCRIPT

Page 1: Julia Computing - an alternative to Hadoop

Julia ComputingShaurya Shekhar (14BCE0497) Aarushi Thakral (14BCE0499)

Page 2: Julia Computing - an alternative to Hadoop

• 4 Main Creators - one of which is Indian - Viral B. Shah.

• He started working on it on the same day as he joined UIDAI - the worlds largest biometric collection and collation initiative - Aadhaar by Day - Julia By Night

• MIT Professors played instrumental role

Page 3: Julia Computing - an alternative to Hadoop

The Need For Julia

MATLAB for matrix calculations and linear

algebra

R Language for statistics

Ruby & Python for web development

All Languages Serve Different Purposes

However, they aren’t as fast as C or Java.

started off as a network simulation tool.

It was meant to be ‘good at EVERYTHING’.

Page 4: Julia Computing - an alternative to Hadoop

What Is Julia?• Data Science is nowadays a very big deal.

• It involves tonnes of data and the analysis of this data to evolve meaningful inferences.

• Most of the softwares which help in these are proprietary like MATLAB and Wolfram’s Mathematica.

• is the first free alternative.

Page 5: Julia Computing - an alternative to Hadoop

How Is It Faster?THE PROBLEM

• Programmers use tools to translate languages like Ruby & Python into faster languages like C & Java.

• This in turn then needs to be compiled into machine code - language that the machine understands.

• This makes it slower, adds complexity and allows more room for error

Solution

Eliminates the need of the intermediary step.

Uses LLVM, a compiler developed by UI-UC and enhanced by Apple & Google

Page 6: Julia Computing - an alternative to Hadoop

How Does It Compare?

Page 7: Julia Computing - an alternative to Hadoop

Alternative to Hadoop• Hadoop is the widely used data crunching system

developed by Yahoo and used by Facebook.

• Hadoop breaks up a larger problem into many smaller problems, spreads them across many systems.

• Julia not only incorporates this fundamental principle of ‘design parallelism’, but also enhances it.

Page 8: Julia Computing - an alternative to Hadoop

Features• Multiple Dispatch

• Dynamic Programming Language

• Good Implementation Speed

• Metaprogramming

• Built-in Package Manager

• Designed for Parallelism & Cloud Computing

Page 9: Julia Computing - an alternative to Hadoop

Features (Cont.)

• User-Defined Data Types as fast as built in ones

• Elegant & Extensible Conversions

• MIT-licensed (free & open-source)

Page 10: Julia Computing - an alternative to Hadoop

Multiple Dispatch• We All Know What Polymorphism is about.

• Number & type of arguments have to be analyzed to understand which of the function implementations needs to be executed.

• This is different from what we’ve learnt in C++, as in C++, the switch is made during compile time, whereas here it is made at run-time.

Page 11: Julia Computing - an alternative to Hadoop

Dynamic Programming Language

• It is a term used in computer science to describe a class of high-level programming language, which at run-time, execute many common programming behaviors that static programming languages perform during compilation

• First native in the Lisp language

Page 12: Julia Computing - an alternative to Hadoop

Good Implementation Speed

• Almost at most times equivalent to C (the grand-daddy of programming languages)

• The secret behind the highly efficient Julia computing is:

1. Just-In-Time (JIT) Compilation using LLVM Compiler Framework

2. Language Design

Page 13: Julia Computing - an alternative to Hadoop

LLVM (Low Level Virtual Machine)

• It is a collection of modular and reusable compiler and toolchain technologies

• Written in C++

• Can generate relocatable machine code at compile-time or link-time or even binary machine code at runtime

• It supports language independent instruction set & type system

• Each instruction is in static single assessment form (SSA), which means that each variable is assigned once and is frozen

Page 14: Julia Computing - an alternative to Hadoop

Metaprogramming• It is the ability to write programming languages which

treat their programs as their data. The program could be designed to read, generate, analyze or transform other programs & even modify itself while running

• Hence technically, the program operates on code itself, this involves inspecting & modifying the code as it runs

• The strongest legacy of Lisp in Julia is its metaprogramming support

Page 15: Julia Computing - an alternative to Hadoop

Built-in Package Manager• It has a built-in manager for installing add-on functionality.

• All package commands are found in Pkg module & are included with ‘Base’ install itself.

• This ensures that libraries of other languages can be ported easily into Julia.

• Example:

1. ’ccall’ is used to access the C shared libraries

2. It has Unicode support for allowing math operators

3. For Strings, it has UTF-8, UTF-16 & UTF-32 & ASCII

4. Markup Languages Like HTML & XML are also supported

Page 16: Julia Computing - an alternative to Hadoop

Parallelism• It provides a multiprocessing environment based on

message passing to allow programs to run on multiple processors in shared or distributed memory

• Implementation of message passing is one-sided

User has to manage only one processor

These do not look like message send & receive, instead resemble high-level function calls

Two key notions are: remote calls & remote references

Page 17: Julia Computing - an alternative to Hadoop

Remote Calls & Remote References

• A remote reference is an object that can be used from any processor to refer to an object stored on a particular processor

• A remote call is a request by a processor to call a certain function on certain arguments on another (possibly the same) processor. A remote call returns a remote reference.

• How remote calls are handled in the program flow:

1. Remote Calls return immediately

2. Processor proceeds to next operation while remote call happens somewhere else

3. You can wait for it to finish by calling ’wait’ on its remote reference

4. You can obtain full value of result by ‘fetch’

Page 18: Julia Computing - an alternative to Hadoop

Conversions

• Conversions of values to various types is carried about by the ‘convert’ function

• Its a function which accepts two arguments, the first is a type object, the second is a value to convert to that type

• It is also really easy to define our own conversions

Page 19: Julia Computing - an alternative to Hadoop

Licences• The core of Julia implementation is licensed under

the MIT License.

• Various libraries used by Julia have their own licenses.

• It is an open-source language which gives people the flexibility of modifying the language to better suit their needs.

Page 20: Julia Computing - an alternative to Hadoop

Plotting Capabilities• Since, this is being used to handle large amounts of

data, it is only normal for it to be able to aptly visualize data easily.

• It uses various libraries to enable it to plot graphs, flow charts, pie charts like:

1. PyPlot to call Python’s matplotlib from Julia with little or no overhead (linspace)

2. Gadfly is another implementation of a different style of grammar of graphics (draw)

Page 21: Julia Computing - an alternative to Hadoop

Similar to MATLAB (using PyPlot)

Page 22: Julia Computing - an alternative to Hadoop

Using Gadfly

Page 23: Julia Computing - an alternative to Hadoop

Just For Fun

Page 24: Julia Computing - an alternative to Hadoop

Lets Take An ExampleWe Use The N-Queens Problem

Page 25: Julia Computing - an alternative to Hadoop
Page 26: Julia Computing - an alternative to Hadoop
Page 27: Julia Computing - an alternative to Hadoop

Understanding Time Notations

Page 28: Julia Computing - an alternative to Hadoop
Page 29: Julia Computing - an alternative to Hadoop
Page 30: Julia Computing - an alternative to Hadoop
Page 31: Julia Computing - an alternative to Hadoop
Page 32: Julia Computing - an alternative to Hadoop

Julia In The Future

Page 33: Julia Computing - an alternative to Hadoop

Thank You