julia computing - an alternative to hadoop
TRANSCRIPT
Julia ComputingShaurya Shekhar (14BCE0497) Aarushi Thakral (14BCE0499)
• 4 Main Creators - one of which is Indian - Viral B. Shah.
• He started working on it on the same day as he joined UIDAI - the worlds largest biometric collection and collation initiative - Aadhaar by Day - Julia By Night
• MIT Professors played instrumental role
The Need For Julia
MATLAB for matrix calculations and linear
algebra
R Language for statistics
Ruby & Python for web development
All Languages Serve Different Purposes
However, they aren’t as fast as C or Java.
started off as a network simulation tool.
It was meant to be ‘good at EVERYTHING’.
What Is Julia?• Data Science is nowadays a very big deal.
• It involves tonnes of data and the analysis of this data to evolve meaningful inferences.
• Most of the softwares which help in these are proprietary like MATLAB and Wolfram’s Mathematica.
• is the first free alternative.
How Is It Faster?THE PROBLEM
• Programmers use tools to translate languages like Ruby & Python into faster languages like C & Java.
• This in turn then needs to be compiled into machine code - language that the machine understands.
• This makes it slower, adds complexity and allows more room for error
Solution
Eliminates the need of the intermediary step.
Uses LLVM, a compiler developed by UI-UC and enhanced by Apple & Google
How Does It Compare?
Alternative to Hadoop• Hadoop is the widely used data crunching system
developed by Yahoo and used by Facebook.
• Hadoop breaks up a larger problem into many smaller problems, spreads them across many systems.
• Julia not only incorporates this fundamental principle of ‘design parallelism’, but also enhances it.
Features• Multiple Dispatch
• Dynamic Programming Language
• Good Implementation Speed
• Metaprogramming
• Built-in Package Manager
• Designed for Parallelism & Cloud Computing
Features (Cont.)
• User-Defined Data Types as fast as built in ones
• Elegant & Extensible Conversions
• MIT-licensed (free & open-source)
Multiple Dispatch• We All Know What Polymorphism is about.
• Number & type of arguments have to be analyzed to understand which of the function implementations needs to be executed.
• This is different from what we’ve learnt in C++, as in C++, the switch is made during compile time, whereas here it is made at run-time.
Dynamic Programming Language
• It is a term used in computer science to describe a class of high-level programming language, which at run-time, execute many common programming behaviors that static programming languages perform during compilation
• First native in the Lisp language
Good Implementation Speed
• Almost at most times equivalent to C (the grand-daddy of programming languages)
• The secret behind the highly efficient Julia computing is:
1. Just-In-Time (JIT) Compilation using LLVM Compiler Framework
2. Language Design
LLVM (Low Level Virtual Machine)
• It is a collection of modular and reusable compiler and toolchain technologies
• Written in C++
• Can generate relocatable machine code at compile-time or link-time or even binary machine code at runtime
• It supports language independent instruction set & type system
• Each instruction is in static single assessment form (SSA), which means that each variable is assigned once and is frozen
Metaprogramming• It is the ability to write programming languages which
treat their programs as their data. The program could be designed to read, generate, analyze or transform other programs & even modify itself while running
• Hence technically, the program operates on code itself, this involves inspecting & modifying the code as it runs
• The strongest legacy of Lisp in Julia is its metaprogramming support
Built-in Package Manager• It has a built-in manager for installing add-on functionality.
• All package commands are found in Pkg module & are included with ‘Base’ install itself.
• This ensures that libraries of other languages can be ported easily into Julia.
• Example:
1. ’ccall’ is used to access the C shared libraries
2. It has Unicode support for allowing math operators
3. For Strings, it has UTF-8, UTF-16 & UTF-32 & ASCII
4. Markup Languages Like HTML & XML are also supported
Parallelism• It provides a multiprocessing environment based on
message passing to allow programs to run on multiple processors in shared or distributed memory
• Implementation of message passing is one-sided
User has to manage only one processor
These do not look like message send & receive, instead resemble high-level function calls
Two key notions are: remote calls & remote references
Remote Calls & Remote References
• A remote reference is an object that can be used from any processor to refer to an object stored on a particular processor
• A remote call is a request by a processor to call a certain function on certain arguments on another (possibly the same) processor. A remote call returns a remote reference.
• How remote calls are handled in the program flow:
1. Remote Calls return immediately
2. Processor proceeds to next operation while remote call happens somewhere else
3. You can wait for it to finish by calling ’wait’ on its remote reference
4. You can obtain full value of result by ‘fetch’
Conversions
• Conversions of values to various types is carried about by the ‘convert’ function
• Its a function which accepts two arguments, the first is a type object, the second is a value to convert to that type
• It is also really easy to define our own conversions
Licences• The core of Julia implementation is licensed under
the MIT License.
• Various libraries used by Julia have their own licenses.
• It is an open-source language which gives people the flexibility of modifying the language to better suit their needs.
Plotting Capabilities• Since, this is being used to handle large amounts of
data, it is only normal for it to be able to aptly visualize data easily.
• It uses various libraries to enable it to plot graphs, flow charts, pie charts like:
1. PyPlot to call Python’s matplotlib from Julia with little or no overhead (linspace)
2. Gadfly is another implementation of a different style of grammar of graphics (draw)
Similar to MATLAB (using PyPlot)
Using Gadfly
Just For Fun
Lets Take An ExampleWe Use The N-Queens Problem
Understanding Time Notations
Julia In The Future
Thank You