pranjal singh€¦ · 1. pranjal singh (speaker). big data and machine learning: two future pillars...

4
PRANJAL SINGH E-mail: [email protected] Senior Data Scientist, ANI Technologies (Ola Cabs/Foodpanda), Bengaluru, INDIA Phone: +91 8953 981 856 LinkedIn: https://www.linkedin.com/in/pranjalsingh ACADEMIC PROFILE Year Degree/Certificate Institute/Board CGPA/% 2015 B. Tech.-M. Tech. (Computer Science & Engineering) IIT Kanpur PG: 8.8/10.0 UG: 8.0/10.0 2010 AISSCE(XII) Sunbeam English School, Varanasi 96.2% 2008 AISSE(X) Sunbeam English School, Varanasi 92.8% AREAS OF INTEREST Machine Learning Natural Language Processing Big Data Mining Information Retrieval Artificial Intelligence ACHIEVEMENTS 2 Patents filed while being an employee at Visa Inc. in US Patent Office and 4 academic publications Secured AIR 226 IIT JEE-2010, AIR 288 & State Rank 17 in AIEEE-2010, Rank 16 in UPTU-2010 and District Rank 1 in AISSCE-2010 Selected for Summer Undergraduate Research Fellowship Award-2012 at IISC Bangalore & NCTU(Taiwan) Fellowship-2012 Selected for InCNTRE Summer of Networking-2013 at Indiana University, USA & Yahoo Summer School-2013 Awarded Merit Scholarship-2010 for Professional Studies by CBSE and UP Govt. Science and Technology Scholarship-2010 Finalist of Jnana Vijnana Pratibha Khoj in 2007 and 2008 (National Level Competition with participation of over 100 schools) Secured Rank 4 for building an “Automated Toll Collection” app among more than 200 Teams in InMobi Hackathon 2016 Awarded 2 Go Beyond Awards for successfully demonstrating key achievements of Bangalore Team at VISA Headquarter, USA TALKS & PUBLICATIONS 1. Pranjal Singh (Speaker). Big Data and Machine Learning: Two Future Pillars of IT Industry. Re-Work Deep Learning Summit Asia, Singapore. Oct 2016 2. Jenny S. Li, Li-Chiou Chen, V. Monaco, Pranjal Singh, Charles C. Tappert. A Comparison of Classifiers and Features for Authorship Authentication of Social Networking Messages.Concurrency & Computation:Practice & Experience.Wiley Online Library.Aug 2016 3. Pranjal Singh, Amitabha Mukerjee. Words are not equal: Graded Weighting Model for building Composite Document Vectors. 12th International Conference on Natural Language Processing. Nov 2015. 4. Pranjal Singh, Amitabha Mukerjee. Word Vector Averaging: Parserless Approach to Sentiment Analysis. regICON: Regional Symposium on Natural Language Processing, IIT BHU. Mar 2015. SUMMER INTERNSHIPS Authorship Authentication (Pace University, New York, USA) (May’14-Jul’14) Studied the effect of various classifiers on Facebook data of 10 users (400 short messages each) Tested various sets of stylometry and ad hoc social networking features for better classification Designed an algorithm that combined top 3 classifiers linearly (SVM, ANN and Decision Tree) Achieved very good authentication results (accuracy of upto 88%) with new algorithm Sponsored Content (Bell Labs, India) (May’13-Jul’13) Proposed a solution for content providers like YouTube, ESPN to sponsor multimedia contents Sponsored Contents on YouTube and Play-Store; these were relaxed from users’ mobile data Set up a VPN server and a nodeJS script that could store user statistics and modify HTTP data Studied its impact on end users’ usage behavior of mobile data with a self-built Android app MultiWay Cut (Max Planck Institute, Germany) (May’12-Jul’12) Studied the concept of Important Separators, MultiWay Cut and various graph algorithms Implemented the problem of Multi Way Cut using C++ and LEDA; applied on various graphs Achieved optimal results (parted pairs of source and sink with minimum cut) on various graphs PROFESSIONAL Ola Cabs/Foodpanda Bengaluru, India Senior Data Scientist Fraud Identification @Foodpanda (July’18-Present) Built a deep Autoencoder Model for detection of anomalous orders and an entropy based fraud syndicate identification model Developed a random forest based fraud prediction model on orders achieving an AUC of 0.90 & precision of 90% at 50% recall Developed an unsupervised deep learning language model for detection of fraud customers who have jittery address and email Developed a tf-idf based NLP model for predicting duplicate vendor listings with similar names to improve customer experience Demand Shaping @Foodpanda (July’18-Present) Developed a customer segmentation algorithm for finding premium Ola users for selective targeting and effective monetary burn foreseeing their retention behavior at Foodpanda. This premium base is being used for food campaigns across India. Built a supervised model for relevant customer prediction to assess retention probability of new users achieving an AUC of 0.93

Upload: others

Post on 26-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PRANJAL SINGH€¦ · 1. Pranjal Singh (Speaker). Big Data and Machine Learning: Two Future Pillars of IT Industry. Re-Work Deep Learning Summit Asia, Singapore. Oct 2016 2. Jenny

PRANJAL SINGH E-mail: [email protected] Senior Data Scientist, ANI Technologies (Ola Cabs/Foodpanda), Bengaluru, INDIA Phone: +91 8953 981 856

LinkedIn: https://www.linkedin.com/in/pranjalsingh

ACADEMIC PROFILE

Year Degree/Certificate Institute/Board CGPA/% 2015 B. Tech.-M. Tech. (Computer Science & Engineering) IIT Kanpur PG: 8.8/10.0 UG: 8.0/10.0

2010 AISSCE(XII) Sunbeam English School, Varanasi 96.2%

2008 AISSE(X) Sunbeam English School, Varanasi 92.8%

AREAS OF INTEREST Machine Learning Natural Language Processing Big Data Mining Information Retrieval Artificial Intelligence

ACHIEVEMENTS 2 Patents filed while being an employee at Visa Inc. in US Patent Office and 4 academic publications Secured AIR 226 IIT JEE-2010, AIR 288 & State Rank 17 in AIEEE-2010, Rank 16 in UPTU-2010 and District Rank 1 in AISSCE-2010 Selected for Summer Undergraduate Research Fellowship Award-2012 at IISC Bangalore & NCTU(Taiwan) Fellowship-2012 Selected for InCNTRE Summer of Networking-2013 at Indiana University, USA & Yahoo Summer School-2013 Awarded Merit Scholarship-2010 for Professional Studies by CBSE and UP Govt. Science and Technology Scholarship-2010 Finalist of Jnana Vijnana Pratibha Khoj in 2007 and 2008 (National Level Competition with participation of over 100 schools)

Secured Rank 4 for building an “Automated Toll Collection” app among more than 200 Teams in InMobi Hackathon 2016

Awarded 2 Go Beyond Awards for successfully demonstrating key achievements of Bangalore Team at VISA Headquarter, USA

TALKS & PUBLICATIONS

1. Pranjal Singh (Speaker). Big Data and Machine Learning: Two Future Pillars of IT Industry. Re-Work Deep Learning

Summit Asia, Singapore. Oct 2016

2. Jenny S. Li, Li-Chiou Chen, V. Monaco, Pranjal Singh, Charles C. Tappert. A Comparison of Classifiers and Features for Authorship

Authentication of Social Networking Messages.Concurrency & Computation:Practice & Experience.Wiley Online Library.Aug 2016

3. Pranjal Singh, Amitabha Mukerjee. Words are not equal: Graded Weighting Model for building Composite Document Vectors.

12th International Conference on Natural Language Processing. Nov 2015.

4. Pranjal Singh, Amitabha Mukerjee. Word Vector Averaging: Parserless Approach to Sentiment Analysis. regICON: Regional

Symposium on Natural Language Processing, IIT BHU. Mar 2015.

SUMMER INTERNSHIPS

Authorship Authentication (Pace University, New York, USA)

(May’14-Jul’14)

Studied the effect of various classifiers on Facebook data of 10 users (400 short messages each) Tested various sets of stylometry and ad hoc social networking features for better classification Designed an algorithm that combined top 3 classifiers linearly (SVM, ANN and Decision Tree) Achieved very good authentication results (accuracy of upto 88%) with new algorithm

Sponsored Content (Bell Labs, India) (May’13-Jul’13)

Proposed a solution for content providers like YouTube, ESPN to sponsor multimedia contents Sponsored Contents on YouTube and Play-Store; these were relaxed from users’ mobile data Set up a VPN server and a nodeJS script that could store user statistics and modify HTTP data Studied its impact on end users’ usage behavior of mobile data with a self-built Android app

MultiWay Cut (Max Planck Institute, Germany)

(May’12-Jul’12)

Studied the concept of Important Separators, MultiWay Cut and various graph algorithms Implemented the problem of Multi Way Cut using C++ and LEDA; applied on various graphs Achieved optimal results (parted pairs of source and sink with minimum cut) on various graphs

PROFESSIONAL

Ola Cabs/Foodpanda Bengaluru, India Senior Data Scientist Fraud Identification @Foodpanda (July’18-Present) Built a deep Autoencoder Model for detection of anomalous orders and an entropy based fraud syndicate identification model Developed a random forest based fraud prediction model on orders achieving an AUC of 0.90 & precision of 90% at 50% recall Developed an unsupervised deep learning language model for detection of fraud customers who have jittery address and email Developed a tf-idf based NLP model for predicting duplicate vendor listings with similar names to improve customer experience

Demand Shaping @Foodpanda (July’18-Present) Developed a customer segmentation algorithm for finding premium Ola users for selective targeting and effective monetary

burn foreseeing their retention behavior at Foodpanda. This premium base is being used for food campaigns across India. Built a supervised model for relevant customer prediction to assess retention probability of new users achieving an AUC of 0.93

Page 2: PRANJAL SINGH€¦ · 1. Pranjal Singh (Speaker). Big Data and Machine Learning: Two Future Pillars of IT Industry. Re-Work Deep Learning Summit Asia, Singapore. Oct 2016 2. Jenny

Customer Experience @Foodpanda (July’18-Present) Developed a vendor cancellation prediction model which could predict in advance if the vendor would start cancelling in future Built a supervised model learning from existing cancellations which achieved a precision of 90% at 75% recall; AUC of 0.93

Allocation Layer Optimization and Supply Shaping (Apr’18-July’18) Formulated cab allocation as a multi-objective optimization problem; objectives being D/S, GMV, driver earnings and allocations Used Hungarian optimization for matching customers with driver partners achieving better business metrics than current

method in simulation. Roll-out already planned for Tier-2 cities. Target of ~1% increase in allocations pan India

Data Scientist Share Matching Optimization (Oct’17-Mar’18) Designed a graph based hierarchical optimization algorithm for Ola share customer matching to control growth/customer

experience/profitability, as per time and business requirements. Used genetic algorithm for faster weight identification New matching algorithm is being used pan India now and has shown huge increase of 5% ENTR for share business unit Developed a supervised share occupancy prediction model for enabling a faster decision at allocation layer between partial

matching & idle allotment. This brought another 2% increase in ENTR for share business pan India

Share Community Detection (Feb’18-Mar’18) Built a community model using directionality, demand distribution across time and linear separation of source-destination pairs Used Multidimensional Scaling(MDS) to transform features to Euclidean space and k-means for clustering communities Key win was a 2% improvement in matches within the community over the old model and pan India usage in new share pricing

Ola Rental Cross Sell (Jun’17-Sep’17) Built a prediction model which utilizes customer characteristics, ride characteristics, financial propensity and locational

attributes to predict whether a City Taxi ride or sequence of such rides can be called as rental and with what confidence Trained model achieved significantly high recall and precision with an AUC of ~0.95 Increased Ola Rental confirmed bookings by roughly 2.6% and an additional GMV of 21 Lakhs/month for top-7 Tier-1 cities

Customer Segmentation (Jun’17-Aug’17) Gender prediction model using deep learning based sequence modeling achieving an accuracy of 95% & active usage in Ola Play Customer segmentation for predicting financial propensity on the basis of Ola footprints including ride and conversion behavior Project Guardian launched across India to ensure safety of Ola customers where gender model is a key contributor

VISA Inc. Bengaluru, India Senior Software Engineer, Data

VISE: VISA InHouse Search Engine (Patent Filed) (Nov’15-Feb’16) Developed a model and a framework to identify top-k relevant merchants from transaction data in real-time Built a Boosting Model of 3 classifiers using NLP features(Phonetics & Distance) that achieved an accuracy of ~95.9% Identified merchants from a huge dataset more intelligently and accurately directly impacting Real-Time Offers’ system Technology: Hadoop, Spark, Hive, Apache Parquet, Spark SQL, Machine Learning, NLP, Python

Pattern Identification in Spark with FP-Tree (Mar’16) Designed and developed a scalable FP-Tree model to clean raw merchant names from transactions & identify merchant groups The model had the ability to identify the junk in the raw name and clean it to give richer information Technology: Hadoop, Spark, Hive, Apache Parquet, NLP, Python

Merchant Stamping in Real Time with Spark and Kafka (Mar’16-May’16) Designed and developed a real-time merchant stamping framework in Spark for faster downstream consumption Intelligently reduced lookup size and optimized record matching time with a one-way fingerprint for each transaction Analyzed and designed a hierarchical framework which utilizes k-means clustering to form clusters of redundant records Reduced the master cache by more than 85% and improved performance by 35% achieving a record rate of stamping, directly

impacting performance of loyalty platforms and thus helped in easy migration of existing systems to LTA Technology: Hadoop, Spark, Kafka, Java, Python

Merchant Location Compliance (Sep’16-Present) Designed and developed a predictive model to flag non-compliant transactions which is scalable and high performing Involved fuzzy string matching to validate string entities and regression to assign feature weights for scoring Text mining and OCR for location identification from legal documents using Conditional Random Fields Framework helped the compliance teams to prevent revenue leakage in order of millions due to ambiguities in incoming data Technology: Hadoop, Spark, Hive, Java, Spark SQL, Parquet

Page 3: PRANJAL SINGH€¦ · 1. Pranjal Singh (Speaker). Big Data and Machine Learning: Two Future Pillars of IT Industry. Re-Work Deep Learning Summit Asia, Singapore. Oct 2016 2. Jenny

Software Engineer, Data Merchant Location Credibility (Patent Filed) (Oct’15-Nov’15) Designed a framework which models merchants and deduce their properties using a scalable & robust Machine Learning model Built a clustering (DBSCAN) & boosting model to detect anomaly in real-time achieving an accuracy of 98% on 2M transactions Helped in saving millions of transactions from risk and fraud without using any personal information of cardholders Technology: Hadoop, Python, Hive, Spark

MTECH THESIS Decompositional Semantics for Document Embedding

Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur

We developed a simple and robust technique to generate document vector using decompositional semantics We proposed a graded weighting schema for building document vectors from word vectors with distributed semantics We released the largest movie review corpus in Hindi containing 697 movie reviews collected from Dainik Jagran and Navbharat Achieved state-of-the art results for sentiment analysis on Hindi movie and product reviews, 90.30% & 92.89% accuracies resp. Achieved state-of-the art results for sentiment analysis on IMDB movie reviews, 94.19% accuracy, with 1.6% improvement and

on Amazon product reviews, 92.91% accuracy, with 7% improvement

KEY COURSE PROJECTS

Movie Recommendation System (Machine Learning) Aug’13-Nov’13 Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur Built a system that predicts whether an unknown user is likely to see a given movie using the MovieLens dataset Predicted movie rating on a scale of 1 to 5 using User Based and Item Based Collaborative Filtering Achieved Root Mean Square Error of 0.947 and Mean Absolute Error of 0.741 in rating predictions

Measuring Similarity between Documents (Natural Language Processing) Aug’13-Nov’13 Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur Studied various approaches to compute similarity between two documents such as Pearson Correlation, Cosine Similarity, etc. Classified documents into coherent cluster using k-means with various distance measures; achieved entropy of the order of 0.06

3D Reconstruction from Several Images (Computer Vision) Jan’14-Apr’14 Advisor: Vinay P Namboodiri, Professor, CSE Department, IIT Kanpur Created 3D models from a set of images using Structure from Motion from point correspondences of multiple images Implemented Sift Descriptor for feature matching, and used camera calibration, depth determination, triangulation techniques

Unsupervised Cross Lingual Alignment Jan’14-Apr’14 Advisor: Satyadev Nandakumar, Professor, CSE Department, IIT Kanpur Proposed a framework that could align bilingual text without supervision; used PLSA and ADIOS as baseline system Worked on a manually built corpus on coal scam in 2 languages (English & Hindi) containing around 50,000 tokens

Empirical Study of Cross-Lingual Unsupervised Alignment Based on Syntactic and Distributional Features

Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur Aug’13-Nov’13 Studied various algorithms such a k-means, PLSA, SVM which could detect important topics in a single document Isolated aligned tokens in both Hindi and English by using PLSA and used ADIOS to separate syntactic classes

Identification of Emotions in Text (Artificial Intelligence) Jan’12-Apr’12 Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur

Studied various NLP Algorithms such as PLSA & LSA which could identify latent topics in the document Modeled the document (ISEAR corpus) into few number of topics which were essentially emotions, using PLSA and PCFG parser

Achieved very good latent clusters corresponding to each of the 7 emotions; achieved good scores for sentiment analysis part

OTHER PROJECTS

Computer Networks: Packet Sniffer for capturing packets in local network by acting as gateway

Compilers: Compiler for a subset of ADA ver2 and code generation for MIPS architecture Mobile Computing: Desktop control using an Android app over WiFi using various types of gestures and sensors Database: Built a web application that supported Hostel Management electronically using PHP and MySQL Operating System: Developed PintOS to support system calls, message queues, file system and virtual memory with paging

HACKATHON ParkIt: Hassle Free Toll Collection (InMobi Hackathon, Bengaluru) Feb’16 Designed and developed a model to promote cashless push payments at the predefined toll booth making them self-serviceable Integrated Zomato API for on-route eating places suggestion and used card to card push payment using Two Way SSL

Authentication for a hassle free payment with Google Cloud Messaging to send push messages of transaction status to toll Business Value: Transparent and hassle free automated solution to the toll problem where operating time as well as cost

becomes significantly less while there is an extra revenue scope for all parties

Page 4: PRANJAL SINGH€¦ · 1. Pranjal Singh (Speaker). Big Data and Machine Learning: Two Future Pillars of IT Industry. Re-Work Deep Learning Summit Asia, Singapore. Oct 2016 2. Jenny

Technology: Android, Google Cloud Messaging, Python, VISA Push Payment, Zomato API

InfiHealth: A Framework to simplify 3-Party Model of Health (Mylan Labs Hackathon, Bengaluru) Mar’16 Designed and developed a framework to simplify the 3-Party Model of Doctor, Patient and Pharmacist in Health-Tech domain Real Time Payment using VISA API enabled instantaneous transfer of money between all the parties Real Time Social Network Feed helped patients and doctors to monitor diseases spread in their respective areas Business Value: Operating time of all parties reduces significantly whereas ease of doing business becomes high Technology: Java Spring, Android, SQL, Python, Image Processing, Natural Language Processing

SKILLS

Languages & Frameworks: C, C++, Java, Python, Unix, Android, Assembly Tools & Technologies: Hadoop, Spark, Kafka, Pig, Hive, WEKA, HTML5, MySQL, PHP, NodeJS, Git, Pandas, Keras, Tensorflow

RELEVANT COURSES

Machine Learning Tools and Techniques Artificial Intelligence Social Network Analysis

Mathematics for Machine Learning Computing for Data Analysis Database Systems

Natural Language Processing Computer Vision and Image Processing Mobile Computing

Data Structure & Algorithms Discrete Mathematics Operating System

POSTIONS OF RESPONSIBILITY Student Guide, Counselling Service Jul’11-Apr’12 Mentored 6 freshmen of 2011 batch, helping them adjust to the campus environment and academics Coordinated with a team of more than 150 people to ensure smooth conduction of Orientation Program

Executive, Show Management, Udghosh’12 (Annual Sports Festival, IIT Kanpur) Jul’12-Sep’12 Led a 2-Tier team of over 60 students and 50 workers for strategic planning and smooth conduction of various events Handled the logistics of all dignitaries and responsible for zero-delay, procurement of infrastructural needs

Coordinator, Informal Events, Udghosh’11 (Annual Sports Festival, IIT Kanpur) Jul’11-Sep’11 Led a 2-Tier team of over 30 students for conduction of fun and informal events including procurement of goods and materials Introduced various new events such as wall climbing leading to larger footfall in informal events adding to more fun in the fest

EXTRA CURRICULAR ACTIVITIES AND ACHIEVEMENTS Finalist of EXLs’ PAN IIT Data Analytics Competition Excellent Quotient-2014 Conducted National Programming League-2012, NIT Trichy programming event at Kanpur-Center with a footfall of 250 people Won Business Quiz in Takneek’10 (Intra-IITK Technical Festival) among 100 participants Member of National Cadet Corps for the period July ‘10 - April ‘11