pranjal singh€¦ · 1. pranjal singh (speaker). big data and machine learning: two future pillars...
TRANSCRIPT
PRANJAL SINGH E-mail: [email protected] Senior Data Scientist, ANI Technologies (Ola Cabs/Foodpanda), Bengaluru, INDIA Phone: +91 8953 981 856
LinkedIn: https://www.linkedin.com/in/pranjalsingh
ACADEMIC PROFILE
Year Degree/Certificate Institute/Board CGPA/% 2015 B. Tech.-M. Tech. (Computer Science & Engineering) IIT Kanpur PG: 8.8/10.0 UG: 8.0/10.0
2010 AISSCE(XII) Sunbeam English School, Varanasi 96.2%
2008 AISSE(X) Sunbeam English School, Varanasi 92.8%
AREAS OF INTEREST Machine Learning Natural Language Processing Big Data Mining Information Retrieval Artificial Intelligence
ACHIEVEMENTS 2 Patents filed while being an employee at Visa Inc. in US Patent Office and 4 academic publications Secured AIR 226 IIT JEE-2010, AIR 288 & State Rank 17 in AIEEE-2010, Rank 16 in UPTU-2010 and District Rank 1 in AISSCE-2010 Selected for Summer Undergraduate Research Fellowship Award-2012 at IISC Bangalore & NCTU(Taiwan) Fellowship-2012 Selected for InCNTRE Summer of Networking-2013 at Indiana University, USA & Yahoo Summer School-2013 Awarded Merit Scholarship-2010 for Professional Studies by CBSE and UP Govt. Science and Technology Scholarship-2010 Finalist of Jnana Vijnana Pratibha Khoj in 2007 and 2008 (National Level Competition with participation of over 100 schools)
Secured Rank 4 for building an “Automated Toll Collection” app among more than 200 Teams in InMobi Hackathon 2016
Awarded 2 Go Beyond Awards for successfully demonstrating key achievements of Bangalore Team at VISA Headquarter, USA
TALKS & PUBLICATIONS
1. Pranjal Singh (Speaker). Big Data and Machine Learning: Two Future Pillars of IT Industry. Re-Work Deep Learning
Summit Asia, Singapore. Oct 2016
2. Jenny S. Li, Li-Chiou Chen, V. Monaco, Pranjal Singh, Charles C. Tappert. A Comparison of Classifiers and Features for Authorship
Authentication of Social Networking Messages.Concurrency & Computation:Practice & Experience.Wiley Online Library.Aug 2016
3. Pranjal Singh, Amitabha Mukerjee. Words are not equal: Graded Weighting Model for building Composite Document Vectors.
12th International Conference on Natural Language Processing. Nov 2015.
4. Pranjal Singh, Amitabha Mukerjee. Word Vector Averaging: Parserless Approach to Sentiment Analysis. regICON: Regional
Symposium on Natural Language Processing, IIT BHU. Mar 2015.
SUMMER INTERNSHIPS
Authorship Authentication (Pace University, New York, USA)
(May’14-Jul’14)
Studied the effect of various classifiers on Facebook data of 10 users (400 short messages each) Tested various sets of stylometry and ad hoc social networking features for better classification Designed an algorithm that combined top 3 classifiers linearly (SVM, ANN and Decision Tree) Achieved very good authentication results (accuracy of upto 88%) with new algorithm
Sponsored Content (Bell Labs, India) (May’13-Jul’13)
Proposed a solution for content providers like YouTube, ESPN to sponsor multimedia contents Sponsored Contents on YouTube and Play-Store; these were relaxed from users’ mobile data Set up a VPN server and a nodeJS script that could store user statistics and modify HTTP data Studied its impact on end users’ usage behavior of mobile data with a self-built Android app
MultiWay Cut (Max Planck Institute, Germany)
(May’12-Jul’12)
Studied the concept of Important Separators, MultiWay Cut and various graph algorithms Implemented the problem of Multi Way Cut using C++ and LEDA; applied on various graphs Achieved optimal results (parted pairs of source and sink with minimum cut) on various graphs
PROFESSIONAL
Ola Cabs/Foodpanda Bengaluru, India Senior Data Scientist Fraud Identification @Foodpanda (July’18-Present) Built a deep Autoencoder Model for detection of anomalous orders and an entropy based fraud syndicate identification model Developed a random forest based fraud prediction model on orders achieving an AUC of 0.90 & precision of 90% at 50% recall Developed an unsupervised deep learning language model for detection of fraud customers who have jittery address and email Developed a tf-idf based NLP model for predicting duplicate vendor listings with similar names to improve customer experience
Demand Shaping @Foodpanda (July’18-Present) Developed a customer segmentation algorithm for finding premium Ola users for selective targeting and effective monetary
burn foreseeing their retention behavior at Foodpanda. This premium base is being used for food campaigns across India. Built a supervised model for relevant customer prediction to assess retention probability of new users achieving an AUC of 0.93
Customer Experience @Foodpanda (July’18-Present) Developed a vendor cancellation prediction model which could predict in advance if the vendor would start cancelling in future Built a supervised model learning from existing cancellations which achieved a precision of 90% at 75% recall; AUC of 0.93
Allocation Layer Optimization and Supply Shaping (Apr’18-July’18) Formulated cab allocation as a multi-objective optimization problem; objectives being D/S, GMV, driver earnings and allocations Used Hungarian optimization for matching customers with driver partners achieving better business metrics than current
method in simulation. Roll-out already planned for Tier-2 cities. Target of ~1% increase in allocations pan India
Data Scientist Share Matching Optimization (Oct’17-Mar’18) Designed a graph based hierarchical optimization algorithm for Ola share customer matching to control growth/customer
experience/profitability, as per time and business requirements. Used genetic algorithm for faster weight identification New matching algorithm is being used pan India now and has shown huge increase of 5% ENTR for share business unit Developed a supervised share occupancy prediction model for enabling a faster decision at allocation layer between partial
matching & idle allotment. This brought another 2% increase in ENTR for share business pan India
Share Community Detection (Feb’18-Mar’18) Built a community model using directionality, demand distribution across time and linear separation of source-destination pairs Used Multidimensional Scaling(MDS) to transform features to Euclidean space and k-means for clustering communities Key win was a 2% improvement in matches within the community over the old model and pan India usage in new share pricing
Ola Rental Cross Sell (Jun’17-Sep’17) Built a prediction model which utilizes customer characteristics, ride characteristics, financial propensity and locational
attributes to predict whether a City Taxi ride or sequence of such rides can be called as rental and with what confidence Trained model achieved significantly high recall and precision with an AUC of ~0.95 Increased Ola Rental confirmed bookings by roughly 2.6% and an additional GMV of 21 Lakhs/month for top-7 Tier-1 cities
Customer Segmentation (Jun’17-Aug’17) Gender prediction model using deep learning based sequence modeling achieving an accuracy of 95% & active usage in Ola Play Customer segmentation for predicting financial propensity on the basis of Ola footprints including ride and conversion behavior Project Guardian launched across India to ensure safety of Ola customers where gender model is a key contributor
VISA Inc. Bengaluru, India Senior Software Engineer, Data
VISE: VISA InHouse Search Engine (Patent Filed) (Nov’15-Feb’16) Developed a model and a framework to identify top-k relevant merchants from transaction data in real-time Built a Boosting Model of 3 classifiers using NLP features(Phonetics & Distance) that achieved an accuracy of ~95.9% Identified merchants from a huge dataset more intelligently and accurately directly impacting Real-Time Offers’ system Technology: Hadoop, Spark, Hive, Apache Parquet, Spark SQL, Machine Learning, NLP, Python
Pattern Identification in Spark with FP-Tree (Mar’16) Designed and developed a scalable FP-Tree model to clean raw merchant names from transactions & identify merchant groups The model had the ability to identify the junk in the raw name and clean it to give richer information Technology: Hadoop, Spark, Hive, Apache Parquet, NLP, Python
Merchant Stamping in Real Time with Spark and Kafka (Mar’16-May’16) Designed and developed a real-time merchant stamping framework in Spark for faster downstream consumption Intelligently reduced lookup size and optimized record matching time with a one-way fingerprint for each transaction Analyzed and designed a hierarchical framework which utilizes k-means clustering to form clusters of redundant records Reduced the master cache by more than 85% and improved performance by 35% achieving a record rate of stamping, directly
impacting performance of loyalty platforms and thus helped in easy migration of existing systems to LTA Technology: Hadoop, Spark, Kafka, Java, Python
Merchant Location Compliance (Sep’16-Present) Designed and developed a predictive model to flag non-compliant transactions which is scalable and high performing Involved fuzzy string matching to validate string entities and regression to assign feature weights for scoring Text mining and OCR for location identification from legal documents using Conditional Random Fields Framework helped the compliance teams to prevent revenue leakage in order of millions due to ambiguities in incoming data Technology: Hadoop, Spark, Hive, Java, Spark SQL, Parquet
Software Engineer, Data Merchant Location Credibility (Patent Filed) (Oct’15-Nov’15) Designed a framework which models merchants and deduce their properties using a scalable & robust Machine Learning model Built a clustering (DBSCAN) & boosting model to detect anomaly in real-time achieving an accuracy of 98% on 2M transactions Helped in saving millions of transactions from risk and fraud without using any personal information of cardholders Technology: Hadoop, Python, Hive, Spark
MTECH THESIS Decompositional Semantics for Document Embedding
Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur
We developed a simple and robust technique to generate document vector using decompositional semantics We proposed a graded weighting schema for building document vectors from word vectors with distributed semantics We released the largest movie review corpus in Hindi containing 697 movie reviews collected from Dainik Jagran and Navbharat Achieved state-of-the art results for sentiment analysis on Hindi movie and product reviews, 90.30% & 92.89% accuracies resp. Achieved state-of-the art results for sentiment analysis on IMDB movie reviews, 94.19% accuracy, with 1.6% improvement and
on Amazon product reviews, 92.91% accuracy, with 7% improvement
KEY COURSE PROJECTS
Movie Recommendation System (Machine Learning) Aug’13-Nov’13 Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur Built a system that predicts whether an unknown user is likely to see a given movie using the MovieLens dataset Predicted movie rating on a scale of 1 to 5 using User Based and Item Based Collaborative Filtering Achieved Root Mean Square Error of 0.947 and Mean Absolute Error of 0.741 in rating predictions
Measuring Similarity between Documents (Natural Language Processing) Aug’13-Nov’13 Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur Studied various approaches to compute similarity between two documents such as Pearson Correlation, Cosine Similarity, etc. Classified documents into coherent cluster using k-means with various distance measures; achieved entropy of the order of 0.06
3D Reconstruction from Several Images (Computer Vision) Jan’14-Apr’14 Advisor: Vinay P Namboodiri, Professor, CSE Department, IIT Kanpur Created 3D models from a set of images using Structure from Motion from point correspondences of multiple images Implemented Sift Descriptor for feature matching, and used camera calibration, depth determination, triangulation techniques
Unsupervised Cross Lingual Alignment Jan’14-Apr’14 Advisor: Satyadev Nandakumar, Professor, CSE Department, IIT Kanpur Proposed a framework that could align bilingual text without supervision; used PLSA and ADIOS as baseline system Worked on a manually built corpus on coal scam in 2 languages (English & Hindi) containing around 50,000 tokens
Empirical Study of Cross-Lingual Unsupervised Alignment Based on Syntactic and Distributional Features
Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur Aug’13-Nov’13 Studied various algorithms such a k-means, PLSA, SVM which could detect important topics in a single document Isolated aligned tokens in both Hindi and English by using PLSA and used ADIOS to separate syntactic classes
Identification of Emotions in Text (Artificial Intelligence) Jan’12-Apr’12 Advisor: Amitabha Mukerjee, Professor, CSE Department, IIT Kanpur
Studied various NLP Algorithms such as PLSA & LSA which could identify latent topics in the document Modeled the document (ISEAR corpus) into few number of topics which were essentially emotions, using PLSA and PCFG parser
Achieved very good latent clusters corresponding to each of the 7 emotions; achieved good scores for sentiment analysis part
OTHER PROJECTS
Computer Networks: Packet Sniffer for capturing packets in local network by acting as gateway
Compilers: Compiler for a subset of ADA ver2 and code generation for MIPS architecture Mobile Computing: Desktop control using an Android app over WiFi using various types of gestures and sensors Database: Built a web application that supported Hostel Management electronically using PHP and MySQL Operating System: Developed PintOS to support system calls, message queues, file system and virtual memory with paging
HACKATHON ParkIt: Hassle Free Toll Collection (InMobi Hackathon, Bengaluru) Feb’16 Designed and developed a model to promote cashless push payments at the predefined toll booth making them self-serviceable Integrated Zomato API for on-route eating places suggestion and used card to card push payment using Two Way SSL
Authentication for a hassle free payment with Google Cloud Messaging to send push messages of transaction status to toll Business Value: Transparent and hassle free automated solution to the toll problem where operating time as well as cost
becomes significantly less while there is an extra revenue scope for all parties
Technology: Android, Google Cloud Messaging, Python, VISA Push Payment, Zomato API
InfiHealth: A Framework to simplify 3-Party Model of Health (Mylan Labs Hackathon, Bengaluru) Mar’16 Designed and developed a framework to simplify the 3-Party Model of Doctor, Patient and Pharmacist in Health-Tech domain Real Time Payment using VISA API enabled instantaneous transfer of money between all the parties Real Time Social Network Feed helped patients and doctors to monitor diseases spread in their respective areas Business Value: Operating time of all parties reduces significantly whereas ease of doing business becomes high Technology: Java Spring, Android, SQL, Python, Image Processing, Natural Language Processing
SKILLS
Languages & Frameworks: C, C++, Java, Python, Unix, Android, Assembly Tools & Technologies: Hadoop, Spark, Kafka, Pig, Hive, WEKA, HTML5, MySQL, PHP, NodeJS, Git, Pandas, Keras, Tensorflow
RELEVANT COURSES
Machine Learning Tools and Techniques Artificial Intelligence Social Network Analysis
Mathematics for Machine Learning Computing for Data Analysis Database Systems
Natural Language Processing Computer Vision and Image Processing Mobile Computing
Data Structure & Algorithms Discrete Mathematics Operating System
POSTIONS OF RESPONSIBILITY Student Guide, Counselling Service Jul’11-Apr’12 Mentored 6 freshmen of 2011 batch, helping them adjust to the campus environment and academics Coordinated with a team of more than 150 people to ensure smooth conduction of Orientation Program
Executive, Show Management, Udghosh’12 (Annual Sports Festival, IIT Kanpur) Jul’12-Sep’12 Led a 2-Tier team of over 60 students and 50 workers for strategic planning and smooth conduction of various events Handled the logistics of all dignitaries and responsible for zero-delay, procurement of infrastructural needs
Coordinator, Informal Events, Udghosh’11 (Annual Sports Festival, IIT Kanpur) Jul’11-Sep’11 Led a 2-Tier team of over 30 students for conduction of fun and informal events including procurement of goods and materials Introduced various new events such as wall climbing leading to larger footfall in informal events adding to more fun in the fest
EXTRA CURRICULAR ACTIVITIES AND ACHIEVEMENTS Finalist of EXLs’ PAN IIT Data Analytics Competition Excellent Quotient-2014 Conducted National Programming League-2012, NIT Trichy programming event at Kanpur-Center with a footfall of 250 people Won Business Quiz in Takneek’10 (Intra-IITK Technical Festival) among 100 participants Member of National Cadet Corps for the period July ‘10 - April ‘11