agenda - hpcccdn.hpccsystems.com/pdf/2014_hpcc_summit_ls.pdf · 2014-10-03 · why he feels hpcc...

12
1 AGENDA LIVESTREAM INFORMATION http://bit.ly/2014hpccsummit Time Topic Presenter(s) 8:30am – 8:50am Kick-off and Conference Overview Vijay Raghavan, LexisNexis 8:50am – 9:30am HPCC Progress Update Roadmap Overview Flavio Villanustre, LexisNexis 9:30am – 10:00am HPCC Systems Community Use Case – Because Who Has Time for MapReduce?! John Andleman, Citrix 10:00am – 10:15am BREAK 10:15am – 10:45am Building an HPCC Systems Community in Silicon Valley Fujio Turner, Big Data/NoSQL Engineer 10:45am – 11:15am Do Algorithms beat Instinct in Hiring? Handan Xiao, Jonathan Zhang and Emma Liu, Comrise 11:15am – 11:45am U.S. Healthcare Payment Reform: Applying HPCC Systems to Medicare’s Bundled Payments Innovation Program Tony Cheng and Luc Pezet, Archway Health Advisors 11:45am – 12:15pm LUNCH BREAK 12:15pm – 12:45pm HPCC /ECL Training – What’s New? Ganglia Monitoring Demo Richard Taylor & Gleb Aronsky, LexisNexis 12:45pm – 1:15pm Improving Thor Data Loading using Parallel Format-agnostic Direct Spraying Mohammad Rashti, RNet 1:15pm – 2:15pm Collaborative Research with Florida Atlantic University (FAU) and LexisNexis Borko Furht and Victor Herrera, FAU, and Edin Muharemagic, LexisNexis 2:15pm – 2:30pm BREAK 2:30pm – 3:00pm Leveraging HPCC Systems with VCL (Virtual Computing Lab) Vincent Freeh, North Carolina State University 3:00pm- 3:40pm Applications of HPCC Systems at Clemson University Amy Apon, Linh Ngo and Michael Payne, Clemson University 3:40pm – 4:20pm KEL Community Version Eric Blood, LexisNexis 4:20pm – 5:00pm La-Z-Boy EDA: Building Complex NLP and EDA Processes the Easy Way with Circuits and Dashboard Drea Lead & Joe Chambers, LexisNexis

Upload: others

Post on 05-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

1

AGENDA

LIVESTREAM INFORMATION http://bit.ly/2014hpccsummit

Time Topic Presenter(s) 8:30am – 8:50am Kick-off and Conference Overview Vijay Raghavan, LexisNexis 8:50am – 9:30am HPCC Progress Update

Roadmap Overview Flavio Villanustre, LexisNexis

9:30am – 10:00am HPCC Systems Community Use Case – Because Who Has Time for MapReduce?!

John Andleman, Citrix

10:00am – 10:15am BREAK

10:15am – 10:45am Building an HPCC Systems Community in Silicon Valley

Fujio Turner, Big Data/NoSQL Engineer

10:45am – 11:15am Do Algorithms beat Instinct in Hiring? Handan Xiao, Jonathan Zhang and Emma Liu, Comrise

11:15am – 11:45am U.S. Healthcare Payment Reform: Applying HPCC Systems to Medicare’s Bundled Payments Innovation Program

Tony Cheng and Luc Pezet, Archway Health Advisors

11:45am – 12:15pm LUNCH BREAK 12:15pm – 12:45pm HPCC /ECL Training – What’s New?

Ganglia Monitoring Demo Richard Taylor & Gleb Aronsky, LexisNexis

12:45pm – 1:15pm Improving Thor Data Loading using Parallel Format-agnostic Direct Spraying

Mohammad Rashti, RNet

1:15pm – 2:15pm Collaborative Research with Florida Atlantic University (FAU) and LexisNexis

Borko Furht and Victor Herrera, FAU, and Edin Muharemagic, LexisNexis

2:15pm – 2:30pm BREAK 2:30pm – 3:00pm Leveraging HPCC Systems with VCL (Virtual

Computing Lab) Vincent Freeh, North Carolina State University

3:00pm- 3:40pm Applications of HPCC Systems at Clemson University

Amy Apon, Linh Ngo and Michael Payne, Clemson University

3:40pm – 4:20pm KEL Community Version Eric Blood, LexisNexis 4:20pm – 5:00pm La-Z-Boy EDA: Building Complex NLP and EDA

Processes the Easy Way with Circuits and Dashboard

Drea Lead & Joe Chambers, LexisNexis

Page 2: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

2

Invited Speakers

Presenter: John Andleman, Staff Database Engineer, Citrix

HPCC Systems Community Use Case – Because Who Has Time for MapReduce?! October 7, 9:30am – 10:00am

In this session, John will share some interesting use cases leveraging the HPCC Systems platform, including those beyond traditional big data uses. John will also share his roadmap of HPCC projects being planned for the next few months and why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and lessons learned.

Speaker Bio:

John has worked many years in a number of industries, architecting and building solutions to acquire, manage, and analyze data. John loves creating solutions that use data to solve problems better, faster, and cheaper. The exponential growth in the volume, velocity, and variety of data have created many new challenges and opportunities that require some very creative solutions. These are the types of solutions that John loves to focus on which ultimately help derive the right data and analysis to make the best possible decisions. John is also a founding member of the HPCC Systems Community Advisory Board. John currently resides in Southern California.

Page 3: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

3

Presenter: Fujio Turner, Big Data/NoSQL Engineer

Building an HPCC Systems Community in Silicon Valley October 7, 10:15am – 10:45am

This presentation will cover how I discovered HPCC Systems, why I started a Meetup on HPCC in Silicon Valley, the hurdles I see for wide spread adoption of HPCC Systems and the solutions to overcome these hurdles.

Speaker Bio:

Fujio Turner went to school for economics and is currently working as a Big Data / NoSQL Engineer in the Silicon Valley area. Author of the soon to be published book “Couchbase for High Performance” he specializes in high speed data platforms. He began his IT career as a LAMP stack developer and soon become a MySQL Developer and DBA. His attention turned to High Available NoSQL systems of CouchDB/Couchbase in 2010. With his philosophy of “In the future there will be more data not less” HPCC Systems was a perfect fit for him. In his spare time he evangelizes HPCC Systems in the Silicon Valley area with the MeetUp group - ‘Exabyte Big Data - HPCC Systems -Silicon Valley’. His list of current and future projects include 3DJSON and Virtual Reality and Big Data.

Page 4: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

4

Presenters: Handan Xiao, Jonathan Zhang and Emma Liu, Comrise

Do Algorithms beat Instinct in Hiring? October 7, 10:45am – 11:15am

If you recently received, an algorithm generated, mass email that mistakenly identified you as a potential candidate for a job, you may not think so. However, in our presentation we will demonstrate that just as algorithms can be used successfully to perform credit analysis, algorithms can be used to prioritize candidates for job positions. We will be discussing our patented prioritization process based upon the Random Forest Algorithm.

Speaker Bios:

Handan Xiao, Data Scientist, Comrise Handan Xiao is currently a Data Scientist at Comrise working on the RT-FIT product to incorporate Personality Surveys and Assessments into the Patented Prioritization Process. She holds a Master Degree in Statistics from Columbia University. Before attending graduate school, Handan had been studying dual-majors in Statistics at Beijing Institute of Technology, and Economics at Peking University. She had a strong background in building and validating predictive models for large datasets by using R and Python. Currently she is especially interested in data mining.

Jonathan Zhang, Data Scientist, Comrise Jonathan Zhang works at Comrise as a Data Scientist. He is the lead developer of RT-FIT Prioritization product, and is responsible for the database design and maintenance for multiple Big Data projects. He has in-depth knowledge and experience in data processing, statistical modeling and analysis. Jonathan graduated from Florida State University with a master degree in Mathematics, and a Bachelor degree in Mathematics from Tongji University. During his undergraduate study, Jonathan has won the first prize in China Undergraduate Mathematics Contest in Modeling. He is very interested in machine learning and data mining.

Emma Liu, Data Scientist, Comrise Emma Liu joined Comrise as a Data Scientist in June 2014. She currently concentrates on text mining and machine learning in RT-FIT Prioritization Application Development. Emma recently graduated from University of Michigan with a Master Degree in Applied Economics. Prior to graduate school, she earned a dual-degree in Economics at George Mason University and Chongqing University. Emma had a solid background in Econometric and financial modeling. She had interned within Huaxia Dun & Bradstreet (Shanghai) focusing on the credit prediction model.

Page 5: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

5

Presenters: Tony Cheng and Luc Pezet, Archway Health Advisors

U.S. Healthcare Payment Reform: Applying HPCC Systems to Medicare’s Bundled Payments Innovation Program October 7, 11:15am – 11:45am

Medicare’s Bundled Payments for Care Initiative (BPCI) is the largest Medicare payment innovation program. Bundled payments involve paying a “package price” for all the services required to treat an episode of care. Unlike the current “fee for service” model, providers will assume financial risk and have financial incentives to improve efficiency and patient outcomes.

Currently, more than 5,000 providers have applied to this program, representing $47 billion of Medicare spending.

Archway Health Advisors is building a platform that helps providers like hospitals and nursing homes manage their bundled payments program. Analyzing complex Medicare claims payment data is critical to understanding the risks of participating in the bundled payment program.

We are using HPCC Systems to power the data management and analytics required to successfully manage the risks and look for care improvement opportunities. We will review our HPCC Systems implementation and provide some examples of the analysis that we are providing our customers.

Speaker Bios:

Tony Cheng, General Manager, Archway Health Advisors

Tony Cheng is currently General Manager of Archway Health Advisors, a healthcare data and analysis company.

Previously, he has been involved in various technology and media startups in New York City. He was co-founder and CEO of IgoUgo (www.igougo.com), one of the first user generated content travel websites. IgoUgo won a Webby Award for the top travel site, and was acquired by Travelocity. He is also co-founder of Tripfilms.com and HotelConfidential.com, the leading online travel video brands.

He has an MBA from Harvard Business School and a B.S. from Cornell University in chemical engineering. He is the holder of 3 U.S. Patents.

Luc Pezet, Information Engineer, Archway Health Advisors

Luc Pezet is a Solution and Software Architect with over 10 years of experience in pioneering web analytic tools and complex data management projects. His expertise includes designing and implementing Big Data solution to process millions of data inputs on a daily basis to monitor, assess and improve performance.

Page 6: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

6

Luc is a successful entrepreneur and co-founder of Tripfilms, the largest database of travel videos on the web.

He also has served as interim CTO for The Achievement Network (ANET), a non-profit education company that helps schools use assessment data to improve student performance. At ANET, he implemented web tools for staff to help scale their operations and end-user web sites for teachers and principals to access reports and analysis. Within just a few years, this platform has helped ANET grow from 13 schools in the Boston area to over 480 schools and 145,000 students across 10 states. ANET has been recognized as a pioneer in education innovation and was named “New Schools Ventures Organization of the Year” in 2011.

Luc holds a Master’s Degree in Computer Science from Rennes University in France.

Page 7: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

7

Presenter: Mohammad Javad Rashti, Senior Research Engineer, RNET Technologies, Inc.

Improving Thor Data Loading using Parallel Format-agnostic Direct Spraying October 7, 12:45pm – 1:15pm

Loading data into an HPCC Thor cluster requires an initial copying of the data files into a Landing Zone (LZ) node for potential decompression, record extraction and spraying onto Thor nodes’ storage. This process has proven to be cumbersome and slow due to the LZ node becoming a procedural and performance bottleneck. This is more significant in Cloud environments such as AWS, where the data may originally reside in remote storage such as S3. In this project we are developing a module that directly extracts the data records from compressed files with various formats residing in the original storage location without copying to an LZ node, and sprays them to the Thor cluster; hence eliminating the LZ node. This operation is done in parallel using multiple data loaders that can reside on Thor nodes. As an expansion to the module, we also plan to load the early stage records directly into Thor nodes’ memory and start Thor processing while the rest of data is being loaded.

Speaker Bio:

Dr. Mohammad Rashti is a Computer Research Scientist and Project Manager at RNET Technologies, working on several research projects in High Performance Computing, Big Data Science and Networking. He received his B.Sc. (2000) and M.Sc. (2003) in Computer Engineering from University of Tehran and Sharif University of Technology, respectively. He has a PhD in High Performance Computing from Queen’s University in Canada (2010), where he also worked as a Postdoctoral Fellow. He has several years of industrial experience in computer architecture, system software, high performance computing and networking, web and application software systems. He has also served as a Computer Engineering faculty member at Shahid Chamran University.

Page 8: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

8

Presenters: Borko Furht and Victor Herrera, Florida Atlantic University (FAU) & Edin Muharemagic, LexisNexis

Collaborative Research with FAU and LexisNexis October 7, 1:15pm – 2:15pm

In 2009, Florida Atlantic University (FAU) received a grant from the National Science Foundation (NSF) to create the site of the Center for Advanced Knowledge Enablement (CAKE) as an Industry/University Cooperative Research Center (I/UCRC) that provides a framework for interaction between university faculty and industry in the areas of information technology, communication, and computing. LexisNexis is currently one of the CAKE industry members. In this session, we will talk about the significance of the membership, joint accomplishments, contributions back to the HPCC Systems community, as well as sharing experiences on including a Data Intensive Computing program in the FAU curriculum.

Speaker Bios:

Borko Furht, Professor, Department of Electrical & Computer Engineering and Computer Science, FAU Borko Furht is a professor in the Department of Electrical & Computer Engineering and Computer Science at Florida Atlantic University (FAU) in Boca Raton, Florida. He is also Director of the NSF-sponsored Industry/University Cooperative Research Center on Advanced Knowledge Enablement. Before joining FAU, he was a vice president of research and a senior director of development at Modcomp (Ft. Lauderdale), a computer company of Daimler Benz, Germany, a professor at University of Miami in Coral Gables, Florida, and a senior researcher in the Institute Boris Kidric-Vinca, Yugoslavia. Professor Furht received Ph.D. degree in electrical and computer engineering from the University of Belgrade. His current research is in multimedia systems, video coding and compression, 3D video and image systems, wireless multimedia, and Internet, cloud computing, and social networks.

He is presently Principal Investigator and Co-PI of several multiyear, multimillion dollar projects. He is the author of numerous books and articles in the areas of multimedia, computer architecture, real-time computing, and operating systems. He is a founder and editor-in-chief of the Journal of Multimedia Tools and Applications (Springer) and he recently co-founded Journal of Big Data (Springer).

He has received several technical and publishing awards, and has consulted for many high-tech companies including IBM, Hewlett-Packard, Xerox, General Electric, JPL, NASA, Honeywell, and RCA. He has also served as a consultant to various colleges and universities. He has given many invited talks, keynote lectures, seminars, and tutorials. He served as Chairman and Director on the Board of Directors of several high-tech companies and as an expert witness for Cisco, Qualcomm, Adobe, and Bell Canada.

Page 9: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

9

Victor Herrera, PhD Candidate and Research Assistant, Florida Atlantic University Victor Herrera is a PhD Candidate and Research Assistant at Florida Atlantic University (FAU). His research interests are in data mining and machine learning. Since 2012, Victor has been a part of the collaborative research between FAU and LexisNexis where his primary contribution is to develop Machine Learning Algorithms on HPCC/ECL Platform. Before arriving to FAU, he worked as software developer and database administrator in multiple companies.

Page 10: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

10

Presenter: Vincent W. Freeh, associate professor of computer science, North Carolina State University

Leveraging HPCC Systems with VCL (Virtual Computing Lab) October 7, 2:30pm – 3:00pm

The Virtual Computing Lab (VCL) is an open-source, cloud-computing platform developed at North Carolina State University. It is a top-level Apache project and deployed throughout the world. This talk discusses porting HPCC to VCL, as well as course integration.

Speaker Bio:

Vincent W. Freeh is an associate professor of computer science at North Carolina State University. He received his Ph.D. in 1996 from the University of Arizona. His research focus is high-performance system software, with emphasis on filesystems, parallel and distributed systems, power-aware computing, and storage systems.

Prof Freeh teaches courses in the above research areas as well as in compilers. He has more than 55 referred publications in numerous computer science conferences and scientific journals. He has received more than six million dollars in sponsored research.

He received an NSF CAREER Award and several IBM Faculty Development Awards. He was a captain in the US Army Corps of Engineers before entering graduate school for his MS. He worked at IBM in the Storage System Division until he returned to school to earn his PhD. Prof. Freeh was on the faculty at the University of Notre Dame prior to coming to NCSU. He lives in Holly Springs, NC with his wife, seven children, and dog.

Page 11: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

11

Presenters: Amy Apon, Linh Ngo and Michael Payne, Clemson University

Applications of HPCC Systems at Clemson University October 7, 3:00pm- 3:40pm

Big Data research at Clemson University includes the development of computational science applications, evaluation of software and hardware infrastructure systems to support Big Data, and design and development of data analytics platforms. HPCC Systems has been designed to support production quality data storage and analytics in an enterprise environment. In this talk we describe how we are using the HPCC Systems platform in an academic research environment to support research in Big Data.

Like many major research universities, Clemson University hosts a shared high performance supercomputing cluster. The Palmetto cluster at Clemson ranks in the top five among university-owned supercomputers in the U.S. and is a significant research resource for faculty, students, and staff. As a shared resource, computing nodes are allocated to users via a batch scheduler system, and users do not have administrative privileges. In this talk we describe how we use HPCC Systems on the shared campus resource using user privileges only. This work opens doors for use of HPCC Systems to a broad community of academic users who have access to shared computing resources.

HPCC Systems is a key tool in several analytics projects at Clemson University. In this talk we describe how we use the HPCC Systems platform in our study of the research productivity of academic institutions. This research requires the use of detailed data over decades of time from various sources and formats without any consistency or a predefined data structure. LexisNexis and Reed Elsevier are key partners in the acquisition of data for this research. The strengths of HPCC Systems include its ability to support easy ingestion of various data formats, dynamic transformation and integration of ingested data, and a seamless interface between the data processing interface and the analytic tools written in R.

Speaker Bios:

Amy Apon, Ph.D., Professor and Chair, Division of Computer Science, School of Computing, Clemson University

Dr. Amy Apon joined Clemson University in August, 2011, as Chair of the Division of Computer Science. Apon brings a distinguished record of contributions at the University of Arkansas where she held the position of Director of the Arkansas High Performance Computing Center and Professor of Computer Science. Apon was awarded the University of Arkansas Alumni award for Service in 2010, the highest award given by the University of Arkansas Alumni Association each year, and was awarded the Arkansas College of Engineering Imhoff Award for contributions to research in 2009.

Prior to 2004, scientific computation at the University of Arkansas was primarily conducted in individual researcher’s labs. In 2004, 2007, and 2010, Apon led the efforts to win multiple Major Research Instrumentation awards from the National Science foundation that acquired the first, second, and only supercomputers in Arkansas, as listed by the Top

Page 12: AGENDA - HPCCcdn.hpccsystems.com/pdf/2014_HPCC_Summit_LS.pdf · 2014-10-03 · why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and ... architecting

12

500 Supercomputer Sites. In 2009, Apon led the effort to win an Academic Research Infrastructure grant from the National Science Foundation to significantly upgrade the power and chilling infrastructure of the campus data center and to establish a new campus research network. University infrastructure for research computing was significantly enhanced as a result of these grants.

Linh B. Ngo, Research Associate, School of Computing, Clemson University Linh B. Ngo holds a Ph.D. in Computer Science from the University of Arkansas in Fayetteville and is a Research Associate in the Big Data Systems lab of the School of Computing, Clemson University. His research interests include data-intensive computing, high performance computing, computational statistics, and large scale data analysis.

Michael E. Payne, Ph.D. Student, School of Computing, Clemson University Michael E. Payne is a second year Ph.D. student in Computer Science at Clemson University under the supervision of Dr. Amy Apon. Currently serving as a machine learning intern at LexisNexis, his research interests include data-intensive computing, social network analysis, and large-scale data analysis.