from business idea to successful delivery by serhiy haziyev & olha hrytsay, softserve
DESCRIPTION
If you`ve missed SoftServe`s presentation on “Big Data Analytics Projects: From a Business Idea to a Successful Delivery” at the 2014 Data & Analytics Innovation and Entrepreneurship event in London or would like to refresh your memory, please download the full version of the presentation in the PDF format. SoftServe`s renowned experts on BI and Big Data, Serhiy Haziyev and Olha Hrytsay, explored skills and experience required to avoid unpleasant pitfalls as well as practical recommendations on how to properly start a Big Data analytics project with a software development partner.TRANSCRIPT
April, 2014
From Business Idea to Successful Delivery
▪ Leading global Product and
Application Development
partner founded in 1993
▪ 3,200 employees across North
America, Ukraine, Russia and
Western Europe
▪ Thousands of successful
outsourcing projects!
Clients include:
SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security
Agenda
Product
Development
Lifecycle
Big Data
Reference
Architectures
Case Studies
Tips for Successful
Delivery
Big Data
Successful Ideas
Top 10 startups with
most funding
Big Data will drive $232 billion in IT spending
through 2016 (Gartner)
BIG DATA INVESTMENTS
2008-2012
$ 4.9B
2013 Jan-Oct
$ 3.6B
Source: www.bigdata-startups.com
Big Data Use Cases Spotify – Changing the music industry
Heineken – Shopperception analyses
From In-house To Open
Innovation Product development lifecycle
1. Envisioning & birth of idea
2. Probing user needs
3. Ideation & design
4. Conceptualization
5. Feasibility study
6. Strategizing
7. Business analysis
8. PoC, Prototyping
9. Final design
10. Technical implementation
11. Pilot production
12. Commercialization
13. Pricing
14. Maintenance & support
15. Extension thru innovation
16. Next version, goto 1
17. R.I.P.
time span
GEN
ERIC
G
ENER
IC
SPEC
IFIC
process agility
1. Envisioning & birth of idea
2. Probing user needs
3. Ideation & design
4. Conceptualization
5. Feasibility study
6. Strategizing
7. Business analysis
8. PoC, Prototyping
9. Final design
10. Technical implementation
11. Pilot production
12. Commercialization
13. Pricing
14. Maintenance & support
15. Extension thru innovation
16. Next version, goto 1
17. R.I.P.
From In-house To Open
Innovation Product development lifecycle
time span
GEN
ERIC
G
ENER
IC
SPEC
IFIC
process agility
Reference Architectures:
▪ Extended Relational
▪ Non-Relational
▪ Hybrid
Big Data Analytics
Reference Architectures
Architecture Drivers: ▪ Volume
▪ Sources
▪ Throughput
▪ Latency
▪ Extensibility
▪ Data Quality
▪ Reliability
▪ Security
▪ Self-Service
▪ Cost
Relational Reference
Architecture
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
OLAP Cubes
Query &
Reporting
Staging
Areas
Operational
Data Stores
Data Marts
Data
Warehouses
Replication
API/ODBC
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Extended Relational
Reference Architecture
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
OLAP Cubes
Query &
Reporting
Staging
Areas
Operational
Data Stores
Data Marts
Data
Warehouses
Replication
API/ODBC
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Non-Relational
Reference Architecture
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
Map Reduce
Query &
Reporting
Search Engines
Distributed File
Systems
NoSQL
Databases
API
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Hybrid Reference
Architecture
Data Refinery Lambda Architecture
Source: Source:
Relational vs. Non-
Relational Architecture
Requirements Relational Non-
Relational Comments
Big Data Scalability Using the MPP techniques, relational data warehouses can run terabytes and
even petabyte+ data. NoSQL solutions can scale further keeping dozens of
petabytes.
Ad-Hoc Reporting There is a variety of mature SQL BI tools in the market. At the same time, BI tools
and connectors for NoSQL databases continue evolving.
Near-Real Time Data Latency Although near-real time is achievable in both relational data warehouse and
NoSQL solutions, it requires extra efforts optimizing data update and processing.
Processing Raw Unstructured Data The benefit of NoSQL solutions (including the Hadoop ecosystem) is easy storing
and processing of unstructured data.
High Data Model Extensibility NoSQL schema-less nature allows easy extending of the data model with the
new data attributes on the fly. Extending relational storages is still possible with
design patterns, but it is quite limited and difficult if compared to NoSQL.
Reliability and Fault-Tolerance Most of the relational and NoSQL technologies offer reliable solutions replicating
data between redundant instances and supporting failover.
High Data Quality and Consistency The RDBMS solutions are about the ACID concept, while NoSQL are about the
BASE concept.
This way, most of NoSQL solutions sacrifice consistency over availability, limiting
their usage in quality critical applications.
High Security and Regulatory
Compliance
Relational storages are traditionally strong in security as they offer role-based
access, row-level security and encryption of data in transit and data in rest. Most
of today’s NoSQL solutions have significant deficit of such features and rely on a
perimeter security model.
Low Cost There are at least two factors that make NoSQL solutions less costly than
relational ones:
1) $0 or considerably low license fee due to open source
2) NoSQL databases typically use clusters of cheap commodity servers
Skills Availability There is still an experience gap with NoSQL technologies on the IT labor market
and the correspondent lack of in-house skills within organizations.
Relational vs. Non-
Relational Architecture
Relational Non-Relational
• Rational
• Predictable
• Traditional
• Flexible
• Agile
• Modern
Business Goals: Provide visual environment for building
custom mobile application Charge customers based on the platform
they are using, number of consumers’
applications etc.
Business Area: Cloud based platform for building, deploying,
hosting and managing of mobile applications
Case Study #1: Usage & Billing Analysis
1. Envisioning & birth of idea
2. Probing user needs
3. Ideation & design
4. Conceptualization
5. Feasibility study
6. Strategizing
7. Business analysis
8. PoC, Prototyping
9. Final design
10. Technical implementation
11. Pilot production
12. Commercialization
13. Pricing
14. Maintenance & support
15. Extension thru innovation
16. Next version, goto 1
17. R.I.P.
SoftServe Involvement Product development lifecycle
From In-house
To Open Innovation with SoftServe
Technologies: Python
Amazon Redshift
Amazon SQS
Amazon S3
Elastic Beanstalk
Jaspersoft BI Professional
Aria Subscription Billing
Platform
▪ Volume (> 10 TB)
▪ Sources (JSON)
▪ Throughput (> 10K/sec)
▪ Latency (2 min)
▪ Extensibility (Custom metrics)
▪ Data Quality (Consistency)
▪ Reliability (24/7)
▪ Security (Multitenancy)
▪ Self-Service (Ad-Hoc reports)
▪ Cost (The less the better )
▪ Constraints (AWS)
Architecture Drivers:
Business Goals: In addition to main service to build in-house Analytics
Platform for ROI measurement and performance analysis of
every product and feature delivered by the platform;
Platform should provide the ability to understand how
end-users are interacting with service content, products, and
features on sites;
Take ownership of analytics event stream;
Perform A/B Testing
Business Area: Retail. A platform for e-commerce and
collecting feedbacks from customers
Case Study #2: Clickstream for retail website
1. Envisioning & birth of idea
2. Probing user needs
3. Ideation & design
4. Conceptualization
5. Feasibility study
6. Strategizing
7. Business analysis
8. PoC, Prototyping
9. Final design
10. Technical implementation
11. Pilot production
12. Commercialization
13. Pricing
14. Maintenance & support
15. Extension thru innovation
16. Next version, goto 1
17. R.I.P.
SoftServe Involvement Product development lifecycle
From In-house
To Open Innovation with SoftServe
Technologies: • Amazon Elastic Load
Balancer, S3
• Tornado, Flume
• Hadoop/HDFS, HBase,
MapReduce, Oozie
(Cloudera distribution)
• Backbone
▪ Volume (45 TB)
▪ Sources (JSON)
▪ Throughput (> 10K/sec)
▪ Latency (1 hour)
▪ Extensibility (Custom tags)
▪ Data Quality (Not critical)
▪ Reliability (24/7)
▪ Security (Multitenancy)
▪ Self-Service (Canned reports)
▪ Cost (The less the better )
▪ Constraints (AWS)
Architecture Drivers:
Understand in-house Big Data capabilities
Determine what steps from Product development lifecycle
can be done with partners
Establish collaboration approach
Do feasibility study
Create architecture and select technology stack
Do prototyping, re-evaluate architecture
Estimate implementation efforts
Align project milestones and success criteria
Align team structure and processes
Set up transparent knowledge management environment
Set up DevOps practices from the very beginning
Advance in Product development through “Small Wins”
Checklist for Successful
Delivery
SoftServe UK Office
Regent's Place,
338 Euston Road,
London, NW1 3BT
Tel: +44 203 519 1216
Contacts Serhiy Haziyev: [email protected]
Olha Hrytsay: [email protected]
Glen Wilson: [email protected]
Thank You!