secure networking in big data environments
TRANSCRIPT
Securely Networking Big Data Environments
Stephen HamptonCTO, Hutchinson Networks
Big Data On The Network – An Example
• HDFS (Hadoop Distributed File System)
• Name Node (HDFS Brain, List of Blocks & Meta Data)
• Data Node (Independent Nodes, Capable of Executing Workloads)
• Process
• Write Data (Data Blocks Uploaded)
• Workload Execution
• Map Phase (Little or no data)
• Shuffle Phase (Data on the network)
• Reduce Phase (Little or no data)
• Output Replication (Data on the network)
• Reading Data (Data Read By Application)
Big Data Network Characteristics
• Availability – The loss of a portion of the cluster will impact performance.
• Burst Handling – Queue depth and low over-subscription ratio are very important.
• Latency – There is delay in the processing on the data node, so this is not an issue.
• Jitter (Variation in Delay) – Some workloads are highly synchronous, so deterministic latency is important.
• Scale – The environment should easily scale and retract to fit requirements.
• Security & Multi-Tenancy – A single environment for multiple logical work loads is more efficient.
• Performant – Server 10GE connectivity is common-place and likely to increase to 25 GE, 40 GE and 50 GE.
Big Data On Traditional Networks
• Availability – Resilient but slow to converge and prone to
Layer 2 problems (Loops and Broadcast Storms).
• Burst Handling – Queue depths are good but over-
subscription ratios can be high.
• Latency/Jitter – Over-subscription can lead with jitter.
• Scale – Difficult to scale without manual configuration
• Security – Firewall is a bottleneck and DMZs are difficult to
design.
• Performant – High speeds available but these are not the
most efficient or cost effective.
Core Layer
Distribution Layer
Access Layer
Big Data Cluster Nodes
1 Gbps1 Gbps
10 Gbps
1 Gbps1 Gbps
Firewall Layer
Big Data On Network Fabrics
• Availability – ECMP with sub-second failover and
flow-lets.
• Burst Handling – Low over-subscription ratios and
optimised use of bandwidth.
• Latency/Jitter – Predictable latency throughout.
• Scale – Very easy to scale, add more leafs and deploy
via controller.
• Security – Secure micro-segmentation, DMZs
everywhere.
• Performant – Multiple 40/100 GE ports in fabric with
1/10 GE to servers. Big Data Cluster Nodes
1/10 Gbps
40 GbpsEthernet Fabric
What Is Networking Your Big Data?
Whether it’s on premise in your own data centre, co-location or cloud, check that your
network is…
• An Network Fabric
• Supports ECMP (Equal Cost Multi-Pathing)
• Provide 40 GE (future 100 GE Support), along with 10GE edge support
• Is built on Software Defined Network
• Uses Network Functions Virtualisation for security
• Implements secure micro-segmentation
What About Big Data for Networking?
• SIEM
• Infrastructure Analytics
• Monitoring & Troubleshooting
• Intent Based Automation
Securely Networking Big Data Environments
Stephen HamptonCTO, Hutchinson Networks