lenovo big data validated design for cloudera enterprise with … · 2018-11-06 · 6 lenovo big...

1 Lenovo Big Data Validated Design for Cloudera Enterprise with Local and Decoupled SAS storage

Lenovo Big Data Validated Design for Cloudera Enterprise with Local and Decoupled SAS Storage

Dan Kangas (Lenovo)

Weixu Yang (Lenovo)

Ajay Dholakia (Lenovo)

Dwai Lahiri (Cloudera)

Last update: 24 October 2018 Version 1.3 Configuration Reference Number: BDCLDRXX83

Deployment considerations for scalable racks including detailed validated bills of material

Solution based on the ThinkSystem SR650 server, bare-metal and virtualized

Reference architecture for Cloudera Enterprise with Apache Hadoop and Apache Spark

Solution based on ThinkSystem SD530 compute node with D3284 SAS storage expansion enclosure

https://lenovopress.com/updatecheck/LP0776/9501fa656ef141c65fb288d1ad5486f7


Table of Contents

1 Introduction ............................................................................................... 5

2 Business problem and business value ................................................... 6

3 Requirements ............................................................................................ 8

Functional Requirements ......................................................................................... 8

Non-functional Requirements................................................................................... 8

4 Architectural Overview ............................................................................. 9

Cloudera Enterprise ................................................................................................. 9

Bare-metal Cluster - Local and External SAS Storage (JBOD) ................................ 9

Virtualized Cluster with VMware vSphere .............................................................. 11

5 Component Model .................................................................................. 12

Cloudera Components ........................................................................................... 13

Apache Spark on Cloudera .................................................................................... 15

6 Operational Model .................................................................................. 17

Hardware Description ............................................................................................ 17 6.1.1 Lenovo ThinkSystem SR650 Server ......................................................................................... 17 6.1.2 Lenovo ThinkSystem SR630 Server ......................................................................................... 18 6.1.3 Lenovo ThinkSystem SD530 Compute Server .......................................................................... 19 6.1.4 Lenovo RackSwitch G8052 ....................................................................................................... 19 6.1.5 Lenovo RackSwitch G8272 ....................................................................................................... 20 6.1.6 Lenovo RackSwitch NE2572 ..................................................................................................... 20 6.1.7 Lenovo RackSwitch NE10032 ................................................................................................... 21 6.1.8 Lenovo D3284 SAS Expansion Enclosure ................................................................................ 22

Cluster Node Configurations .................................................................................. 22 6.2.1 Worker Nodes ............................................................................................................................ 23 6.2.2 Master and Utility Nodes ........................................................................................................... 24 6.2.3 System Management and Edge Nodes ..................................................................................... 26 6.2.4 External SAS Storage Node ...................................................................................................... 26

Cluster Software Stack .......................................................................................... 28 6.3.1 Cloudera Enterprise CDH .......................................................................................................... 28


6.3.2 Red Hat Operating System ........................................................................................................ 29

Cloudera Service Role Layouts .............................................................................. 29

System Management ............................................................................................. 31

Networking ............................................................................................................. 32 6.6.1 Data Network ............................................................................................................................. 33 6.6.2 Hardware Management Network ............................................................................................... 33 6.6.3 Multi-rack Network ..................................................................................................................... 34 6.6.4 10Gb and 25Gb Data Network Configurations .......................................................................... 35

Predefined Cluster Configurations ......................................................................... 36 6.7.1 SR650 Configurations ............................................................................................................... 37 6.7.2 SD530 with D3284 Configurations ............................................................................................ 39 6.7.3 Cluster Storage Capacity ........................................................................................................... 40 6.7.4 Storage Tiering with NVMe and SSD Drives ............................................................................. 42 6.7.5 D3284 Storage Tiering ............................................................................................................... 42 6.7.6 SD530 and D3284 Configuration Options ................................................................................. 44

7 Deployment considerations ................................................................... 45

Increasing Cluster Performance ............................................................................. 45

Processor Selection ............................................................................................... 45 7.2.1 SR630/SR650 Processors ......................................................................................................... 46 7.2.2 SD530 Processors ..................................................................................................................... 46

Designing for Storage Capacity and Performance ................................................. 46 7.3.1 Node Capacity ........................................................................................................................... 46 7.3.2 Node Throughput ....................................................................................................................... 46 7.3.3 HDD Controller .......................................................................................................................... 47

Memory Size and Performance .............................................................................. 47

Data Network Considerations ................................................................................ 49

Designing with Hadoop Virtualized Extenstions (HVE) .......................................... 50 7.6.1 Enabling Hadoop Virtualization Extensions (HVE) .................................................................... 50

Cloudera VMware Virtualized Configuration .......................................................... 52 7.7.1 Cluster Software Stack .............................................................................................................. 52 7.7.2 ESXi Hypervisor and Guest OS Configuration: ......................................................................... 52

Estimating Disk Space ........................................................................................... 53

Scaling Considerations .......................................................................................... 54 7.9.1 Scaling D3284 External SAS JBOD Storage ............................................................................ 54 7.9.1 Scaling D3284 Storage and SD530 Compute Independently ................................................... 55

High Availability Considerations ............................................................................. 55 7.10.1 Network Availability .................................................................................................................... 55


7.10.2 Cluster Node Availability ............................................................................................................ 56 7.10.3 Storage Availability .................................................................................................................... 56 7.10.4 Software Availability ................................................................................................................... 56

Linux OS Configuration Guidelines ........................................................................ 57 7.11.1 OS configuration for Cloudera CDH .......................................................................................... 57 7.11.2 OS Configuration for SAS Multipath .......................................................................................... 57

Designing for High Ingest Rates ............................................................................ 59

8 Bill of Materials - SR650 Nodes ............................................................. 60

Master Node .......................................................................................................... 60

Worker Node .......................................................................................................... 61

System Management Node.................................................................................... 63

Management Network Switch ................................................................................ 64

Data Network Switch .............................................................................................. 64

Rack ....................................................................................................................... 64

Cables .................................................................................................................... 65

9 Bill of Materials - SD530 with D3284 ..................................................... 66

Master Node .......................................................................................................... 66

Worker Node .......................................................................................................... 67

Systems Management Node .................................................................................. 68

External SAS Storage Enclosure ........................................................................... 69

Management Network Switch ................................................................................ 70

Data Network Switch .............................................................................................. 70

Rack ....................................................................................................................... 71

Cables .................................................................................................................... 71

Software ................................................................................................................. 71

10 Acknowledgements ................................................................................ 72

11 Resources ............................................................................................... 73

Document history ......................................................................................... 75


1 Introduction This document describes the reference architecture for Cloudera Enterprise on bare-metal with locally attached storage and with decoupled compute and storage, and on a virtualized platform with VMware vSphere. It provides a predefined and optimized hardware infrastructure for the Cloudera Enterprise, a distribution of Apache Hadoop and Apache Spark with enterprise-ready capabilities from Cloudera. This reference architecture provides the planning, design considerations, and best practices for implementing Cloudera Enterprise with Lenovo products.

Lenovo and Cloudera worked together on this document, and the reference architecture that is described herein was validated by Lenovo and Cloudera.

With the ever-increasing volume, variety and velocity of data becoming available to an enterprise comes the challenge of deriving the most value from it. This task requires the use of suitable data processing and management software running on a tuned hardware platform. With Apache Hadoop and Apache Spark emerging as popular big data storage and processing frameworks, enterprises are building so-called Data Lakes by employing these components.

Cloudera brings the power of Hadoop to the customer's enterprise. Hadoop is an open source software framework that is used to reliably manage large volumes of structured and unstructured data. Cloudera expands and enhances this technology to withstand the demands of your enterprise, adding management, security, governance, and analytics features. The result is that you get a more enterprise ready solution for complex, large-scale analytics.

VMware vSphere brings virtualization to Hadoop with many benefits that cannot be obtained on physical infrastructure or in the cloud. Virtualization simplifies the management of your big data infrastructure, enables faster time to results and makes it more cost effective. It is a proven software technology that makes it possible to run multiple operating systems and applications on the same server at the same time. Virtualization can increase IT agility, flexibility, and scalability while creating significant cost savings. Workloads get deployed faster, performance and availability increases and operations become automated, resulting in IT that is simpler to manage and less costly to own and operate.

The intended audience for this reference architecture is IT professionals, technical architects, sales engineers, and consultants to assist in planning, designing, and implementing the big data solution with Lenovo hardware. It is assumed that you are familiar with Hadoop components and capabilities. For more information about Hadoop, see “Resources” on page 73.


2 Business problem and business value Business Problem

The world is well on its way to generate more than 40 million TB of data by 2020. In all, 90% of the data in the world today was created in the last two years alone. This data comes from everywhere, including sensors that are used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone global positioning system (GPS) signals. This data is big data.

Big data spans the following dimensions: ● Volume: Big data comes in one size: large – in size, quantity and/or scale. Enterprises are awash with

data, easily amassing terabytes and even petabytes of information. ● Velocity: Often time-sensitive, big data must be used as it is streaming into the enterprise to maximize

its value to the business. ● Variety: Big data extends beyond structured data, including unstructured data of all varieties, such as

text, audio, video, click streams, and log files.

Enterprises are incorporating large data lakes into their IT architecture to store all their data. The expectation is that ready access to all the available data can lead to higher quality of insights obtained through the use of analytics, which in turn drive better business decisions. A key challenge faced today by these enterprises is setting up an easy to deploy data storage and processing infrastructure that can start to deliver the promised value in a very short amount of time. Spending months of time and hiring dozens of skilled engineers to piece together a data management environment is very costly and often leads to frustration from unrealized goals. Furthermore, the data processing infrastructure needs to be easily scalable in addition to achieving desired performance and reliability objectives.

Big data is more than a challenge; it is an opportunity to find insight into new and emerging types of data to make your business more agile. Big data also is an opportunity to answer questions that, in the past, were beyond reach. Until now, there was no effective way to harvest this opportunity. Today, Cloudera uses the latest big data technologies such as the in-memory processing capabilities of Spark in addition to the standard MapReduce scale-out capabilities of Hadoop, to open the door to a world of possibilities.

Business Value

Hadoop is an open source software framework that is used to reliably manage and analyze large volumes of structured and unstructured data. Cloudera enhances this technology to withstand the demands of your enterprise, adding management, security, governance, and analytics features. The result is that you get an enterprise-ready solution for complex, large-scale analytics.

How can businesses process tremendous amounts of raw data in an efficient and timely manner to gain actionable insights? Cloudera allows organizations to run large-scale, distributed analytics jobs on clusters of cost-effective server hardware. This infrastructure can be used to tackle large data sets by breaking up the data into “chunks” and coordinating data processing across a massively parallel environment. After the raw data is stored across the nodes of a distributed cluster, queries and analysis of the data can be handled efficiently, with dynamic interpretation of the data formatted at read time. The bottom line: Businesses can finally get their arms around massive amounts of untapped data and mine that data for valuable insights in a more efficient, optimized, and scalable way.


Cloudera that is deployed on Lenovo System x servers with Lenovo networking components provides superior performance, reliability, and scalability. The reference architecture supports entry through high-end configurations and the ability to easily scale as the use of big data grows. A choice of infrastructure components provides flexibility in meeting varying big data analytics requirements.

There is growing interest in deploying Hadoop on a virtualized infrastructure driven by the promise of ease of managing the cluster during initial deployment as well as adding more nodes when data storage and processing requirements grow. The ability to have virtualized Hadoop environment look and feel the same as it does on a bare-metal infrastructure allows flexibility in incorporating the solution within an enterprise’s data management architecture.


3 Requirements The functional and non-functional requirements for this reference architecture are desribed in this section.

Functional Requirements A big data solution supports the following key functional requirements:

● Ability to handle various workloads, including batch and real-time analytics ● Industry-standard interfaces so that applications can work with Cloudera ● Ability to handle large volumes of data of various data types ● Various client interfaces

Non-functional Requirements Customers require their big data solution to be easy, dependable, and fast. The following non-functional requirements are key:

● Easy:

o Ease of development o Easy management at scale o Advanced job management o Multi-tenancy o Easy to access data by various user types

● Dependable:

o Data protection with snapshot and mirroring o Automated self-healing o Insight into software/hardware health and issues o High availability (HA) and business continuity

● Fast:

o Superior performance o Scalability

● Secure and governed:

o Strong authentication and authorization o Kerberos support o Data confidentiality and integrity


4 Architectural Overview Cloudera Enterprise

Figure 1 shows the main features of the Cloudera reference architecture that uses Lenovo hardware. Users can log into the Cloudera client from outside the firewall by using Secure Shell (SSH) on port 22 to access the Cloudera solution from the corporate network. Cloudera provides several interfaces that allow administrators and users to perform administration and data functions, depending on their roles and access level. Hadoop application programming interfaces (APIs) can be used to access data. Cloudera APIs can be used for cluster management and monitoring. Cloudera data services, management services, and other services run on the nodes in cluster. Storage is a component of each data node in the cluster. Data can be incorporated into Cloudera Enterprise storage through the Hadoop APIs or network file system (NFS), depending on the needs of the customer.

A database is required to store the data for Cloudera manager, hive metastore, and other services. Cloudera provides an embedded database for test or proof of concept (POC) environments and an external database is required for a supportable production environment.

Figure 1. Cloudera architecture overview

Bare-metal Cluster - Local and External SAS Storage (JBOD)

The big data cluster solutions described in this document can be deployed on bare-metal infrastructure. This means that both the management nodes and the data nodes are implemented on physical host servers. The number of servers of each type is determined based on requirements for high-availability, total data capacity and desired performance objectives. This reference architecture provides validated solutions for tradition


local storage on the Lenovo SR650 as well as external storage using the Lenovo SD530 dense compute nodes with Lenovo D3284 external SAS storage enclosure, configured for non-RAID JBOD (Just-a-Bunch-Of-Drives) which gives over 40% more storage capacity per rack and more compute nodes compared to nodes with internal HDD storage.

The cornerstone server for Cloudera big data clusters will be the SR650 with the highest performance of all selections of processor, memory, and storage. The SD530 with external D3284 SAS enclosure solution provides dense and optimized storage with the highest storage capacity per rack and highest compute node count per rack. The SD530 solution also allows scaling up compute nodes separately from the storage nodes.

With Hadoop external SAS HDD storage, a separate 5U direct access SAS external storage enclosure is used with up to 6 dense SD530 compute nodes connected via SAS cabling. This allows upgrading compute nodes with new technology without impacting the storage nodes. Also, the dense form factors increase both the Cloudera rack storage and compute node count by over 40% compared. Scale out of the external SAS enclosure is as usual for Hadoop where compute nodes are added one at a at time while a new SAS enclosure added for each 6 compute nodes.

With Hadoop local storage, the SR650 server contains compute and storage in the same physical enclosure. Scale out is accomplished by adding one or more nodes which add both compute and storage simultaneously to the cluster. The Lenovo SR650 2U node provides the highest CPU core count and highest total memory per node for a very high end analytics solution.

The graphic below gives a high level view of external SAS storage vs. internal SAS storage.

SR650 Internal SAS Storage SD530 + External Direct Attach SAS Storage

Figure 2. Hadoop local and external SAS storage topologies


Virtualized Cluster with VMware vSphere When Hadoop is virtualized, all of the components of Hadoop, including the NameNode, ResourceManager, DataNode, and NodeManager, are running within purpose-built Virtual Machines (VMs) rather than on the native OS of the physical machine. However, the Hadoop services or roles of the Cloudera software stack are installed with Cloudera Manager exactly the same way as with the physical machines. With a virtualization infrastructure, two or more VMs can be run on the same physical host server to improve cluster usage efficiency and flexibility.

The VMware-based infrastructure with direct attached storage for HDFS is used to maintain the storage-to-CPU locality on a physical node. VMs are configured for one-to-one mapping of a physical disk to a vSphere VMFS virtual disk - see Figure 3 below.

Figure 3. One-to-one mapping of local storage


5 Component Model Cloudera Enterprise provides features and capabilities that meet the functional and nonfunctional requirements of customers. It supports mission-critical and real-time big data analytics across different industries, such as financial services, retail, media, healthcare, manufacturing, telecommunications, government organizations, and leading Fortune 100 and Web 2.0 companies.

Cloudera Enterprise is the world’s most complete, tested, and popular distribution of Apache Hadoop and related projects. All of the packaging and integration work is done for you, and the entire solution is thoroughly tested and fully documented. By taking the guesswork out of building out your Hadoop deployment, Cloudera Enterprise gives you a streamlined path to success in solving real business problems with big data.

The Cloudera platform for big data can be used for various use cases from batch applications that use MapReduce or Spark with data sources, such as click streams, to real-time applications that use sensor data.

Figure 4 shows the Cloudera Enterprise key capabilities that meet the functional requirements of customers.

Figure 4. Cloudera Enterprise key capabilities


Cloudera Components Cloudera Enterprise solution contains the following components:

● Analytic SQL: Apache Impala

Impala is the industry’s leading massively parallel processing (MPP) SQL query engine that runs natively in Hadoop. Apache-licensed, open source Impala project combines modern, scalable parallel database technology with the power of Hadoop, enabling users to directly query data stored in HDFS and Apache HBase without requiring data movement or transformation. Impala is designed from the ground up as part of the Hadoop system and shares the same flexible file and data formats, metadata, security, and resource management frameworks that are used by MapReduce, Apache Hive, Apache Pig, and other components of the Hadoop stack.

● Search Engine: Cloudera Search

Cloudera Search is Apache Solr that is integrated with Cloudera Enterprise, including Apache Lucene, Apache SolrCloud, Apache Flume, Apache Tika, and Hadoop. Cloudera Search also includes valuable integrations that make searching more scalable, easy to use, and optimized for near-real-time and batch-oriented indexing. These integrations include Cloudera Morphlines, which is a customizable transformation chain that simplifies loading any type of data into Cloudera Search.

• NoSQL - HBase

A scalable, distributed column-oriented datastore. HBase provides real-time read/write random access to very large datasets hosted on HDFS.

• Stream Processing: Apache Spark

Apache Spark is an open source, parallel data processing framework that complements Hadoop to make it easy to develop fast, unified big data applications that combine batch, streaming, and interactive analytics on all your data. Cloudera offers commercial support for Spark with Cloudera Enterprise. Spark is 10 – 100 times faster than MapReduce which delivers faster time to insight, allows inclusion of more data, and results in better business decisions and user outcomes.

● Machine Learning: Spark MLlib

MLlib is the API that implements common machine learning algorithms. MLlib is usable in Java, Scala, Python and R. Leveraging Spark’s excellence in iterative computation, MLlib runs very fast, high-quality algorithms.

● Cloudera Manager

Cloudera Manager is the industry’s first and most sophisticated management application for Hadoop and the enterprise data hub. Cloudera Manager sets the standard for enterprise deployment by delivering granular visibility into and control over every part of the data hub, which empowers operators to improve performance, enhance quality of service, increase compliance, and reduce administrative costs. Cloudera Manager makes administration of your enterprise data hub simple and straightforward, at any scale. With Cloudera Manager, you can easily deploy and centrally operate the complete big data stack.

Cloudera Manager automates the installation process, which reduces deployment time from weeks to


minutes; gives you a cluster-wide, real-time view of nodes and services running; provides a single, central console to enact configuration changes across your cluster; and incorporates a full range of reporting and diagnostic tools to help you optimize performance and utilization.

• Cloudera Manager Metrics

Cloudera Manager monitors a number of performance metrics for services and role instances that are running on your clusters. These metrics are monitored against configurable thresholds and can be used to indicate whether a host is functioning as expected. You can view these metrics in the Cloudera Manager Admin Console, which displays metrics about your jobs (such as the number of currently running jobs and their CPU or memory usage), Hadoop services (such as the average HDFS I/O latency and number of concurrent jobs), your clusters (such as average CPU load across all your hosts) and so on.

• Cloudera Manager Backup And Disaster Recovery (BDR)

Cloudera Manager provides an integrated, easy-to-use management solution for enabling data protection in the Hadoop platform. Cloudera Manager provides rich functionality that is aimed towards replicating data that is stored in HDFS and accessed through Hive across data centers for disaster recovery scenarios. When critical data is stored on HDFS, Cloudera Manager provides the necessary capabilities to ensure that the data is available at all times, even in the face of the complete shutdown of a data center. Cloudera Manager also provides the ability to schedule, save, and (if needed) restore snapshots of HDFS directories and HBase tables.

• Cloudera Manager API

The Cloudera Manager API provides configuration and service lifecycle management, service health information and metrics, and allows you to configure Cloudera Manager. The API is served on the same host and port as the Cloudera Manager Admin Console, and does not require an extra process or extra configuration. The API supports HTTP Basic Authentication, accepting the same users and credentials as the Cloudera Manager Admin Console.

• Cloudera Navigator

A fully integrated data management and security tool for the Hadoop platform. Cloudera Navigator provides three categories of functionality:

o Auditing data access and verifying access privileges. Cloudera Navigator allows administrators to configure, collect, and view audit events, and generate reports that list the HDFS access permissions granted to groups. Cloudera Navigator tracks access permissions and actual accesses to all entities in HDFS, Hive, HBase, Hue, Impala, Sentry, and Solr.

o Searching metadata and visualizing lineage. Metadata management features allow DBAs, data modelers, business analysts, and data scientists to search for, amend the properties of, and tag data entities. Cloudera Navigator supports tracking the lineage of HDFS files, datasets, and directories, Hive tables and columns, MapReduce and YARN jobs, Hive queries, Impala queries, Pig scripts, Oozie workflows, Spark jobs, and Sqoop jobs.

o Securing data and simplifying storage and management of encryption keys. Data encryption and key management provide protection against potential threats by malicious actors on the network


or in the datacenter. It is also a requirement for meeting key compliance initiatives and ensuring the integrity of enterprise data.

• Cloudera Kafka

Cloudera Distribution of Apache Kafka is a distributed commit log service. Kafka functions much like a publish/subscribe messaging system, but with better throughput, built-in partitioning, replication, and fault tolerance. Kafka is a good solution for large scale message processing applications. It is often used in tandem with Apache Hadoop, Apache Storm and Spark Streaming.

For more information, see this website: cloudera.com/content/cloudera/en/products-and-services/product-comparison.html

The Cloudera solution is operating system independent. Cloudera supports many Linux® operating systems, including Red Hat Linux and SUSE Linux. For more information about the versions of supported operating systems, see this website:

http://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_cm_requirements.html.

Apache Spark on Cloudera Spark has recently become very popular and is being adopted as a preferred framework for a variety of big data use-cases ranging from batch applications that use MapReduce or Spark with data sources such as click streams, to real-time applications that use sensor data.

The Spark stack is shown in Figure 5. As depicted, the foundational component is the Spark Core. Spark is written in the Scala programming language and offers simple APIs in Python, Java, Scala and SQL.

Figure 5. The Spark stack

In additional to the Spark Core, the framework allows extensions in the form of libraries. Most common extensions are Spark MLlib for machine learning, Spark SQL for queries on structured data, Spark Streaming for real-time stream-processing, and Spark GraphX for handling graph databases. Other extensions are also available. Cloudera does not currently support GraphX or SparkR. There are also caveats for Spark SQL support - please refer to http://www.cloudera.com/documentation/enterprise/latest/topics/spark.html.

http://www.cloudera.com/content/dam/www/static/documents/datasheets/cloudera-enterprise-datasheet.pdf

http://www.cloudera.com/content/dam/www/static/documents/datasheets/cloudera-enterprise-datasheet.pdf

http://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_cm_requirements.html

http://www.cloudera.com/documentation/enterprise/latest/topics/spark.html


The Spark architecture shown in Figure 5 enables a single framework to be used for multiple projects. Typical big data usage scenarios to date have deployed the Hadoop stack for batch processing separately from another framework for stream processing, and yet another one for advanced analytics such as machine learning. Apache Spark combines these frameworks in a common architecture, thereby allowing easier management of the big data code stack and also enabling reuse of a common data repository.

The Spark stack shown in Figure 5 can run in a variety of environments. It can run alongside the Hadoop stack, leveraging Hadoop YARN for cluster management. Spark applications can run in a distributed mode on a cluster using a master/slave architecture that uses a central coordinator called “driver” and potentially large number of “worker” processes that execute individual tasks in a Spark job. The Spark executor processes also provide reliable in-memory storage of data distributed across the various nodes in a cluster. The components of a distributed Spark application are shown in Figure 6.

Figure 6. Distributed Spark application component model

A key distinguishing feature of Spark is the data model, based on RDDs (Resilient Distributed Datasets). This model enables a compact and reusable organization of data set that can reside in the main memory and can be accessed by multiple tasks. Iterative processing algorithms can benefit from this feature by not having to store and retrieve data sets from disks between iterations of computation. These capabilities are what deliver the significant performance gains compared to MapReduce.

RDDs support two types of operations: Transformations and Actions. Transformations are operations that return a new RDD, while Actions return a result to the driver program. Spark groups operations together to reduce the number of passes taken over the data. This so-called lazy evaluation technique enables faster data processing. Spark also allows caching data in memory for persistence to enable multiple uses of the same data. This is another technique contributing to faster data processing.


6 Operational Model This section describes the operational model for the Cloudera reference architecture. To show the operational model for different sized customer environments, four different models or cluster designs are provided for supporting different amounts of data. Throughout this document, these models are referred to as starter rack, half rack, full rack, and multi-rack configuration sizes. The multi-rack is three times larger than the full rack.

A Cloudera deployment consists of cluster nodes, networking equipment, power distribution units, and racks. The predefined configurations can be implemented as-is or modified based on specific customer requirements, such as lower cost, improved performance, and increased reliability. Key workload requirements, such as the data growth rate, sizes of datasets, and data ingest patterns help in determining the proper configuration for a specific deployment. A best practice when a Cloudera cluster infrastructure is designed is to conduct the proof of concept testing by using representative data and workloads to ensure that the proposed design works.

Hardware Description This reference architecture uses Lenovo servers SR630 (1U) and SR650 (2U) servers and Lenovo RackSwitch G8052 and G8272 top of rack switches.

6.1.1 Lenovo ThinkSystem SR650 Server The Lenovo ThinkSystem SR650 is an ideal 2-socket 2U rack server for small businesses up to large enterprises that need industry-leading reliability, management, and security, as well as maximizing performance and flexibility for future growth. The SR650 server is particularly suited for big data applications due to its rich internal data storage, large internal memory and selection of high performance Intel processors. It is also designed to handle general workloads, such as databases, virtualization and cloud computing, virtual desktop infrastructure (VDI), enterprise applications, collaboration/email, and business analytics.

The SR650 server supports:

Up to two Intel® Xeon® Scalable Processors Up to 1.5 TB 2666 MHz TruDDR4 memory (support for up to 3 TB is planned for future), Up to 24x 2.5-inch or 14x 3.5-inch drive bays with an extensive choice of NVMe PCIe SSDs,

SAS/SATA SSDs, and SAS/SATA HDDs Flexible I/O Network expansion options with the LOM slot, the dedicated storage controller slot, and

up to 6x PCIe slots

Figure 7. Lenovo ThinkSystem SR650


Combined with the Intel® Xeon® Scalable Processors (Bronze, Silver, Gold, and Platinum), the Lenovo SR650 server offers an even higher density of workloads and performance that lowers the total cost of ownership (TCO). Its pay-as-you-grow flexible design and great expansion capabilities solidify dependability for any kind of workload with minimal downtime.

The SR650 server provides high internal storage density in a 2U form factor with its impressive array of workload-optimized storage configurations. It also offers easy management and saves floor space and power consumption for most demanding use cases by consolidating storage and server into one system.

This reference architecture recommends the storage-rich ThinkSystem SR650 for the following reasons:

Storage capacity: The nodes are storage-rich. Each of the 14 configured 3.5-inch drives has raw capacity up to 10 TB and each, providing for 140 TB of raw storage per node and over 2000 TB per rack.

Performance: This hardware supports the latest Intel® Xeon® Scalable processors and TruDDR4 Memory.

Flexibility: Server hardware uses embedded storage, which results in simple scalability (by adding nodes).

PCIe slots: Up to 7 PCIe slots are available if rear disks are not used, and up to 3 PCIe slots if the Rear HDD kit is used. They can be used for network adapter redundancy and increased network throughput.

Higher power efficiency: Titanium and Platinum redundant power supplies that can deliver 96% (Titanium) or 94% (Platinum) efficiency at 50% load.

Reliability: Outstanding reliability, availability, and serviceability (RAS) improve the business environment and helps save operational costs

For more information, see the Lenovo ThinkSystem SR650 Product Guide:

https://lenovopress.com/lp0644-lenovo-thinksystem-sr650-server

6.1.2 Lenovo ThinkSystem SR630 Server The Lenovo ThinkSystem SR630 server (shown in Figure 8) is a cost and density-balanced 1U two-socket rack server. The SR650 features a new, innovative, energy-efficient design with up to two Intel® Xeon® Scalable processors (Bronze, Silver, Gold and Platinum), a large capacity of faster, energy-efficient TruDDR4 Memory, up to 14x 3.5" SAS drives or 24x 2.5" SAS drives, and up to three PCI Express (PCIe) 3.0 I/O expansion slots in an impressive selection of sizes and types. The server has improved feature set and exceptional performance is ideal for scalable cloud environments.

Figure 8. Lenovo ThinkSystem SR630



For more information, see the Lenovo ThinkSystem SR630 Product Guide: https://lenovopress.com/lp0643-lenovo-thinksystem-sr630-server

6.1.3 Lenovo ThinkSystem SD530 Compute Server The Lenovo ThinkSystem SD530 is an ultra dense and economical two-socket server in a 0.5U rack form factor. Up to four SD530 servers can be mounted in the ThinkSystem SD530 D2 enclosure using front access making upgrade and serviceability easy. This compute density combined with the Lenovo D3284 84 drive SAS enclosure gives a Hadoop cluster decoupled compute and storage allowing scaling up of compute resources without impacting storage nodes.

Figure 9. ThinkSystem SD530 D2 enclosure with four front access compute nodes

Each compute node has six 2.5" storage bays for certain combinations of HDD, SSD, or NVMe drives. This gives additional storage for Hadoop hot and warm tiering.

For more information on the SD530 D2 enclosure and compute nodes, visit this link:

https://lenovopress.com/datasheet/ds0003-lenovo-thinksystem-sd530-and-d2-enclosure

6.1.4 Lenovo RackSwitch G8052 The Lenovo networking RackSwitch G8052 (as shown in Figure 10) is an Ethernet switch that is designed for the data center and provides a simple network solution. The Lenovo RackSwitch G8052 offers up to 48x 1 GbE ports and up to 4x 10 GbE ports in a 1U footprint. The G8052 switch is always available for business-critical traffic by using redundant power supplies, fans, and numerous high-availability features.

Figure 10. Lenovo RackSwitch G8052




Lenovo RackSwitch G8052 has the following characteristics:

• A total of 48x 1 GbE RJ45 ports • Four 10 GbE SFP+ ports • Low 130W power rating and variable speed fans to reduce power consumption

For more information, see the Lenovo RackSwitch G8052 Product Guide: https://lenovopress.com/tips1270-lenovo-rackswitch-g8052

6.1.5 Lenovo RackSwitch G8272 Designed with top performance in mind, Lenovo RackSwitch G8272 is ideal for today’s big data, cloud and optimized workloads. The G8272 switch offers up to 72 10Gb SFP+ ports in a 1U form factor and is expandable with six 40Gb QSFP+ ports. It is an enterprise-class and full-featured data center switch that delivers line-rate, high-bandwidth switching, filtering and traffic queuing without delaying data. Large data center grade buffers keep traffic moving. Redundant power and fans and numerous HA features equip the switches for business-sensitive traffic.

The G8272 switch (as shown in Figure 11) is ideal for latency-sensitive applications. It supports Lenovo Virtual Fabric to help clients reduce the number of I/O adapters to a single dual-port 10Gb adapter, which helps reduce cost and complexity. The G8272 switch supports the newest protocols, including Data Center Bridging/Converged Enhanced Ethernet (DCB/CEE) for support of FCoE and iSCSI and NAS.

Figure 11. Lenovo RackSwitch G8272

The enterprise-level Lenovo RackSwitch G8272 has the following characteristics:

• 48x SFP+ 10GbE ports plus 6x QSFP+ 40GbE ports • Support up to 72x 10Gb connections using break-out cables • 1.44 Tbps non-blocking throughput with low latency (~ 600 ns) • OpenFlow enabled allows for easily created user-controlled virtual networks • Virtual LAG and LACP for dual switch redundancy

For more information, see the Lenovo RackSwitch G8272 Product Guide:

https://lenovopress.com/tips1267-lenovo-rackswitch-g8272

6.1.6 Lenovo RackSwitch NE2572 The Lenovo ThinkSystem NE2572 RackSwitch is designed for the data center and provides 10 Gb/25 Gb Ethernet connectivity with 40 Gb/100 Gb Ethernet upstream links. It is ideal for big data workload solutions and is an enterprise class Layer 2 and Layer 3 full featured switch that delivers line-rate, high-bandwidth




switching, filtering, and traffic queuing without delaying data. Large data center-grade buffers help keep traffic moving, while the hot-swap redundant power supplies and fans (along with numerous high-availability software features) help provide high availability for business sensitive traffic.

Figure 12. Lenovo RackSwitch NE2572

The NE2572 has the following chacteristics:

• 48x SFP28/SFP+ ports that support 10 GbE SFP+ and 25 GbE SFP28 with AOC and DAC cabling

• 6x QSFP28/QSFP+ ports that support 40 GbE QSFP+ and 100 GbE QSFP28 optical transceivers

with iAOC and DAC cabling

• QSFP28/QSFP+ ports can also be split out into two 50 GbE (for 100 GbE QSFP28), or four 10 GbE

(for 40 GbE QSFP+) or 25 GbE (for 100 GbE QSFP28) connections by using breakout cables.

For more information, see the Lenovo RackSwitch NE2572 Product Guide:

https://lenovopress.com/lp0608-lenovo-thinksystem-ne2572-rackswitch

6.1.7 Lenovo RackSwitch NE10032 The Lenovo ThinkSystem NE10032 RackSwitch that uses 100 Gb QSFP28 and 40 Gb QSFP+ Ethernet technology is specifically designed for the data center. It is ideal for today's big data workload solutions and is an enterprise class Layer 2 and Layer 3 full featured switch that delivers line-rate, high-bandwidth switching, filtering and traffic queuing without delaying data. Large data center-grade buffers help keep traffic moving, while the hot-swap redundant power supplies and fans (along with numerous high-availability features) help provide high availability for business sensitive traffic.

The NE10032 RackSwitch has 32x QSFP+/QSFP28 ports that support 40 GbE and 100 GbE optical transceivers, active optical cables (AOCs), and direct attach copper (DAC) cables. It is an ideal cross-rack aggregation switch for use in a multi rack big data Cloudera cluster.

Figure 13: Lenovo ThinkSystem NE10032 cross-rack switch

For further information on the NE10032 switch, visit this link:





6.1.8 Lenovo D3284 SAS Expansion Enclosure The Lenovo Storage D3284 High Density Expansion Enclosure offers 12 Gbps SAS direct-attached storage expansion capabilities that are designed to provide density, speed, scalability, security, and high availability for medium to large businesses. The D3284 delivers enterprise-class storage technology in a cost-effective dense solution with flexible drive configurations of up to 84 drives in 5U and RAID or JBOD (non-RAID) host connectivity. This reference architecture uses the JBOD configuration. Figure 14 shows a photo of the enclosure.

Figure 14. D3284 SAS expansion enclosure

For further information on the D3284 SAS Expansion Enclosure, visit this link:

https://lenovopress.com/lp0513-lenovo-storage-d3284-external-high-density-drive-expansion-enclosure

Cluster Node Configurations The Cloudera reference architecture is implemented on a set of nodes that make up a cluster which includes two main node types: Worker nodes and Master nodes. Worker nodes use either the ThinkSystem SR650 servers with locally attached storage, or SD530 dense compute nodes with external SAS storage. Master nodes use ThinkSystem SR630 servers.

Worker nodes run data (worker) services for storing and processing data.

Master nodes run the following types of services:

• Management control services for coordinating and managing the cluster • Miscellaneous and optional services for file and web serving



6.2.1 Worker Nodes Table 1 lists the recommended system components for worker nodes demonstrated in this reference architecture.

Table 1. Worker node configuration Component Worker node configuration Server ThinkSystem SR650 or SD530 Processor 2x Intel® Xeon® processors: 6130 Gold, 16-core, 2.1Ghz Memory - base 384 GB: 12x 32GB 2666MHz RDIMM Disk (OS) Dual M.2 480GB SSD with RAID1 Disk (data), per worker node 4 TB drives: 14x 4TB NL SAS 3.5 inch (56 TB total)

Alternate HDD capacities available: 6 TB drives; 14x 6TB NL SAS 3.5 inch (84 TB total) 7 TB partition, 8TB drives; 14x 8TB NL SAS 3.5" (98 TB total)* 8 TB drives: 12x 8TB NL SAS 3.5 inch (96 TB total)* 10 TB drives: 10x 10TB NL SAS 3.5 inch (100TB total)*

HDD controller OS: M.2 RAID1 mirror enablement kit HDFS (SR650): ThinkSystem 430-16i 12Gb HBA HDFS (SD530): ThinkSystem 430-8e 12GB HBA

Hardware storage protection OS: RAID1 HDFS: None (JBOD). By default, Cloudera maintains a total of three copies of data stored within the cluster. The copies are distributed across data servers and racks for fault recovery.

Hardware management network adapter

Integrated 1G BaseT XCC management controller - dedicated or shared LAN port

Data network adapter ThinkSystem 10Gb 4-port SFP+ LOM * Cloudera recommended maximum storage per worker node is 100 TB. This can be achieved with the maximum 14x HDDs per node and up to 7TB partitions giving 98TB per node. Higher capacity HDDs can be used with less than 14 drives per node to maintain the 100TB node limit but lower IOPs are achieved with less HDDs.

The Intel® Xeon® Scalable Processor recommended in Table 1 will provide a balance in performance vs. cost for Cloudera worker nodes. Higher core count and frequency processors are available for compute intensive workloads. A minimum of 384 GB of memory is recommended for most MapReduce workloads with 768 GB or more recommended for HBase, Spark and memory-intensive MapReduce workloads, and VMware virtualized environments.

The OS is loaded on a dual M.2 SSD memory module with RAID1 mirroring capability for High Availability and lowest cost. Optional hot swappable OS storage for SR630 and SD530 nodes is via dual 480GB 2.5" SSD (local SFF storage bay) with RAID1. Data disks are JBOD configured for maximum Hadoop and Spark performance with data fault tolerance coming from the HDFS file system 3x replication factor..


Figure 15. SR650 Worker node disk assignment

Figure 16. SD530 Worker node disk assignment

6.2.2 Master and Utility Nodes The Master node is the nucleus of the Hadoop Distributed File System (HDFS) and supports several other key functions that are needed on a Cloudera cluster. Master nodes primarily run the HDFS NameNode and JournalNode service. A utility node runs remaining management services including an additional instance of ZooKeeper.

The Master node runs the following services:

YARN ResourceManager: Manages and arbitrates resources among all the applications in the system.


Hadoop NameNode: Controls the HDFS file system. The NameNode maintains the HDFS metadata, manages the directory tree of all files in the file system and tracks the location of the file data within the cluster. The NameNode does not store the data of these files.

ZooKeeper: Provides a distributed configuration service, a synchronization service and a name registry for distributed systems.

JournalNode: Collects, maintains and synchronize updates from NameNode.

HA ResourceManager: Standby ResourceManager that can be used to provide automated failover.

HA NameNode: Standby NameNode that can be used to provide automated failover.

Utility nodes run other non-master services for Hadoop management such as: Cloudera Manager, HBase master, HiveServer2, and Spark History Server.

Table 2 lists the recommended configuration for Maste and Utility nodes and they can be customized according to client needs.

Table 2. Master and Utility node configuration Component Master and Utility node configuration Server ThinkSystem SR630 Processor 2x Intel® Xeon® Scalable Processors: 4114 Silver, 12-core 2.1Ghz Memory - base 192 GB – 12x 16 GB 2666MHz RDIMM Disk (OS / local storage)

OS: Dual M.2 480 GB SSD with RAID1 Data: 8x 2TB 2.5” SAS HDD

HDD controller ThinkSystem RAID 930-16i 4GB Flash 12Gb controller Hardware storage protection OS: RAID1

NameNode/Metastore: RAID1 Database: RAID10 Zookeeper/QJN: No h/w protection; JBOD HDDs; multiple service instances across Master nodes provide redundancy

Hardware management controller

Integrated XCLARITY™ CONTROLLER (XCC) with 1GBaseT dedicated interface or shared LAN interface

Data network adapter ThinkSystem 10Gb 4-port SFP+ LOM

The Intel® Xeon® Scalable Processors and minimum memory specified in Table 2 is recommended to provide sufficient performance as a Cloudera Master node. The M.2 SSD form factor is intended for Operating System storage in this reference architecture.

The Master node uses 10 drives for the following storage pools:

• Two drives (M.2 SSD modules) are configured with RAID 1 for operating system • Two drives are configured with RAID 1 for NameNode metastore • Four drives are configured with RAID 10 for database • One drive is configured with RAID 0 for ZooKeeper • One drive is configured with RAID 0 for Quorum Journal Node store

This design separates the data stores for different services and provides best performance. SSD drives in the 2.5" and 3.5" SAS/SATA form factor and PCIe card flash storage can be used to provide improved I/O


performance for the database.

Figure 17. Cloudera Master and Utility node disk assignment

6.2.3 System Management and Edge Nodes Known as Edge, System Management or Gateway nodes, these are installed on the cluster data network but do not run Cloudera Enterprise software directly. Their purpose is to connect the Cloudera cluster to an outside network for remote administration access, for ingesting data from an outside source, or for running end user application software which accesses the Cloudera Enterprise cluster.

A single system management/gateway node is configured in this reference architecture as a minimal node configured for remote administration of the Linux OS and for hardware maintenance. Based on the particular requirements of the cluster for high speed ingesting of data and edge node applications, the CPU, memory, storage, and network capability of this server can be increased.

6.2.4 External SAS Storage Node The D3284 5U external enclosure is configured for JBOD HDDs with direct connect SAS cabling to a Host Bus Adapter (HBA) in the SD530 compute nodes. The enclosure holds 84 HDDs in the 3.5" form factor which are configurable into 6 isolated zones with 14x HDDs each which allows attaching up to 6 worker nodes. The SAS links from the D3284 enclosure can be redundant via dual ESM controller modules and dual SAS cabling with Linux DM-multipath enabled.

Table 3. SAS storage node configuration Component External SAS Enclosure Configuration Node Lenovo storage D3284 5U enclosure SAS controller 2x ESM modules each with 3x 12Gb SAS x4 ports Zone configuration 84 HDDs divided into 6 zones, 14 HDDs each zone for direct

SAS cabling to SD530 compute nodes Disks (data) per SD530 node Dual-ported 12Gb SAS HDDs. Zone capacity summary:


4 TB drives: 14x 4TB NL SAS 3.5 inch (56 TB total) Alternate HDD capacities available: 6 TB drives; 14x 6TB NL SAS 3.5 inch (84 TB total) 7 TB partition, 8TB drives; 14x 8TB NL SAS 3.5" (98 TB total) * 8 TB drives: 12x 8TB NL SAS 3.5 inch (96 TB total per)* 10 TB drives: 10x 10TB NL SAS 3.5 inch (100TB total)*

Hardware storage protection HDFS: None, as configured for JBOD . By default, Cloudera maintains a total of three copies of data stored within the cluster. The copies are distributed across data servers and racks for fault recovery.

Hardware management network adapter

10/100M BaseT interface to integrated management controller with dedicated RJ-45 LAN port

Enclosure High Availability Redundant features: Dual ESM SAS controllers Dual redundant hot-swap AC power supplies N+1 redundant cooling with 5 hot-swap fans Dual SAS cabling to two D3284 ESM controllers with DM-multipath enabled

Hot-swap parts Drives, ESM controller modules, sideplanes, power supplies, and cooling fans

* Cloudera recommended maximum storage per worker node is 100 TB. This can be achieved with the maximum 14x HDDs per node and up to 7TB partitions giving 98TB per node. Higher capacity HDDs can be used with less than 14 drives per node to maintain the 100TB node limit but lower IOPs are achieved with less HDDs.

D3284 Zoning Configuration

D3284 is configured for 6 discrete zones with 14 disks each. Host ports A.1 and A.2, for example, refer to Port A on the rear of the enclosure and use of a Y-cable to split out the single host port A into two ports, A.1 and A.2. Figure 18 shows the port and drive assignments.

Figure 18. D3284 Zone Configuration

Figure 19 shows SAS Y-cabling for the maximum 6 hosts per D3284 SAS enclosure. Included is the second cabling path to dual ESM controllers. A single host can be connected with a non Y-cable leaving room to


scale out the cluster by adding 5 more hosts and the associated HDDs to the SAS enclosure.

Figure 19. D3284 Redundant SAS cabling

Cluster Software Stack

6.3.1 Cloudera Enterprise CDH The following Cloudera CDH software components were installed for this reference architecture

Component Version Release Supervisord 3.0-cm5.15.1 Unavailable Bigtop-Tomcat (CDH 5 only) 0.7.0+cdh5.15.1+0 1.cdh5.15.1.p0.4 Cloudera Manager Agent 5.15.1 1.cm5151.p0.3.el7 Cloudera Manager Management Daemons 5.15.1 1.cm5151.p0.3 Crunch (CDH 5 only) 0.11.0+cdh5.15.1+104 1.cdh5.15.1.p0.4 Flume NG 1.6.0+cdh5.15.1+189 1.cdh5.15.1.p0.4 Hadoop 2.6.0+cdh5.15.1+2822 1.cdh5.15.1.p0.4 MapReduce 1 2.6.0+cdh5.15.1+2822 1.cdh5.15.1.p0.4 HDFS 2.6.0+cdh5.15.1+2822 1.cdh5.15.1.p0.4 HttpFS 2.6.0+cdh5.15.1+2822 1.cdh5.15.1.p0.4


hadoop-kms 2.6.0+cdh5.15.1+2822 1.cdh5.15.1.p0.4 MapReduce 2 2.6.0+cdh5.15.1+2822 1.cdh5.15.1.p0.4 YARN 2.6.0+cdh5.15.1+2822 1.cdh5.15.1.p0.4 HBase 1.2.0+cdh5.15.1+470 1.cdh5.15.1.p0.4 Lily HBase Indexer 1.5+cdh5.15.1+74 1.cdh5.15.1.p0.4 Hive 1.1.0+cdh5.15.1+1395 1.cdh5.15.1.p0.4 HCatalog 1.1.0+cdh5.15.1+1395 1.cdh5.15.1.p0.4 Hue 3.9.0+cdh5.15.1+8420 1.cdh5.15.1.p0.4 Impala 2.12.0+cdh5.15.1+0 1.cdh5.15.1.p0.4 Java 8 1.8.0_91 n/a Kite (CDH 5 only) 1.0.0+cdh5.15.1+147 1.cdh5.15.1.p0.4 kudu 1.7.0+cdh5.15.1+0 1.cdh5.15.1.p0.4 Llama (CDH 5 only) 1.0.0+cdh5.15.1+0 1.cdh5.15.1.p0.4 Mahout 0.9+cdh5.15.1+36 1.cdh5.15.1.p0.4 Oozie 4.1.0+cdh5.15.1+492 1.cdh5.15.1.p0.4 Parquet 1.5.0+cdh5.15.1+197 1.cdh5.15.1.p0.4 Pig 0.12.0+cdh5.15.1+114 1.cdh5.15.1.p0.4 sentry 1.5.1+cdh5.15.1+458 1.cdh5.15.1.p0.4 Solr 4.10.3+cdh5.15.1+529 1.cdh5.15.1.p0.4 spark 1.6.0+cdh5.15.1+570 1.cdh5.15.1.p0.4 Sqoop 1.4.6+cdh5.15.1+136 1.cdh5.15.1.p0.4 Sqoop2 1.99.5+cdh5.15.1+49 1.cdh5.15.1.p0.4 Whirr 0.9.0+cdh5.15.1+25 1.cdh5.15.1.p0.4 Zookeeper 3.4.5+cdh5.15.1+149 1.cdh5.15.1.p0.4

6.3.2 Red Hat Operating System Component Version Linux Operating System Red Hat Enterprise Linux Server 7.5 (Maipo) Kernel Linux 3.10.0-862.el7.x86_64 Architecture x84-64

Cloudera Service Role Layouts Because the Master node is responsible for many memory-intensive tasks, multiple Master and Utility nodes are needed to split out functions. For most implementations, the size of the Cloudera cluster drives how many Master/Utility nodes are needed. Table 4 and Table 5 provide a high-level guideline for a cluster that provides HA NameNode and ResourceManager failover when configured with multiple Master nodes.


Table 4. Service Layout Matrix for High Availability Node Master Node 1 Master Node 2 Master Node 3

Data Nodes

Service/Roles

NameNode JournalNode

NameNode JournalNode

JournalNode DataNode

ZooKeeper ZooKeeper ZooKeeper

ResourceManager ResourceManager Hive MetaStore, WebHCat, HiveServer2

NodeManager

JobHistory Server Cloudera Manager and CM management services

impalad

SparkHistory Server Hue

Oozie

Impala Statestore

Impala Catalog Server

As the number of worker nodes increases in the cluster, more Master and Utility nodes will be needed as shown in Table 5.

Table 5. Cluster size and node types

Number of Worker Nodes Number of Master Nodes Number of Utility nodes

3 - 20 3 --

20 - 80 3 2

80 - 200 3 8

200 - 500 5 8

Note: To ease scale-up of worker nodes, one should plan ahead by installing the next level of Master nodes to be ready as Worker nodes cross one of the boundaries.

Installing and managing the Cloudera Stack The Hadoop ecosystem is complex and constantly changing. Cloudera makes it simple so enterprises can focus on results. Cloudera Manager is the easiest way to administer Hadoop in any environment, with advanced features like intelligent defaults and customizable automation. Combined with predictive


maintenance included in Cloudera’s Support Data Hub, Cloudera Enterprise keeps the business up and running. Reference Cloudera's latest Installation documentation for detailed instructions on Installation: https://www.cloudera.com/documentation/enterprise/5-15-x/topics/installation.html. Reference Cloudera software service layout recommendations: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_host_allocations.html#host_role_assignments

System Management Systems management of a cluster includes Operating System, Hadoop & Spark applications and hardware management. Systems management uses Cloudera Manager and is adapted from the standard Hadoop distribution, which places the management services on separate servers than the worker servers. The Master node runs important and high-memory use functions, so it is important to configure a powerful and fast server for systems management functions. The recommended Master node hardware configuration can be customized according to client needs.

Hardware management uses the Lenovo XClarity™ Administrator, which is a centralized resource management solution that reduces complexity, speeds up response and enhances the availability of Lenovo server systems and solutions. XClarity™ is used to install the OS onto new worker nodes; update firmware across the cluster nodes, record hardware alerts and report when repair actions are needed.

Figure 20 shows the Lenovo XClarity™ Administrator interface in which servers, storage, switches and other rack components are managed and status is shown on the dashboard. Lenovo XClarity™ Administrator is a virtual appliance that is quickly imported into a server virtualized environment.


Figure 20. XClarity™ Administrator interface

In addition, the xCAT open source cluster management tool is available which uses a command line interface to provide a unified interface for hardware control, discovery and operating system deployment. It can be used to facilitate or automate the management of small and large cluster sizes. For more information about xCAT, see Resources section on page 73.

Networking The reference architecture specifies two networks: a high-speed data network and a management network. Two types of rack switches are required; one 1Gb for out-of-band management and a pair of high speed data switches for the data network with High Availability. See Figure 21 below.


Figure 21. Cloudera network

6.6.1 Data Network The data network creates a private cluster among multiple nodes and is used for high-speed data transfer across worker and master nodes, and also for importing data into the Cloudera cluster. The Cloudera cluster typically connects to the customer’s corporate data network. This reference architecture demonstrates the Lenovo 10Gb ethernet System Networking RackSwitch™ G8272 which provides 48 10Gb Ethernet ports with 40Gb uplink ports. Other available data network speeds are 25Gb with RackSwitch NE2532 or 100Gb with RackSwitch NE10032 using either copper or fiber links.

The two ethernet NIC ports of each node are link aggregated into a single bonded network connection giving up to double the bandwidth of each individual link. Link reduncancy is also provided if one link fails. The two data switches are connected together as a Virtual Link Aggregation Group (vLAG) pair using LACP to provide the switch redundancy. Either high speed data switch can drop out of the network and the other switch continues transferring traffic. The switch pairs are connected with dual 10Gb links called an ISL, which allows maintaining consistency between the two peer switches.

6.6.2 Hardware Management Network The hardware management network is a 1GbE network for out-of-band hardware management. The recommended 1GbE switch is the Lenovo RackSwitch G8052 with 10Gb SFP+ uplink ports. Through the XClarity™ Controller management module (XCC) within the ThinkSystem SR650 and SR630 servers, the out-of-band network enables hardware-level management of cluster nodes, such as node deployment, UEFI firmware configuration, hardware failure status and remote power control of the nodes.


Hadoop has no dependency on the XCC management function. The Cloudera OS/management network can be shared with the XCC hardware management network, or can be separated via VLANs on the respective switches. The Cloudera cluster and hardware management networks are then typically connected directly to the customer’s existing administrative network to facilitate remote maintenance of the cluster.

6.6.3 Multi-rack Network The data network in the predefined reference architecture configuration consists of a single network topology. A rack consists of redundant high speed data switches with two bonded links to each server node.. Additional racks can be added as needed for scale out. Beginning with the third rack a core switch for rack aggregation is used and the Lenovo NE10032 core switch with 40Gb and 100Gb uplinks is the best choice for this purpose.

Figure 12 shows a 2-rack configuration. A single rack can be upgraded to this configuration by adding the second rack with the LAG network connection show.

Figure 22. Cloudera 2-rack network configuration

Figure 13 shows how the network is configured when the Cloudera cluster contains 3 or more racks. The data network is connected across racks by four aggregated uplinks from each rack’s high speed data switch to a core NE10032 switch. The 2-rack configuration can be upgraded to the 3-rack configuration as shown. Additional racks can be added with similar uplink connections to the NE10032 cross rack switch. Reference Figure 23 and Figure 25.


Figure 23 Cloudera multi-rack rack network configuration

Within each rack, the G8052 1Gb management switch can be configured to have two uplinks to the data switches for propagating the management VLAN across cluster racks through the NE10032 cross-rack switch. Other cross rack network configurations are possible and may be required to meet the needs of specific deployments and to address clusters larger than three racks.

For multi-rack solutions, the Master nodes can be distributed across racks to maximize fault tolerance.

6.6.4 10Gb and 25Gb Data Network Configurations Both 10Gb and 25Gb high speed data network configurations as well as 100Gb network configurations are possible with a Cloudera cluster. The 10Gb network speed is cost effective for most big data workloads. Higher network speeds such as 25Gb and 100Gb data networks are becoming pervasive and increase overall cluster performance and reduce execution time of big data workloads. The table below shows the 10GB components used in this reference architecture and alternate 25Gb components for a higher speed data network.


Table 6. 10Gb and 25Gb Network Components Components 10Gb Network 25Gb Network

High Speed Data Switches 2x, Lenovo RackSwitch G8272,10GbE 2x, ThinkSystem NE2572 RackSwitch, 25GbE

Node network adapter * ThinkSystem 10Gb 4-port SFP+ LOM Mellanox ConnectX-4 Lx 10/25GbE SFP28 2-port Ethernet Adapter

Connector type SFP+ SFP28 Cabling/transceivers Lenovo Active DAC SFP+ Lenovo 25GBase-SR SFP28

Tranceiver * Reference section 7.5 Data Network Considerations for an expanded network adapter list

Predefined Cluster Configurations The intent of the predefined configurations is to aid initial sizing for customers and to show example starting points for four production cluster sizes: starter rack, half rack, full rack, and a 3 rack multi-rack configuration. These consist of Worker nodes, Master/Utility nodes, system management/edge nodes, network switches, storage enclosures and rack hardware. Figure 24 and Figure 25 show rack diagrams with a description of each component. Table 5 lists storage capacity of the predefined configurations.


6.7.1 SR650 Configurations

Half Rack Full Rack

Figure 24. Half rack and full rack Cloudera predefined configurations


Figure 25. SR650 Multi-rack Cloudera configuration


6.7.2 SD530 with D3284 Configurations Half Rack Full Rack

Figure 26. SD530 with D3284 - Half and Full Rack Pre-Defined Configurations


Figure 27. SD530 with D3284 SAS Expansion Enclosure - 3-Rack Pre-Defined Configuration

6.7.3 Cluster Storage Capacity Table 7 lists the amount of rack storage and the number of nodes in each predefined configuration. A non-production Proof of Concept (POC) size is include for low cost evaluation. The Starter rack is the minimum production cluster supported by Cloudera. Storage is described in two ways: (1) raw storage, and (2) usable storage available for customer data. Usable storage assumes a Hadoop replication of three data


blocks and 25% reserve working capacity. Table 7 shows uncompressed totals. Software compression rates vary widely based on file contents so usable space must be calculated with a specific compression method.

For maximum storage IO operations per second (IOPs) the maximum number of parallel drives should be used in each node, such as 14 in this reference architecture. Less drives with larger density HDDs may give an incremental cost reduction for the same storage capacity, but at a reduced performance (i.e. longer transaction times). Various storage configurations are shown below for comparison.

Table 7. SR650 and SD530 Cluster Storage Capacity Capacity in TB SR650 with local storage SD530 with D3284 external storage (JBOD)

3.5" HDD Large Form Factor (LFF)

POC rack

Starter rack

Half rack

Full rack

3x rack

POC rack

Starter rack

Half rack

Full rack 3x rack

14 Disks per node

4 TB drives Raw storage 168 280 504 952 3080 168 280 672 1344 4648

Usable (25% reserve) 42 70 126 238 770 42 70 168 336 1162

6 TB drives Raw storage 252 420 756 1428 4620 252 420 1008 2016 6972

Usable (25% reserve) 63 105 189 357 1155 63 105 252 504 1743

7TB partition (8TB drives) * Raw storage 294 490 882 1666 5390 294 490 1176 2352 8134

Usable (25% reserve) 74 123 221 417 1348 74 123 294 588 2034

Number of Nodes Data node count 3 5 9 17 55 3 5 12 24 83

HDDs per node 14 14 14 14 14 14 14 14 14 14

12 disks per node

8 TB drives * Raw storage 288 480 864 1632 5280 288 480 1152 2304 7968

Usable (25% reserve) 72 120 216 408 1320 72 120 288 576 1992

Number of Nodes Data node count 3 5 9 17 55 3 5 12 24 83

HDDs per node 12 12 12 12 12 12 12 12 12 12

10 disks per node

10 TB drives *

Raw storage 300 500 900 1700 5500 300 500 1200 2400 8300

Usable (25% reserve) 75 125 225 425 1375 75 125 300 600 2075

Number of Nodes

Data node count 3 5 9 17 55 3 5 12 24 83

HDDs per node 10 10 10 10 10 10 10 10 10 10

Notes: Cloudera recommended worker node max. capacity is 100 TB. This can be achieved with various drive sizes and drive counts per node. Higher performance occurs with the full count of 14 drives per node.


6.7.4 Storage Tiering with NVMe and SSD Drives

SSD Read/Write Usage Specification

SSD drives with SAS or SATA interfaces, and U.2 NVMe drives with PCIe interfaces provide high speed solid state data storage. They are useful as a hot or warm Hadoop storage tier for increasing workload performance, especially with in-memory analytics engines such as Apache Spark. SSDs and NVMe drives have a write limit specification expressed in Drive Writes Per Day (DWPD). Lenovo Mainstream drives have a 3 to 5 write cycle per day average limit (equivalent to continuous writes over a 5 year time period). The Performance drives give over 10 DWPD. Each drive includes wear leveling algorithms as standard features to spread the write operations evenly across every storage byte and allows the drive to achieve the DWPD specifications. The drive firmware takes care of wear leveling under the covers for ease of use by application software.

SSD Drive Options

The following table shows example SSD drive options for the SD530 in 2.5" SATA, SAS and NVMe form factors in the Mainstream and Performance categories. Entry drives with < 3 DWPD are also available and the SD530 product guide on lenovopress.com can be referenced for those drives. The latest Lenovo configurator tools should be used for the latest available drives for the SD530, SR650, and D3284 nodes.

Table 8. SD530 SSD Drive Options, 2.5" form factor SSD Type Description 12Gb SAS Hot Swap SSD, 2.5"

Performance (10+ DWPD) Think System 2.5" HUSMM32 400GB, 800GB, 1.6TB ThinkSystem 2.5" SED HUSMM32 400GB, 800, 1.6TB SSD FIPS

Mainstream (3-5 DWPD) ThinkSystem 2.5" PM1645 800GB, 1.6TB, 3.2TB ThinkSystem 2.5" PM1635a 400GB, 800GB, 1.6TB, 3.2TB

6Gb SATA hot swap SSD, 2.5"

Mainstream (3-5 DWPD) ThinkSystem 2.5" Intel S4610 240GB, 480GB, 960GB, 1.92TB ThinkSystem 2.5" Intel S4600 240GB, 480GB, 960GB, 1.92TB ThinkSystem 2.5" 5100 240GB, 480GB, 960GB, 1.92TB, 3.84TB

U.2 NVME SSD

Performance (10+ DWPD) ThinkSystem U.2 Intel P4800X 375GB, 750GB ThinkSystem U.2 PX04PMB 800GB

Mainstream (3-5 DWPD) ThinkSystem U.2 PX04PMB 960GB ThinkSystem U.2 Intel P4600 1.6TB, 3.2TB

6.7.5 D3284 Storage Tiering The D3284 SAS enclosure can mix rotating HDDs and also SSD drives allowing hot/warm/cold tiers in a single storage enclosure. Figure 28 shows HDDs assigned to a Zone which is connected to a specific host via the Host port listed.


Figure 28. D3284 HDD assignments per Zone

Figure 29 shows HDD assignments for each row in the two drawers of the D3284 SAS enclosure. SSDs generally fill a full row and are not mixed with HDDs (except for row 1). To facilitate mixing SSDs and HDDs within a single zone (and it's attached host), each row contains HDDs assigned to different zones and this creates the warm/hot/cold tiering supported on a Cloudera worker node.

Figure 29. D3284 HDD assignments per drawer and row

Drive population rules exist for the sequence of adding HDDs/SSDs to the enclosure and the complete rules are outlined in the D1212/D1224/D3284 Hardware Installation and Maintenance Guide located on the Lenovo product support website. Link is provided in the Reference section.

The mix of HDDs and SDDs easily fit big data workloads with a variety of capacities and performance available. Currently available SSDs are listed in Table 9.


Table 9. D3284 SSD drive options SSD Type Description 12Gb Dual-port SAS Hot Swap SSD, 2.5" in 3.5" Hybrid Tray

10 DWPD 400GB 10DWD SAS SSD (2.5" in 3.5" Hybrid Tray)

3 DWPD 400GB 3DWD SAS SSD (2.5" in 3.5" Hybrid Tray) 400GB 3DWD SAS SSD (2.5" in 3.5" Hybrid Tray PM1635a))

1 DWPD

3.84TB 1DWD SAS SSD (2.5" in 3.5" Hybrid Tray) 3.84TB 1DWD SAS SSD (2.5" in 3.5" Hybrid Tray PM1633a) 7.68TB 1DWD SAS SSD (2.5" in 3.5" Hybrid Tray) 15.36TB 1DWD SAS SSD (2.5" in 3.5" Hybrid Tray)

6.7.6 SD530 and D3284 Configuration Options

PCIe Slot Usage

Two PCIe low profile slots per SD530 node are available for SAS HBA, Networking, and SSD adapter cards. The base D3284 configuration will use one HBA adapter leaving a 2nd PCIe slot available for a second redundant SAS link, or for a high performance 25Gb or 100Gb NIC adapter with associated network switches.

PCIe slots may also be used for additional Hot and Warm SSD storage in the PCIe form factor which are available for the SD530 and SR650.

Table 10. PCIe Adapter Options Adapter Type Description

Networking Wide selection of network adapters (ethernet, fiberchannel, and infiniband) from Emulex, Mellanox, Broadcom, Intel, Qlogic - reference Table 14 for complete ethernet NIC listing

Storage Intel NVMe PCIe SSD storage adapters - adds more storage via the PCIE expansion bus Storage External SAS HBA adapters for external SAS storage enclosures

GPU GPU adapters from Nvidia and AMD (requires GPU node tray which uses two SD530 node bays in the 4 bay SD530 chassis)

Local 2.5" HDD Bays

Each SD530 node contains six 2.5" HDD bays which present an opportunity to increase storage capacity further by adding high speed Solid State Drives (SSD) or NVMe drives as a local high performance cache for Hadoop hot or warm storage tiers. In-memory workloads which use Spark will benefit from local high speed SSDs since these workloads will spill intermediate results to disk for certain workloads. SSD drives, then can be a critical part of the solution for gaining the maximum performance for in-memory workloads

Reference section 6.7.4 Storage Tiering with NVMe and SSD Drives for additional SSD drive usage details.


7 Deployment considerations This section describes other considerations for deploying the Cloudera solution.

Increasing Cluster Performance This reference architecture and pre-defined configurations provide balanced cluster performance and a starting point to customizing specific workload types. Various hardware components can be enhanced as needed such as processor, system memory, storage components, and network configuration. Reference the below sections for methods for creating a unique cluster configuration relative to:

• CPU selection/core counts

• Memory selection and the 384GB/768GB performance advantage for higher bandwidth

• Storage performance with maximum HDDs

• SSD drives for high speed storage and hot/warm/cold storage tiering

• Network performance

Processor Selection Minimum hadoop recommendations are 1 CPU processor core per data disk plus additional cores dedicated specific Cloudera software services and data analytics functions. The worker node Intel Gold processors in this reference architecture provide 2 processor cores per HDD ratio which gives the maximum HDD throughput plus a full set of cores for additional data anlytics.

Cloudera workload types may be skewed toward IO-bound workloads that create heavy network traffic or CPU bound workloads that stress the CPU cores themselves. Intel Processors in the Platinium class provide higher core counts to meet the highest of CPU bound workloads.

Below are several examples of IO-bound workloads:

• Sorting • Indexing • Grouping • Data importing and exporting • Data movement and transformation

Below are several examples of CPU-bound workloads:

• Clustering/Classification • Complex text mining • Natural-language processing • Feature extraction

Moving up to the Intel processor Bronze, Gold and Platinum categories adds higher total core counts, higher


operating frequency (in Megahertz, MHz), and increased internal memory cache sizes. Processor cost increases incrementally as each of the processor specifications increase. Cloudera provides processor selection guidance for sizing based on specific types of Cloudera cluster services required. Reference this link below for details:

https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html

7.2.1 SR630/SR650 Processors Intel Xeon Scalable Processors are available in a wide range from Bronze to Gold to Platinum class for the SR630 and SR650 servers including the M processors which support 1.5TB memory. This provides the highest memory capacity and performance using M processors and 128GB memory dimms in each of the SR650's 24 DIMM slots giving 3.0 TB of total system memory per SR650 server.

7.2.2 SD530 Processors Intel Xeon Scalable Processors in a wide range from Bronze to Gold to Platinum class are available in the SD530 server. Essentially all big data workloads will obtain the best performance using 12 DIMM sockets and any of the SD530 supported processors. The SD530 supports 16 DIMM slots with a reduced selection of processors (125 watts power or less) in order to give the highest memory capacity possible. The 16 DIMM count operates at reduced memory bandwidth and the Lenovo memory configurator should be referenced for the latest balanced and un-balanced configuration information. When needing the highest performance node configuration with highest Platinum processors, and the highest memory capacity and memory bandwidth, the Lenovo SR650 worker node would be an appropriate choice. For more details on capacity and performance, reference section 7.4.

Designing for Storage Capacity and Performance Selection of the HDD form factor, number of drives, and capacity of each drive can skew a worker node towards highest node capacity or highest disk IO throughput.

7.3.1 Node Capacity The 3.5" HDD form factor gives the maximum local storage capacity for a node. 10TB and larger HDDs are available and can be used to replace the 4TB HDDs used in this reference architecture to give a total of up to the 100 TBs per node (as the recommended maximum by Cloudera). The 4TB HDD size provides the best balance of HDD capacity and performance per node. When increasing data disk capacity, some workloads may experience a decrease in disk parallelism, creating a bottleneck at that node which negatively affects performance. To increase total rack capacity while still using the 4TB HDD size recommended in this reference architecture, the number of nodes in the cluster should be increased to maintain good I/O disk performance

7.3.2 Node Throughput The 2.5" HDD form factor gives the maximum local storage throughput for a node configuration. In cases where the maximum local storage throughput per node is required, the worker node can be configured with

https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html


24x 2.5-inch SAS drives. The 2.5-inch HDD has less total capacity per drive and gives less total capacity per node than the 3.5" form factor, but allows for higher parallel access to the drives - more data can be accessed simultaneously. The SR650 configuration using 2.5" and 3.5" HDDs is listed below as an example of maximum node capacity vs. parallel HDD connections for various drive sizes.

HDD Form Factor HDD size Max. node

storage capacity Parallel HDD Connections

3.5" HDDs, 14x HDDs

10 TB Drive 100 TB 10

8 TB Drive 96 TB 12

2.5" HDDs, 24x HDDs 2.4 TB Drive 57.6 TB 24

Solid State Drives (SSDs) are also available in the 2.5" form factor for the SR650 with a higher capacity per drive than spinning HDDs, but at a significantly higher cost per drive.

In the 2.5" HDD configuration of the SR650, it is recommended to use 3 host bus adapters for maximum parallel throughput vs. a single host bus adapter.

7.3.3 HDD Controller For the type of HDD controller, a host bus adapter driving just-a-bunch-of-disks (JBOD) is the best choice for a worker node in the Cloudera cluster. It provides excellent performance and, when combined with the Hadoop default of 3x data replication, also provides significant protection against data loss. The use of RAID with data disks is discouraged because it reduces performance and the amount data that can be stored. The Hadoop file system, HDFS, provides data redundancy across the Cloudera cluster via the 3 replicas of each data block which makes RAID unnecessary.

Use of RAID0, as a secondary choice, is supported with a single HDD per RAID array for better fault tolerance.

RAID1 and RAID10 are used for certain disks in a Cloudera Master node; therefore, a RAID HDD controller is specified in this configuration.

Memory Size and Performance Low node memory capacity can negatively impact cluster performance by causing workload thrashing and spilling to slower storage devices. Also, in-memory workloads such as Apache Spark benefit from larger memory capacity. Spark workloads are recommended to use higher memory capacities than with Hadoop map reduce for this reason.

In addition to capacity, the number of populated memory DIMMs can negatively impact performance due to Intel memory interleaving techniques for maximizing memory controller performance.


This reference architecture specifies node memory sizes appropriate to Lenovo server types and with moderate Cloudera workloads, but many memory choices are available. Table 11 shows memory capacity recommendations based on cluster node type.

Table 11. Node memory capacity recommendations Memory Capacity Node Type

192 GB Master and Utility nodes 288 - 384 GB Worker node minimum

578 - 3,000 GB Worker node in-memory Spark and high performance workloads

Table 12 provides specific SR650 and SD530 memory configurations to maximize memory and overall workload performance. While all memory capacities in the chart are valid and considered balanced sizes (or near-balanced), the green color coding shows changes in memory bandwidth relative to the highest performance possible using the maximum of 24 DIMMs available with the current generation of Intel 2-socket architecture. If during the cluster planning phase, a determined node memory capacity is close to the 'best' dark green row, one should use that higher capacity and gain a bandwidth performance advantage as well. Certain populated DIMM quantities have memory interleaving advantages giving faster access times than other DIMM quantities.

If one is repurposing an existing cluster and would like to reuse memory, the lighter green, 'better' memory config will show capacities that may fit with memory DIMMs on-hand. The resulting performance relative to the best possible is lower and is shown in the chart.

The relative performance column is aligned with the whitepaper Intel Xeon Scalable Family Balanced Memory Configurations. A link is provided in the Reference section The column is relative to the maximum memory bandwidth possible. The Lenovo memory configurator should be used to verify the latest recommended memory capacities for the Balanced or Near-Balanced configuration. Other DIMM configurations will show up in the configurator as un-balanced and should be avoided. See this link: http://lesc.lenovo.com/ss/#/memory_configuration

Table 12. SR650 - Recommended memory configurations for 2-socket worker nodes

Capacity DIMM Description Relative

Performance Quantity 128 GB 16GB TruDDR4 Memory (1Rx4, 1.2V) 2666MHz RDIMM 67% 8 192 GB 16GB TruDDR4 Memory (1Rx4, 1.2V) 2666MHz RDIMM 97% 12 256 GB 32GB TruDDR4 Memory (2Rx4, 1.2V) 2666Mhz RDIMM 67% 8 288 GB 6x 8GB plus 6x 16GB TruDDR4 Memory (1Rx4, 1.2V)

2666Mhz RDIMM 94% 12

384 GB 32GB TruDDR4 Memory (2Rx4, 1.2V) 2666Mhz RDIMM 97% 12 512 GB 32GB TruDDR4 Memory (2Rx4, 1.2V) 2666Mhz RDIMM 68% 16 578 GB 6x 16GB (2Rx8) plus 6x 32GB TruDDR4 Memory (2Rx4,

1.2V) 2666Mhz RDIMM 94% 12

768 GB 64GB TruDDR4 Memory (4Rx4, 1.2V) 2666MHz LRDIMM 97% 12 768 GB 32GB TruDDR4 Memory (2Rx4, 1.2V) 2666Mhz RDIMM 100% 24

1,536 GB 64GB TruDDR4 Memory (4Rx4, 1.2V) 2666MHz LRDIMM 100% 24 2,048 GB 128GB TruDDR4 Memory (8Rx4 1.2V) 2666Mhz 3DS RDIMM 68% 16

http://lesc.lenovo.com/ss/%23/memory_configuration


3,072 GB * 128GB TruDDR4 Memory (8Rx4 1.2V) 2666Mhz 3DS RDIMM 100% 24 DIMM counts to avoid: 2,6,10,14,18,20,22

* Requires CPU part numbers that support 1.5TB of memory each CPU .

Best Better Avoid Notes: 1. DIMM quantity is of the same part number (speed, size, rank, etc.), unless noted.

2. Physical location of memory DIMMs in the numbered DIMM slots is important - follow the install guides for each particular Lenovo server for the correct location for each DIMM.

Table 13 shows the SD530 dense compute node memory capacity and quantity recommendations. The SD530 has a total of 16 DIMM slots (2-1-1 slot wiring) while the SR650 has a total of 24 DIMM slots (2-2-2 slot wiring). To achieve the balanced memory configuration for best DIMM interleaving, not all memory capacities are available with the SD530 compared with SR650.

Table 13. SD530 Recommended memory configurations for 2 socket worker nodes

Capacity DIMM Description Relative Performance Quantity

128GB 16GB TruDDR4 Memory (1Rx4, 1.2V) 2666MHz RDIMM 67% 8 192 GB 16GB TruDDR4 Memory (1Rx4, 1.2V) 2666MHz RDIMM 97% 12 256GB 32GB TruDDR4 Memory (2Rx4, 1.2V) 2666Mhz RDIMM 67% 8 384GB 32GB TruDDR4 Memory (2Rx4, 1.2V) 2666Mhz RDIMM 97% 12 768GB 64GB TruDDR4 Memory (4Rx4, 1.2V) 2666MHz LRDIMM 97% 12

1,536GB 128GB TruDDR4 Memory (8Rx4 1.2V) 2666Mhz 3DS RDIMM 97% 12 DIMM counts to be avoided: 2,6,10,14,16

Best Better Avoid

Notes: 1. DIMM quantity is of the same part number (speed, size, rank, etc.), unless noted.

2. Physical location of memory DIMMs in the numbered DIMM slots is very important - follow the install guides for each particular Lenovo server for the correct location of each DIMM.

Data Network Considerations The data network speed for a Cloudera cluster should be 10Gb, 25Gb, or 100Gb and Lenovo provides a full set of components for these speeds including copper and fiber cabling choices. The cluster data network in this reference architecture uses a 10Gb network with bonded server NIC interfaces and redundant network switches to provide 20Gb of maximum network connectivity between nodes in the cluster. The ThinkSystem 4 port 10Gb LAN on Motherboard (LOM) adapter is used in this reference architecture while other branded 10Gb NIC adapters are available. Lenovo 25Gb and 100Gb network components are connected in a similar manner. Reference section 6.6.4 for more details on the 25Gb network components.

The currently available NIC adapters are shown in Table 14.


Table 14. Ethernet Network adapters for cluster nodes Code Description

AT7S Emulex VFA5.2 2x10 GbE SFP+ PCIe Adapter AT7T Emulex VFA5.2 2x10 GbE SFP+ PCIe Adapter and FCoE/iSCSI SW ATPX Intel X550-T2 Dual Port 10GBase-T Adapter ATRN Mellanox ConnectX-4 1x40GbE QSFP+ Adapter AUAJ Mellanox ConnectX-4 2x25GbE SFP28 Adapter AUKN ThinkSystem Emulex OCe1410B-NX PCIe 10Gb 4-port SFP+ Ethernet Adapter AUKP ThinkSystem Broadcom NX-E Pcie 10Gb 2-Port Base-T Ethernet Adapter AUKS ThinkSystem Broadcom NX-E PCIe 25GbE 1-Port SFP28 Ethernet Adapter AUKX ThinkSystem Intel X710-DA2 PCIe 10Gb 2-Port SFP+ Ethernet Adapter B0WY ThinkSystem Intel XXV710-DA2 PCIe 25Gb 2-Port SFP28 Ethernet Adapter B21R ThinkSystem QLogic QL41262 10/25GbE SFP28 2-Port PCIe Ethernet Adapter B31C ThinkSystem Mellanox ConnectX-5 Ex 25/40GbE 2-port Low-Latency Adapter B31G ThinkSystem QLogic QL41134 PCIe 10Gb 4-Port Base-T Ethernet Adapter AUZV ThinkSystem Broadcomm 5719 1Gb 4-port RJ45 Ethernet Adapter AUZW ThinkSystem I350-T4 PCIe 1Gb 4-Port RJ45 Ethernet Adapter AUZX ThinkSystem Broadcom 5720 1GbE RJ45 2-Port PCIe Ethernet Adapter AUZY ThinkSystem I350-T2 PCIe 1Gb 2-Port RJ45 Ethernet Adapter

Designing with Hadoop Virtualized Extenstions (HVE) Hadoop supports rack awareness of replicated data blocks via the concept of a Node Group and restricting the location of the same replicated data blocks within the node group. This helps distribute the 3 copies of data blocks across different failure points, such as an individual rack or a group of servers. When designing for a virtualized environment with VMware, a node group gives Hadoop which multiple VMs are located on the same physical node, so as to avoid locating multiple replica copies on the same physical node. Also, when designing with the external SAS enclosure and SD530 serve nodes, HVE helps distribute the replica copies across multiple SAS storage enclosures rather than having the potential of all 3 copies ending up on a single storage enclosure. In a multi rack environment, HVE is used to give Hadoop knowledge which nodes are in which rack, to help avoid multiple replica copies from being located within the same rack.

7.6.1 Enabling Hadoop Virtualization Extensions (HVE) HVE is part of the Apache Project and adds physical node awareness to Hadoop and Spark for a virtualized environment. This enables HDFS to maintain all of the data block replicas across physical nodes as it does in the bare-metal environment. Refer to the Apache Project HVE descriptions and user guide at this link: https://issues.apache.org/jira/browse/HADOOP-8468. Following are considerations for HVE:

1. Enable HVE when there is more than one Hadoop VM per physical node in virtualized environments. 2. Use the Node Group definition to group VMs that reside on the same physical node to enable HDFS

to distribute block replication across physical nodes.

https://issues.apache.org/jira/browse/HADOOP-8468


3. HVE extensions can be used to create node groups to further specify locality and awareness for spreading HDFS block replicas across physical servers of a common model type, with a certain level of power supply redundancy or nodes from certain hardware purchase cycles, for example.

The following diagram illustrates the addition of a new level of abstraction (in red) called Node Groups. The NodeGroups represent the physical hypervisor on which the nodes (VMs) reside.

Rx = Server Rack

NGx = Node Group

Nx = Physical Server Node

Figure 30. HVE Node groups All VMs under the same node group run on the same physical host. With awareness of the node group layer, HVE refines the following policies for Hadoop on virtualization: Replica Placement Policy • No duplicated replicas are on the same node or nodes under the same node group. • First replica is on the local node or local node group of the writer. • Second replica is on a remote rack of the first replica. • Third replica is on the same rack as the second replica. • The remaining replicas are located randomly across rack and node group for minimum restriction.


Replica Choosing Policy • The HDFS client obtains a list of replicas for a specific block sorted by distance, from nearest to farthest: local node, local node group, local rack, off rack. Balancer Policy • At the node level, the target and source for balancing follows this sequence: local node group, local rack, off rack. • At the block level, a replica block is not a good candidate for balancing between source and target node if another replica is on the target node or on the same node group of the target node.

HVE typically supports failure and locality topologies defined from the perspective of virtualization. However, you can use the new extensions to support other failure and locality changes, such as those relating to power supplies, arbitrary sets of physical servers, or collections of servers from the same hardware purchase cycle.

Cloudera VMware Virtualized Configuration Guidelines for installing the Cloudera stack on a virtualized platform are nearly identical to those for bare-metal, except for enabling the Hadoop Virtualized Extensions (HVE) functionality. HVE is configured in Cloudera Manager for HDFS and YARN via safety valve properties. Cloudera installation documentation is used to specify the node group topology for each specific cluster design. In this reference architecture, the following software and hardware configuration was used.

7.7.1 Cluster Software Stack Table 15. Virtualized Cloudera Software Stack

Component Version

vSphere (ESXi + vCenter Server) 6.5.0

Guest Operating System Red Hat rhel7.3

Cloudera Hadoop Distribution CDH 5.12

Cloudera Manager CDH 5.12

Java Oracle 1.8.0

7.7.2 ESXi Hypervisor and Guest OS Configuration: Below are the key configuration parameters used in this reference architecture. Many valid configurations exist and additional information can be obtained from the Cloudera and VMware whitepapers shown in the References section on page 73:

ESXi

4 VMs per physical node 8 physical nodes * 2 = 32 total VMs


Memory

CPU to Memory locality: 2VMs per CPU 6% of node physical Memory allocated for ESXi; remainder Memory allocated to VMs Anonymous paging: vm.swappiness=0

Disks

3 disks each first 2 VMs; 4 disks each 3rd and 4ths VMs (14 disks total per physical node) DataNode: 3 disks each first 2 VMs; 4 disks each 3rd and 4th VMs (14 disks total per physical node) VMware PVSCSI storage adapter used (all 4 virtual SCSI controllers used) for best I/O performance Queue depth in guest OS SCSI driver: 4294967295 (default value) Eager-zeroed thick VMDKs (on EXT4 filesystem in guest OS)

Network

VMXNET3 network driver used with MTU=90000 for jumbo frames on guest OS and virtual switch Enabled TCP segmentation offload (TSO) at the ESXi level (should be enabled by default). Only

VMXNET3 drivers at the Guest layer can leverage this.

Estimating Disk Space When you are estimating disk space within a Cloudera Enterprise cluster, consider the following:

For improved fault tolerance and performance, Cloudera Enterprise replicates data blocks across multiple cluster worker nodes. By default, the file system maintains three replicas.

Compression ratio is an important consideration in estimating disk space and can vary greatly based on file contents. If the customer’s data compression ratio is unavailable, assume a compression ratio of 2.5:1.

To ensure efficient file system operation and to allow time to add more storage capacity to the cluster if necessary, reserve 25% of the total capacity of the cluster.

Assuming the default three replicas maintained by Cloudera Enterprise, the raw data disk space and the required number of nodes can be estimated by using the following equations:

Total raw data disk space = (User data, uncompressed) * (4 / compression ratio)

Total required worker nodes = (Total raw data disk space) / (Raw data disk per node)

You should also consider future growth requirements when estimating disk space.

Based on these sizing principals,

Table 16 shows an example for a cluster that must store 500 TB of uncompressed user data. The example shows that the Cloudera cluster needs 800 TB of raw disk space to support 500 TB of uncompressed data. The 800 TB is for data storage and does not include operating system disk space. A total of 15 nodes are required to support a deployment of this size.

Total raw data disk space = 500TB * (4 / 2.5) = 500 * 1.6 = 800TB

Total required worker nodes = 800TB / (4TB * 14 drives) = 800TB / 56TB = 14.2 => 15 nodes


Table 16. Example of storage sizing with 4TB drives Description Value Data storage size required (uncompressed) 500 TB Compression ratio 2.5:1 Size of compressed data 200 TB Storage multiplication factor 4 Raw data disk space needed for Cloudera cluster 800 TB Storage needed for Cloudera Hadoop 3x replication 600 TB Reserved storage for headroom (25% of 800TB) 200 TB Raw data disk per node (with 4TB drives * 14 drives) 56 TB Minimum number of nodes required (800/56) 15

Scaling Considerations The Hadoop architecture is linearly scalable by adding individual nodes as needed, but it's important to plan ahead for sufficient lab space, open space in racks, and network switch needs to support the new nodes.

Typically, identically configured worker nodes are best to maintain the same ratio of storage and compute capabilities. A Cloudera cluster is scalable by adding additional SR650 and SD530 Worker nodes, Master nodes and network switches. As the capacity of a rack is reached, new racks can be added to the cluster.

Two key aspects to consider for scale out expansion are networking and management. These aspects are critical to cluster operation and changes become more complex as the cluster infrastructure grows so they should be sized to include future node scale out.

The cross-rack networking configuration shown in Figure 23 provides robust network interconnection of multiple racks within the cluster. As more racks are added, the predefined networking topology remains balanced and symmetrical. If there are plans to scale the cluster beyond two racks, a best practice is to initially design the cluster for the 3 node multi-rack design to enforce proper network topology and prevent future re-configuration and hardware changes.

Also, as the number of nodes within the cluster increases, so do many of the tasks of managing the Cloudera cluster. Table 5 shows the number of master nodes based on worker node count and the next higher number of master nodes can be installed early to prepare for scaling out of worker nodes. Building a cluster management framework as part of the initial design and proactively considering the challenges of managing a large cluster pays off significantly in the long run.

7.9.1 Scaling D3284 External SAS JBOD Storage With the decoupled solution using SD530 dense compute nodes and D3284 external SAS JBOD enclosures, linear node scale out occurs as normal for Hadoop clusters. Individual SD530 worker nodes can be added to the 2U SD530 chassis up to 4 nodes, then a new 2U chassis added for the next SD530 node, adding up to 4 nodes to that chassis.

Up to 6 worker nodes are cabled to a D3284 SAS expansion enclosure. A new D3284 SAS enclosure is added to attach the next 1 to 6 worker nodes.

In this way the SD530 and D3284 SAS enclosure is linear scaled within a typical Cloudera cluster.


7.9.1 Scaling D3284 Storage and SD530 Compute Independently With the decoupled SD530 dense compute node and D3284 SAS expansion enclosure solution, compute nodes can be scaled up by removing and replacing with newer technology without impacting the storage nodes. The 12Gb SAS cabling links are industry standard and provide a robust connection to future compute nodes, which may have faster processors and higher memory sizes, as they become available.

D3284 SAS enclosures can also be configured for less compute node attachment, whereby trading compute rack space for more storage capacity. In this reference architecture, 6 nodes attached to a single SAS expansion enclosure is appropriate for a Cloudera cluster, assigning 14 disks to each node, and meeting the Cloudera 100TB node limit with a 10Gb data network. An alternate D3284 configuration with 4 nodes attached and 21 disks assigned to each worker node is possible. This adds more storage per rack for a storage optimized cluster. With this topology using a higher storage per node capacity, a higher speed data network should be considered such as 25Gb and higher to give bandwidth for fault conditions such as a failed node and associated re-replication of the missing disks attached to the failed node.

High Availability Considerations When a Cloudera cluster on Lenovo servers is implemented, consider availability requirements as part of the final hardware and software configuration. Typically, Hadoop is considered a highly reliable solution which means redundant hardware is specified to minimize single-point-of-failure conditions and down time, and the possibility of data loss. Hadoop, Cloudera and Lenovo best practices provide significant protection against data loss. High Availability designed into this reference architecture includes redundant Master and Worker nodes with associated Cloudera services, redundant node power supplies and rack PDUs, redundant network connections, among other items.

High Availability of the rack and node redundant power supplies is critical to an operating enterprise data center. The Lenovo configurator tools assist in selecting options that allow redundant power configurations. See the Reference section for links to the Lenovo configurator tools.

7.10.1 Network Availability To support HA in the data network, redundant switches should be specified for each tier of switches in the Cloudera cluster. Section 6.6 describes the data network topology with the high speed rack data switchs and 100Gb cross-rack redundant switch pairs. These switch pairs should be configured for Virtual Link Aggregation Groups (vLAG) on Lenovo switches (or LACP) which provides coherency between the pairs to continue transferring traffic if a single switch drops out.

Also, the two server ethernet port configurations for NIC bonding or NIC teaming must also be configured for LACP (mode=4 or mode=802.3ad). This way, a single NIC, network cable or switch can fail and the data network connection will continue with the remaining half of the network connection. The bonded NIC interface also operates at twice the speed, or 20Gb/s in this configuration.

A second redundant 1Gb management network switch could be added to ensure HA of the hardware management network. The hardware management network does not affect the operation of the Cloudera Hadoop file system, but remote access to the cluster for maintenance purposes could be impacted with a single switch outage.


7.10.2 Cluster Node Availability The redundancy of each individual worker node is not necessary with Hadoop. HDFS default 3x replication provides built-in redundancy and makes loss of data unlikely with a single or even double node failure. If Hadoop best practices are used, an outage from a worker node loss is extremely unlikely as the workload can be dynamically re-allocated. The loss of a worker node will not cause a job to fail; workload is automatically re-allocated to another data note.

Multiple Master nodes are recommended so that if there is a failure, function can be moved to an operational Master node. Having multiple Master nodes does not automatically resolve the issue of the NameNode being a single point of failure. For more information, see “Software availability considerations.”

Within racks, switches and nodes must have redundant power feeds with each power feed connected from a separate PDU.

7.10.3 Storage Availability HDFS 3x replication provides more than sufficient protection. Higher levels of replication can be considered if needed.

Cloudera also provides manual or scheduled snapshots of volumes to protect against human error and programming defects. Snapshots are useful for rollback to a known data set.

For the D3284 External SAS enclosure (JBOD) solution, this architecture demonstrates high available SAS connections between the SD530 worker nodes and D3284 storage node. The dual redundant SAS link includes 2 HBA PCIe adapters in each worker node, two SAS cable connections, and two ESM SAS controllers in each D3284. This dual SAS connection is not strictly required due to the HDFS 3x replication and Hadoop architected fault tolerance, but gives higher availability in customer enterprises that require it.

7.10.4 Software Availability Operating system availability is provided by using RAID1 mirrored drives for redundant storage.

NameNode HA is recommended and can be achieved by using three master nodes. Active and standby nodes communicate with a group of separate daemons called JournalNodes to keep their state synchronized. When any namespace modification is performed by the active NameNode, it durably logs a record of the modification to most of these JournalNodes. The standby NameNode can read the edits from the JournalNodes and is constantly watching them for changes to the edit log. As the standby Node sees the edits, it applies them to its own namespace.

An external database is required for Cloudera Manager, Hive metastore and so on, and HA configuration of external database is recommended to avoid single point of failure. Embedded databases should only be used for test or POC environment.


Linux OS Configuration Guidelines Cloudera recommends certain Linux configurations to obtain the most stability and performance from the cluster. See the Cloudera installation link in the Resources section on page 73 for complete details, and also this summary of the key install requirements used with Red Hat rhel7.

7.11.1 OS configuration for Cloudera CDH Linux Services installed:

o DNS with forward and reverse FQDN hostname resolution

o NTP for time syncronization across cluster nodes

o NSC for name service caching

o multipathd for D3284 SAS enclosure redundant connections and High Availability

Temporary configuration changes for Installation:

o SELinux should be Disabled at least during Cloudera installation and re-enabled after the cluster is

running with all start up issues resolved.

o firewall should be Disabled (firewalld) during the install process.

Uninstalling unused OS services that may have been installed by default - an example list:

o cups

o iptables

o ip6tables

o postfix - not used by Cloudera but may installed as required by other software

o Bluetooth

Configuration settings for best performance:

o Disable Transparent Huge Pages (THP)

o Disable tuned service (tuned)

o Configure data network NIC bonding interface for MTU = 9000

o Configure data network NIC bonding interface for mode=4 (or mode=802.3ad) to match the 10Gb

data switch configuration for vLAG (LACP)

o Set vm.swappiness linux kernel parameter

7.11.2 OS Configuration for SAS Multipath This reference architecture includes a solution for full redundant signalling paths between the SD530 worker nodes and D3284 SAS storage enclosure. The Linux DM-multipath service manages link failover between the multipath SAS links while multiple SAS controllers and SAS cabling provide redundant hardware.


DM-Multipath

Device Mapper multipath or DM-multipath is a native Linux service included in Red Hat Enterprise Linux. It will aggregate multiple physical SAS links cabled to a single D3284 zone into a single /dev/mapper/mpathn device for each HDD. The DM-multipath Linux service manages fail-over of the links as needed to maintain continuous storage availability in the case where one of the physical links drops out.

During cluster installation, the device-mapper-multipath package is installed onto each host attached to a D3284 SAS enclosure, the /etc/multipath.conf file is configured, and the multipathd service is started. Additional cabling is attached to each D3284 to create the second SAS link. The multipathd service will automatically recognize the additional paths, create the /dev/mapper/mpathn devices and make the HDDs available to the operating system.

Multipath.conf

Below is a description of /etc/multipath.conf contents used in this reference architecture.

Table 17. Multipath.conf parameters Parameter Value

user_friendly_names yes find_multipaths yes path_selector "round-robin 0" path_checker tur failback immediate no_path_retry fail

Parameter Descriptions

user_friendly_names - specifies more easily readable device labels such as /dev/mapper/mpathn, rather than the default long WWID number of each HDD. When mounting HDDs for boot up in /etc/fstab, the UUID of each HDD should be referenced rather than the /dev label which is not guaranteed to remain unchanged after a reboot. The /dev/ labels are more convenient for Linux administration and troubleshooting. find_multipaths yes - enables the multipathd daemon to automatically detect multiple paths to storage devices, aggregate the links and assign to a particular storage device. path_selector "round-robin 0" - specifies the algorithm to use in determining what path to use for the next I/O operation. The "round-robin 0" value loops through every path in the path group sending the same amount of I/O to each. path_checker tur - specifies the method used to determine the state of the paths. tur will issue a Test Unit ready command to the device to determine the state of the path. failback immediate - specifies that when a link has recovered from a failure and is usable again, the link switches back immediately and to the highest priority path group. no_path_retry fail - specifies that once a path fails it is removed from the aggregation. This setting prevents I/O queuing if all paths in the aggregation have failed, and helps prevent s/w that is using this device from hanging indefinitely.


Additional conf file entries can override the multipathd default settings and these are found in Linux on-line resources or via the Red Hat subscription document service.

D3284 SAS Expansion Enclosure Tools

Configuration tool for performing firmware updates and retrieving in-band log files.

logs can be obtained from the dedicated management port via the 1Gb ethernet network, but more complete log dumps for troubleshooting are available with the JBOD Configuration Tool at lenovo.com on the D3284 product site, along with other s/w tools:

https://datacentersupport.lenovo.com/us/en/products/storage/lenovo-storage/d3284/6413?linkTrack=Caps:Body_BrowseProduct&searchType=0&keyWordSearch=

Designing for High Ingest Rates Designing for high ingest rates requires some care. It is important to have a full characterization of the ingest patterns and volumes. The following questions provide guidance to key factors that affect the rates:

● On what days and at what times are the source systems available or not available for ingest? ● When a source system is available for ingest, what is the duration for which the system remains

available? ● Do other factors affect the day, time and duration ingest constraints? ● When ingests occur, what is the average and maximum size of ingest that must be completed? ● What factors affect ingest size? ● What is the format of the source data (structured, semi-structured, or unstructured)? Are there any

data transformation or cleansing requirements that must be achieved during ingest?

To increase the data ingest rates, consider the following points:

● Ingest data with MapReduce job, which helps to distribute the I/O load to different nodes across the cluster.

● Ingest when cluster load is not high, if possible. ● Compressing data is a good option in many cases, which reduces the I/O load to disk and network. ● Filter and reduce data in earlier stage saves more costs.




8 Bill of Materials - SR650 Nodes This appendix includes the Bill of Materials (BOMs) for individual nodes of the Cloudera big data cluster. Table 5 lists how many core components are required for each of the predefined configuration sizes.

The BOM lists in this appendix are not meant to be exhaustive and must always be verified with the on-line Lenovo configuration tools. Any discussion of pricing, support and maintenance options is outside the scope of this document.

This BOM information is for the United States; part numbers and descriptions can vary in other countries. Other sample configurations are available from your Lenovo sales team. Components are subject to change without notice.

Master Node Table 18 lists the BOM for the Master node.

Table 18. Master node Code Description Qty

7X01CTO1WW -SB- ThinkSystem SR630 - 1yr Warranty 1 AUWC ThinkSystem SR530/SR570/SR630 x8/x16 PCIe LP+LP Riser 1 Kit 1 B0MK Enable TPM 2.0 1 AUPW ThinkSystem XClarity Controller Standard to Enterprise Upgrade 1 AUW9 ThinkSystem SR630/SR570 2.5" AnyBay 10-Bay Backplane 1 AUMV ThinkSystem M.2 with Mirroring Enablement Kit 1 AUM7 ThinkSystem 2.5" 2TB 7.2K SAS 12Gb Hot Swap 512n HDD 8 AVWA ThinkSystem 750W (230/115V) Platinum Hot-Swap Power Supply 2 5978 Select Storage devices - configured RAID 1 AXCA ThinkSystem Toolless Slide Rail 1 AUKK ThinkSystem 10Gb 4-port SFP+ LOM 1 AUNK ThinkSystem RAID 930-16i 4GB Flash PCIe 12Gb Adapter 1 AUWQ Lenovo ThinkSystem 1U LP+LP BF Riser Bracket 1 AUW1 ThinkSystem SR630 2.5" Chassis with 10 Bays 1 AWER Intel Xeon Silver 4116 12C 85W 2.1GHz Processor 2 A2KL Secondary Array - RAID 10 1 AUUV ThinkSystem M.2 CV3 128GB SATA 6Gbps Non-Hot Swap SSD 2 AUNB ThinkSystem 16GB TruDDR4 2666 MHz (1Rx4 1.2V) RDIMM 8 AUWW -SB- Front VGA Cable for 1U 2.5" 1 A2K7 Primary Array - RAID 1 1 6570 2.0m, 13A/100-250V, C13 to C14 Jumper Cord 2 2305 Integration 1U Component 1 AUTQ ThinkSystem small Lenovo Label for 24x2.5"/12x3.5"/10x2.5" 1 AUNP FBU345 SuperCap 1

AURR ThinkSystem M3.5 Screw for Riser 2x2pcs and SR530/550/558/570/590 Planar 5pcs 2


AURN Lenovo ThinkSystem Super Cap Box 1 AULP ThinkSystem 1U CPU Heatsink 2 AVWJ ThinkSystem 750W Platinum RDN PSU Caution Label 1 AUWL Lenovo ThinkSystem 1U LP Riser Dummy 1 AUW7 ThinkSystem SR630 4056 Fan Module 2 AVWK ThinkSystem EIA Plate with Lenovo Logo 1 AWF9 ThinkSystem Response time Service Label LI 1 AUX4 MS 1U Service Label LI 1 AUX3 ThinkSystem SR630 Model Number Label 1 AUWV 10x2.5"Cable Kit (1U) 1 AVKG ThinkSystem SR630 MB to 10x2.5" HDD BP NVME cable 1 AV00 6.8m Super Cap Cable 1

AWGE ThinkSystem SR630 WW Lenovo LPK 1 AUW3 Lenovo ThinkSystem Mainstream MB - 1U 1 B0ML Feature Enable TPM on MB 1

B173 Companion Part for XClarity Controller Standard to Enterprise Upgrade in Factory

1

AVEN ThinkSystem 1x1 2.5" HDD Filler 2 A102 Advanced Grouping 1 8971 Integrate in manufacturing 1 AUTJ Lenovo ThinkSystem Label Kit 1 7008 Primary Array 2 HDDs 1 2302 RAID Configuration 1 7017 Secondary Array 4 HDDs 1 AVJ2 ThinkSystem 4R CPU HS Clip 2 AUTC ThinkSystem SR630 Lenovo Agency Label 1 AUTA XCC Network Access Label 1

Worker Node Table 19 lists the BOM for the Worker node.

Table 19. Worker node Code Description Qty

7X05CTO1WW -SB- ThinkSystem SR650 - 1yr Warranty 1 AURC ThinkSystem SR550/SR590/SR650 x16/x8(or x16) PCIe FH Riser 2 Kit 1 B0MK Enable TPM 2.0 1 AUPW ThinkSystem XClarity Controller Standard to Enterprise Upgrade 1 AUR9 ThinkSystem SR650/SR550/SR590 3.5" SATA/SAS 12-Bay Backplane 1 AUMV ThinkSystem M.2 with Mirroring Enablement Kit 1 AUU6 ThinkSystem 3.5" 4TB 7.2K SAS 12Gb Hot Swap 512n HDD 14 AVWF ThinkSystem 1100W (230V/115V) Platinum Hot-Swap Power Supply 2 5977 Select Storage devices - no configured RAID required 1 AXCA ThinkSystem Toolless Slide Rail 1


AUKK ThinkSystem 10Gb 4-port SFP+ LOM 1 AUNM ThinkSystem 430-16i SAS/SATA 12Gb HBA 1 A484 Populate Rear Drives 1

AUVW ThinkSystem SR650 3.5" Chassis with 8 or 12 bays 1 AWEN Intel Xeon Gold 6130 16C 125W 2.1GHz Processor 2 AURZ ThinkSystem SR590/SR650 Rear HDD Kit 1 B11V ThinkSystem M.2 5100 480GB SATA 6Gbps Non-Hot Swap SSD 2 AUND ThinkSystem 32GB TruDDR4 2666 MHz (2Rx4 1.2V) RDIMM 12 AUS8 ThinkSystem SR550/SR590/SR650 EIA Latch w/ VGA Upgrade Kit 1 2306 Integration >1U Component 1 AUTQ ThinkSystem small Lenovo Label for 24x2.5"/12x3.5"/10x2.5" 1 AUQB Lenovo ThinkSystem Mainstream MB - 2U 1 AURS Lenovo ThinkSystem Memory Dummy 12 AURP Lenovo ThinkSystem 2U 2FH Riser Bracket 1


AUSA Lenovo ThinkSystem M3.5" Screw for EIA 136 AVWK ThinkSystem EIA Plate with Lenovo Logo 1 AWF9 ThinkSystem Response time Service Label LI 1 AWFF ThinkSystem SR650 WW Lenovo LPK 1 AURM ThinkSystem SR550/SR650/SR590 Right EIA Latch with FIO 1 B0ML Feature Enable TPM on MB 1


1

B31F ThinkSystem M.2 480GB SSD Thermal Kit 1 A102 Advanced Grouping 1 A2HP Configuration ID 01 1 8971 Integrate in manufacturing 1 AUTJ Lenovo ThinkSystem Label Kit 1 AUSE Lenovo ThinkSystem 2U CPU Entry Heatsink 2 AUSG Lenovo ThinkSystem 2U Cyborg 6038 Fan module 1 AUSS MS 12x3.5" HDD BP Cable Kit 1 AUT8 ThinkSystem 1100W RDN PSU Caution Label 1 AUTS ThinkSystem 2U 12 3.5"HDD Conf HDD sequence Label 1 AVJ2 ThinkSystem 4R CPU HS Clip 2 AUT1 ThinkSystem SR650 Lenovo Agency Label 1 AUSZ ThinkSystem SR650 Service Label LI 1 AUTD ThinkSystem SR650 model number Label 1 AUTA XCC Network Access Label 1


System Management Node Table 20 lists the BOM for the Systems Management Node.

Table 20. Systems Management Node Code Description Qty

7X01CTO1WW -SB- ThinkSystem SR630 - 1yr Warranty 1 6570 2.0m, 13A/125-10A/250V, C13 to IEC 320-C14 Rack Power Cable 2

AVWA ThinkSystem 750W(230/115V) Platinum Hot-Swap Power Supply 2 AUWB ThinkSystem SR530/SR630/SR570 2.5" SATA/SAS 8-Bay Backplane 1 AUWC ThinkSystem SR530/SR570/SR630 x8/x16 PCIe LP+LP Riser 1 Kit 1 5977 Select Storage devices - no configured RAID required 1 B0MK Enable TPM 2.0 1 AXCA ThinkSystem Toolless Slide Rail 1 AUKK ThinkSystem 10Gb 4-port SFP+ LOM 1 AUPW ThinkSystem XClarity™ Controller Standard to Enterprise Upgrade 1 AUNG ThinkSystem RAID 530-8i PCIe 12Gb Adapter 1 AWEH Intel® Xeon® Bronze 3106 8C 85W 1.7GHz Processor 1 AUWQ Lenovo ThinkSystem 1U LP+LP BF Riser BKT 1 AUNB ThinkSystem 16GB TruDDR4 2666 MHz (1Rx4 1.2V) RDIMM 1 AUW0 ThinkSystem SR630 2.5" Chassis with 8 bays 1 AUMV ThinkSystem M.2 with Mirroring Enablement Kit 1 AUUV ThinkSystem M.2 CV3 128GB SATA 6Gbps Non-Hot Swap SSD 2 AUWW -SB- Front VGA Cable for 1U 2.5" 1 2305 Integration 1U component 1 AUS6 Lenovo ThinkSystem 1U height CPU HS Dummy 1 AULP ThinkSystem 1U CPU Heatsink 1 AVWJ ThinkSystem 750W Platinum RDN PSU Caution Label 1 AUWF Lenovo ThinkSystem Super Cap Holder Dummy 1 AVKJ ThinkSystem 2x2 Quad Bay Gen4 2.5" HDD Filler 1

AUWK Lenovo ThinkSystem 4056 Fan Dummy 1 AUWL Lenovo ThinkSystem 1U LP Riser Dummy 1 AVWK ThinkSystem EIA plate with Lenovo logo 1 AWF9 ThinkSystem Response time Service Label LI 1 AUX4 MS 1U Service label LI 1 AUX3 ThinkSystem SR630 Model Number Label 1 AUWX 8x2.5" HDD BP Cable Kit 1 AWGE ThinkSystem SR630 WW Lenovo LPK 1 AUW3 Lenovo ThinkSystem Mainstream MB - 1U 1 B0ML Feature Enable TPM on MB 1 B173 XClarity™ Controller Standard to Enterprise Upgrade in factory 1


Management Network Switch Table 21 lists the BOM for the Management/Administration network switch.

Table 21. Management/Administration network switch Code Description Qty

7159HC1 Lenovo RackSwitch G8052 (Rear to Front) 1 ASY2 Lenovo RackSwitch G8052 (Rear to Front) 1 A3KR Air Inlet Duct for 442 mm RackSwitch 1 A3KP Adjustable 19" 4 Post Rail Kit 1 6201 1.5m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 2 2305 Integration 1U component 1

Data Network Switch Table 22 lists the BOM for the data network switch.

Table 22. Data network switch Code Description Qty

7159HCW Lenovo RackSwitch G8272 (Rear to Front) 2

ASRD Lenovo RackSwitch G8272 (Rear to Front) 2 ASTN Air Inlet Duct for 487 mm RackSwitch 2 6201 1.5m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 4 A3KP Adjustable 19" 4 Post Rail Kit 2 2305 Integration 1U component 2 3792 1.5m Yellow Cat5e Cable 2

Rack Table 23 lists the BOM for the rack.

Table 23. Rack Code Description Qty

9363RC4 -SB- 42U 1100mm Enterprise V2 Dynamic Rack 1

A1RC -SB- 42U 1100mm Enterprise V2 Dynamic Rack 1 5895 1U 12 C13 Switched and Monitored 60A 3 Phase PDU 4 2304 Integration Prep 1 AU8J Integrated Rack Miscellaneous Parts Kit 1 AU8K LeROM Validation 1

91Y9793 Foundation Service - 5Yr Next Business Day Response 1

4271 1U black plastic filler panel 2 4275 5U black plastic filler panel 3

Different cluster sizing leaves different unused rack space; therefore, consider the use of blank plastic filter


panels for the rack to better direct cool air flow.

The number of PDUs in the rack depends on the server numbers in the rack. Four PDU should be used for the half rack configuration and six PDUs for a full rack.

Cables Table 24 lists the BOM for the cables types used in this reference architecture.

Table 24. Cables Code Description AT2S -SB- Lenovo 3m Active DAC SFP+ Cables A3RG 0.5m Passive DAC SFP+ Cable A51N 1.5m Passive DAC SFP+ Cable 3792 1.5m Yellow Cat5e Cable A51P 2m Passive DAC SFP+ Cable 3793 3m Yellow Cat5e Cable AT2S -SB- Lenovo 3m Active DAC SFP+ Cables A3RG 0.5m Passive DAC SFP+ Cable A51N 1.5m Passive DAC SFP+ Cable 3792 1.5m Yellow Cat5e Cable


9 Bill of Materials - SD530 with D3284 This bill of material list is the summary of the components used in this reference architecture. Quantities are for individual nodes. Table 5 lists node quantities for severl pre-defined configurations.

Master Node Table 25 lists the BOM for the Master node.

Table 25. Master node Code Description Qty

7X02CTO1WW ThinkSystem SR630 - 3yr Warranty 1 B0MK Enable TPM 2.0 1 AUPW ThinkSystem XClarity Controller Standard to Enterprise Upgrade 1 AUWB ThinkSystem SR530/SR630/SR570 2.5" SATA/SAS 8-Bay Backplane 1 AUMV ThinkSystem M.2 with Mirroring Enablement Kit 1 AUM7 ThinkSystem 2.5" 2TB 7.2K SAS 12Gb Hot Swap 512n HDD 8 AVW8 ThinkSystem 550W (230V/115V) Platinum Hot-Swap Power Supply 1 5977 Select Storage devices - no configured RAID required 1 AXCA ThinkSystem Toolless Slide Rail 1 AUKK ThinkSystem 10Gb 4-port SFP+ LOM 1 AUNG ThinkSystem RAID 530-8i PCIe 12Gb Adapter 1 AUW0 ThinkSystem SR630 2.5" Chassis with 8 Bays 1 AWER Intel Xeon Silver 4116 12C 85W 2.1GHz Processor 2 B11V ThinkSystem M.2 5100 480GB SATA 6Gbps Non-Hot Swap SSD 2 AUNB ThinkSystem 16GB TruDDR4 2666 MHz (1Rx4 1.2V) RDIMM 12 6400 2.8m, 13A/100-250V, C13 to C14 Jumper Cord 1 2305 Integration 1U Component 1 AUS9 Lenovo ThinkSystem CFF V3 PSU Dummy 1 AULP ThinkSystem 1U CPU Heatsink 2 AUWF Lenovo ThinkSystem Super Cap Holder Dummy 1 AUWG Lenovo ThinkSystem 1U VGA Filler 1 AUWL Lenovo ThinkSystem 1U LP Riser Dummy 1 AUWM Lenovo ThinkSystem 1U LP+LP BF Riser Dummy 1 AUW7 ThinkSystem SR630 4056 Fan Module 2 AVWH ThinkSystem 550W RDN PSU Caution Label 1 AVWK ThinkSystem EIA Plate with Lenovo Logo 1 AWF9 ThinkSystem Response time Service Label LI 1 AUX4 MS 1U Service Label LI 1 AUX3 ThinkSystem SR630 Model Number Label 1 AUWX 8x2.5" HDD BP Cable Kit 1 AWGE ThinkSystem SR630 WW Lenovo LPK 1 AUW3 Lenovo ThinkSystem Mainstream MB - 1U 1 B0ML Feature Enable TPM on MB 1



1

A102 Advanced Grouping 1 8971 Integrate in manufacturing 1

A193 Integrated Solutions 1

AUTJ Lenovo ThinkSystem Label Kit 1 AVJ2 ThinkSystem 4R CPU HS Clip 2 AUTC ThinkSystem SR630 Lenovo Agency Label 1 AUTV ThinkSystem large Label for non-24x2.5"/12x3.5"/10x2.5" 1 AUTA XCC Network Access Label 1 8034 e1350 Solution Component 1

Worker Node Table 26 lists the BOM for the Worker node.

Table 26. SD530 Worker node Code Description Qty

7X21CTO1WW Lenovo ThinkSystem SD530 1 B0MK Enable TPM 2.0 1 AUPW ThinkSystem XClarity Controller Standard to Enterprise Upgrade 1 B324 ThinkSystem SD530 2.5" NVMe 4-Bay Backplane Kit 1 AUMV ThinkSystem M.2 with Mirroring Enablement Kit 1 B11K ThinkSystem U.2 Intel P4600 3.2TB Mainstream NVMe PCIe3.0 x4 Hot

Swap SSD 1

AUNR ThinkSystem 430-8e SAS/SATA 12Gb HBA 1 5977 Select Storage devices - no configured RAID required 1 AUXN ThinkSystem SD530 Computing Node 1 AX6D Intel Xeon Gold 6130 16C 125W 2.1GHz Processor 2 B11V ThinkSystem M.2 5100 480GB SATA 6Gbps Non-Hot Swap SSD 2 AUND ThinkSystem 32GB TruDDR4 2666 MHz (2Rx4 1.2V) RDIMM 12 B0ML Feature Enable TPM on MB 1 B173 Companion Part for XClarity Controller Standard to Enterprise Upgrade in

Factory 1

AVEN ThinkSystem 1x1 2.5" HDD Filler 3 AXHD Feature Intel Inside Xeon Label 1 A193 Integrated Solutions 1 B33L NVMe x4 SSL, LI 1 AVD0 System Document 1 AVJ2 ThinkSystem 4R CPU HS Clip 2

AUXW ThinkSystem SD530 85mm CPU2 Heatsink 2 AUXT ThinkSystem SD530 Agency Label 1 AUXR ThinkSystem SD530 HDD Blank Filler 2


AVQY ThinkSystem SD530 SW GBM 1 8034 e1350 Solution Component 1

Table 27. SD530 2U Chassis 7X20CTO1WW Lenovo ThinkSystem D2 Enclosure 1 7X20CTO1WW Lenovo ThinkSystem D2 Enclosure 1

6201 1.5m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 2 AUZ2 ThinkSystem D2 2000W Platinum PSU 2 AUYC ThinkSystem D2 Slide Rail 1 AUY9 ThinkSystem D2 10Gb 8 port EIOM SFP+ 1 AUY7 ThinkSystem D2 8-slot x8 Shuttle ASM 1 AUXM ThinkSystem D2 Enclosure 1 AVR1 ThinkSystem Single Ethernet Port SMM 1 2306 Integration >1U Component 1 AUYE ThinkSystem D2 Rack Shipping Bracket-Front for Rack 1 A102 Advanced Grouping 1 8971 Integrate in manufacturing 1 A193 Integrated Solutions 1 AVQL Lenovo ThinkSystem Warning Label, Power 2000W - Platinum 1 AVD0 System Document 1 AVT5 ThinkSystem D2 Chassis Agency Label 1 AVQM ThinkSystem D2 Chassis Label 1 AVQN ThinkSystem D2 Service Label _LI 1 8034 e1350 Solution Component 1

Systems Management Node Table 28 lists the BOM for the Systems Management Node.

Table 28. Systems Management Node Code Description Qty

7X02CTO1WW ThinkSystem SR630 - 3yr Warranty 1 AUWC ThinkSystem SR530/SR570/SR630 x8/x16 PCIe LP+LP Riser 1 Kit 1 B0MK Enable TPM 2.0 1 AUPW ThinkSystem XClarity Controller Standard to Enterprise Upgrade 1 B0WJ ThinkSystem SR630 3.5" Anybay 4-Bay Backplane 1 AUMV ThinkSystem M.2 with Mirroring Enablement Kit 1 AVWA ThinkSystem 750W (230/115V) Platinum Hot-Swap Power Supply 2 5977 Select Storage devices - no configured RAID required 1 AXCA ThinkSystem Toolless Slide Rail 1 AUKK ThinkSystem 10Gb 4-port SFP+ LOM 1 AUNL ThinkSystem 430-8i SAS/SATA 12Gb HBA 1 AUWQ Lenovo ThinkSystem 1U LP+LP BF Riser Bracket 1


AUW2 ThinkSystem SR630 3.5" Chassis with 4 Bays 1 AWEH Intel Xeon Bronze 3106 8C 85W 1.7GHz Processor 1 B11V ThinkSystem M.2 5100 480GB SATA 6Gbps Non-Hot Swap SSD 2 AUNB ThinkSystem 16GB TruDDR4 2666 MHz (1Rx4 1.2V) RDIMM 1 AUWU -SB- Front VGA Cable for 1U 3.5" 1 6400 2.8m, 13A/100-250V, C13 to C14 Jumper Cord 2 2305 Integration 1U Component 1 AUS6 Lenovo ThinkSystem 1U height CPU HS Dummy 1


AULP ThinkSystem 1U CPU Heatsink 1 AVWJ ThinkSystem 750W Platinum RDN PSU Caution Label 1 AVJ3 ThinkSystem 1x1 3.5" HDD Filler 4

AUWK Lenovo ThinkSystem 4056 Fan Dummy 1 AUWL Lenovo ThinkSystem 1U LP Riser Dummy 1 AVWK ThinkSystem EIA Plate with Lenovo Logo 1 AWF9 ThinkSystem Response time Service Label LI 1 AUX4 MS 1U Service Label LI 1 AUX3 ThinkSystem SR630 Model Number Label 1 AWGE ThinkSystem SR630 WW Lenovo LPK 1 AUW3 Lenovo ThinkSystem Mainstream MB - 1U 1 B0ML Feature Enable TPM on MB 1 B0WF -SB- SR630 4x3.5" NVMe Cable 1

B173 Companion Part for XClarity Controller Standard to Enterprise Upgrade in Factory 1

B2RV -SB- ThinkSystem AnyBay Drive Bays Label 1 8971 Integrate in manufacturing 1 A193 Integrated Solutions 1 AUTJ Lenovo ThinkSystem Label Kit 1 AVJ2 ThinkSystem 4R CPU HS Clip 1 AUTC ThinkSystem SR630 Lenovo Agency Label 1 AUWY ThinkSystem SR650 12x3.5" SATA/SAS/NVME BP Cable Kit 1 AUTV ThinkSystem large Label for non-24x2.5"/12x3.5"/10x2.5" 1 AUTA XCC Network Access Label 1 8034 e1350 Solution Component 1

External SAS Storage Enclosure Table 29 lists the BOM for the external SAS storage enclosure.

Table 29. External SAS Storage Enclosure Code Description Qty


6413HC1 Lenovo Storage D3284 High Density Expansion Enclosure 1 AUDV Lenovo Storage D3284 High Density Expansion Enclosure 1 3803 3m Blue Cat5e Cable 1 AUJT Lenovo Storage 12Gb High Density Rack Mount Kit-Rails 25"-33" 1 AUJS Lenovo Storage 12G High Density Exp SAS ESM/IO Module 2 AU1A -SB- 2m External MiniSAS HD 8644/2xMiniSAS HD 8644 Y-Cable 6 AUJV Publications - Statement of Warranty, Safety Std. 1 6292 2m, 16A/100-250V, C19 to IEC 320-C20 Rack Power Cable 2 AUDS Lenovo Storage 3.5" 4TB 7.2K NL-SAS HDD (14 pack) 6 2306 Integration >1U Component 1 AX4V Lenovo Storage D3284 packaging for 84 drives 1 8971 Integrate in manufacturing 1 A193 Integrated Solutions 1 AUJR Lenovo Storage D3284 Label Kit 1 AUJU Lenovo Storage D3284 Packaging 1 8034 e1350 Solution Component 1

Management Network Switch Table 30 lists the BOM for the Management/Administration network switch.

Table 30. Management/Administration network switch Code Description Qty

7159HC1 Lenovo RackSwitch G8052 (Rear to Front) 1 ASY2 Lenovo RackSwitch G8052 (Rear to Front) 1 A3KR Air Inlet Duct for 442 mm RackSwitch 1 A3KP Adjustable 19" 4 Post Rail Kit 1 6201 1.5m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 2 2305 Integration 1U Component 1

Data Network Switch Table 31 lists the BOM for the data network switch.

Table 31. Data network switch Code Description Qty

7159HCW Lenovo RackSwitch G8272 (Rear to Front) 2

ASRD Lenovo RackSwitch G8272 (Rear to Front) 2 ASTN Air Inlet Duct for 487 mm RackSwitch 2 6201 1.5m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 4 A3KP Adjustable 19" 4 Post Rail Kit 2 2305 Integration 1U Component 2


Rack Table 32 lists the BOM for the rack.

Table 32. Rack Code Description Qty

1410HPB Scalable Infrastructure 42U 1100mm Enterprise V2 Dynamic Rack 1 A2M8 Scalable Infrastructure 42U 1100mm Enterprise V2 Dynamic Rack 1 5897 1U 9 C19/3 C13 Switched and Monitored 60A 3 Phase PDU 2 5895 1U 12 C13 Switched and Monitored 60A 3 Phase PDU 2 2202 Cluster 1350 Ship Group 1 2304 Integration Prep 1 2310 Solution Specific Test 1 AU8K LeROM Validation 1 B1EQ Network Verification 1 4273 3U black plastic filler panel 1 B165 X Braces for Railhawk Rack 2

Different cluster sizing leaves different unused rack space; therefore, consider the use of blank plastic filter panels for the rack to better direct cool air flow.

The number of PDUs in the rack depends on the server numbers in the rack. Four PDU should be used for the half rack configuration and six PDUs for a full rack.

Cables Table 33 lists the BOM for the cables types used in this reference architecture.

Table 33. Cables Code Description AT2R -SB- Lenovo 1m Active DAC SFP+ Cables A3RG 0.5m Passive DAC SFP+ Cable A51N 1.5m Passive DAC SFP+ Cable 3792 1.5m Yellow Cat5e Cable A51P 2m Passive DAC SFP+ Cable 3793 3m Yellow Cat5e Cable

Software Table 34 lists available software orderable from Lenovo along with the hardware.

Table 34. Software Code Description Qty 2020 RHEL Server Physical or Virtual Node, 2 Skt Prem RH Sup 3Yr 1


10 Acknowledgements This reference architecture document has benefited very much from the detailed and careful review comments provided by colleagues at Lenovo and Cloudera.

Lenovo business review

• Prasad Venkatachar – Sr. Solutions Product Manager

Cloudera technical review

• Alex Moundalexis – Software Engineer - Partner Engineering

• Calvin Goodrich – Engineering Manager - Partner Engineering

Cloudera business review

• Sean Gilbert – Director - Business Development/ Global Partner Sales

VMware technical review

• Technical staff members at VMware


11 Resources For more information, see the following resources: Lenovo ThinkSystem SR650 server:

• Lenovo Press product guide: https://lenovopress.com/lp0644.pdf • 3D Tour: https://lenovopress.com/lp0673-3d-tour-thinksystem-sr650

Lenovo ThinkSystem SR630 server: • Lenovo Press product guide: https://lenovopress.com/lp0643-lenovo-thinksystem-sr630-server • 3D Tour: https://lenovopress.com/lp0672-3d-tour-thinksystem-sr630

Lenovo ThinkSystem SD530 dense compute server

• Product Guide: https://lenovopress.com/lp0635-thinksystem-sd530-server Lenovo ThinkSystem SD530 D2 enclosure,

• Product Guide: https://lenovopress.com/datasheet/ds0003-lenovo-thinksystem-sd530-and-d2-enclosure

Lenovo D3284 JBOD SAS Expansion Enclosure product guide:

• Product Guide: https://lenovopress.com/lp0513-lenovo-storage-d3284-external-high-density-drive-

expansion-enclosure

• D1212/D1224/D3284 Installation and Maintenance Guide:

https://datacentersupport.lenovo.com/us/en/search?query=D3284&Products=STORAGE/LENOVO-

STORAGE/D1212&searchLocation=PSPHome_S

Lenovo RackSwitch G8052 (1GbE Switch): • Product page: https://lenovopress.com/tips1270-lenovo-rackswitch-g8052 • Lenovo Press product guide: https://lenovopress.com/tips1270.pdf

Lenovo RackSwitch G8272 (10GbE Switch): • Product page: https://lenovopress.com/tips1267-lenovo-rackswitch-g8272 • Lenovo Press product guide: https://lenovopress.com/tips1267.pdf

Lenovo ThinkSystem NE10032 (40GbE/100GbE Switch): • Product page: https://lenovopress.com/lp0609-lenovo-thinksystem-ne10032-rackswitch • Lenovo Press product guide: https://lenovopress.com/lp0609.pdf

Intel Xeon Scalable Family Balanced Memory Configurations • Whitepaper: https://lenovopress.com/lp0742-intel-xeon-scalable-family-balanced-memory-

configurations Lenovo XClarity Administrator:

• Product page: https://lenovopress.com/tips1200-lenovo-xclarity-administrator • Lenovo Press product guide: https://lenovopress.com/tips1200.pdf

Cloudera:

• Cloudera Distribution for Hadoop (CDH): http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html

• Cloudera Installation Guide: https://www.cloudera.com/documentation/enterprise/5-15-x/topics/installation.html

• Cloudera products and services: https://www.cloudera.com/products.html • Cloudera solutions: http://www.cloudera.com/content/cloudera/en/solutions.html

https://lenovopress.com/lp0644.pdf

https://lenovopress.com/lp0673-3d-tour-thinksystem-sr650


https://lenovopress.com/lp0672-3d-tour-thinksystem-sr630

https://lenovopress.com/lp0635-thinksystem-sd530-server





https://datacentersupport.lenovo.com/us/en/search?query=D3284&Products=STORAGE/LENOVO-STORAGE/D1212&searchLocation=PSPHome_S

https://datacentersupport.lenovo.com/us/en/search?query=D3284&Products=STORAGE/LENOVO-STORAGE/D1212&searchLocation=PSPHome_S


https://lenovopress.com/tips1270.pdf




https://lenovopress.com/lp0609.pdf

https://lenovopress.com/lp0742-intel-xeon-scalable-family-balanced-memory-configurations

https://lenovopress.com/lp0742-intel-xeon-scalable-family-balanced-memory-configurations

https://lenovopress.com/tips1200-lenovo-xclarity-administrator


http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html

http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html

https://www.cloudera.com/documentation/enterprise/5-15-x/topics/installation.html

https://www.cloudera.com/documentation/enterprise/5-15-x/topics/installation.html

https://www.cloudera.com/products.html

http://www.cloudera.com/content/cloudera/en/solutions.html


• Cloudera resources: https://www.cloudera.com/resources.html • Cloudera RA with VMware and local attached storage:

cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_vmware_local_storage.pdf

VMware:

• VMware Hadoop Deployment Guide • Big Data Performance on vSphere • Virtaulized Hadoop Performance with VMware vSphere 6.0 on High-Performance Servers

Red Hat: https://www.redhat.com/en

Open source software:

• Hadoop: hadoop.apache.org • Spark: spark.apache.org • Flume: flume.apache.org • HBase: hbase.apache.org • Hive: hive.apache.org • Hue: gethue.com • Impala: rideimpala.com • Oozie: oozie.apache.org • Mahout: mahout.apache.org • Pig: pig.apache.org • Sentry: entry.incubator.apache.org • Sqoop: sqoop.apache.org • Whirr: whirr.apache.org • ZooKeeper: zookeeper.apache.org • Parquet: parquet.apache.org • Hadoop Virtualization Extensions (HVE): https://issues.apache.org/jira/browse/HADOOP-8468 ● xCat: xcat.org

https://www.cloudera.com/resources.html

http://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_vmware_local_storage.pdf

http://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_vmware_local_storage.pdf

http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsphere/vmware-hadoop-deployment-guide.pdf

http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/bigdata-perf-vsphere6.pdf

http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/bigdata-perf-vsphere6.pdf

http://www.vmware.com/techpapers/2015/virtualized-hadoop-performance-with-vmware-vsphere-10452.html

https://www.redhat.com/en

http://hadoop.apache.org/

http://flume.apache.org/

http://hbase.apache.org/

http://hive.apache.org/

http://gethue.com/

http://rideimpala.com/

http://oozie.apache.org/

http://mahout.apache.org/

http://pig.apache.org/

http://sqoop.apache.org/

http://sqoop.apache.org/

http://zookeeper.apache.org/

https://issues.apache.org/jira/browse/HADOOP-8468

http://xcat.org/


Document history Version 1.0 12 Oct2017 First version Version 1.1 28 Nov 2017 Updated storage charts to show Cloudera recommendation of 100 TB per node max. Version 1.2 14 Dec 2017 Updates to networking sections Version 1.3 24 Oct 2018 Added decoupled external JBOD SAS storage configuration with dense compute

nodes


Trademarks and special notices © Copyright Lenovo 2018.

References in this document to Lenovo products or services do not imply that Lenovo intends to make them available in every country.

Lenovo, the Lenovo logo, ThinkCenter, ThinkVision, ThinkVantage, ThinkPlus and Rescue and Recovery are trademarks of Lenovo.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Information is provided "AS IS" without warranty of any kind.

All customer examples described are presented as illustrations of how those customers have used Lenovo products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

Information concerning non-Lenovo products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by Lenovo. Sources for non-Lenovo list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. Lenovo has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-Lenovo products. Questions on the capability of non-Lenovo products should be addressed to the supplier of those products.

All statements regarding Lenovo future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local Lenovo office or Lenovo authorized reseller for the full text of the specific Statement of Direction.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in Lenovo product announcements. The information is presented here to communicate Lenovo’s current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard Lenovo benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Photographs shown are of engineering prototypes. Changes may be incorporated in production models.

Any references in this information to non-Lenovo websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this Lenovo product and use of those websites is at your own risk.

lenovo big data validated design for cloudera enterprise with … · 2018-11-06 · 6 lenovo big...

Documents