owncloud enterprise edition on ibm infrastructure; a performance and sizing study for large user...

10

Click here to load reader

Upload: owncloud-inc

Post on 12-Jun-2015

169 views

Category:

Technology


1 download

DESCRIPTION

This whitepaper written by IBM, details the outstanding performance for more than 100,000 ownCloud users on a single storage server; giving an overview of ownCloud sizing considerations and the appropriate assumptions. Within a proof of concept by ownCloud and IBM, measurements were conducted on high-end x86-based IBM storage and compute infrastructure with emphasis on large user number scenarios.

TRANSCRIPT

Page 1: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 1 of 10

ownCloud Enterprise Edition on IBM Infrastructure

A Performance and Sizing Study for Large User Number Scenarios

Dr. Oliver Oberst – IBM Frank Karlitschek – ownCloud

Page 2: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 2 of 10

Introduction One aspect of wide spread cloud computing is online storage, the storage cloud. Within different storage cloud technologies, file sync and share is already in use by millions of people and is still growing rapidly – especially inside organizations – leaving organizations struggling to control and protect sensitive corporate data. ownCloud provides enterprises with a highly scalable on-premises alternative to consumer-grade, cloud-based apps. ownCloud is installed directly on an organization’s servers, fully integrated into existing identity management, security, governance, back up and disaster recovery tools. Organizations can choose fully on-premises storage, cloud, or a hybrid model. Successfully designing large-scale IT solutions implies a realistic system-load estimation depending on software usage patterns and user behavior leading to a suitable sized system. The purpose of this document is to give an overview of general ownCloud sizing considerations and the appropriate assumptions. Within a proof of concept by ownCloud and IBM, measurements were conducted on high-end x86-based IBM storage and compute infrastructure with emphasis on large user number scenarios. Specifically, the performance of ownCloud 5 Enterprise Edition was measured running on an IBM Flex System as database and application server infrastructure connected to an IBM General Parallel File System Storage Server (GSS) 24 system as storage backend. Due to the vast amount of possible authentication and load-balancing setup combinations, going far beyond the scope of this work, the proof of concept described here was focused on measuring the performance capabilities of the ownCloud application and database server and the storage backend. The first question for system sizing – “what do you expect your users to do with the system?” may sound trivial, but sizing is directly related to the size and number of files stored in the system, the number of application plug-ins enabled on the system, and the nature of the users, devices and bandwidth used to connect to the system. For example, higher bandwidth connections use less memory, but stress disk performance as more files are uploaded in a given time frame. The key questions we look at are users, devices, behaviors and files. More specifically:

1. How many users are expected to be active on the system each day? 2. How many devices will they be connecting? 3. What type and number of devices (e.g. mobile phone, tablet or desktop / laptop

machine) will be used? 4. How many files are they syncing and sharing?

There are many other questions, but these tend to be the most important to get right.

Page 3: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 3 of 10

ownCloud Enterprise Edition

Started in January 2010 as an open source project, ownCloud grew quickly --- due to its modular and open design -- into a fast growing community project. Written in PHP and JavaScript ownCloud can utilize several database management systems like SQLite, MySQL, MariaDB, PostgreSQL, and Oracle database. Users interact with ownCloud either via a web browser (http or https) or via sync clients on desktop PCs running Windows, OS X, Linux as well as via mobile devices based on iOS or Android. Any updates to a user’s file are pushed to all connected client devices automatically. Files can be made publicly available via a unique web link or directly shared between users of an ownCloud instance. Due to the large user community the ownCloud project matured and in 2011 the company ownCloud Inc. was founded to ‘bring secure sync & share to business’. In contrast to the community version, ownCloud Enterprise Edition provides enterprise level support, additional features, and improved performance due to customer focused fine tunings.

InfinibandGPFS

GSS •OwnCloudUser Data•OwnCloudDatabase

IBM Flex System or IBM NeXtScale• 80% OwnCloudApplication Servers• 20% OwnCloudDatabase Servers

Clients

https webDav

• Web:• Browser

• Destkop Sync: • Win• Mac • Linux

• Mobile:• Android• IOS

Figure 1 - Blueprint of IBM Infrastructure running ownCloud Enterprise Edition. Application

and Database servers access GSS directly via GPFS.

Page 4: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 4 of 10

ownCloud Enterprise IBM Infrastructure – The Blueprint

A sync & share solution capable of scaling up to very large user numbers requires a powerful and reliable storage backend as well as a suitable sized server infrastructure. To meet these requirements a combination of latest IBM high-end servers and storage products offering enormous compute and storage capabilities were selected: IBM General Parallel File System (GPFS) – GPFS is a high performance parallel file system used in many of IBMs large compute clusters within the Top 500 list and therefore can handle the parallel file access from the application and database servers to the ownCloud files including the ownCloud database. It is also a file management infrastructure which offers a vast amount of file management functionality like Information lifecycle, hierarchical storage management and many more. IBM GPFS Storage Server (GSS) – The GPFS Storage Server is one of the latest High Performance Computing (HPC) driven storage products announced during early 2013. It delivers the reliability and performance required in scenarios we investigated within this work. IBM Flex System – The IBM Flex System x86 servers nodes offer a large flexibility in choosing well suited server nodes within a 14 node chassis which offers full flex node and network management capabilities.

Performance and Sizing Proof of Concept In the following we will estimate the amount of the different user access types expected for typical usage scenarios within a large organizations. First, the detailed test setup and usage scenario assumption are described followed by a summary of the measurement method. Finally the results are presented and discussed.

Proof of Concept Setup

The ownCloud environment was installed and setup by ownCloud Inc. on the previously prepared hardware nodes using a standard IBM supported Linux environment. The setup in detail: Hardware - Analogously to Figure 1 an IBM Flex System chassis was attached to a GSS 24 system. During our test the 10 GbE connection between both was used for the GPFS access of the Flex nodes to the storage server. The hardware details of the used IBM Flex System nodes are listed in Table 1. On the storage side, the GSS 24 model was setup with a GPFS declustered RAID array of eight data and two parity stripes (8+2p). The used model was equipped with 2TB HDDs which summed up to a total of 288TB of net space on the fully configured system.

Page 5: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 5 of 10

Table 1 – Application, Database and Client Node Setup

IBM Flex System

CPU RAM HDD

x240 Nodes Model 8737

2x Intel Xeon 8C Processor Model E5 2670 Details:

115W, 2.6GHz, 1600MHz FSB, 20MB L1 Cache

64 GB

2x IBM 300GB 2.5in SFF Slim-HS,

10.000 RPM, SAS HDD,

RAID1 Setup To test the performance of ownCloud a typical n-tier web architecture with several ownCloud application servers and an n-node MySQL database cluster had to be setup together with client nodes on the six IBM Flex System. Therefore, for each of the three required node types (application, database, client) two of the Flex System nodes were setup correspondingly. Software – The detailed software setup splits up into three categories, the server system software and the ownCloud app configuration as well as the benchmarking tools:

• System software: o Operating System on all Servers: Red Hat Enterprise Linux (RHEL) 6.4 o Web server:

� Apache 2.2.15 � PHP 5.3.3

o Database: � MySQL 5.1.66

• ownCloud environment: o ownCloud Enterprise Edition 5.0 o Active ownCloud apps (default setup):

� Deleted files, First Run Wizard, Image Viewer, Provisioning API, Share Files, Text Editor, User Account Migration, Version, ownCloud Instance Migration

o Additionally: � Server side encryption is disabled, as it is assumed as default in

very large installations. � Logging is set to “FATAL only”.

• Benchmarking tools: o oc-stress – PHP script which uses curl to generate parallel load on an

ownCloud instance � Example: ./oc-stress.php GET url 100000 200

• 200 concurrent curl requests

• 100000 requests in total o ab – Apache benchmark tool. can be used for measuring roundtrip speed of

single requests if used without concurrency mode � Example: ./ab –n100000 –c200 url

Page 6: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 6 of 10

• 200 concurrent curl requests

• 100000 requests in total

The Testing Procedure

The testing procedure consists of a performance tuning after the installation and the final measurement of the maximum performance. A set of two individual measurements, the medium time for one request without concurrent access and additionally the number of request per second using concurrent request is used.

For the initial fine tuning of the test setup, several different reference measurements are used:

• Server performance tuning: o HTTP_GET requests of a single small text file /test.txt

• ownCloud performance including database and storage access: o HTTP_GET requests of an ownCloud folder view on

/remote.php/webdav

Further more, the following test scenarios are of major interest:

• HTTP_PROPFIND requests (sync client heartbeat)

• List of directories via WebDAV

• File upload

• File download In an iterative process, the measurements are executed after each tuning step, beginning with a clean installation of Apache, MySQL and ownCloud with default settings.

Performance Optimizations

The complexity of used tools within a high level ownCloud installation requires a detailed tuning of the individual components. The database requires indexes in most of the active tables. The number of live Apache connections needs to be adapted to a higher value of 1000 or more. Analogously the number of allowed MySQL connections has to be increased. By using a tuned PHP cache (APC) the application server performance can be increased drastically. Any given ownCloud deployment will have additional environment and policy specific configurations which also have to be revisited for gaining best performance. The first set of optimizations is mainly aiming on the Apache connection settings. The number of alive requests is raised to 4096. PHP APC is installed and the temporary space of MySQL was placed into the ramdisk. The second tuning interval consisted of pure Linux system parameter changes. Several IPv4 related parameters where tuned mostly related to the TCP protocol. The results after both tuning iterations and for both used benchmarks

Page 7: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 7 of 10

can be found in Figure 2. The IPv4 system parameter tuning drastically increases the number of requests per second an application server can handle. One should also keep in mind to limit the amount of Apache processes per application server to avoid the use of swap. In our case the apache processes were each consuming about 12MB RAM. Concerning Apache modules, mod_gzip and mod_deflate are both usefull to speedup data transfer, free server memory and closes http connections faster. Apache MPM has to be used to as mod_php is currently not thread save.

Figure 2 – Results of the Benchmarks after the first and second tuning iteration. ab (dark

blue) and oc-stress results do not converge after tuning the network system parameters of

the application servers. Both show however significant improvements. Note that ab ist

scaling a bit better then oc-stress with massive parallel requests. So the server is actually

faster then measured with oc-stress. ab is giving more accurate numbers in this scenarios.

Large User Account Number Systems

For large installations with a big number of user accounts ownCloud Enterprise edition features the provisioning API to enable a comfortable way to provision and deprovision accounts via a REST interface. To test the scalability concerning user account numbers,

Page 8: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 8 of 10

100000 accounts were created and the performance tests were repeated. No measureable impact on the server performance could be observed.

Final Measurement

As mentioned before, the web service functions of interest for an ownCloud installation to be benchmarked are:

1. PROFIND: remote client sync 2. HTTP_GET: List files and directories 3. WebDAV: list files and directories via WebDAV 4. Upload of files a. 5. Download of files

Figure 3 – Final results of the different application server functions of interest within an

ownCloud instance. Measurement of each function was executed separately with a

concurrency of 1000 parallel requests except for up and download were 100 parallel

requests were committed. Both where sent to one application server.

The results of the final set of benchmarks of the PoC environment are listed in Figure . Obviously the system can handle sync client PROPFIND requests best followed by

Page 9: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 9 of 10

HTTP_GET used to list the files and directories within the web portal. The number of requests an application server could handle via the WebDAV interface was about 30% behind PROFINDs and the number of concurrent up and download requests were both in the same order of about 75 requests per second.

Assumptions and Considerations – The 100k User Scenario

In order to estimate the required infrastructure size of a comparable system like the one used here, a general assumption on the average user behavior and therefore the distribution of their access patterns has to be taken into account. Furthermore, to give a quantitative impression within the following section, we extrapolate the combination of assumed usage patterns and measured performance to a user number of 100000 active users and translate the results into hardware sizing numbers. When using ownCloud within an enterprise grade environment using the Sync & Share system as a central data hub within the daily work of each user, the desktop sync clients are of major importance. They are running permanently to keep their files in sync with the server. The default syncing rate of the desktop clients triggers PROFIND requests each 30 seconds to check the synchronization status and to take actions if needed. 100000 active desktop sync clients using the default setup lead to an average number of 3333 PROFINDs each second. Using the benchmark results this translates into a required reference application server node number of slightly less then five application servers which should be able to cope with the load generated by all desktop syncing clients. Note that the synchronization frequency is adjustable. It is possible to intercept peak load times by automatically extend the synchronization interval. Assuming that each user additionally uses a mobile device to connect to the ownCloud instance with an average access rate of one mobile access per hour, this adds up to 28 requests per second for each request type, PROPFIND and HTTP_GET. According to our benchmarks the performance of about a quarter of the reference application server is needed additionally. The web portal usage is assumed to be of the same order like mobile access. However, we exchange the PROPFIND requests with additional HTTP_GET calls of the same number, resulting in 56 (2 * 28) HTTP_GET requests per second on the whole system. This translates again in about a 0.25 Finally, file upload and download have to be taken into account. Here we anticipate average rates of one upload and two downloads per hour. The benchmark results lead to a required number of about 0.75 nodes for downloads and 0.1 nodes for uploads. We round that up to one node for the file transfers. Overall for the 100k user scenario including a buffer of about 20% for the interception of peak load times, the five hosts for the desktop sync clients, one and a half host for web and mobile access and file transfers and finally two database servers, a rounded number

Page 10: ownCloud Enterprise Edition on IBM Infrastructure; A Performance and Sizing Study for Large User Number Scenarios

© 2014 IBM Corporation Version 5/22/2014

http://w3.ibm.com/support/Techdocs

ownCloud Enterprise on IBM Infrastructure - A Performance and Sizing Study Page 10 of 10

of ten Flex System nodes from Table 1 and the GSS24 are assumed to cope with the expected load.

Summary In order to estimate the required IBM hardware infrastructure size for a large scale ownCloud Enterprise Setup for six digit user numbers the proof-of-concept described within this paper successfully delivered a basis for the sizing of future joint ownCloud and IBM solutions. The derived numbers for the 100k user scenario show a great performance and lead to the conclusion that with a single GSS, one fully equipped Flex System chassis and the ownCloud Enterprise edition it is possible to deliver a Sync & Share environment on enterprise level for more than 100000 users. We did not reach the performance limit of the GSS at all whereas it turned out that the sizing of the application and database servers is crucial. The successful cooperation of ownCloud Inc. and IBM enables the delivery of perfectly customized solutions for our clients. ownCloud provides enterprises with a highly scalable on-premises alternative to consumer-grade, cloud-based apps. By installing ownCloud on premises, fully integrated into existing identity management, security, governance, back up and disaster recovery tools, organization can be in full control of their sensitive data. Further more, organizations have the flexibility to choose between on-premises storage, cloud, or a hybrid model to find the model which perfectly matches their requirements.

Trademarks IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml. ownCloud and the ownCloud Logo is a registered trademark of ownCloud, Inc. in the United States, other countries, or both. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. Other company, product and service names may be trademarks or service marks of others.