maxscale - mysql and mariadb at web-scale - enabling new ... - mysql and... · mariadb and mysql...
TRANSCRIPT
1
MariaDB and MySQL at Web-‐Scale: Enabling New Use Cases With MariaDB MaxScale, an Intelligent Proxy
Author: Rich Sands Last update: January 30, 2015
2
Contents 1 Abstract: Scaling New Heights: MariaDB and MySQL in Web-‐scale Applications ............................. 3 2 The Challenge: OSS Databases Must Answer to New Demands ....................................................... 3 3 Introducing MaxScale, a Modular, Intelligent Database Proxy ......................................................... 4 3.1 MaxScale Architecture ................................................................................................................ 6 3.2 Services ....................................................................................................................................... 6 3.2.1 Client Protocols ................................................................................................................... 7 3.2.2 Back-‐End Protocols .............................................................................................................. 7 3.2.3 Routing Modules ................................................................................................................. 8 3.2.4 Authentication Modules ...................................................................................................... 9 3.2.5 Filtering and Logging Modules ............................................................................................ 9 3.2.6 Monitoring Modules .......................................................................................................... 10
4 Delivering Web-‐scale Scalability, Availability, Agility, and Performance ........................................ 10 4.1 Read Scalability using MariaDB or MySQL Replication and Connection-‐based Load Balancing 11 4.2 Read-‐Write Scalability and HA Fail-‐Over using Galera Cluster and Connection-‐based Load Balancing ............................................................................................................................................. 11 4.3 Read Scalability With MariaDB or MySQL Replication and Read/Write Splitting Using Statement-‐based Load Balancing ........................................................................................................ 12 4.4 Query Logging for Performance Diagnostics Using MaxScale Logging Modules ...................... 13 4.5 Query Transformation for Legacy Application Compatibility Using MaxScale Regex Filter Modules .............................................................................................................................................. 14 4.6 The Future: New Uses, New Flexibility for MariaDB and MySQL Through an Intelligent, Modular Proxy ..................................................................................................................................... 14
5 MaxScale Flexibility In Action: Implementation of a Replication Relay Using a Binlog Router ....... 16 6 MaxScale Today ............................................................................................................................... 17 7 Conclusion ....................................................................................................................................... 21
3
1 Abstract: Scaling New Heights: MariaDB and MySQL in Web-‐scale Applications
The design and implementation of server-based applications has benefitted greatly from the evolution and maturation of open source operating systems, databases, and application frameworks. But there remain some significant challenges in ramping these mature, open source technologies to web scale. In particular, popular relational database technologies can struggle to meet web scale demands for availability, consistency, scalability, and development agility.
Imagine a modular infrastructure component sitting between clients and back-end databases that can hide the complexity of database clustering and dynamically translate requests and responses between disparate database technologies. A layer that can aggregate back-end resources in flexible ways, delivering:
● High availability, including coordination of multi-master synchronous clusters. ● Combinations of read and write scalability clustering that can be tailored to
different applications’ requirements, while pushing the complexity into the database layer and keeping application architectures simple.
● Protocol translators, and filtering and transformation capabilities that lets database architects leverage the most appropriate technologies and more easily integrate the database layer into today’s most sophisticated application architectures.
MariaDB MaxScale, a new intelligent database proxy adds to the breadth and polish of the MariaDB and MySQL ecosystem, bringing these and other capabilities to the fore. An open source project on GitHub, MaxScale is available today as an optional add-on component for MariaDB Enterprise and MariaDB Enterprise Cluster, along with comprehensive support and professional services. It is ready for database administrators, architects, DevOps engineers, and developers to explore, deploy, and contribute to.
This report describes the challenges that MariaDB MaxScale addresses, its architecture, some useful clustering configurations available to try in its initial release, and a glimpse of future plans for this new technology. Read on for more information, and be sure to check out the MaxScale code today, and discover new tools to help you deploy MariaDB and MySQL for the new, demanding use cases common in web scale, agile, and advanced enterprise IT applications.
2 The Challenge: OSS Databases Must Answer to New Demands The past 15 years have seen a revolution in how applications are built, deployed, and delivered, and open source infrastructure has been the principal driver of this revolution. The LAMP stack, which includes a OSS database such as MariaDB or MySQL, has become the de-‐facto application infrastructure powering the global Internet – dramatically lowering costs, increasing flexibility, and eliminating vendor lock-‐in. But as quickly as these databases have evolved, the demands of today’s web-‐scale and mobile customer-‐facing applications are taxing even the latest releases of these core database components to the limit:
4
● Scale-‐out configurations often rely on clustering technologies such as MySQL master/slave replication and sharding that are hard to configure and manage. These approaches may force complex architectures on applications.
● Availability requirements demand additional clustering to eliminate single points of failure and minimize planned downtime, but making these clusters work alongside scale-‐out clustering is even more complex.
● Near-‐continuous application deployment makes downtime for database migrations, upgrades, and configuration changes too costly and impractical, making on-‐line updates mandatory.
● Applications may be deployed to on-‐premises, cloud, or hybrid infrastructure, and need to be flexibly moved or scaled based upon dynamic demand.
● Administrators, IT operations specialists, DBAs and developers need comprehensive logging and auditing to optimize performance, diagnose problems, manage security, and avoid outages.
● Multiple applications accessing the same data can have very different performance, consistency, and availability requirements, forcing compromises.
Consider a typical scenario for a massive multiplayer online game launching with fanfare and a high level of anticipation. These popular games see their maximum traffic early in their life, soon after launch. Every action that a player performs in game play results in database updates, generating a high volume. The interactive nature of the game demands low latencies to deliver a satisfying experience to thousands of simultaneous players. Availability is critical, since players frustrated by system downtime may never return. Development and deployment agility is essential, because the developers building these games may need to quickly build, test, and deploy updates on the live production system to solve problems. Such challenges aren’t unique to gaming. How do you handle the scaling challenge when your mobile app is featured in the App Store? Is your customer-‐facing web application resilient to disasters by maintaining replicated databases hosted in geographically separated data centers? Are you able to elastically provision more storage and processing to adapt to varying workloads, de-‐provisioning when capacity is no longer needed? Can your eCommerce application maintain customer security and privacy while scaling to meet the demands of peak holiday shopping days? If your start-‐up public cloud SaaS application takes off, how will you cope? Applications aren’t getting any simpler. Demands on database technology are pushing the envelope. Even the most skilled application architects and developers reach their limit with the need to simultaneously meet these different requirements.
3 Introducing MaxScale, a Modular, Intelligent Database Proxy In each of these challenging use cases, if we introduce a server between the application and the database servers to mediate access to database resources, we can simplify or solve these problems. Such proxy servers aren’t a new idea. But in the case of MySQL databases, proxies have typically been limited to very specific problem domains. For example, there are a number of database proxies on the market from companies like ScaleBase(1) and Tesora(2) that can help you implement scale-‐out
5
replication architectures. HAproxy(3) can be used to distribute database connections over a number of MariaDB or MySQL instances, with queuing and failure detection to manage utilization and increase service availability. A proxy server can make a collection of back-‐end servers appear to an application like a single, simple resource, hiding complexity and enabling use cases that otherwise would be hard to implement. For example, a database proxy could:
● Load balance both connections, and individual requests. ● Log requests, then forward them to database servers for handling. ● Filter requests according to a set of rules, to implement access policies. ● Allow for more interoperable, federated authentication methods beyond basic database access
control to grant application access. ● Scale large tables by implementing transparent sharding outside of the application. ● Translate and mediate between alternative query languages and protocols, letting applications
designed to use different database technologies access the same resources or letting applications use different back-‐end databases without recoding.
● Monitor the availability, utilization, and load on back-‐end servers to prevent requests from going to overburdened or unavailable resources.
Because proxy servers add in a bit of network latency as an extra hop for requests to traverse, it is not practical to have many different proxies layered together to implement more complex database architectures. The MariaDB and MySQL world needs a proxy that can be easily adapted using APIs and plug-‐ins to flexibly implement a wide range of different functions. This is the concept behind MariaDB’s newest tool, called MariaDB MaxScale. MaxScale is a modular, database-‐centric proxy. Built around a high-‐performance networking core, MaxScale includes a set of APIs that allow a wide range of plug-‐ins to perform different functions. The system is designed to be easily extendable, with a low-‐latency, high performance message switch at its center. Multiple MaxScale proxies can be deployed together to increase availability and eliminate
single points of failure. MaxScale sits between clients and back-‐end database servers, and to the client application appears as an ordinary database resource, but with enhanced scalability, availability, performance and security characteristics. Applications may be built using popular frameworks and application servers, or may be designed using custom high-‐performance architectures for specialized purposes. As long as an application uses standard access methods to connect to and utilize database services, it can gain benefit from MaxScale’s intelligent proxy mechanisms.
6
Likewise, MaxScale connects to back-‐end database clusters through pluggable protocol modules that abstract their specific features. MaxScale may be co-‐located on the same systems as application servers, either in side-‐by-‐side virtual machine instances or in separate containers or processes. Co-‐location can reduce latencies by eliminating network hops, while retaining all the architectural advantages of a true proxy. MaxScale is designed with a lot of flexibility in how it is deployed, making it a powerful building block in building out sophisticated, scalable database architectures while retaining very low latencies.
3.1 MaxScale Architecture MariaDB MaxScale’s core consists of a small footprint, efficient event-‐driven message processor that dispatches incoming events to the plug-‐in modules linked to the core through APIs. Events in MaxScale are network requests such as:
● An incoming connection request on a listener socket. ● Incoming statements and data from clients. ● Returned data from backend database servers to be
forwarded to clients. ● Socket closures, or errors. ● Availability of additional connections.
Using the Linux epoll system call for event-‐driven I/O notification, MaxScale can efficiently manage a large number of open connections. As events trigger processing within the proxy, a state
engine determines how MaxScale responds. For example, a “close” event on a database connection changes its state from open to closed, flushing buffers, freeing resources, and removing the connected database from MaxScale’s pool of available resources. Plug-‐ins register functions to be called upon specific state transitions, allowing MaxScale to be dynamically extended to handle a wide range of operations without altering the core, or requiring a system restart in most cases. This state-‐machine architecture also allows even complex processing workflows to be non-‐blocking, greatly increasing throughput and concurrency.
3.2 Services
7
MariaDB MaxScale implements a database service on behalf of clients by combining a set of one or more client-‐facing protocol/port pairs, a routing module, and a set of back-‐ends. The protocol/port pairs define how clients connect to MaxScale and issue operation requests. The query routing module implements an algorithmic policy that assigns incoming requests to back-‐ends most appropriate and able to process them. The back-‐ends in turn receive the requests routed through MaxScale, process them, and return results to be passed back to the clients. Optional filtering and logging modules can be added to processing as well, mirroring selected operations, transforming them dynamically, and blocking them or altering routing decisions as appropriate. This modular architecture allows MaxScale to combine multiple technologies and large numbers of back-‐end clustered servers into services that transparently deliver complex, high-‐value features without the need to modify client .applications.
3.2.1 Client Protocols Client protocols allow applications built to interface with specific database technologies such as MySQL to connect to alternative databases without application changes. Initially, MaxScale supports only MySQL connectors, which are also compatible with MariaDB. In the future, MaxScale may support additional client protocols such as JSON over HTTP, BSON, or even connectors to commercial databases, mapping such client requests into back-‐end operations on alternative database clusters transparently to the application.
3.2.2 Back-‐End Protocols Back-‐end protocols abstract different database products and clustering technologies, eliminating the need to make applications cluster-‐aware. Developers can build applications without regard to clustering, and MaxScale will map simple client operations into cluster-‐aware operations on multiple back-‐end database servers, simultaneously executing these operations in parallel and aggregating the results on behalf of client applications. MaxScale includes a protocol module that supports the following back-‐end database clusters:
8
● MySQL ● MariaDB ● Galera Cluster on both MariaDB and MySQL
3.2.3 Routing Modules Routing modules implement the core proxy algorithms that allow MaxScale to aggregate different back-‐end clusters into sophisticated database services. Routing modules can implement different load balancing, availability, and security policies tailored to the needs of applications. Initially, there are two classes of routing algorithms available within MaxScale:
Connection-‐based Routing MaxScale can establish a route for a particular client connection when that connection is first established. The proxy does not examine individual requests flowing on such a connection, and once the connection between a client and back-‐end database is established, it remains in-‐place until the connection is terminated. Applications may be designed to split reads and writes within the application to different connections, or may use Galera Cluster to handle both reads and writes on any server. Either way, connection-‐based routes have low processing overhead, and can efficiently spread a web-‐scale read load across a large replication hierarchy. Statement-‐based Routing Alternatively, MaxScale can examine each request coming in over a connection and route requests based on the semantic content of each statement. One example included in MaxScale determines whether a request should be sent to a read/write server (master), a read-‐only server (slave) or to both, by evaluating whether each SQL statement might update the database. An application can be written as though it is communicating with a single instance of a database, and MaxScale will handle scale-‐out transparently. The Read/Write Splitter can be used both with MariaDB and MySQL Replication and with Galera Clusters. When used with Galera Clusters, MaxScale’s Galera Monitor module elects one of the database nodes as a master for the purpose of read/write splitting, but because Galera implements fully synchronous replication, failover for a down master server is very quick, making such a configuration highly available. With the Read/Write Splitter coupled with Galera Clusters, applications can be designed and implemented very simply, but take advantage of sophisticated database architectures.
Both of these routing methodologies can be fine-‐tuned by assigning weighting to each server, by MaxScale service. For example, you can assign specific slave servers to manage requests from your eCommerce web application, and other slaves to handle reporting by weighting these servers differently for each service. But in the event of a failure, MaxScale automatically uses available
9
resources to fill in until the failed system is back online. Other tuning parameters include slave lag, so that you can be assured of a minimum level of consistency in read responses. The Read/Write Splitter can also use load sensing, and dynamic sensing of slave consistency lag to route requests to the most lightly loaded server or to the least-‐lagged slave, and place an upper bound on allowable lag in consistency for slaves. These two routing modules are just the start. MaxScale’s message processing foundation and modular plug-‐in architecture make it possible to build routing modules that implement advanced load-‐balancing, high-‐availability failover, sharding, and other sophisticated database technologies to meet a broad range of application requirements.
3.2.4 Authentication Modules In addition to protocol translation and proxy routing modules, MaxScale offers proxy authentication of clients for database access. When a client connects to MaxScale, it authenticates and gains authorization to perform database operations with MaxScale. The proxy, in turn, authenticates with the back-‐end databases in the clusters it manages. MaxScale can reduce the load on back-‐end database servers by routing many client connections through a much smaller number of back-‐end connections, off-‐loading much of the authentication processing. MaxScale’s authentication modules dynamically update their tables as authorizations change. Stale authorization data doesn’t affect access, and is automatically refreshed as necessary. Authentication proxying also opens the database layer to alternative authentication mechanisms such as Kerberos, OpenStack Keystone, or other authentication technologies better suited to cloud deployments. Alternative authentication mechanisms are not yet available but may be added in the future.
3.2.5 Filtering and Logging Modules MariaDB MaxScale can log, audit, and filter individual client requests as they are routed through the proxy, enabling a whole range of management and security capabilities. For example, MaxScale can direct matching statements into log files according to specific criteria, and also can split the stream of requests and send them to multiple back-‐end services -‐ a “tee” splitter for database operations. Available filters can match statements that:
● Match all criteria -‐ pass-‐through. ● Match a regular expression. ● Take longer than a specified time to execute. ● Are initiated by a particular user or client IP address. ● Trigger exceptions.
Filtering works both on inbound statements and returned results from the back-‐end. Filters can be combined and chained together into a processing flow that allows for complex filtering logic. Regular expression filters can specify replacement strings which MaxScale can substitute for matched strings, a
10
simple but powerful form of query rewriting and transformation that opens up a number of sophisticated use cases. For example, MaxScale could allow applications written for different dialects of SQL or older, incompatible versions of MySQL to successfully interact with back-‐end databases by dynamically matching the incompatible SQL statement strings and substituting equivalent, compatible language elements. Beyond logging and auditing, MaxScale can implement a configurable firewall that blocks unauthorized access to records by clients identified either specifically or by regular expression pattern. This database-‐level firewall capability can prevent sensitive data from leaking to the wrong parties, without forcing applications to manage a complex matrix of roles and access levels, and offers more control than is possible with the database grant mechanism. MaxScale’s logging capabilities allow application developers to troubleshoot issues as they happen, collect statistics, and characterize performance under different load conditions.
3.2.6 Monitoring Modules MaxScale must maintain detailed state information about the back-‐end databases in its resource pool in order to correctly route requests, manage failures, and dynamically add and delete resources in cloud environments. MaxScale supports two monitoring modules. These are: MySQL/MariaDB Replication Monitor
● Can monitor both simple master-‐slave and also complex replication topologies including tree-‐structured, hierarchical architectures in which some servers are simultaneously both masters and slaves.
● Keeps track of the degree of replication lag for each slave server -‐ useful in statement based routing modules such as the read/write splitter to place an upper bound on potential read inconsistency.
Galera Monitor
● Can monitor the cluster status of all nodes in a Galera cluster, identifying those which are active, inactive, and not synced, and thus not a part of the active cluster.
● Can nominate one node of a Galera cluster as the master for the read/write splitter routing module, thus allowing a Galera cluster to operate in a simple HA mode with very fast failover in the event of master failure for applications unable to use the multi-‐master synchronous replication of Galera directly.
In future releases, MaxScale’s monitoring plug-‐in API, in conjunction with new Protocol modules, will allow the system to support additional back-‐end databases and access protocols.
4 Delivering Web-‐scale Scalability, Availability, Agility, and Performance
11
MariaDB MaxScale’s ability to proxy, translate, filter, and log client requests through modular plug-‐ins gives the system great flexibility in implementing specific application-‐driven use cases. With available, off-‐the-‐shelf plug-‐ins, MaxScale can facilitate some common use cases in MariaDB and MySQL clustering.
4.1 Read Scalability using MariaDB or MySQL Replication and Connection-‐based Load Balancing
Many web applications have a high ratio of read operations to updates. MariaDB or MySQL replication can help scale out read capacity by asynchronously replicating updates from a master server to a number of slave database servers that in turn can assume a share of the read request load. MaxScale simplifies this configuration by managing both the read and update connections for clients, and using connection-‐based routing, load-‐balances clients across any number of slave database servers using a round-‐robin balancing mechanism. In such a configuration, client applications are “replication aware”. They are designed to split updates and read requests across different database connections. Update requests must be directed to a master server, and read requests may be directed to any one of many slave servers for processing.
Connection-‐based load balancing does not examine the individual requests on a connection, and all of the routing computation happens once per connection set-‐up, making it particularly lightweight and high-‐performance. In the event of a failure of a slave server, MaxScale detects the problem and automatically removes the failed resource from the round-‐robin load balancing pool. Should the master server fail, MHA provides fail-‐over protection by promoting one of the slave servers to be the new master. MaxScale tracks these changes as well, adjusting its routing algorithms and resource pools accordingly. Connection load balancing can be adjusted using weighting parameters to optimize loads across different-‐capacity servers. MaxScale also can
use server weights to direct the traffic for certain applications -‐ for example, analytics -‐ to specific back-‐end slave servers, while maintaining fail-‐over to alternative slaves in the event of a problem.
4.2 Read-‐Write Scalability and HA Fail-‐Over using Galera Cluster and Connection-‐based Load Balancing
12
With MariaDB or MySQL Replication, applications must be “replication aware”. This puts an additional burden on developers, and may slow down the integration of new features or dynamic provisioning of resources in the event of demand spikes. In addition, MySQL Replication does not deliver true high-‐availability fail-‐over with no single point of failure. Galera cluster, available from MariaDB and also integrated into MariaDB Galera Cluster, solves both of these problems by implementing a multi-‐master, synchronous replication technology on top of MariaDB or MySQL. But load-‐balancing a Galera cluster requires the load-‐balancing method to be tightly integrated with the Galera cluster in order to monitor server status and adjust resource pools as node states change. MaxScale’s support for Galera clustering in its back-‐end protocol
handlers provides this integration. For applications able to use the “all-‐master” capabilities of Galera cluster, each client uses a single connection to the database layer through MaxScale, for both read and update operations. Application design is simpler, and MaxScale can load balance across the multiple master servers in the cluster using round-‐robin connection-‐based load balancing. MaxScale monitors the node status of every server in the Galera cluster, and only load balances connections from clients to nodes which are joined and fully synchronized with the cluster, eliminating “slave lag”, and allowing for fully redundant, high availability clustering that accommodates nodes dynamically joining and leaving the cluster to perform rolling updates and migrations as well as backups.
4.3 Read Scalability With MariaDB or MySQL Replication and Read/Write Splitting Using Statement-‐based Load Balancing
For applications that are not designed to split reads and updates across two different connections, and that require read scalability, MaxScale’s statement-‐based routing capabilities can do the read/write
splitting on behalf of the application. Such applications communicate with the database layer through MaxScale, using a simple, single-‐connection model for all database requests. MaxScale examines each SQL statement request from clients, and dynamically determines whether each request should be routed to the master server or round-‐robin load balanced to one of the slave servers. MaxScale’s sophisticated proxy capabilities push the complexity of read/write splitting into the database layer, bringing read scalability to a wider range of applications and eliminating the costly and complex work needed to make applications “replication aware”. As with the previous two examples, MaxScale continuously monitors the status of each server in the back-‐end cluster and routes requests only to servers that are fully operational and ready to process them.
13
4.4 Query Logging for Performance Diagnostics Using MaxScale Logging Modules
MaxScale filters can be configured to write all or a subset of queries into log files without altering the statements being processed by the proxy. Imagine a DevOps engineer utilizing this capability to diagnose performance issues across a web-‐scale application deployment. Using several MaxScale filters, the engineer could record all queries handled by MaxScale, as well as the top N longest-‐running queries, into separate log files. MaxScale would then:
1. Accept a query from a client application, 2. Forward the query to the back-‐end database,
log it into the “All Queries” log file, and log the query along with a time-‐stamp into the “Long Running Queries” log file.
3. Receive the result from the back-‐end, 4. Forward the result to the client, log the result
into the “All Queries” log, and if the timestamp difference between the result and corresponding query shows that the query was one of the top N longest-‐running queries, then log the result and time stamp into the “Long Running Queries” log file. Analysis of these logs could provide the engineer with detailed records of queries running in production which impact performance as seen by the client. Such filtering and logging coexists with other modules within
MaxScale. A similar approach could allow MaxScale to automatically forward queries to both the operational back-‐end database for online client transaction processing, and to an analytics data warehouse for reporting and analysis.
14
4.5 Query Transformation for Legacy Application Compatibility Using MaxScale Regex Filter Modules
Sometimes it is not practical to modify an application’s source code to make it compatible with the latest version of MariaDB or MySQL. For example, a legacy application may have been written using
the old, deprecated “CREATE TABLE … TYPE=” syntax from MySQL 5.1. MaxScale could substitute “ENGINE” for “TYPE” in all “CREATE TABLE” statements from that application, using regular expression matching and replacement. In such a scenario, MaxScale would:
1. Accept a query from the client application, 2. If the query matches the regular expression
“/CREATE TABLE/” then substitute “ENGINE” for “TYPE” in that statement, else pass the statement through the filter unchanged.
3. Forward the potentially transformed statement to a more recent back-‐end database.
4. Receive the result from the back-‐end. 5. Forward the result to the client.
Such query transformations could be used to handle deprecated syntax, or even translate between
alternative dialects of SQL supported by different databases. Complex processing pipelines including multiple filters that can split and transform queries are possible.
4.6 The Future: New Uses, New Flexibility for MariaDB and MySQL Through an Intelligent, Modular Proxy
By combining different plug-‐ins for protocols, filtering or transforms, routing, monitoring, and authentication, MariaDB MaxScale can adapt the cost-‐effective, open source MariaDB and MySQL database technologies to handle the most challenging web and enterprise IT applications. One planned enhancement will allow multiple MaxScale proxies to communicate and coordinate client requests across even larger database cluster configurations. Such configurations likely will deliver even more scalability and availability, including geographic replication for both caching and disaster resilience. Some additional ideas possible with MaxScale’s modular architecture:
Protocols ● Mapping application requests between NoSQL and relational models and back-‐end
databases, simplifying application development and letting DevOps engineers mix and match technologies to gain scale and cost efficiencies.
● Simplifying database migrations.
15
● Reducing the complexity of ETL, offering ETL alternatives, or potentially eliminating ETL operations in big-‐data analytics applications.
Routers
● Combining replication technologies into even more scalable and available clustering configurations – for example combining Galera cluster for updates with MariaDB 10 Replication for read scalability.
● Transparently pushing sharding into the database layer, splitting large tables using sharding technology and allowing for joins that span multiple shards.
● Implementing geographic replication for disaster resilience by coordinating multiple MaxScale proxies and duplicate clusters located in different data centers.
● Facilitating multivariate (A/B) testing methodologies by routing requests to alternative databases and logging results based on statistical test algorithms.
● Simplifying Big Data analytics by delivering real-‐time operational transactions to a Hadoop-‐based analytics infrastructure.
● Accelerating replication and offloading master database nodes through implementation of a binlog relay.
Filters and Transforms
● Adding intelligent firewall capabilities to the database layer based on characteristics such as role, location, time, frequency of request, on whitelists/blacklists, and other criteria, simplifying PCI or HIPPA compliance.
● Obfuscating private data in the database layer through transforms, implementing a key PCI compliance feature within the database layer. Such obfuscation replaces sensitive data with characters such as asterisks, preventing inadvertent disclosure of financial information such as credit card numbers as a result of application errors, for example.
Authentication Modules
● Mapping user identities authenticated through higher-‐level mechanisms such as PAMs, LDAP, OAuth, and cloud-‐based authentication protocols such as OpenStack Keystone or AWS IAM to back-‐end database identities and ACLs for database objects.
Monitoring Modules
● Automatically managing how back-‐end nodes are included or excluded from routing resource pools based on their monitored state.
● Driving intelligent alerting for database layer issues based on whole-‐cluster monitoring and decision support.
● Dynamic cluster reconfiguration to support on-‐line, zero downtime migrations, updates, backups, and server upgrades.
MaxScale’s modularity will enable use cases that we haven’t imagined. We hope and expect that published APIs for dynamically integrating new plug-‐ins, along with an open source core technology base will enable an active co-‐development and user community to form around MaxScale. We think that this exciting architecture has great potential to expand the application of the world’s most popular open source database technologies.
16
5 MaxScale Flexibility In Action: Implementation of a Replication Relay Using a Binlog Router
Web-‐scale applications using MariaDB or MySQL replication can run into some interesting issues with this technology. For example, Booking.com, a well-‐known and successful hotel booking site, often uses very wide replication topologies with 50-‐100 slaves replicating from a single master database. With MySQL replication, every update is requested by every slave, which can drive so much traffic from the master to the slaves that it saturates even very high-‐speed network interfaces deployed in parallel. Booking.com asked the MariaDB team to investigate whether MariaDB MaxScale could be deployed between the master and slave servers in a replication architecture, rather than between clients and back-‐end databases as is ordinarily the case. When used in this way, MaxScale relays binlog records between master and slave databases. Because MaxScale’s message processing core operates asynchronously by implementing a state
machine for each service, the team was able to build a non-‐blocking router using the standard MaxScale router APIs which treats the slaves as clients, the master as the back-‐end, and which stores and forwards binlog records to the slaves as they request them. MaxScale offloads replication processing from the master database server and allow standard MariaDB/MySQL replication to scale to much larger deployments and very high “fan-‐out” ratios of master to slaves. How does this MaxScale binlog relay work?
1. MaxScale registers as a slave with the master database,
2. The master then streams binlog records to MaxScale as it would any slave.
3. MaxScale stores these binlog records so that it can forward them to the actual slave databases as necessary.
4. The slave servers register with MaxScale, which behaves exactly as a master would, by
5. Streaming stored binlog records to the slaves either to catch them up to the master, or to maintain their consistency, all without imposing additional load on
the master database. A prototype binlog router has been delivered to Booking.com by the MariaDB engineering team, and is already showing promising performance and scalability when used as a replication relay. MaxScale was not originally designed to be used as a replication relay, but its flexible, high performance architecture is up to the task.
17
Booking.com and MariaDB worked together to create this novel solution to a problem at the bleeding edge of web-‐scale application architecture. The MaxScale binlog router is a great example of what is possible when a sophisticated customer works with a world-‐class database engineering team. Do you have very challenging problems to solve in cost-‐effectively scaling your applications? Could a general-‐purpose, modular database proxy be a powerful new solution to these problems? Would you like help from the team that created MariaDB and MySQL to solve your complex database challenges? Get in touch with your MariaDB sales professional today to open a conversation and see how the MariaDB team can solve your toughest database problems.
6 MaxScale Today The MaxScale proxy server is available now, as an open source project on GitHub. The project includes the following capabilities which also are available in the product’s supported GA release:
● Support for both physical hardware and virtualized environments, and the following operating systems:
o CentOS 5, 6, 7 64-‐bit. o RHEL 5, 6, 64-‐bit. o Fedora 19, 20, 64-‐bit. o Debian 6, 7. o Ubuntu 12.04 LTS, 13.10, 14.04, 64-‐bit. o OpenSuse 13.1 (coming soon).
● Installation and deployment using standard commands and text-‐based configuration files. ● A command-‐line administration tool supporting “shell mode” with command history and
editing, individual command-‐line processing, and standard integration with Linux shells through #! syntax.
● Complete reference and how-‐to documentation. ● Pacemaker + Heartbeat HA resource management support. ● The MaxScale message processing core, based on the high-‐performance Linux epoll system call.
Non-‐blocking state machine. Basic APIs for integrating modules. ● Protocol Modules:
o MariaDB and MySQL client protocols. o MariaDB and MySQL back-‐end protocols. o Back-‐end modules for MariaDB and MySQL replication clusters with MHA fail-‐over. o A back-‐end module for Galera multi-‐master, synchronous high availability clusters.
● Authentication Modules: o MySQL native authentication module -‐ proxies client access requests.
● Monitor Modules: o MariaDB and MySQL replication hierarchy monitor module, supports:
▪ Simple replication. ▪ Tree-‐structured hierarchical replication topologies. ▪ MHA fail-‐over. ▪ Sampling and monitoring of replication lag status of slaves. ▪ Server maintenance mode.
18
o Galera cluster monitor module, supports: ▪ Cluster membership and synchronization status for each member server. ▪ May be configured to elect a single “master” server for use with both client
applications and MaxScale routing modules designed for master/slave replication, but with fast, reliable fail-‐over and no write contention.
▪ Server maintenance mode. ● Routing Modules:
o Connection-‐based routing module, supports: ▪ Round-‐robin distribution of connection requests to master, slave, or Galera
synced cluster nodes. ▪ Supports Galera clusters in both master/slave and multi-‐master, synchronous
configurations. ▪ Tree-‐structured complex replication topologies. ▪ Weighting parameters for Individual servers, and for overall services may alter
round-‐robin connection distribution. ▪ Server maintenance mode.
o Statement-‐based Read/Write Splitter routing module, supports: ▪ Parsing, evaluation, and routing of individual SQL statements to read/write
(master), read/only (slave), or to all servers simultaneously. ▪ Handles routing of Prepared statements. ▪ Supports Galera clusters in both master/slave and multi-‐master, synchronous
configurations. ▪ Round-‐robin statement distribution, or distribution to the server with the fewest
overall or individual service connections, the fewest active operations, or the least replication lag.
▪ Tree-‐structured, complex replication topologies. ▪ Slave fault tolerance -‐ resilient to failure of staves for individual statements. ▪ Server and service weighting parameters may alter round-‐robin distribution. ▪ Optional maximum acceptable lag parameter may override routing decisions.
● FIltering and Logging Modules, including the filter and log processing pipeline and: o QLA query logging, writing copies of queries to user/connection specific log files, with
optional user name, source address, and regular expression filtering. o Tee query splitter, copying queries to an additional MaxScale service, with optional user
name, source address, and regular expression filtering. o Top query logging, retaining the longest running N queries in a buffer, and writing them
to a connection-‐specific log file when the connection is closed. Also may be filtered by user name, source address, and regular expressions.
o Regex query transforming filter, which matches and replaces content according to regular expressions, allowing SQL statements to be altered as they pass through MaxScale.
o Statement counting filter, counts the number of SQL statements relayed through MaxScale.
Beyond MaxScale V1.0, the development team is working on extending MaxScale in new directions. If you would like to experiment with some of these technologies with early releases, you will find the code available on GitHub. Some capabilities under development now include:
19
● Integration with WebYog’s MONyog database monitoring application, included with MariaDB Enterprise.
● A Galera-‐specific statement-‐based routing module that works with the cluster’s optimistic locking concurrency control to minimize the likelihood of transaction failures due to deadlocks.
● Integration with MariaDB Enterprise features such as the Notification Service. ● Schema-‐based Sharding: a router that detects explicit USE DATABASE statements or database
qualifiers in connection strings and routes cilent statements to a particular cluster (customer-‐requested feature).
● Hierarchical routing architectures: using multiple routing modules in a pipeline -‐ for example a sharding router to determine a shard cluster and a load-‐balancing router within that cluster.
● HTTPD Protocol module for testing REST-‐style client interfaces. ● A hinting mechanism allowing applications to dynamically control statement-‐based routers. ● A monitor for MySQL Cluster configurations. ● Filtering to RabbitMQ mailboxes. ● A multi-‐master monitor plugin. ● A Binlog router for configuring MaxScale as a high performance replication relay.
MariaDB MaxScale development is very much focused on the most critical issues our customers face in deploying web-‐scale applications. If you have ideas, suggestions for new or improved features, problem statements you’d like us to be aware of and consider, or code you would like to contribute, the team is eager to engage with you. Please contact us as we describe below, and participate in the evolution of this exciting new technology. The project is licensed under GPLv2, and we encourage interested contributors to use GitHub’s features to fork the project, evaluate it, improve and enhance it, and contribute those enhancements back upstream through pull requests.
● GitHub project: https://github.com/mariadb-‐corporation/MaxScale If you have questions about the code, or want to interact with the development team, you can join the conversation:
● Developer forum: https://groups.google.com/forum/#!forum/maxscale ● IRC channel: #maxscale on freenode.net
If you would like to download a pre-‐built and ready to try binary of MaxScale, you can visit the product download page on the mariadb.com website at:
● https://mariadb.com/my_portal/download or ● https://github.com/mariadb-‐corporation/MaxScale
The links to download binaries for supported operating systems are available on that page. We have prepared a collection of technical documents explaining the MaxScale architecture in more detail, and some how-‐to guides on testing it out in prototype installations. In addition, there are
20
several blog posts from both MariaDB and early evaluators of MaxScale, explaining the system in more technical detail. These documents can be found at: Documentation: https://www.mariadb.com/resources/guides-‐whitepapers Blog Roll: https://mariadb.com/blog-‐tags/maxscale http://markriddoch.blogspot.co.uk/ MariaDB MaxScale is a powerful product with its included plug-‐in modules and can solve many complex database architecture issues right out of the box. But MaxScale is much more than the pre-‐packaged modules included with the product. Its modular architecture holds the promise of even more power and flexibility, and the sales engineering and development teams at MariaDB are eager to work with customers interested in prototyping use cases with MaxScale and helping to drive development direction. Please contact your MariaDB sales team if you are interested in collaborating with us on this new technology: https://mariadb.com/about/contact
21
7 Conclusion Open source databases like MariaDB and MySQL deliver such compelling advantages to application developers that they have become the most popular database technologies in the world. But no technology can rest on its laurels. The demands placed on database infrastructure in an always-‐on, mobile, rich media Internet world are much larger today than the problems that these technologies originally were designed to solve. Not only must database technology scale to handle orders of magnitude more information, but the number of simultaneous clients, the complexity of their requests, their performance and availability expectations, and the pace of application development have all skyrocketed in the last decade. MariaDB and MySQL remain on the forefront of database innovation and are able to handle even the biggest problems of web-‐scale applications, but only with the help of new, innovative approaches to database architecture. MariaDB MaxScale, an intelligent, modular proxy for database infrastructure, is one such technology designed to enhance MariaDB and MySQL to handle these new challenges. Available as an optional add-‐on subscription for MariaDB Enterprise and MariaDB Enterprise Cluster, MaxScale is supported by the skilled team at MariaDB, and stands ready to tackle these new, webscale uses and stretch the boundaries of open source RDBMS applicability. We’re optimistic that MaxScale’s plug-‐in approach can be adapted to solve many of the most difficult challenges in database infrastructure, but we know that these solutions will likely come from a broad community of developers, DBAs, and open source contributors all working together and inventing new use cases and capabilities that can plug in to the MaxScale core. We’re eager to join with you in creating this community together, and seeing just how far we can go in adapting MariaDB and MySQL to web scale.