p i v o ta l g r e e n p l u m d a ta b a s e a w s ma r k ... · greenplum upgrades 16 gprelease...

28
Release Notes Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Updated: September 2018 Overview Pivotal Greenplum is deployed on AWS using a CloudFormation template that has been optimized for efficiency and performance. The offering here is either a Bring Your Own License (BYOL) approach or an Hourly rate. BYOL simply means a license for Pivotal Greenplum must be obtained directly from Pivotal Software before deploying the software using Amazon's resources. The Hourly rate adds a Pivotal Software cost in addition to the instance cost. Many AWS resources are automatically created in a single "Stack", which is a collection of AWS resources that are treated as a single unit. The Stack makes management of your Pivotal Greenplum Stack much simpler than manually managing each resource. Resources are clearly labeled with the Stack Name so you can easily identify resources on AWS that are used for Pivotal Greenplum. Major features of Pivotal Greenplum on AWS include "gpsnap" which uses AWS EBS Snapshots to provide a complete backup solution, "Self-Healing" by using an AWS Auto Scaling Group resource and Pivotal provided scripting, "gprelease" which provides automated database upgrades, and "gpoptional" which simplifies installation of optional packages. More details on these features are in these release notes. CloudFormation template version 2.4 is based on Pivotal Greenplum Database version 5.10.2. Overview 1 Deploying Greenplum on AWS 5 AWS CloudFormation Parameters 5 Stack Name 5 ClusterAvailabilityZone 5 ClusterInstanceCount 6 ClusterKeyName 6 ClusterNodeInstanceType 6 D2 Series 6 R4 Series 6 ClusterSSHLocation 7 Install Command Center 7 Install Data Science Python 7 Install Data Science R 8 Install MADlib 8

Upload: others

Post on 23-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Release Notes

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Updated: September 2018

Overview Pivotal Greenplum is deployed on AWS using a CloudFormation template that has been optimized for efficiency and performance. The offering here is either a Bring Your Own License (BYOL) approach or an Hourly rate. BYOL simply means a license for Pivotal Greenplum must be obtained directly from Pivotal Software before deploying the software using Amazon's resources. The Hourly rate adds a Pivotal Software cost in addition to the instance cost. Many AWS resources are automatically created in a single "Stack", which is a collection of AWS resources that are treated as a single unit. The Stack makes management of your Pivotal Greenplum Stack much simpler than manually managing each resource. Resources are clearly labeled with the Stack Name so you can easily identify resources on AWS that are used for Pivotal Greenplum. Major features of Pivotal Greenplum on AWS include "gpsnap" which uses AWS EBS Snapshots to provide a complete backup solution, "Self-Healing" by using an AWS Auto Scaling Group resource and Pivotal provided scripting, "gprelease" which provides automated database upgrades, and "gpoptional" which simplifies installation of optional packages. More details on these features are in these release notes. CloudFormation template version 2.4 is based on Pivotal Greenplum Database version 5.10.2.

Overview 1

Deploying Greenplum on AWS 5 AWS CloudFormation Parameters 5

Stack Name 5 ClusterAvailabilityZone 5 ClusterInstanceCount 6 ClusterKeyName 6 ClusterNodeInstanceType 6

D2 Series 6 R4 Series 6

ClusterSSHLocation 7 Install Command Center 7 Install Data Science Python 7 Install Data Science R 8 Install MADlib 8

Page 2: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Install PL/R 8 Install PostGIS 8

AWS CloudFormation 8 Create In Progress 8 Create Complete 8 EC2 Instances 9 CloudFormation Output 9

Connecting 10 SSH Access 10 Client Tool 11

Additional Resources 11 AWS Logs 12 Validation 12

Greenplum on AWS Debugging Problems 12 Auto Scaling Group 12 Quota 13

Greenplum on AWS Additional Features 14 Self Healing 14

Segment Healing 14 Standby-Master Healing 15 Master Healing 15

Snapshots 15 gpsnap 15 gpcronsnap 16

Greenplum Upgrades 16 gprelease 16 gpcronrelease 17

pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17 bouncer resume 17

Optional Installs 18 gpoptional 18

Patching Linux on AWS 18 Step 1 - Stop the database 18 Step 2 - Yum Update 18 Step 3 - Suspend the Auto Scaling Group 19 Step 4 - Restart the Instances 19 Step 5 - Start the Database and pgBouncer 20

2

Page 3: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Step 6 - Restore the Auto Scaling Group 20

Greenplum on AWS Technical Details and History 20 AWS Resources 20

AMI 20 Availability Zone 20 VPC 20 Subnet 21 Security Group 21 Gateway 21 Route 21 Placement Group 21 Autoscaling Group 21 Elastic IP 21 Storage 21

Root and Swap 22 Data Storage 22

Ephemeral Storage 22 EBS Storage 22

Masters Storage 22 Systems Manager Parameter Store 23 IAM permissions 23 Diagram 23

Version History 24 Version 2.4.0 24

Enhancements 24 Fixes 24

Version 2.3.1 24 Enhancements 24

Version 2.3 24 Fixes 24 Enhancements 24

Version 2.2 25 Fixes 25 Enhancements 25

Version 2.1 25 Fixes 25 Enhancements 25

Version 2.0 26 Fixes 26 Enhancements 26

Version 1.3 26

3

Page 4: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Enhancements 26 Version 1.2 26

Fixes 26 Enhancements 27

Version 1.1 27 Fixes 27 Enhancements 27

Version 1.0 28

4

Page 5: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Deploying Greenplum on AWS

AWS CloudFormation Parameters

Stack Name This identifies the Pivotal Greenplum Stack. Stack is an AWS term which basically means cluster and makes it easier to manage all of the resources used in the deployment.

ClusterAvailabilityZone An Availability Zone is an isolated area within a geographic Region. It is analogous to a physical data center.

5

Page 6: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

ClusterInstanceCount The total number of Virtual Machines in the Stack. When deploying with a Single-Node, database mirroring and the Standby-Master will be disabled. The default is Single-Node.

Description Equivalent

Single-Node 1 Node

2-Masters-2-Segments 1/8 Rack

2-Masters-4-Segments 1/4 Rack

2-Masters-8-Segments 1/2 Rack

2-Masters-12-Segments 3/4 Rack

2-Masters-16-Segments 1 Rack

ClusterKeyName This is the name of your AWS Private Key and it is used so you can ssh to the Master node after the Stack is created.

ClusterNodeInstanceType Amazon supports many different Instance Types but this has been limited to two different series with Ephemeral and EBS disk options. Instance series D2 and R4 are supported. The tables below shows the details of each instance type option. The default is r4.xlarge-EBS-6TB.

D2 Series Storage Optimized instances with local HDD ephemeral storage that is optimized for throughput. Ephemeral storage is lost if the nodes are stopped.

Instance Type

Storage Type

Storage Size

Memory vCPUs Network Speed

Use

d2.xlarge Ephemeral 6TB 30.5 4 Moderate Dev/Test

d2.2xlarge Ephemeral 12TB 61 8 High Dev/Test

d2.4xlarge Ephemeral 24TB 122 16 High Dev/Test

d2.8xlarge Ephemeral 48TB 244 36 10GB Production

R4 Series Memory optimized instances that are EBS storage only. EBS storage has been optimized for throughput.

6

Page 7: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Instance Type

Storage Type

Storage Size

Memory vCPUs Network Speed

Use

r4.xlarge EBS 6TB 30.5 4 Up to 10GB Dev/Test

r4.xlarge EBS Encrypted

6TB 30.5 4 Up to 10GB Dev/Test

r4.2xlarge EBS 12TB 61 8 Up to 10GB Dev/Test

r4.2xlarge EBS Encrypted

12TB 61 8 Up to 10GB Dev/Test

r4.4xlarge EBS 24TB 122 16 Up to 10GB Dev/Test

r4.4xlarge EBS Encrypted

24TB 122 16 Up to 10GB Dev/Test

r4.8xlarge EBS 48TB 244 32 10GB Production

r4.8xlarge EBS Encrypted

48TB 244 32 10GB Production

r4.16xlarge EBS 48TB 488 64 25GB Production - High Concurrency

r4.16xlarge EBS Encrypted

48TB 488 64 25GB Production - High Concurrency

ClusterSSHLocation This is the IP Address range that is allowed to connect to your Stack. You can use "0.0.0.0/0" but that means that every address will be able to ssh to the Stack so long as they have the public key. Consider using a more restrictive mask to prevent unwanted attempts to connect to the Stack.

Install Command Center Indicates if you would like the optional Command Center package to be installed or not. If you choose to skip this install initially, you can still run the optional install to install this package. Use gpoptional to install this package.

Install Data Science Python Indicates if you would like the optional Data Science Python package to be installed or not. If you choose to skip this install initially, you can still run the optional install to install this package. Use gpoptional to install this package.

7

Page 8: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Install Data Science R Indicates if you would like the optional Data Science R package to be installed or not. If you choose to skip this install initially, you can still run the optional install to install this package. Use gpoptional to install this package.

Install MADlib Indicates if you would like the optional MADlib package to be installed or not. If you choose to skip this install initially, you can still run the optional install to install this package. Use gpoptional to install this package.

Install PL/R Indicates if you would like the optional PL/R package to be installed or not. If you choose to skip this install initially, you can still run the optional install to install this package. Use gpoptional to install this package.

Install PostGIS Indicates if you would like the optional PostGIS package to be installed or not. If you choose to skip this install initially, you can still run the optional install to install this package. Use gpoptional to install this package.

AWS CloudFormation Deployment is very simple in the AWS Marketplace. Simply provide the parameters in the user interface and then submit the CloudFormation template to create the Stack.

Create In Progress During the Stack deployment, CloudFormation will indicate that the Stack is being created with CREATE_IN_PROGRESS status.

Create Complete After the CloudFormation template has finished, the progress will change to CREATE COMPLETE. Below is an example of two Stacks with one in each of the two statuses of CREATE_IN_PROGRESS and CREATE_COMPLETE.

8

Page 9: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

EC2 Instances Each instance contains a suffix in the name to indicate the role of Master, Standby, or Segment as shown below.

CloudFormation Output The Outputs tab of the CloudFormation Stack will have the connection information to the database once the Stack reaches the CREATE_COMPLETE Status. As shown below, the Output section will contain all of the information needed to start using your Stack. Note that the password shown below is randomly generated and not stored by Pivotal.

9

Page 10: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Connecting Connecting can be done with ssh or with a database client tool like pgAdmin 4. The CloudFormation Output for MasterHost, Port, AdminUserName, and Password used to connect to Greenplum.

SSH Access Use the SSH KeyName provided when creating the stack to connect with ssh. The message of the day provides detailed information about the Stack as shown below.

10

Page 11: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Client Tool Connecting with a remote client tool like pgAdmin 4 is also very easy to do using the Master public IP address and password provided in the CloudFormation Output.

Additional Resources Installation of Pivotal Greenplum on AWS includes detailed logs plus supplemental installs and validation scripts that can be executed after the initial installation is complete.

11

Page 12: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

AWS Logs Logs for the deployment of the Stack can be found in: /opt/pivotal/greenplum/rollout.log. A log file is on every node but the Master node will have more detailed information regarding the database initialization.

Validation Validation includes scripts to run industry standard benchmarks of TPC-H and TPC-DS. It also includes scripts to validate the disk and network performance of the Stack using the Pivotal Greenplum utility "gpcheckperf".

Greenplum on AWS Debugging Problems

Auto Scaling Group The creation of virtual machines for running Greenplum on AWS is managed by an AWS Resource called an Auto Scaling Group (ASG). The ASG manages its own logs too so your Stack may fail but not give you enough information as to why it failed but the ASG has much more detailed logs. If the Stack creation fails, delete the Stack and try again. On the second attempt, find the ASG for the Stack and examine the logs. The ASG will be named with the prefix of the Stack.

12

Page 13: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

The Auto Scaling Group may have to try multiple times to deploy all of the nodes requested in your Stack but if you see the status change to Failed, examine the output. As you can see in the error message below, Amazon does not have capacity in the Availability Zone chosen.

The Auto Scaling Group will retry multiple times until either it provides enough capacity or the Stack times out after 10 minutes to 1 hour depending on the number of nodes requested. It may take several Failed attempts to provide the capacity desired but Amazon may also not be able to. If the Stack does fail again because there aren't enough resources in the Availability Zone, delete the Stack and try to deploy again but in a different Availability Zone.

Quota Another problem you may encounter is a quota error. You may not have enough quota for any one of the resources used in the deployment but the most common two are number of instances and the disk storage. The ASG logs will indicate if there is a quota problem.

13

Page 14: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

- Go to https://console.aws.amazon.com/support/ to create a support ticket - Pick Service Level Increase - Pick the appropriate Region and Resource Type - Enter a value for the new Limit Type and Value

Greenplum on AWS Additional Features

Self Healing Greenplum on AWS products utilize an AWS EC2 resource called an Auto Scaling Group (ASG). This resource provisions all of the nodes in the Stack within a single Placement Group for optimal network performance. When a node stops, the ASG automatically provisions a new node to replace the failed node and terminates the old node. Starting with version 2.1, when a node gets replaced by the ASG, the initialization scripts will automatically self-heal the cluster. The roles for nodes are Master, Standby-Master, and Segment and based on the role, the self healing will execute different commands to restore the cluster to 100% operational status. All new nodes interact with the Systems Manager Parameter Store to retrieve the private and public keys for this Stack. This enables the new node to interact with the existing nodes in the cluster and be added without human intervention. Note that the EBS volumes on the affected nodes will also be replaced during the healing process. Also note that any snapshots you have created for the stack that experienced a self-healing process will have those snapshots deleted. This is because the mapping of snapshot to host gets broken. Be sure to take a new snapshot after a self-healing process completes.

Segment Healing In the unlikely event of a node failure on AWS, this is the most likely node to fail because of the number of Segment nodes will generally be more than the number of Masters. Secondly, the load on the Segments is far greater than the Masters so this may expose a hardware problem more readily than a Master and then the ASG may decide to replace the node. The Segment Healing process first executes "gprecoverseg", which is a Greenplum utility, to replace the failed node with the new node. This command is executed a second time rebalance the data to the new node but before this is done, pgBouncer is paused. Pausing pgBouncer allows currently running queries to complete but does not allow new queries to start. Once all current queries have stopped, then the rebalance starts. Once complete, pgBouncer is resumed.

14

Page 15: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Database activity may continue during the Segment Healing process and connections can still occur to pgBouncer.

Standby-Master Healing In the event that the Standby-Master were to fail, the healing process is pretty simple. A Greenplum utility called gpinitstandby is executed which replaces the failed Standby-Master with the new node. Database activity may continue during the Standby-Master Healing process.

Master Healing In the event that the Master were to fail, the new node executes a few Greenplum utilities. The process first will fail over to the Standby-Master followed immediately by a database shutdown. The Standby-Master is then returned back to a Standby-Master role and the new Master node is set as the Master. The database is up and operational at this point but the database statistics were lost in this process so users are still not allowed to connect. The Master Healing process next executes the analyzedb command on every database in the Greenplum cluster to gather the needed statistics. The pgBouncer load balancer is configured and started and then the Elastic IP address is re-assigned to the new Master node. Once the Elastic IP address is assigned, normal database activity may resume. Note that in the event of the Master Healing process, optional installs such as Greenplum Command Center, Madlib, PL/R, etc. will need to be re-installed. Also note that the randomly generated password is also recreated. This can be viewed in /opt/pivotal/greenplum/variables.sh on the Master node.

Snapshots Amazon EBS volumes have a snapshot feature which is very useful for quickly creating a database backup. The process is to first stop the database in order to get a consistent view of the database and then execute an aws command to create a snapshot for each disk volume. After the snapshots are created, the database is restarted. Each snapshot must be labeled correctly so that when a restore is desired, the volumes get attached to the right hosts and mounted as the right volumes. All of these requirements and more have been incorporated into AWS-only Greenplum utilities "gpsnap" and "gpcronsnap".

15

Page 16: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

gpsnap This utility manages creating, listing, deleting, and restoring snapshots. Please note that creating or restoring a snapshot will restart the database. Here are the list of parameters used with gpsnap: - gpsnap list: lists snapshots - gpsnap create: creates a new snapshot - gpsnap delete <snapshot_id>: deletes a specific snapshot - gpsnap delete all: deletes all snapshots - gpsnap restore <snapshot_id>: restores a specific snapshot

gpcronsnap This utility manages the automatic execution of gpsnap. By default, there will be a cron job that runs every 10 minutes and using the configuration file: /usr/local/greenplum-aws/conf/gpcronsnap.conf to determine if a snapshot is needed or not. gpcronsnap.conf #maximum number of snapshots; delete the oldest when max reached

max_snapshots=4

#snapshot day of week (1..7); 1 is Monday

#to specify daily, use (1 2 3 4 5 6 7)

snapshot_dow=(7)

#time of day to run the snapshot

#do not schedule a time where the snapshot may not finish until the next day

snapshot_time=04:00

As shown above, the default schedule is a weekly snapshot on Sunday at 4:00 AM in the local timezone. Four snapshots will be retained before the oldest snapshot will be automatically deleted.

Greenplum Upgrades

gprelease This utility upgrades a Greenplum on AWS cluster to the latest database release available. The tool automatically downloads the binaries, copies it to the hosts in the cluster, stops the cluster, installs the new version, and then starts the cluster again. The tool automatically executes gpoptional so that optionally installed packages are re-installed or upgraded to a compatible version.

16

Page 17: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

gpcronrelease This utility checks to see if a new release is available. By default, this runs in cron weekly on Sunday at 12:00 AM in the local timezone. If a new version is available, the message of the day is updated to indicate a new version is available.

pgBouncer This is a load balancing utility that is included with Greenplum. This utility allows far greater connections to the database with less impact on resources. It is recommended to use pgBouncer instead of connecting directly to the database. More information on pgBouncer is available in the Greenplum documentation. pgBouncer is configured to listen on port 5432 which is the default port usually used by Greenplum. Greenplum has been configured to listen on port 6432. Authentication has been configured to use "md5" which is encrypted password. Create users and assign passwords in Greenplum as normal and pgBouncer will authenticate users with the database passwords you set. Other authentication schemes such as LDAP can be configured with pgBouncer post-installation. Pooling has been configured for "transaction" with max client connections of 1000 and max database connections to 10. These settings can be changed but these defaults provide a good starting point for most installations. Configuration and logs for pgBouncer are located in /data1/master/gpseg-1/pgbouncer on the Master node. Lastly, the "bouncer" utility has been added to make it easier to start and stop pgBouncer.

bouncer start Starts pgBouncer. Run this on the Master host.

bouncer stop Stops pgBouncer. Run this on the Master host.

bouncer pause Pauses pgBouncer. Run this on the Master host.

bouncer resume Resumes pgBouncer. Run this on the Master host.

17

Page 18: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Optional Installs Many of the commonly used packages are included as optional installs. These packages can be installed during the initial deployment or after the deployment has completed.

gpoptional This utility simplifies installing optional components during the initial deployment and also after the deployment has been completed. Simply run "gpoptional" to see the optional installation options. This tool is also used in conjunction with gprelease to upgrade or reinstall already installed optional packages.

Patching Linux on AWS Over After you have deployed your cluster in AWS, you may need to patch the operating system to address vulnerabilities and in order to do that, you have to take a few extra steps compared to an on-premise installation.

Step 1 - Stop the database Login as gpadmin to the Master node and then execute "bouncer stop" to stop pgBouncer. Next, execute the "gpstop" command to stop the database.

gpstop -a

bouncer stop

Step 2 - Yum Update Still logged into the Master nodes as gpadmin, use "gpssh -f all_hosts.txt" to connect to all hosts via ssh. The all_hosts.txt file is in the gpadmin home directory on the Master node. Execute the command to run the yum update. gpssh -f all_hosts.txt "sudo yum update -y"

exit

Note: Be patient as this executes on all nodes in parallel. You will not see any output until the command completes on all nodes.

18

Page 19: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Step 3 - Suspend the Auto Scaling Group The nodes in a cluster are deployed with an Auto Scaling Group (ASG) which also aides in the self-healing activities when a node fails. The ASG will automatically replace a node even when you stop it. To performance maintenance activities, you must suspend the ASG. First, find the ASG for your deployment. The name will be prefixed with your Stack name.

Next, click Edit and then add all processes to the Suspended Processes text box as shown below and then save.

Alternatively, you can use the AWS CLI from your local computer.

stack_name="ASG-Test-1"

group_name=$(aws autoscaling describe-auto-scaling-groups \

--query 'AutoScalingGroups[].[AutoScalingGroupName]' --output text | grep $stack_name)

aws autoscaling suspend-processes --auto-scaling-group-name $group_name

Step 4 - Restart the Instances Connect with ssh as gpadmin on the Master and restart the nodes. gpssh -f all_hosts.txt "sudo shutdown -r now"

exit

19

Page 20: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Step 5 - Start the Database and pgBouncer Wait a few minutes and then connect to the Master node as gpadmin with ssh. Use gpssh to verify that all hosts are up. gpssh -f all_hosts.txt "uptime"

After confirming all hosts are up, use "gpstart" and "bouncer start" to start the database and the connection pooler.

gpstart -a

bouncer start

Step 6 - Restore the Auto Scaling Group Remove all suspended activities from you ASG either from the Web console or with the CLI. Execute this from your local computer. stack_name="ASG-Test-1"

group_name=$(aws autoscaling describe-auto-scaling-groups \

--query 'AutoScalingGroups[].[AutoScalingGroupName]' --output text | grep $stack_name)

aws autoscaling resume-processes --auto-scaling-group-name $group_name

Greenplum on AWS Technical Details and History

AWS Resources

AMI The Amazon Machine Image or AMI uses CentOS 6.9 with Hardware Virtual Machine (HVM) enabled. The HVM EC2 Instance is required to enable enhanced networking. The AMI has all the software packages and pre-reqs for installing Pivotal Greenplum and necessary add-ons including Intel-enhanced networking drivers.

Availability Zone AWS Data Center locations are independent of one another. Pivotal Greenplum is deployed within a single AZ to ensure the best possible performance.

20

Page 21: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

VPC A virtual network is used to logically isolate instances on AWS. This CloudFormation template uses a single VPC. While a VPC can span multiple Availability Zones, this CloudFormation template uses only one. It is strongly recommended to use a dedicated VPC for Pivotal Greenplum to make management easier and to make sure the required VPC settings are made. To connect existing AWS resources from a different VPC, use AWS "VPC Peering".

Subnet Specifies the IP address range for nodes deployed in the VPC. This is the IP address range used by the Interconnect traffic for Greenplum. DHCP is used to assign IP addresses to nodes deployed in the Subnet.

Security Group Defines the Protocol, Port Range, and Source for traffic associated with the VPC. All traffic between nodes is allowed and ports 5432, 22, 28080, and 28090 are available to the SSH location chosen.

Gateway Creates a network path to the Internet from your VPC.

Route The route is modified to allow traffic to the SSH Location specified on the Ports defined in the Security Group.

Placement Group Groups virtual machines together to reduce network traffic.

Autoscaling Group Deploys and assigns a public facing IP address to the nodes for the Stack. The minimum, maximum, and desired capacity of the auto-scaling group are set to the same value. Nodes are configured with dedicated tenancy for Production designated instance types while Dev/Test nodes are configured with the default, shared tenancy.

Elastic IP This is a static IP address that is used for connecting to the Master node. In case of a Master node failure, the new node will also get assigned the Elastic IP address. More details on this process is discussed in the Healing section of this document.

21

Page 22: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Storage

Root and Swap Storage for the root partition is fixed at 8GB and uses a GP2 disk which is the recommendation from Amazon. The swap partition also utilizes a GP2 disk and the size is set using this calculation: round(sqrt(RAM)).

Data Storage Disks are mounted with "rw,noatime,nobarrier,nodev,inode64,allocsize=16m 0 2" and blockdev read ahead of 16385. The scheduler is set to deadline. There are also two storage options which are Ephemeral and EBS storage.

Ephemeral Storage

Ephemeral storage is available on the D2 series instances. This instances have 3, 6, 12 or 24 disks local to each virtual machine. These disks are configured with RAID 0 in RAID groups using "mdadm" utility. These disks perform very well but have drawbacks. Stopping the instance will result in losing all data on these disks so use Ephemeral storage with caution. It has been observed that more RAID 0 groups performs better than fewer but with diminishing returns beyond 4 RAID 0 groups. Therefore, when using ephemeral storage, there are up to four data drives on each node. The instance types d2.xlarge and d2.2xlarge have 1 and 2 RAID 0 groups respectively. The instance types d2.4xlarge and d2.8xlarge have 4 RAID 0 groups.

EBS Storage

The EBS storage option has been configured with ST1 disks which are optimized for throughput instead of IOPs. The size of the EBS volumes perform better at larger size and more disks per node performs better than fewer. Therefore, the storage is configured with one to four 12TB volumes based on the instance type chosen. The EBS Encrypted option is configured exactly like the EBS storage option but with the Amazon Encryption option enabled. There is a small performance hit when using EBS Encrypted versus EBS. Pivotal recommends using EBS storage as it has been tuned to perform as well as Ephemeral storage and data won't be lost if you stop the nodes in the Stack. In addition to ST1 being optimized for throughput and thus best for Greenplum, this disk option is less expensive than other disk options that are optimized for IOPs.

Masters Storage Nodes in the Stack are deployed with an Auto Scaling Group which means all nodes are identical. However, the data storage needs on the Masters is much less than on the Segment nodes. This increases the EBS storage costs unnecessarily so during the provisioning of Master nodes, data volumes 2 through 4 are removed.

22

Page 23: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Systems Manager Parameter Store This is a secure key value pair repository designed to store configuration and data management information about AWS resources you deploy. During the installation process of Greenplum on AWS, new private and public keys are created and shared on every node in the cluster. The keys are also stored in the parameter store so that new nodes in the Stack can interact with the existing nodes in a secure fashion and without human intervention. Also, when new Stacks are deployed, a reconciliation process runs to remove any keys no longer needed.

IAM permissions IAM Instance role permissions are required for creating AWS resources for this deployment. The required permissions in the policy include: "ec2:CreateTags", "ec2:DescribeInstances", "ec2:DescribeInstanceStatus", "ec2:DetachVolume", "ec2:DescribeVolumeStatus", "ec2:DeleteVolume", "ec2:DescribeVolumes", "ec2:CreateSnapshot", "ec2:DeleteSnapshot", "ec2:DescribeSnapshots", "ec2:CopySnapshot", "cloudformation:*", "ec2:AssociateAddress", "s3:Get*", "s3:Put*", "s3:List*", "autoscaling:*", "ssm:*"

Diagram

23

Page 24: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Version History

Version 2.4.0

Enhancements - Upgraded Greenplum to 5.10.2. - Upgraded Greenplum Command Center to 4.3.1 - Added bouncer pause and resume functions - Added Self-Healing - Renamed gpupgrade and gpcronupgrade to gprelease and gpcronrelease - gprelease and gpcronrelease enhanced for better integration to optionally installed components - gpoptional tool created to make it easier to install optional components and also upgrade existing components.

Fixes - Fixed loop during Message of the Day creation on the Master during a self-healing event.

Version 2.3.1

Enhancements - Upgraded Greenplum Command Center to version 4.2.0

Version 2.3

Fixes - Corrected the number of segments per host for r4.16xlarge and EBS Encrypted drives - Updated a script for slight change in behavior from "aws ec2 describe-instances" CLI command that initially returns "None" for the Stack name.

Enhancements - Upgraded Pivotal Greenplum database to version 5.9 - Upgraded Operating System to CentOS 7.5 - Increased root volume size from 8GB to 32GB - Wrapped message of the day on ssh login to 80 characters wide - Revised pgBouncer default settings and added "bouncer" utility - Upgraded Greenplum Command Center to version 4.1.1 - Added Data Science Python and R packages as optional installs - Improved gpsnap performance - Pause pgBouncer on self-healing segment recovery so that active queries won't be cancelled

24

Page 25: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Version 2.2

Fixes - Patched the operating system for the Meltdown and Spectre vulnerabilities. - When a single node install had the node replaced because of failure, the auto-healing feature wouldn't re-initialize the database and left the single node in an unusable state. - Increased the MaxStartups sshd configuration to prevent unwanted failed ssh attempts from preventing the cluster from initializing.

Enhancements - Manage EBS Snapshots with two new tools; gpsnap and gpcronsnap - Manage Greenplum Database upgrades with two new tools; gpupgrade and gpcronupgrade - Optional installs are now available as parameters to the Cloud Formation Template and visible in the Marketplace. Optional installs are still available post-installation. - Added Postgis as an optional install. - Removed csv files used during the installation that contained the public and private keys. - Updated the temporary gpadmin password to a more secure value. Password authentication is only temporarily enabled during the installation process. - Store the randomly generated admin password in the AWS Parameter Store so that it can be retrieved in case the Master fails. - Updated the ip_local_port_range to start at 10,000 so the database interconnect doesn't interfere with gpfdist ports.

Version 2.1

Fixes - r4.xlarge, r4.2xlarge, and r4.4xlarge encrypted disk options did not properly set encryption on. - Resolved installation issue for optional Command Center where excessive log files were created in the gpadmin home directory.

Enhancements - Upgraded to GPDB 5.2. - Stack is now self-healing. If the Auto Scaling Group replaces a bad node, the node will automatically be replaced and added back into the Greenplum cluster properly. An Elastic IP address keeps the IP address the same if the Master were to fail. - Configured pgBouncer load balancer to be the default connection mechanism for clients. - Upgraded optional installs of Command Center, Madlib, and PL/R to the latest versions. - Dynamically set swap size based on the instance type chosen. - Removed /data[2-4] volumes from the Master and Standby. - Snapshot permissions added to the IAM permissions list to make it easier to automate taking EBS snapshots for a backup. - Yum update performed.

25

Page 26: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

Version 2.0

Fixes - Fixed MOTD calculation for available disk space for single node installs. - Fixed issue where /etc/rc.local was restarting the network after CloudFormation script finished. This sometimes caused status checks to fail in AWS for a single node in a Stack and then the node would get terminated.

Enhancements - Upgraded to GPDB 5. - Enhanced Stack Output to have more detailed information including username, password, Master node, ssh location, instance type, instance count, ssh key name, availability zone, additional installs path, validation scripts path, and the Pivotal API token value. - Modified wait in deploying nodes to make sure both AWS checks pass for each node before assigning roles and installing the database. - Added d2.xlarge, d2.2xlarge, r4.xlarge, r4.2xlarge, and r4.4xlarge instance types. - Dynamically set instance tenancy based on instance type. 8xlarge and 16xlarge large nodes are deployed in dedicated tenancy while the others are in shared. - Removed API Token parameter. - Enhanced optional installer to use local copies of the install files rather than dynamically downloading it.

Version 1.3

Enhancements - Upgraded to CentOS 6.9. - Updated Greenplum version to 4.3.14.0. - Added support for Enhanced Networking that uses the ENA driver. Now both Enhanced Networking drivers are installed and will AWS will automatically use the correct driver. - Added support for r4.8xlarge and r4.16xlarge instance types. - Added RaiseError condition to make debugging errors easier. - Changed API Token to be optional per Amazon request. - Changed root and swap partitions from 500GB and 50GB respectively to 8GB each. This reduces the cost to customers using AWS as it was determined Greenplum doesn't need that much space for root and swap disks. - Added build_ami directory with scripts used to prepare a minimal install image of CentOS 6.9.

Version 1.2

Fixes - Determined that launch index in an autoscaling group isn't always unique. When the autoscaling group has to replace a node or has to retry to launch enough nodes, the launch index gets reset

26

Page 27: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

to 0. The launch index was being used to assign the Master, Standby-Master, and Segments. The fix is to wait until all of the nodes have been launched and create an index based on the private ip address . When the autoscaling group has to replace a node or has to retry to launch enough nodes, the launch index gets reset to 0. The launch index was being used to assign the Master, Standby-Master, and Segments. The fix is to wait until all of the nodes have been launched and create an index based on the order of private ip addresses in the Stack. - Lowered gp_vmem_protect_limit for d2.8xlarge to account for 12 Segments per node.

Enhancements - Updated Greenplum version to 4.3.13.0. - Updated tpch.sh and tpcds.sh for slight changes to the scripts. - Enhanced check_disk.sh performance validation script to dynamically find the data directories and to use all_hosts instead of segment_hosts. - Enhanced check_network.sh performance validation script to use all_hosts instead of segment_hosts. - Combined Instance Type, Disk Option, and Disk Size into a single parameter. This resolves the dependent parameter issue with Disk Size when using Ephemeral disks. This allows makes it possible to support EBS Storage Only instance types. - Added Availability Zone parameter because AutoscalingGroup isn't smart enough to automatically find the AZ that has the capacity available. This allows a user to try other AZs in order to successfully deploy the stack. - Added API Key check to the MOTD generation.

Version 1.1

Fixes - Created default database. - Set gpadmin password. - Added entry to pg_hba.conf file to allow encrypted password connection from remote nodes. - Removed Command Center reference on Segment nodes. - Removed ephemeral disks when EBS is chosen. - Fixed disk sizes. data1 is smaller than data2-4. - Fixed permissions for /usr/local and /usr/local/greenplum-db/*. World writable permissions set for most all files.

Enhancements - Uses st1 disks instead of gp2 for better performance. - Allows for 18 nodes so this is equivalent to a full rack. - Made Command Center an optional install. - Added API Token as a parameter to be able to track customers and allow them to automatically download optional installs. - Removed redundant parameter for ClusterName and using Stack Name instead. - Removed i2.8xlarge instance type because it is more expensive than d2.8xlarge and not as fast.

27

Page 28: P i v o ta l G r e e n p l u m D a ta b a s e A W S Ma r k ... · Greenplum Upgrades 16 gprelease 16 gpcronrelease 17 pgBouncer 17 bouncer start 17 bouncer stop 17 bouncer pause 17

Pivotal Greenplum Database AWS Marketplace v2.4.0 Release Notes Release Notes

- Added d2.4xlarge for Dev/Test purposes. - Simplified scripting to a single execution of a script stored in the AMI. This script then calls all scripts to complete the install. Scripts are also numbered 001, 002, 003, etc to make the execution order obvious. - Enhanced the xfs mounting options for better performance. - Changed the number of Segments per node from 8 to 12 for d2.8xlarge while keeping 8 for d2.4xlarge. - Simplified template for easier maintenance. - AMI can dynamically download scripts from S3 to make development easier. - AMI can disable execution of the scripts to make development and debugging easier. - Added Encrypted EBS option. - Added Message of the Day on the Master node that explains how to use the database. - Updated gp_vmem_protect_limit based on instance type - Updated sysctl.conf vm.overcommit_ratio from the default of 50 to 95.

Version 1.0 Initial release.

28