azure large scale deployments - tales from the trenches

44
CLD334a Aaron Saikovski Specialist Solution Architect – Microsoft Cloud Technologies Rackspace Australia T: @RuskyDuck72 E: [email protected] Deploying Complex and Large Scale Azure Environments – Tales from the Trenches

Upload: aaron-saikovski

Post on 21-Jan-2018

213 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Azure Large Scale Deployments - Tales from the Trenches

CLD334aAaron Saikovski Specialist Solution Architect – Microsoft Cloud TechnologiesRackspace AustraliaT: @RuskyDuck72 E: [email protected]

Deploying Complex and Large Scale Azure Environments –Tales from the Trenches

Page 2: Azure Large Scale Deployments - Tales from the Trenches

Agenda

Quick Intros

Large Scale Deployments

Subscriptions

Tagging

Storage

Networking

Automation

Monitoring

Questions

Page 3: Azure Large Scale Deployments - Tales from the Trenches

About me

Page 4: Azure Large Scale Deployments - Tales from the Trenches
Page 5: Azure Large Scale Deployments - Tales from the Trenches
Page 6: Azure Large Scale Deployments - Tales from the Trenches
Page 7: Azure Large Scale Deployments - Tales from the Trenches
Page 8: Azure Large Scale Deployments - Tales from the Trenches
Page 9: Azure Large Scale Deployments - Tales from the Trenches
Page 10: Azure Large Scale Deployments - Tales from the Trenches

SubscriptionsOne Subscription per environment -> Dev, Test, Prod

MSA and AzureAD Accounts -> subscriptions

Enterprise Agreement (EA) - > Consolidated billing

Restrict access to Prod (Yes Devs we are looking at you )

TIP#1: Use named accounts (AzureAD) instead of MSA and use MFA!!!

TIP#2: Use billing alerts at the subscription level to manage spend

Page 11: Azure Large Scale Deployments - Tales from the Trenches

Subscriptions

Page 12: Azure Large Scale Deployments - Tales from the Trenches

Source: https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits#subscription-limits

Key Subscription Limits

Page 13: Azure Large Scale Deployments - Tales from the Trenches
Page 14: Azure Large Scale Deployments - Tales from the Trenches

TaggingKey:Value pairs -> name resources

Link resources -> cost centre, business unit etc

Group common resources

Resource -> 15 tags Max.

Names -> Max. 512 characters

Value ->Max. 256 characters.

Page 15: Azure Large Scale Deployments - Tales from the Trenches

Tagging..contExamples:

Environment: Dev, Test, Prod

Build date

Cost centre

Owner

Azure “Classic” mode doesn’t support tagging

TIP#3: Automated shutdown of resources without tags. Save $$$

Page 16: Azure Large Scale Deployments - Tales from the Trenches

Tagging

Source: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-using-tags

Page 17: Azure Large Scale Deployments - Tales from the Trenches
Page 18: Azure Large Scale Deployments - Tales from the Trenches

Quick Storage Recap

Source: https://docs.microsoft.com/en-us/azure/storage/storage-redundancy

Page 19: Azure Large Scale Deployments - Tales from the Trenches

Storage AccountsDon’t overload storage accounts

Plan Pricing Tiers -> Performance

Premium storage -> Production workloads

Avoid single storage accounts

Standard storage -> MAX 500 IOPs per disk

Premium -> MAX 5000 IOPS per disk (P30)

TIP#4: Enable encryption when provisioning. Not after!

Page 20: Azure Large Scale Deployments - Tales from the Trenches

Storage Account NamingNaming of storage accounts -> Storage load balancing

Eg. ‘devstorageacct001’, ‘devstorageacct002’

Traffic bound to a partition server -> Rebalance -> performance hit!

Can have a big performance hit on VM workloads

TIP#5: Prefix storage accounts with a 3 digit hash (Unique)

Source: https://docs.microsoft.com/en-us/azure/storage/storage-performance-checklist

Page 21: Azure Large Scale Deployments - Tales from the Trenches

Storage Account Naming

Same cluster

Unique cluster

Page 22: Azure Large Scale Deployments - Tales from the Trenches

Managed Disks GA Announced Feb 8th 2017!

Removes storage account scale management

Easy migration path

Massive scale set support – 1,000 VMs

2000 managed disks per subscription

RBAC roles on disks

Managed Disks -> LRS only

Late Breaking!!!

Page 23: Azure Large Scale Deployments - Tales from the Trenches
Page 24: Azure Large Scale Deployments - Tales from the Trenches

Networking

Planning!!!

Overlapping IP ranges -> ExpressRoute, S2S VPN

Deploy and Redeploy -> Iterate

Keep it simple

Single VNet vs VNet Peering

GatewaySubnet -> /27 Address Space

TIP#6: Avoid Network Security Groups (NSGs) at the NIC level

Page 25: Azure Large Scale Deployments - Tales from the Trenches
Page 26: Azure Large Scale Deployments - Tales from the Trenches

Network Security Groups (NSGs)

Recommended!!

Page 27: Azure Large Scale Deployments - Tales from the Trenches
Page 28: Azure Large Scale Deployments - Tales from the Trenches
Page 29: Azure Large Scale Deployments - Tales from the Trenches

AutomationAutomate everything -> ARM, PowerShell, CLI

No manual changes

ARM is incremental

Tag resources

Resource groups & Tags for cost optimisation

Layer the deployment

Page 30: Azure Large Scale Deployments - Tales from the Trenches

Automation..contStore ARM templates in a private repository

Linked templates vs. layered ARM templates

Azure Automation for scheduled tasks

TIP#7: Keep your Azure PowerShell and SDK tools up to date

TIP#8: Lock ResourceGroups with ‘CanNotDelete’ lock level

TIP#9: Don’t store passwords in .param files -> use KeyVault!!

Azure

Automation

Bonus Tip: Staggered Automation runbook schedules -> PowerShell

Page 31: Azure Large Scale Deployments - Tales from the Trenches

Automation..Tips and TricksUse "location": "[resourceGroup().location]" as default resource location

Use subscription().id, resourceGroup().id for unique identifiers in variables

Use listKeys for dynamic value lookups:

…"[listKeys(resourceId('Microsoft.Cache/Redis', parameters('redisCacheName')), '2014-04-01').primaryKey

Page 32: Azure Large Scale Deployments - Tales from the Trenches

Automation..Tips and Tricks..contUse outputs for debugging:"outputs": {

"RedisSessionStateHost": {

"type": "string",

"value": "[concat(parameters('redisCacheName'),

•'.redis.cache.windows.net')]"

}

}

Page 33: Azure Large Scale Deployments - Tales from the Trenches
Page 34: Azure Large Scale Deployments - Tales from the Trenches

MonitoringOMS (Log Analytics) -> default used by Rackspace

Support -> subscription level

Lots of metrics are captured

Automated alerting -> Support ticket

Example Key VM metricsMalware signatures update status

Realtime protection

CPU average greater than 95 percent average over 5 minutes

Operating System Disk C = has less than 500 MB free space

Recovery vault backup failures

Page 35: Azure Large Scale Deployments - Tales from the Trenches

Monitoring..contInclude PaaS workloads – App Services, DocDB etc

AppInsights -> URL monitoring -> multiple test locations

Webhooks -> Azure Functions -> OMS Ingestion

TIP#10: OMS has a 15 minute indexing interval

Page 36: Azure Large Scale Deployments - Tales from the Trenches
Page 37: Azure Large Scale Deployments - Tales from the Trenches

OMS Query SamplesARM Deployments:

Type:AzureActivity AND (OperationName="Microsoft.Resources/deployments/write" OR OperationName="Microsoft.Resources/deployments/validate/action") | measure count () by ResourceId, ResourceGroup

Malware signatures out of date:

Type=ProtectionStatus AND (ProtectionStatusRank=250) AND (TypeofProtection="System Center Endpoint Protection")

Page 38: Azure Large Scale Deployments - Tales from the Trenches

OMS Query Samples..contSQL Azure: Average CPU utilization percentage greater than 80% over 10 minutes:

Type=sqlazure_CL MetricName_s=cpu_percent | measure max(Average_d) as DBCPU by DatabaseName_s interval 10minutes | where DBCPU >=80

Page 39: Azure Large Scale Deployments - Tales from the Trenches

Key Takeaways

TIP#1: Use named accounts (AzureAD) instead of MSA and use MFA!!!

TIP#2: Use billing alerts at the subscription level to manage spend

TIP#3: Automated shutdown of resources without tags. Save $$$

TIP#4: Enable encryption when provisioning. Not after!

TIP#5: Prefix storage accounts with a 3 digit hash (Unique)

TIP#6: Avoid Network Security Groups (NSGs) at the NIC level

TIP#7: Keep your Azure PowerShell and SDK tools up to date

TIP#8: Lock ResourceGroups with ‘CanNotDelete’ lock level

TIP#9: Don’t store passwords in .param files -> use KeyVault!!

TIP#10: OMS has a 15 minute indexing interval

Page 40: Azure Large Scale Deployments - Tales from the Trenches
Page 41: Azure Large Scale Deployments - Tales from the Trenches

Complete your session evaluation on MyIgnitefor your chance to WIN one of many daily prizes.

(image of prizes tbc)

Session evaluation

Page 42: Azure Large Scale Deployments - Tales from the Trenches

Visit Channel 9 to access a wide range of Microsoft training and event recordings https://channel9.msdn.com/

Head to the TechNet Eval Centre to download trials of the latest

Microsoft products http://Microsoft.com/en-us/evalcenter/

Visit Microsoft Virtual Academy for free online training visit

https://www.microsoftvirtualacademy.com

Continue your Ignite learning path

Page 43: Azure Large Scale Deployments - Tales from the Trenches

CLD334aAaron Saikovski Specialist Solution Architect – Microsoft Cloud TechnologiesRackspace AustraliaT: @RuskyDuck72 E: [email protected]

Deploying Complex and Large Scale Azure Environments –Tales from the Trenches

Page 44: Azure Large Scale Deployments - Tales from the Trenches

Microsoft Ignite