Azure Stack Assessment

aka “best practices” if you’d like to call it that

Hello everyone. Often I walk into Azure Stack environments where they have “been there, done that”. That is, the customer has gone beyond the core installation and configuration from original equipment manufacturers (OEM), such as by DellEMC, Lenovo or HPE. All of the steps up to this point are well documented on this Overall Timeline for an Azure Stack Datacenter integration.

Once the OEM has left, the customer may have installed some resource providers (RP), done Azure Stack registration, which enables marketplace syndication and created things like offers, plans & quotas.

Therefore, the customer turns to me and says “Did we do this right”? Or “are we following “best practices”? And I can’t resist the standard consultant’s response “it depends”. Nonetheless, in this post what I want to start collecting and collating are things to take a look at as an Azure Stack Operator in your shiny new object known as an Azure Stack Integrated System.

While I am sure I will not cover everything here, consider this a checklist for what you’ve done. And also a checklist for things the need to be done. Yes, this is all on Azure Stack Documentation. So what’s different here? I’m trying to put a little more order of precedence and narrative to the information, along with the reference links not only to Azure Stack docs, but also other related docs in Azure Docs and elsewhere. Any verbiage within this post is not the word of Microsoft, but a sharing of my experience in working with many customers in the field. I hope it’s useful to you.

Vendor Cleanup

It’s just like when there construction workers are all done in your kitchen, have left and then you notice they didn’t sweep the floor and they didn’t throw away the boxes. Likewise these are some things I like to double check once the OEM has left. Now you can check

  • Have all passwords they left you with been changed. Or did you change them afterwards? If not, the OEM still has them
    • Deployment Virtual Machine (DVM) still exist?
      • When the vendor leaves, this should be destroyed. It holds the keys to the kingdom – in order to build out an Azure Stack Integrated system. It is the responsibility of the OEM to delete this. While most do, I have walked into customers to find this still on the hardware lifecycle host (HLH). If it is there, you will find it within Hyper-V manager as a guest machine, along with the OEM tools installed on other guest VMs – which are ok.
    • Hardware Lifecycle Host (HLH) Password
      • Has it been patched and updated by the vendor before they left? Some do. Some not so good.
    • CloudAdmin password
      • For a little DR – did you make a secondary CloudAdmin through the PEP?
    • OEM Out of Band Management tools’ passwords such as for
      • DellEMC – iDRAC
      • etc for HPE, Fujitsu

Azure Stack Roles and Responsibilities

First things first! Who’s on first. Let’s have some governance in place so that no harm is done intentionally or accidentally. If you are lucky to be the sole administrator and do everything, then you assume all roles and hence – all risk! But if you are part of a slightly larger organization and have multiple people/departments, then you may want to have separation of duties and least privileges to the degree possible with Azure Stack Administration. Define these roles first. Then build your administrative workstation to get moving onward.

Inside the Box versus Outside of the box

In any cloud conversation, I like to mention this theme . You will get people that will demand “I need to have the Role Based Access Control (RBAC) Owner role for my VM”! I’ll ask them “what is their job role”, and they say they run the domain controllers in the cloud. I politely say to them NO (to granting “Owner”), but you only need to log into your server which is “Inside The Box”. Any Azure / Stack resources in either portal are controlled by RBAC and those are “Outside the Box”. That is you have some degree of control for those Azure/Stack resources, but may not necessarily need to login a domain controller. But, if you need to run a template to deploy it, add disks or storage, etc., those are things cover by “Outside the box” and thus done in the portal. Ask people what they need to do their job. Then design your RBAC only to the degree for least privileges and separation of duties as needed, if at all, for “Outside the box”.

Outside the box

  • Azure AD Global Administrator account –connected scenario only
    • In general, this is not needed for day to day administration of Azure Stack. This should not be an account used by the Azure Stack Operators. Don’t forget your separation of duties! But here are the cases when it will be needed
      1. During initial installation of the Azure Stack fabric with the OEM
      2. During some Resource Provider installations. An example would be where you need to create an Azure AD application in a connected environment for Web Apps. This account is also used to setup both SQL and My SQL resource providrs
      3. Otherwise for typical Azure AD administration to add other users or groups into Azure Active Directory. These users and groups can then be used for RBAC roles.
  • Azure Stack Operator
    • Responsible for operating Azure Stack infrastructure end-to-end planning, deployment & integration, packaging and offering cloud resources and requested services on the infrastructure​
    • Want to get officially certified on this? The current exam 70-537 is good until the end of 2019. It is my understanding that later a replacement exam will be developed. I have a blog of study links to help you prepare as well. aka.ms/70-537
  • Azure Stack Cloud Administrator
    • Once the system is deployed, this is a sacred account as this is what is required to access the Privileged Endpoint (PEP).
      RECOMMENDATION
      1. Change the password as noted above, post vendor visits. This is done from the PEP using Set-CloudAdminUserPassword
      2. Create a secondary Cloud Admin Account from the PEP using New-CloudAdminUser.
  • Cloud Architect – see Azure Stack Architecture Solution Architectures
  • Cloud Administrator
  • DevOps – this is a role and or team that should own this. NOTE: this is not a specific RBAC role or permission, but more of a role categorization for your Azure Stack operations as in “Someone or a team does DevOps”. In doing so, there may be special RBAC roles and or Graph permission needed in order to run scripts, pipelines, etc. This could be an entire blog in and of itself. If I find one I’ll link it.
    • Continuous Intergration, Continuous Deployment (CI/CD)
    • Infrastructure as Code (IaC) – think deploying ARM Templates

Inside the Box i.e. in the Virtual Machine

  • Local Admins
  • Domain Admins
  • Virtual Appliance admins
  • Administrators for 3rd party tools once you login to those virtual machines
  • You get it, right?

 

Administrative Workstation

One thing to note about this section – other than the Azure Stack Quickstart templates, most everything in this section would also be good advice for administering an Azure Cloud environment as well.

Workstation

Every organization has their methods and requirements to lock down a workstation. I connote possible cover all of the possibilities here, but will share some thoughts and references to help out.

  • Overall guidance with many links within are provided at Securing privileged access. This document should be reviewed in depth to determine which features need to implement to satisfy your regulations and compliance requirements for your administrative environment.
    • Privileged Access Workstation (PAW) – is described also in the document above.
    • Operating System
    • Administrative Tiered model of administration
      • This model is not for the faint of heart. In other words, this is an administrative model designed for the securest of environments, the governments and military, where their predominant administration live in a disconnected world where more modern approaches cannot yet apply.
      • For most organizations I would not recommend this model, particularly if connected to the cloud. But based on the points in the preceding paragraph it may apply in your scenario.
      • In environments that have disconnected internal spaces or need to provide a secure barrier to ingress and egress of things like safe downloads from the internet, the article above also discusses Clean Source for installation media.

Azure Stack Tools

This is definitely well documented, but I will reference the links in order, plus a couple others scattered throughout the Azure Stack Docs that you also want to have installed. Ideally, these would be installed on the PAW as described above. Everything to be downloaded would be done through the Clean Source media as described above if following all principles of a secure workstation, particularly in a high secure environment.

  1. Install PowerShell for Azure Stack
  2. Download Azure Stack Tools from Github
  3. I like to have Azure Stack Data Transfer Tools installed also to improve managing and moving files from the admin workstation. This is particularly useful if you are in a limited or disconnected environment. In particular from that list I like to have
  4. Connect to Azure Stack with PowerShell as an operator
  5. Connect to Azure Stack with PowerShell as a user
    • Register Resource Providers – Resource providers aren’t automatically registered for new user subscriptions that don’t have any resources deployed through the portal. You can explicitly register a resource provider by running this script. As a provider you can run this when you provision their tenant subscriptions.
  6. Create development environment – see Set up a Development Environment for Azure Stack
  7. See also Develop templates for Azure Stack.
    • Do you have a copy of the Azure Stack Quick Start Templates? This is the fastest way to start with ARM templates that you know will work. Then you can customize them from the sample templates to build out more comprehensive solutions to import into the Azure Stack Marketplace. You can click the green Clone or Download button to have these samples on your clean workstation.
    • Need some help doing this the first time? See Tutorial: Create a VM using a community template
    • IMPORTANT! ARM Templates are Infrastructure as Code. Treat the templates like code with versioning and storing in a code repository.
  8. Many people are using Terraform. If you are too, then get the Terraform Provider for Azure Stack.
  9. Are you also building an Azure Stack Development Kit (ASDK)? Then you definitely will want Matt McSpirit’s most excellent scripts to configure it all for you once you build it!
  10. Azure Stack SCOM Management Pack – if you use System Center Operations Manager (SCOM), then you want this add-on! If you don’t have SCOM, well that is someone else’s blog 😉
    • Note: this along with the next tool are options for management across multiple Azure Stack Stamps = one rack!
  11. A new partner which does a really nice job for Multiple Azure Stack Stamp management is Cloud Assert.
  12. Don’t have SCOM and don’t want to get it? Nagios is another fine tool to consider

Gateway

You have a secured administrative workstation. How do you access that from on-premises? Or how does you access Azure Stack if the admin workstation is in the cloud? Again, you may have this all figured out, but here are some of the options to consider. NOTE: a VPN gateway of type “Express Route” is not supported from within Azure Stack, only type VPN.

Connect Azure Stack to Azure using VPN

Configure IPsec/IKE policy for site-to-site VPN connections

 

Health

Quick Ways to check Health

While there are many more sophisticated tools and techniques for monitoring, SIEM & SCOM integration, Health and Alerts in Azure Stack is designed first to be easy and highlight just what you need, just in time. When the OEM is done with your build, they will have done these basic tests. But if it has been a while since they came by or else it’s been a while since you had a support call, and Microsoft Support had you run some of these simple checks, then here are a couple quick things you can do from time to time to assess the health. As another proactive step, before and after any updates, these are good things to check as well 😉

  • Test-AzureStack
    • This will require going into the Privileged End Point (PEP)
    • But the advantage is that this is that it not only is a very comprehensive test on the health of the Integrated System, but also can be focused on “Cloud infrastructure tests” if there is a specific scenario you want to validate.
    • Additionally on the link above, you can dump various logs. WARNING these can get vary large! But they can be filtered out for use by support. They will also have tools in many cases to parse through these files as needed. So generally you don’t need to go there. Because even if you could crack a log file open and understand what it says is wrong, most likely it is something “under the hood” where you would still have to contact support to access the privileged environment.
  • Monitor Health in the Portal
    • In the REGION management tile of the Azure Stack Operator Portal, there are quite a few quick and simple KPI like tiles to give easy to understand signals.
    • View Health of the Resource Providers and Infrastructure Roles shows a simple dashboard of those two tiles.
    • How about Quick Insights? Perhaps it is buried in Docs (I can’t find it), but this nifty little button really satisfies. It is in both Azure and Azure Stack. You have to go to Monitor, then select the activity log. Then there is a little button that looks like this…
  • If you click on the Quick Insights button above, then a super simple window pops up! If only I could pin that to my dashboard…..couldn’t find a way to do that…..but as you see above, if you have any filters enabled, you can “pin current filters” as shown above. Below, you see the Quick Insights summary. FYI -this exists in Azure as well.

Patch & Update – Azure Stack

  • By the time many vendors complete their validation of major updates released by Microsoft, and then schedule their deployments with you, their updates will sometimes be a little behind the current update.
  • Why start behind, building in a version behind and worrying about “what if I update on all this new stuff”. Just start fresh with the latest patches and updates before you do anything.
  • If the OEM didn’t do so, then immediately do all patching and updating on the system.
  • How to get ready for this properly? Just refer to the Azure Stack update activity checklist. During the update process, you can Monitor Updates.
  • Got OEM updates? Depending on your OEM, they may have pre-defined processes and procedures. Always best to check with their site or knowledgebases.
  • Patch and Update process for Azure Stack Resource Providers (RP)

VM update and Management

This is something to pay attention to in both the Azure Stack Operator side of the house as well as in the tenant spaces. There needs to be a process in place as well as roles / responsibilities around this. See VM update and management automation in Azure Stack for options to consider for your VMs.

Azure Stack Marketplace

Before you get to any IaaS or PaaS services, you have to have some Compute resources in place and likely VM extensions to support the deployment of those resources. While is it possible in a connected scenario to simply use the portal to download marketplace items, Backup and DR is a consideration for instead getting your marketplace items as if in disconnected scenario. When you use the Marketplace Syndication Tool, you then can export all images, 3rd party add-ons, templated solutions and VM Extensions to your favorite external drive or network share accessible to 1 or more stack as needed. And then you have the added benefit of not only re-usability across multiple stacks, but now you have all of those artifacts backed up as well.

What is available in the Marketplace

If you have not yet explored the possibilities of marketplace items to download from Azure, see Azure Marketplace items available for Azure Stack. Better yet, ask your end users what they need to do their jobs and deploy their applications. Then their requirements will help to drive what you need to grab from the marketplace. Make this something to check regularly as the Azure Stack Operator, as new items can be added at any time. Here is a script you can try out to compare what you have versus what is available.

Azure Extensions

This deserves its own honorable mention. While extensions are included in the Azure Stack Marketplace, they fall in the “Inside the Box” capabilities as mentioned above. Once a Virtual machine is deployed software may need to be installed or scripted via an extension which sort of acts on behalf the installer to make things happen like: promote a domain controller or add monitoring tools in the VM.

In the tenant virtual machines, you will need to know which extension are require for some services, for example SQL server, versus those which are nice to have or based upon requirements.

 

Publish Marketplace Items

As mentioned above in ARM Templates under the Azure Stack Tools section, one goal as a provider should be to simplify the tenants/users subscription life. You can add value by creating customer solution templates to put in the Markteplace. See Create and Publish a Marketplace Item in Azure Stack. Build the Azure environment around the thing they download from the Markteplace, versus making them download a VM, and then they have to configure storage, networks. etc. By building a complete solution for your users, you add value by simplifying their deployment of applications while reducing the support calls that may arise by their inability to configure the Azure environment properly.

 

Backup and Recovery

Now that you have done all of this work to get your Azure Stack Integrated system up and running, you surely don’t want to lose anything now or in the future. Here are some links to make sure you have the pieces of the puzzle in place to protect what you have. I’ll put them in order and then include additional links and resources to help.

Infrastructure Backup

Understand what is covered and what is not covered with this approach. It is also important to review the Infrastructure limits page which tells about other things that will or will not be backed up. Please review Infrastructure Backup Service Best Practices before proceeding with the steps below or if you have set up infrastructure backup already. Then proceed to enable or verify the following steps:

  1. Verify Requirements in place for Infrastructure Backup Service
    • These backups provide help for the redeployment of the Azure Stack cloud to restore identity, security, and Azure Resource Manager data
    • NOTE: this does NOT include IaaS/PaaS resources. Make sure you reviewed the links just below the Infrastructure Backup above.
  2. Enable Azure Stack Infrastructure Backup from PowerShell
    • Yes – it is true that you CAN do this through the Administrator Portal, here are two reason to use PowerShell instead:
      • Having the process scripted in itself is a form of backup! Faster and easier to run again if needed
      • If you do more than one stack, it is likewise faster to reproduce this across multiple systems

Protect IaaS and PaaS resources

The methods in consideration of these options below will vary depending on if you are connected or disconnected. So beware as you read the options if the solution says “Azure or it’s a SaaS” and you are disconnected, well then just move on to something else.

Virtual Machines

Applications deployed into IaaS VMs can be protected at the guest OS level using backup agents. Data can be restored to the same IaaS VM, to a new VM on the same system, or a different system in the event of a disaster. Backup agents support multiple data sources in an IaaS VM such as:

  • Disk: This requires block-level backup of one, some, or all disks exposed to the guest OS. It protects the entire disk and captures any changes at the block level.
  • File or folder: This requires file system-level backup of specific files and folders on one, some, or all volumes attached to the guest OS.
  • OS state: This requires backup targeted at the OS state.
  • Application: This requires a backup coordinated with the application installed in the guest OS. Application-aware backups typically include quiescing input and output in the guest for application consistency (for example, Volume Shadow Copy Service (VSS) in the Windows OS).
  1. Use this article as a plan to Protect VMs deployed on Azure Stack
  2. Use Azure Backup for Connected scenarios to backup to Azure
  3. Use Azure Backup Server in dis/connected scenarios
    • Note: even though you will also see DPM in the installation wizard, it is not supported for Azure Stack.
  4. There are various 3rd party and ISV solutions. Check with your favorite backup provider. Some are included in this blog. Below are some of the other backup provider solutions that you may want to evaluate – in no particular order.

Platform as a Service (PaaS) Resource Providers

For your Web Apps, API apps and Functions, see Back up App Service on Azure Stack

As part of the backup plan noted above, keep in mind that one of the deployment options is to build both File Servers and the Database servers off the stack, which can help in the protection of a complete failure of the appliance by protecting the data off stack.

For both SQL and MySQL Resource Providers, keep in mind these two things:

The best way to deploy these both is using the PowerShell scripts. So if you preserved those well, that will help in redeploying if needed as your scripts will be ready to go. Also, this makes them reusable if they are parameterized nicely.

If you have a SQL Always on Availability group, please follow that guidance to Configure backups on secondary replicas of an Always On availability group

Everything else for the My/SQL RPs are Virtual Machines with Databases. So all of the methods above for Virtual Machines can be used to protect those databases, in addition to clustering technologies. Although not well documented for Azure Stack, the cluster could possibly span across multiple stacks or across database servers that are not located on Azure Stack as well.

*Although there is no definite time when this will happen, there is a plan eventually to move away from the various methods to deploy RPs, and they will be deployed instead through the marketplace. Stay tuned for that! In Build 1908 you can already see this option in Markteplace management in the Azure Stack Administrator portal. But the functionality is not there yet.

 

3rd Party Templated Solutions

The templates themselves can be backed up as described above when you download marketplace images using the Marketplace Syndication Tool. Beyond that, you have to consider the related backup and DR solutions for those solutions and or virtual machines.

Monitoring & Management

Azure Stack has at its core Azure Monitor as a platform capability so you can Proacitvely Monitor Azure Stack operations. This allows for many, but not all of it’s big sister Azure’s capabilities. The differences are explained in How to consume monitoring data from Azure Stack, which notes that “Not all monitoring data that is found in Azure Global will be in Azure Stack…” – good list to review so you will know.

Above in Azure Stack Tools, are listed the SCOM Management Pack and Nagios plugin for monitoring. Read more about how those work in Integrate external monitoring solution with Azure Stack, plus a User Guide for the SCOM management pack as well as other options for using PowerShell to monitor health and alerts.

Monitoring Tenant VMs

Since these are just virtual machines, you can install any agent you want, provided you also want to build and or bake in everything else that is required to support that. Or if you want to easily monitor your VMs, there is an Azure Monitor extension which can be used.

SIEM Integration

Since the initial setup of this can be done in the deployment worksheet, you may be all set. But if you didn’t and know want to integrate with something like Splunk…

 

Capacity Management

There are two different persepctives to cost management which are interrelated. On the Azure Stack Operator side, they are ultimately responsible for capacity of the overall integrated system and hence resources that can be consumed by tenants. On the tenant side, they will have quotas set by the Azure Stack operators. Therefore, the tenant will have to manage how they use resources to not exceed the quotas for networking, storage and compute in particular.

Azure Stack Operator Capacity Management

There are some handy little KPI tiles that will show you some of this information. This should be someone’s responsibility to keep an eye on this and take action as needed.

Resource Providers’ Capacity

Your users may request additional SKUs or your databases may get full. Therefore monitoring and adjusting is needed for theses as well.

 

Azure Stack Tenant VMs Capacity

Tenants will have to monitor this for their applications and virtual machines (e.g. Azure monitor extension and alerts). Ideally, this would have be planned up front, based on testing and or requirements. Nonetheless, then it will be a matter of doing one of the following as needed:

Additional Tenant Considerations

Security

For all that is within the stack, or “under the hood” security is not of your concern. It is secure by design. Don’t believe me? Listen to the product group on their talk at Ignite –Discovering the Importance of Security Design Principles and Key Use cases for Azure. This is one of my favorite presentations as once you hear and understand what was done to harden Azure Stack, it is very satisfying.

So what remains are some other Security related items that have come up from our customers, so I’ll address those here. (Thanks Pat T for the ideas :))

  • Vulnerability Assessment
  • Web Apps over HTTPS

Cost

  • Underutilized VMs
  • Un-needed resources such as
    • Unused Gateways
    • Public IP addresses

Azure Scaffolding

Azure Scaffold is prescriptive guidance written by one of our fine Architects, Rob. Initially, it looked at many of the capabilities of Azure Resource Manager for a subscription. Check out the scaffold link as there is much more in it now, much covered above, but wanted to address these in particular as they were not mentioned above.

  • Resource Locks
  • Azure Policy
  • Tags

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s