Automatically rotate the password of a service principal

  • Oct 12, 2022
  • Azure
  • |
Service Principal password

When creating a DevOps pipeline for your infrastructure as code (IaC), you often need a service principal to manage resources in your Azure subscription. A service principal is an identity you can give permission to access Azure resources. There are two flavors: application and managed identity. Managed identities are tight to a resource and eliminate the need for credentials. The identity has the same lifecycle as the resource. When the resource is deleted, the managed identity is deleted as well. Application identity is the representation of a registered application in your AD tenant. Usually, it's used for user-created apps or DevOps pipelines. In comparison with managed identities, it requires you to maintain credentials. There are three ways to authenticate: passwords, certificates, and federated identity credentials. Federated identity credentials remove the burden of maintaining passwords and certificates. It enabled external workloads to authenticate and access Azure resources without holding any secrets. As this is a relatively new feature, it's not widely supported. In the rest of the blog post, I'll refer to application identities when mentioning service principals.

Maintainability is a significant downside of using passwords and certificates to authenticate service principals. Maintainability becomes even more problematic when engineers use service principals to access Azure resources from their local development environment. Even though you want to prevent this as much as possible, it can be helpful when debugging and fixing production problems. Engineers that have access to service principals lead to secrets sprawl. Even though secrets sprawl doesn't happen overnight, it's a significant risk. This blog explains how to prevent secret sprawl by rotating the service principal password. The solution I'll present includes two resources: Privileged Identity Management (PIM) to obtain access to the service principal password and Azure automation to rotate the password.

 

Secrets sprawl happens when secrets are stored at different places in the organization. In the introduction, I described the situation when engineers use a service principal to access Azure resources from their local environment. Engineers (might) store the password in a password manager tool (like 1Password) for convenience. You might end up with passwords being stored in multiple places.

 

Key points to give more context about the scenario.

  • Terraform to manage infrastructure in Azure
  • For local development, engineers use their AD account to authenticate (no access to production)
  • In case of production problems, a service principal with high privileges can be used

In case of emergency problems:

  1. Engineer set service principal credentials in an environment variable to authenticate
  2. The engineer makes the changes in Terraform
  3. Run Terraform plan to check the changes
  4. Run Terraform apply to make the changes in Azure

 

You want to give engineers all the tools they need to solve production problems as fast as possible. However, it's a significant security risk when using service principals locally. Especially if multiple engineers use the same service principal. Rotating the password (after it's used) reduces the risk that credentials fall into the wrong hands. A second problem to address is traceability. If an engineer uses the service principal, an audit record must be stored to trace any infrastructure change back to an engineer. PagerDuty is often used to retain this kind of information. Especially for production problems. To underline, in (all)most cases, you roll out infrastructure with a CI/CD pipeline. However, in case of critical production problems, you want to move fast. It's important not to reuse the service principal used in the CI/CD pipeline. Use a second service principal in case of emergency problems.

 

Below is a sequence diagram showing the interactions for the proposed solution.

  1. An engineer activates role assignment to become a member of the group 'Demo group'
  2. Record (Add member to role completed (PIM activation)) is written in the audit log and sent to Log Analytics Workspace
  3. The engineer has access to the Key Vault and gets the service principal password
  4. PIM deactivates the role assignment after X minutes/ hours
  5. Record (Remove member from role (PIM activation expired) written in the audit log and sent to Log Analytics Workspace
  6. Alert is fired based on the previous audit log record
  7. An Azure automation (Powershell) runbook is executed.
    1. Generates new password for the service principal
    2. New password is stored in Key Vault

Use PIM for just in time access

With Privileged Identity Management you can restrict access to resources using time and approval-based activation. To use PIM, an Azure Active Directory Premium P2 license is required. Three eligible assignment types are available: AD roles, (preview) privileged access groups, and Azure resources. With privileged access groups, (selected) users are eligible to become a member of a group. In my example, I created the group `Demo group`. This group has access to read secrets in a Key Vault. User John Doe is eligible to become a member of this group. This user can activate the eligible access group in PIM. A second user gets notified and must approve or decline the request. Now John Doe has access to the Key Vault to get the service principal's password.

Using PIM is a more compliant way of working because users only obtain access for a short period, and it needs to be approved by a second user. Even though using PIM is already a significant improvement, it doesn't solve the problem of secret sprawl because after an engineer acquires access to the password, they can still store the password locally for later use and bypass PIM.

 

Azure PIM in Terraform?

Unfortunately, it's not (yet) possible to manage eligible assignments in Terraform. You can track the feature request on GitHub.You can configure privileged access for groups in Active Directory. Browse to the group and click on Privileged access in the left menu.

Store PIM actions in Log Analytics workspace

Azure Log Analytics allows you to collect logs and data from Azure resources. Azure Active Directory logs different kinds of information, e.g., audit, sign-in, provisioning, risky users, risk detection, etc. Logs can be sent to multiple destinations, Log Analytics Workspace, Storage Account, Event Hub, or a partner solution. You can configure this in the Diagnostic Settings of Azure Directory.

When an engineer activates the eligible assignment, they become a member of the Demo AD group. Active Directory creates the log entry Add member to role completed (PIM activation) on PIM activation and when the activation expires Remove member from role (PIM activation expired). These log entries are available in Audit Logs in the left menu under Monitoring. I configured that these logs are stored in Log Analytics Workspace. A nice feature of Log Analytics Workspace is the ability to create alerts. Alerts are triggered when they meet a specific condition. If an alert is triggered, then one or multiple actions are executed. I made an alert that triggers when a log search query returns at least one result. And if that happens, then an Azure Automation Runbook is executed.

 

An alert consists of one or more conditions and one or more actions.

  • A condition is a signal (and some logic) that cause an alert to be triggered. There are multiple alert types e.g. metrics, logs, activity log, and resource health
  • An action is executed when a condition is met. Examples of actions are SMS notifications, Automation Runbooks, Azure Functions, etc.

 

The query:

Log Analytics workspace in Terraform

The code snippet below shows how you can create a Log Analytics Workspace and ensure that the audit logs are pushed to the workspace. Also, you can see how to configure a query alert rule and connect it to an action.


              resource "azurerm_log_analytics_workspace" "log_analytics_workspace" {
                name                = "log-demo-ad-audit"
                location            = local.location
                resource_group_name = azurerm_resource_group.rg.name
                retention_in_days   = 30
              }

              resource "azurerm_monitor_aad_diagnostic_setting" "aad_diagnostics_setting_audit_logs" {
                name               = "audit-logs-to-log-analytics"
                log_analytics_workspace_id = azurerm_log_analytics_workspace.log_analytics_workspace.id
                log {
                  category = "AuditLogs"
                  enabled  = true
                  retention_policy {}
                }
               }

              resource "azurerm_monitor_scheduled_query_rules_alert" "monitor_scheduled_query_rules_alert" {
                name                = "sqra-pim-group-expiration-${random_integer.suffix.result}"
                location            = azurerm_resource_group.rg.location
                resource_group_name = azurerm_resource_group.rg.name
              
                action {
                  action_group = [azurerm_monitor_action_group.monitor_action_group.id]
                }
                data_source_id = azurerm_log_analytics_workspace.log_analytics_workspace.id
                description    = "Query audit log for PIM group assignment expiration"
                enabled        = true
                query          = <<-QUERY
                AuditLogs
                  | mv-expand TargetResources
                  | where Category == 'GroupManagement'
                  | where LoggedByService == 'PIM'
                  | where OperationName == 'Remove member from role (PIM activation expired)'
                  | sort by TimeGenerated desc
              QUERY
                severity       = 3
                frequency      = 5
                time_window    = 5
                trigger {
                  operator  = "GreaterThanOrEqual"
                  threshold = 1
                }
                depends_on = [
                  azurerm_log_analytics_workspace.log_analytics_workspace,
                  azurerm_monitor_action_group.monitor_action_group
                ]
              }
              
              resource "azurerm_monitor_action_group" "monitor_action_group" {
                name                = "ag-pim-group-expiration-${random_integer.suffix.result}"
                resource_group_name = azurerm_resource_group.rg.name
                short_name          = "pimgexp"
              
                automation_runbook_receiver {
                  name                    = "action_run_book_receiver"
                  automation_account_id   = azurerm_automation_account.automation_account.id
                  runbook_name            = azurerm_automation_runbook.run_book_change_spn_password.name
                  webhook_resource_id     = azurerm_automation_webhook.web_book_change_spn_password.id
                  is_global_runbook       = true
                  service_uri             = azurerm_automation_webhook.web_book_change_spn_password.uri
                  use_common_alert_schema = true
                }
              }
          

Rotate the password with Azure Automation

In the previous section, I explained that an alert is triggered based on an audit log entry. The alert executes an Azure Automation Runbook. Azure Automation is a service to automate management tasks. In my case, I would like to change the service principal's password when the audit log “Remove member from role (PIM activation expired)” is created. I created a Runbook with PowerShell. A schedule or webhook can start a Runbook. The alert I made executes an action that calls a Runbook webhook.

In the introduction, I explained about two types of service principals, application and managed identity. Managed identities are tight to an Azure resource. I enabled this for Azure Automation. The managed identity ensures that the Runbook has access to protected resources. When manage identity is enabled, a service principal is automatically created. I gave the service principal permission to maintain secrets in KeyVault (Key Vault Secrets Officer built-in role). It also has Application.ReadWrite.All permissions on the Graph API. Note it's not possible to set these permissions for a service principal in the portal. Usually, these permissions are set on the application registration. However, a managed identity doesn't have an app registration. The only way to achieve this is through the API or Terraform.

The PowerShell RunBook itself is straightforward. It executes the following actions:

  • Get emergency service principal
  • Change the service principal's password
  • Store the new password in Key Vault.
          
            param (
              [Parameter (Mandatory = $false)]
              [object] $WebHookData
              )
              if ($WebHookData)
              {
              Connect-AzAccount -Identity
              $targetAdAppName =  Get-AutomationVariable -Name 'AdAppName'
              $keyvaultName = Get-AutomationVariable -Name 'KeyVaultName'
              $keyvaultSecretName = Get-AutomationVariable -Name 'KeyVaultSecretName'
              $token = (Get-AzAccessToken -ResourceTypeName MSGraph).token
              Connect-MgGraph -AccessToken $token
              $app = Get-MgApplication -Filter "DisplayName eq '$targetAdAppName'"
              Write-Output "application id: " $app.DisplayName
              
              foreach ($passwordCredential in $app.PasswordCredentials) {
                Remove-MgApplicationPassword -ApplicationId $app.Id -KeyId $passwordCredential.KeyId
              }
          
              $newPassword = Add-MgApplicationPassword -ApplicationId $app.Id
              Write-Output "New password created for ad app"
              $secretSecureString = ConvertTo-SecureString -String $newPassword.SecretText -AsPlainText -Force
          
              Set-AzKeyVaultSecret -VaultName $keyvaultName -Name $keyvaultSecretName -SecretValue $secretSecureString -Expires "2099-01-01T00:00:00Z"
              Write-Output "Password stored in key vault"
              }
              else
              {
                  Write-Output "No webhook request body found"
              }
          
        

Create an Azure Automation account and a Powershell Runbook in Terraform

The Microsoft.Graph modules aren't registered by default. In the azurerm_monitor_action_group.monitor_action_group block, you can see that a Runbook name and Webhook id are defined. I created these in the code block below.


          resource "azurerm_automation_account" "automation_account" {
            name                          = "aa-demo"
            location                      = local.location
            resource_group_name           = azurerm_resource_group.rg.name
            sku_name                      = "Basic"
            public_network_access_enabled = "true"
            identity {
              type = "SystemAssigned"
            }
          }

          resource "azurerm_automation_module" "microsoft-graph-authentication" {
            name                    = "Microsoft.Graph.Authentication"
            resource_group_name     = azurerm_resource_group.rg.name
            automation_account_name = azurerm_automation_account.automation_account.name
          
            module_link {
              uri = "https://www.powershellgallery.com/api/v2/package/Microsoft.Graph.Authentication/1.9.6"
            }
          }
          
          resource "azurerm_automation_module" "microsoft-graph-applications" {
            name                    = "Microsoft.Graph.Applications"
            resource_group_name     = azurerm_resource_group.rg.name
            automation_account_name = azurerm_automation_account.automation_account.name
          
            module_link {
              uri = "https://www.powershellgallery.com/api/v2/package/Microsoft.Graph.Applications/1.9.6"
            }
            depends_on = [
              azurerm_automation_module.microsoft-graph-authentication
            ]
          }

          resource "azurerm_automation_webhook" "web_book_change_spn_password" {
            name                    = "wh-change-spn-password-${random_integer.suffix.result}"
            resource_group_name     = azurerm_resource_group.rg.name
            automation_account_name = azurerm_automation_account.automation_account.name
            expiry_time             = "2032-01-01T00:00:00Z"
            enabled                 = true
            runbook_name            = azurerm_automation_runbook.run_book_change_spn_password.name
          }

          resource "azurerm_automation_runbook" "run_book_change_spn_password" {
            name                    = "rb-change-spn-password-${random_integer.suffix.result}"
            location                = local.location
            resource_group_name     = azurerm_resource_group.rg.name
            automation_account_name = azurerm_automation_account.automation_account.name
            log_verbose             = "true"
            log_progress            = "true"
            description             = "Runbook for changing service principal password when PIM group assignment expires"
            runbook_type            = "PowerShell"
          
            content = # Powershell script here
          }
        

Give a managed identity access to the Graph API in Terraform

The Runbook uses a managed identity to access protected resources. I enabled this by setting the azurerm_automation_account.automation_account.identity property. As explained earlier, a managed identity is a service principal without an application registration. Usually, the Graph API permissions are set on the application registration and inherited by the service principal. In the code below, you see how this is done in Terraform.

 


        data "azuread_service_principal" "automation_account_managed_identity" {
          display_name  = "aa-demo"
        
          depends_on = [
            azurerm_automation_account.automation_account
          ]
        }
        
        data "azuread_application_published_app_ids" "well_known" {}
        
        
        data "azuread_service_principal" "msgraph" {
          application_id = data.azuread_application_published_app_ids.well_known.result.MicrosoftGraph
        }
        
        resource "azuread_app_role_assignment" "app_role_assignment" {
          app_role_id         = data.azuread_service_principal.msgraph.app_role_ids["Application.ReadWrite.All"]
          principal_object_id = data.azuread_service_principal.automation_account_managed_identity.object_id
          resource_object_id  = data.azuread_service_principal.msgraph.object_id
        }
      

The Azure Automation Runbook is the last step in the process. The service principal's password is changed when the PIM access group assignment is expired. The new password is stored in KeyVault. From this point, it doesn't matter if an engineer holds the password locally because it gets changed anyway. We can build secure solutions in Azure by combining PIM and Azure Automation.

You can find the complete code on my GitHub repository

Comments