Rotate ECR Authorization Token Secret

25 May 2022

External Secrets Operator (ESO) is a Kubernetes operator that allows to read secrets from AWS SecretsManager (or any other SecretStore like Azure KeyVault or Google Secret Manager) and injects their values into Kubernetes Secrets.

A common use case is to use ESO to inject an Amazon Elastic Container Registry (ECR) authorization token secret that grants permission for pulling images from a private ECR to nodes within a Kubernetes cluster. You'd create a secret containing the ECR login password acquired using the AWS CLI.

$ aws ecr get-login-password --region eu-central-1 | base64 --decode

{
    "payload":"0zcDjBe...",
    "datakey":"AQ..",
    "version":"2",
    "type":"DATA_KEY",
    "expiration": 123456789
}

The ECR login token would then be synced to a Kubernetes Secret within your Kubernetes cluster using the AWS SecretStore provider. The access key secret awssm-secret that is referenced in the SecretStore below first needs to be created using kubectl. A classical chicken-and-egg problem: to read further secrets we first need to create an initial secret.

$ kubectl create secret generic awssm-secret --from-literal=access-key-id=<insert access key id> --from-literal=access-key-secret=<insert access key secret>

The AWS IAM service user the access key was created for needs to have appropriate permission to read all required secrets.

apiVersion: external-secrets.io/v1alpha1
kind: SecretStore
metadata:
  name: aws-secretsmanager
spec:
  provider:
    aws:
      service: SecretsManager
      region: eu-central-1
      auth:
        secretRef:
          accessKeyIDSecretRef:
            name: awssm-secret
            key: access-key-id
          secretAccessKeySecretRef:
            name: awssm-secret
            key: secret-key-secret

In this example, we are using the SecretStore to read a key-value type secret myApp/ecr-login containing the keys registry (the ECR's URL), username ("AWS"), and password (obtained using aws ecr get-login-password as shown above).


apiVersion: external-secrets.io/v1alpha1
kind: ExternalSecret
metadata:
  name: ecr-pull-secret
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secretsmanager
    kind: SecretStore
  target:
    name: ecr-pull-secret
    template:
      type: kubernetes.io/dockerconfigjson
      data:
        .dockerconfigjson:
          {{- printf "{\"auths\": {\"%s\": {\"auth\": \"%s\"} } }" .registry (printf "%s:%s" .username .password | b64enc) | b64enc }}
  data:
  - secretKey: registry
    remoteRef:
      key: myApp/ecr-login
      property: registry
  - secretKey: username
    remoteRef:
      key: myApp/ecr-login
      property: username
  - secretKey: password
    remoteRef:
      key: myApp/ecr-login
      property: password

ECR tokens expire after 12 hours

The problem is that for security reasons ECR authorization tokens expire after 12 hours. So if one of your Kubernetes nodes would need to pull an image some time later, the pull would fail, resulting in ErrImagePull and ImagePullbackOff errors.

A solution is to run CRON-Job pods that use AWS CLI to refresh the ECR token regularly from within your cluster. This goes against the idea of using ESO to externalize secrets management. It should not be the responsibility of a Kubernetes cluster to keep external secrets fresh. It also adds complexity to your cluster as it requires an additional Kubernetes Deployment that is unrelated to the core compute needs of your application.

The AWS way to keep secrets fresh is to define Lambda functions for regular secret rotation. AWS even offers a Python template for a secret rotation function that is called multiple times to do different steps in secret rotation (create secret, set secret, test secret, and finish secret). However, as maximum rotation frequency is 1 day for SecretsManager's built-in secret rotation and ECR tokens expire after only 12 hours (why, AWS?), this is not an option.

Luckily, we can implement our own Lambda function that refreshes the ECR authorization token in SecretsManager. EventBridge can then be used to trigger the Lambda function at a fixed rate (every 6 hours for example).

To make things work, the Lambda function needs an appropriate execution role that on top of AWSLambdaBasicExecutionRole allows read (GetSecretValue) and write access (UpdateSecret) to the secret in question. As the ECR authorization token is retrieved by the Lambda function and an authorization token's permission scope matches that of the IAM principal used to retrieve the authentication token, the Lambda function also needs permissions to access ECR (e.g. AmazonEC2ContainerRegistryReadOnly). Otherwise, the Docker login to the ECR would be successful but any pull attempt would fail (HTTP 403 Unauthorized, I tried...).

    ...
    {
        "Effect": "Allow",
        "Action": [
            "secretsmanager:GetSecretValue",
            "secretsmanager:UpdateSecret"
        ],
        "Resource": "arn:aws:secretsmanager:eu-central-1:<account-id>:secret:myApp/ecr-login*"
    }
    ...

The Python code uses boto3 to retrieve a new authorization token (get_authorization_token) for ECR and update the secret.

import json
import base64
import boto3

sm_client = boto3.client('secretsmanager')
ecr_client = boto3.client('ecr')

def lambda_handler(event, context):
    """Updates the ecr login password in secret myApp/ecr-login/password"""

    secret_name = 'myApp/ecr-login'

    # read secret string
    secret = get_secret_value(secret_name)
    json_secret = json.loads(secret['SecretString'])

    # retrieve new ECR token
    login_token = ecr_client.get_authorization_token()['authorizationData'][0]['authorizationToken']

    # update secret, format is AWS:PASSWORD in base64, decode base64 and remove 'AWS:'
    json_secret['password'] = base64.b64decode(login_token)[4:].decode('UTF-8')
    update_result = update_secret(secret_name, json.dumps(json_secret))

    print(update_result)

    return {
        "statusCode": 200,
        "headers": {
            "Content-Type": "application/json"
        },
        "body": update_result
    }

def update_secret(secret_name, secret_string):
    """Updates the value of an existing secret"""
    return sm_client.update_secret(SecretId=secret_name, SecretString=secret_string)

def get_secret_value(secret_name):
    """Gets the value of a secret. """
    return sm_client.get_secret_value(SecretId=secret_name)

Note that the response from get_authorization_token is different from the one aws ecr get-login-password gets you. In the Python response authorizationToken is in the format AWS:<password> of which the CLI response only returns the <password> part. Depending on how dockerconfigjson is configured in the external secrets definition, you might need to deconstruct the token in a different way.

{
    'authorizationData': [
        {
            'authorizationToken': 'string',
            'expiresAt': datetime(2021, 1, 1),
            'proxyEndpoint': 'string'
        },
    ]
}

Once the Lambda function is set up an EventBridge trigger can be added. A rule with a fixed schedule expression rate(6 hours) will do the job of refreshing the ECR token way before expiration.

EKS Nodes

Note that in case you are dealing with a Kubernetes cluster within AWS (EKS), maybe even within the same account, and you may control the worker node IAM roles, a better option might be to assign the role NodeInstanceRole to your worker nodes, as described here. This way kubelet running on EKS nodes may make calls to AWS APIs with permissions granted through the instance role, avoiding the need to read ECR login information via ESO and thus stale authorization tokens altogether.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:BatchCheckLayerAvailability",
                "ecr:BatchGetImage",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetAuthorizationToken"
            ],
            "Resource": "*"
        }
    ]
}

You may even grant secondary AWS accounts access to an ECR by setting appropiate repository policies.

Instead of using ESO you may also opt for using the AWS Secrets and Configuration Provider (ASCP) for the Kubernetes Secrets Store CSI Driver. ASCP exchanges the pod identity for an IAM role to access secrets in Secrets Manager. Any secret with appropriate access policies can then be mounted as a file in EKS pods (synchronization with Kubernetes Secrets can be enabled as well). This way, the service user required for ESO to access Secrets Manager (see the chicken-and-egg problem mentioned above) becomes obsolete, too.

Next: Ackermann on AWS Lambda with S3 Trigger

Rotate ECR Authorization Token Secret

ECR tokens expire after 12 hours #

EKS Nodes #

ECR tokens expire after 12 hours

EKS Nodes