Architecture
- Operational excellence
- Cloud account creation with compliance baselines
- AWS Landing zone accelerator and control tower
- Network
- Equinix Colocation and Fabric
- DevOps
- Jenkins
- Gitea
- Github
- Sonarqube
- GitOps
- ArgoCD
- Weaveworks FluxCD TF (Tofu) controller or HCP Terraform operator
- Dapr for Service-to-service invocation, Asynchronous Publish and subscribe events, Synchronous Orchestration Workflows, State management, Resource bindings, actors, configuration, distributed lock
- Infrastructure as code
- Terraform.
- Ansible.
- REST API ( Use mastercard terraform rest api provider )
- Jenkins Configuration as Code (a.k.a. JCasC) Plugin.
- Cloud account creation with compliance baselines
- Security
- Each component should use entraid, keycloak, okta for authentication and authorization
- Signup, Login pages should be provided by entraid, keycloak, okta
- It can have username/password, Login with social providers, SAML, LDAP, Active Directory integrations, etc.
- When component pod or process starts, it should use IAM role, Secrets store CSI driver (SSCD) and AWS Secrets and Configuration Provider (ASCP) to mount the token which will be used by the process to call all other components.
- This single token will be used to call all other components.
- All other components should check with entraid, keycloak, Okta if token is valid and then approve/deny the request
- AWS secrets manager will use eventbridge and lambda functions to rotate the tokens.
- ssh deploy keys will be mounted in the pod as volumes by SSCD and ASCP and .ssh/config to use the private key with Host = reponame, Hostname = github.com, IdentifyFile = mounted private key. https://gist.github.com/holmberd/dbeb8789742acfd791747772104160fe (opens in a new tab)
- Token will be used by Terraform as environment variable GITEA_TOKEN or GITHUB_TOKEN
- Use entraid, keycloak, Okta for authentication and authorization of DevOps tools
- Search in google for "jenkins entraid", "gitea entraid", etc.
- When installing the devops tool, use entraid for authentication and authorization
- When connecting from one devops tool to another like jenkins to gitea, use entraid.
- Jenkins should use only one token from entraid to access all other devops tools like github/gitea, sonarqube, etc. with entraid roles mapping to devops tools roles. One user/token for nonprod and one user/token for prod.
- Dapr for Service-to-service invocation, secrets
- Use service account and iam roles (IRSA) to get secrets like ssh private keys, tokens, etc. from AWS Secret manager meant for the pod or namespace
- https://aws.github.io/aws-eks-best-practices/security/docs/data/#secrets-management (opens in a new tab)
- Mount AWS secret manager secrets as volumes directly into Pod.
- https://github.com/kubernetes-sigs/secrets-store-csi-driver (opens in a new tab) with AWS Secrets & Configuration Provider (ASCP)
- Convert AWS secret manager secrets into kubernetes secrets which can be used by pods in the namespace.
- argocd, fluxcd, jenkins and terraform to github enterprise organization or gitea organization
- Create users for jenkins ( to commit, push and create prs for gitops repositories ) and terraform ( to create gitops repositories and add ssh deploy public keys to them and add ssh deploy private keys to cloud secrets manager like AWS secrets manager ) in entraid. Add them to appropriate teams in github/gitea or role mapping in entraid.
- Use ssh deploy keys for argocd and fluxcd to read gitops repositories since the cost of each user is 21 USD per month for Github enterprise and we need one user per organizational unit. If we create user, we are not using it to perform any api operation other than clone and checkout. Terraform user can add the ssh deploy public keys to gitea/github and add ssh deploy private keys to cloud secrets manager like AWS secrets manager for use by argocd and fluxcd.
- ssh deploy keys will be mounted in the pod as volumes by SSCD and ASCP and .ssh/config to use the private key with Host = reponame, Hostname = github.com, IdentifyFile = mounted private key. https://gist.github.com/holmberd/dbeb8789742acfd791747772104160fe (opens in a new tab)
- https will be used to make changes for gitops repositories and ssh will be used to read from gitops repositories.
- Key rotation
- cloud secret management service publishes an event which triggers a serverless function which rotates the secret and updates the secret management service. A secret can be
- ssh deploy key should be rotated by lambda functions of AWS secrets manager. It could use terraform taint command to mark the public key and private key as tainted in terraform state so that the next sync updates them. Terraform state is also stored in AWS secrets manager.
- AWS secrets manager should rotate the entraid tokens using lambda functions
- https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html (opens in a new tab)
- https://learn.microsoft.com/en-us/azure/key-vault/secrets/tutorial-rotation (opens in a new tab)
- https://learn.microsoft.com/en-us/azure/key-vault/secrets/tutorial-rotation-dual?tabs=azure-cli (opens in a new tab)
- https://cloud.google.com/secret-manager/docs/secret-rotation (opens in a new tab)
- Each component should use entraid, keycloak, okta for authentication and authorization
- Reliability
- EKS, AKS, GKE
- Performance Efficiency
- Cost Optimization
- CNCF projects https://www.cncf.io/projects/ (opens in a new tab) . Click on button VIEW ON CNCF LANDSCAPE
- argo
- containerd
- coredns
- etcd
- fluentd
- flux
- harbor
- helm
- jaeger
- keda
- kubernetes
- kustomize
- open policy agent
- prometheus
- artifacthub
- backstage
- cert manager
- chaos mesh
- cni
- dapr
- keycloak
- knative
- open feature
- open telemetry
DevOps steps
- Each organization, organizational unit, library, application, tool, middleware, microservice, data pipeline, dashboard, etc. is a component. We need code repository, CI/CD pipeline, artifact repositories and other DevOps infrastructure resources for each component.
Backstage
- Maintain and operate your deployment of Backstage. This includes customer support, infrastructure, CI/CD and, as your Backstage product grows, on-call support.
- Drive adoption of customers (developers at your company).
- Work with senior tech leadership and architects to ensure your organization's best practices for software development are encoded into a set of Software Templates.
- Evangelize Backstage as a central platform towards other infrastructure/platform teams.
- Security
- The UrlReader facility is of particular interest for a secure Backstage configuration. In particular the backend.reading.allow configuration lists the hosts that you trust the backend to be able to read content from on behalf of users. It is extremely important that this list does not, for example, allow access to instance metadata endpoints of cloud providers, or other endpoints that your Backstage instance may have access to which contain sensitive information. In general it is recommended to keep the list minimal and only allow reading from required endpoints. The same concerns apply to custom implementations of the UrlReader interface, if you need to implement these through code.
- For a high-security deployment, the auth backend should therefore be deployed in a separate service with its own database.
- Operators should configure catalog rules to limit the allowed entity kinds that users can define.
- By default all internal users are allowed to create and delete entities. If this does not fit your organization's needs it is recommended to enable and configure the permission system to restrict these operations.
- By default, Scaffolding jobs execute directly on the host machine, including any actions defined in the template. Because the Scaffolder templates are considered a more sensitive area it is recommended to control access to create and update templates to trusted parties.
- One strategy that allows you to reduce the access that the Scaffolder service has is to rely on user credentials when executing actions. For example, a GitHub App integration could be configured with read-only permissions, with a separate user OAuth token used to create repositories. This requires that your users have access to create repositories in the first place.
- By default all internal users are allowed to execute templates in the scaffolder. If this does not fit your organization's needs it is recommended to enable and configure the permission system to restrict these operations.
- avoid injecting authentication headers for upstream services in proxy configuration. restrict the access as much as possible, for example using the allowedMethods option to limit the methods that can be used, and using tokens with the minimum required authorization scope.
Solution
- examplebank
- AWS
- examplebank organization
- examplebank account
- nonprod organizational unit
- devops organizational unit
- nonprod01
- devops-comptest01 account
- devops-inttest01 account
- devops-e2etest01 account
- devops-perftest01 account
- devops-nonprod01 account
- nonprod02
- devops-comptest02 account
- devops-inttest02 account
- devops-e2etest02 account
- devops-perftest02 account
- devops-nonprod02 account
- nonprod03
- devops-comptest03 account
- devops-inttest03 account
- devops-e2etest03 account
- devops-perftest03 account
- devops-nonprod03 account
- nonprod01
- orgunit01 organizational unit
- nonprod01
- orgunit01-comptest01 account
- orgunit01-inttest01 account
- orgunit01-e2etest01 account
- orgunit01-perftest01 account
- nonprod02
- orgunit01-comptest02 account
- orgunit01-inttest02 account
- orgunit01-e2etest02 account
- orgunit01-perftest02 account
- nonprod03
- orgunit01-comptest03 account
- orgunit01-inttest03 account
- orgunit01-e2etest03 account
- orgunit01-perftest03 account
- nonprod01
- devops organizational unit
- prod organizational unit
- devops organizational unit
- devops-prod account
- devops-dr account
- orgunit01 organizational unit
- orgunit01-prod account
- orgunit01-dr account
- devops organizational unit
- examplebank organization
- AWS
- Azure
- Google cloud
- Create devops-nonprod01 account in devops OU in nonprod OU and devops-prod account in devops OU in prod OU with below
- Below container images copied to private registry like ECR.
- ArgoCD application controller
- ArgoCD repo server
- ArgoCD redis
- FluxCD source controller
- Tofu controller
- EKS/AKS/GKE cluster with below
- ArgoCD application controller
- ArgoCD repo server
- ArgoCD redis
- ArgoCD app of apps pointing to ECR OCI repository
- Service account with IAM Role for argocd and tofu controllers.
- Below container images copied to private registry like ECR.
- Push Kubernetes yaml and terraform code to ECR OCI Repository where app of apps is pointing to since github or gitea is not configured yet.
- ArgoCD application controller will create the argocd applications for fluxcd source controller, tofu controller and others and repoint to gitops repository which does not exist yet.
- tofu controller terraform resource pointing to terraform code will configure github or argocd application controller will install gitea. nonprod will configure nonprod and prod will configure prod.
- They will then create the required gitops repositories using terraform or argocd.
- They will checkin the code from OCI ECR to the gitops repositories appropriate branch.
- Create devops, devops-github, devops-gitea, devops-argocd-application-controller, devops-argocd-redis, devops-argocd-repo-server, devops-fluxcd-source-controller, devops-fluxcd-tf-controller repositories in examplebank, examplebank-nonprod and examplebank-prod organizations.
- If possible, Jenkins can be added to copy files from source code repository to gitops repository instead of directly checkin files to gitops repository.
- Create devops-jenkins repositories in examplebank, examplebank-nonprod and examplebank-prod organizations.
- Jenkins pipeline should also update the container images in ECR using CI/CD.
- Install all required devops tools.
- For orgunit01 and other OUs, we just need to point the argocd app of apps to appropriate git repository branch and argocd will do the rest.
Solution
- Below has problem with security and reliability because of minikube and k3s. Companies are not interested in creating landing zones using terraform directly. They will use services like AWS Control Tower because each account that gets created has some predefined resources which should not be changed by iam roles of the account. There are a lot of compliance requirements from landing zones for different industries which are already satisfied by these landing zone solutions - https://aws.amazon.com/solutions/implementations/landing-zone-accelerator-on-aws/ (opens in a new tab). So we should not use below solution for landing zone and instead start with devops organizational unit per environment account with Kubernetes (EKS/AKS) installed and argocd installed in it. Then we should use argocd to install other kubernetes resources and to install fluxcd source controller and tofu controller OR terraform operator to use terraform to create all the resources required by the organizational unit components using GitOps. We can use upload all terraform code to cloud native oci registry like AWS ECR before git server is created.
- Create google account examplebank.azure@gmail.com
- Create azure cloud account with email examplebank.azure@gmail.com
- Create VM with minikube, fluxcd, tofu controller and opentofu inside this account. It should have all required fluxcd and terraform code inside the image.
- Add a terraform resource which is watching a directory which creates below
- examplebank organization
- nonprod organizational unit and vm
- In this vm, add terraform resource which creates
- devops organizational unit and vm
- In this vm, add terraform resource which creates
- prod organizational unit and vm
- In this vm, add terraform resource which creates
- devops organizational unit and vm
- In this vm, add terraform resource which creates
- aks cluster with argocd
- argocd should then install required controllers, jenkins and gitea with examplebank, examplebank-nonprod and examplebank-prod organizations.
- devops-gitea GitOps repository needs to be created in gitea examplebank-prod organization
- devops-github GitOps repository needs to be created in github examplebank-prod organization
- In this vm, add terraform resource which creates
- devops organizational unit and vm
- In this vm, add terraform resource which creates
- nonprod organizational unit and vm
- examplebank organization
- Once the resources are created, it should delete the code and point to the appropriate gitops repository.
- Terraform VMs should NOT wait for this devops-github repository to be created and once it is created, checkin the github repository creation code and once the github source code repository for examplebank, nonprod, prod and devops are created, it should checkin the fluxcd and terraform code to them. This is because it is anti pattern and we will have to give write permission to Terraform VM to write to the GitOps repository.
- We need to keep the fluxcd and terraform code on the VMs as part of the VM image and then create the source code repository with terraform and fluxcd code checked into devops/infrastructure folder so that it is copied by the Jenkins CI/CD pipeline to the required GitOps repository, branch and directory.
- Terraform VMs should also do the same for jenkins so that CI/CD pipelines are created and triggered which will copy the terraform code to devops-jenkins and devops-github GitOps repositories. We cannot use fluxcd pointing to individual gitops repository since gitops repository does not exist and jenkins CI/CD pipeline does not exist to copy the code to the GitOps repository.
- We can do Jenkins+Gitea and Github Actions+Github.
- VMs need read only access to the appropriate GitOps repositories setup.
- We should be able to recreate the VM during patching activity. The fluxcd configuration pointing to appropriate GitOps repository should not be lost. Changes to the nonprod and prod vms should be done by management account. Changes to the ou vms need to be done by nonprod and prod vms. This is because we cannot restart VMs during patching when terraform is running inside the VM. We need to manually update the management account VM. We need to try to implement self management of VM by terraform via GitOps if possible. Maybe create a new VM, update the state and then delete the old VM. terraform state needs to be stored in secrets manager with required iam access to the secret manager. Does tofu controller have feature to check if anything else apart from what was created by it was created manually and delete the manually created resources ?
- Horizontal and vertical scaling changes for fluxcd is required for performance reasons.
- Need to provide workload identify to terraform runners to be able to plan and apply infrastructure code.
- Due to all these requirements which have been already thought of by fluxcd and tofu controller, it is best to use them instead of trying to move to argocd + tofu controller or removing kubernetes and using single binary controller.
- Well Architected review
- Operational excellence
- Security
- Better to use EKS anywhere, AKS edge essentials instead of k3s for security reasons.
- Reliability
- What happens if AZ where the tofu controller vm is running goes down ?
- Performance efficiency
- Cost optimization
- Better to use k3s instead of EKS anywhere, AKS edge essentials to lower cost.
GitOps repositories
- For bootstrapping, no need to use dc01. We can use management account of one of the clouds like aws, azure or gcp.
- We do not need to use Jenkins for bootstrapping CI/CD pipeline since the CI/CD pipeline will just checkout and checkin the infrastructure code to gitops repositories. We dont have other tools like linters, artifact repository, etc. available so we dont need to use Jenkins. We can just use Github actions or similar CI/CD tool of the code repository provider cloud.
- During bootstrapping, we can use flux cd terraform of management account of one of the clouds to create and maintain the github repositories and then transfer the responsibility of these github repositories to the devops prod account flux cd terraform of the same cloud.
- After creating organization, first OU we should create is devops and also create all the components of devops like devops-jenkins, devops-github, devops-nexusrm, devops-sonarqube, etc. so that we can start managing the other components using Jenkins CI/CD pipelines instead of code repository provider cloud CI/CD pipeline tool like Github actions.
- Create nonprod and prod OUs and then create devops, testing, orgunit01 and other OUs inside them.
- We will use app of apps so that argocd manages its own apps. We dont need terraform to create the apps.
- Terraform should just create the kubernetes cluster, create argocd namespace and install argocd and create an app which points to a git repository which will contain application resources for all code that is deployed to the kubernetes cluster.
- Code repository directories in devops/infrastructure directory : ( No need for devops/container_deployment directory )
- aws/orgunit01/eks/argocd -> orgunit01 repository aws/orgunit01/env/eks branch application resources argocd directory.
- aws/orgunit01/eks/kubernetes -> orgunit01-app01-transfer repository aws/orgunit01/env/eks branch kubernetes directory.
- aws/orgunit01/fluxcd -> orgunit01 repository aws/orgunit01/env branch terraform resources fluxcd directory.
- aws/orgunit01/terraform -> orgunit01-app01-transfer repository aws/orgunit01/env branch terraform directory.
- aws/devops/fluxcd -> devops repository aws/devops/nonprodX and aws/devops/prod branch fluxcd directory.
- aws/devops/terraform -> orgunit01-app01-transfer repository aws/devops/nonprodX and aws/devops/prod branch terraform directory.
GitOps repository | Branch | Folders | Example |
---|---|---|---|
https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) | github | terraform | self terraform vm installed on dc01 with tofu controller terraforms for all repositories with branches github/devops/nonprod |
https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) | github/devops/nonprod | terraform | self and OUs source and GitOps nonprod Github repositories |
https://github.com/examplebank-prod/examplebank.git (opens in a new tab) | github | terraform | self terraform vm installed on dc01 with tofu controller terraforms for all repositories with branches github/devops/prod |
https://github.com/examplebank-prod/examplebank.git (opens in a new tab) | github/devops/prod | terraform | self and OUs GitOps prod Github repositories |
https://github.com/examplebank-prod/examplebank.git (opens in a new tab) | dc01 | terraform | self terraform vm with https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) dc01 and https://github.com/examplebank-prod/examplebank.git (opens in a new tab) dc01 tofu controller terraforms; org ; prod OUs, accts, terraform vms, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/examplebank.git (opens in a new tab) | aws | terraform | self terraform vm with https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) aws and https://github.com/examplebank-prod/examplebank.git (opens in a new tab) aws tofu controller terraforms; org ; prod OUs, accts, terraform vms, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/examplebank.git (opens in a new tab) | azure | terraform | self terraform vm with https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) azure and https://github.com/examplebank-prod/examplebank.git (opens in a new tab) azure tofu controller terraforms; org ; prod OUs, accts, terraform vms, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/examplebank.git (opens in a new tab) | gcp | terraform | self terraform vm with https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) gcp and https://github.com/examplebank-prod/examplebank.git (opens in a new tab) gcp tofu controller terraforms; org ; prod OUs, accts, terraform vms, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) | dc01 | terraform | nonprod OUs, accts, terraform vms, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) | aws | terraform | nonprod OUs, accts, terraform vms, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) | azure | terraform | nonprod OUs, accts, terraform vms, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/examplebank.git (opens in a new tab) | gcp | terraform | nonprod OUs, accts, terraform vms, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | github/devops/nonprod | terraform | Application components source and GitOps nonprod Github repositories |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | github/devops/prod | terraform | Application components prod Github repositories |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | dc01/orgunit01/comptest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | dc01/orgunit01/inttest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | dc01/orgunit01/e2etest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | dc01/orgunit01/perftest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | dc01/devops/nonprod01,02,03 | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | dc01/testing/nonprod01,02,03 | terraform, argocd | sonarqube project |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | dc01/orgunit01/prod | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | dc01/orgunit01/dr | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | dc01/devops/prod | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | aws/orgunit01/comptest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | aws/orgunit01/inttest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | aws/orgunit01/e2etest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | aws/orgunit01/perftest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | aws/devops/nonprod01,02,03 | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | aws/testing/nonprod01,02,03 | terraform, argocd | sonarqube project |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | aws/orgunit01/prod | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | aws/orgunit01/dr | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | aws/devops/prod | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | azure/orgunit01/comptest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | azure/orgunit01/inttest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | azure/orgunit01/e2etest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | azure/orgunit01/perftest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | azure/devops/nonprod01,02,03 | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | azure/testing/nonprod01,02,03 | terraform, argocd | sonarqube project |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | azure/orgunit01/prod | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | azure/orgunit01/dr | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | azure/devops/prod | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | gcp/orgunit01/comptest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | gcp/orgunit01/inttest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | gcp/orgunit01/e2etest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | gcp/orgunit01/perftest01,02,03 | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | gcp/devops/nonprod01,02,03 | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01.git (opens in a new tab) | gcp/testing/nonprod01,02,03 | terraform, argocd | sonarqube project |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | gcp/orgunit01/prod | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | gcp/orgunit01/dr | terraform, argocd | kubernetes cluster, tofu controller terraforms, argocd applications |
https://github.com/examplebank-prod/orgunit01.git (opens in a new tab) | gcp/devops/prod | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | dc01/orgunit01/comptest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | dc01/orgunit01/inttest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | dc01/orgunit01/e2etest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | dc01/orgunit01/perftest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | dc01/devops/nonprod01,02,03 | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | dc01/testing/nonprod01,02,03 | terraform, argocd | sonarqube project |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | dc01/orgunit01/prod | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | dc01/orgunit01/dr | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | dc01/devops/prod | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | aws/orgunit01/comptest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | aws/orgunit01/inttest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | aws/orgunit01/e2etest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | aws/orgunit01/perftest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | aws/devops/nonprod01,02,03 | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | aws/testing/nonprod01,02,03 | terraform, argocd | sonarqube project |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | aws/orgunit01/prod | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | aws/orgunit01/dr | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | aws/devops/prod | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | azure/orgunit01/comptest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | azure/orgunit01/inttest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | azure/orgunit01/e2etest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | azure/orgunit01/perftest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | azure/devops/nonprod01,02,03 | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | azure/testing/nonprod01,02,03 | terraform, argocd | sonarqube project |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | azure/orgunit01/prod | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | azure/orgunit01/dr | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | azure/devops/prod | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | gcp/orgunit01/comptest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | gcp/orgunit01/inttest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | gcp/orgunit01/e2etest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | gcp/orgunit01/perftest01,02,03 | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | gcp/devops/nonprod01,02,03 | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
https://github.com/examplebank-nonprod/orgunit01-app01-transfer.git (opens in a new tab) | gcp/testing/nonprod01,02,03 | terraform, argocd | sonarqube project |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | gcp/orgunit01/prod | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | gcp/orgunit01/dr | terraform, argocd | kubernetes deployment |
https://github.com/examplebank-prod/orgunit01-app01-transfer.git (opens in a new tab) | gcp/devops/prod | terraform, argocd | jenkins CI/CD pipeline, nexus repositories |
Kustomize helm charts
- Problem with helm charts is you cannot override the yaml if it is not part of values. To solve this problem, CI pipeline should use kustomize with helm charts.
- https://github.com/kubernetes-sigs/kustomize/blob/master/examples/chart.md (opens in a new tab)
- Use helm template command to convert helm chart to base yaml. Do not use helmCharts of kustomize as recommended by kustomize.
- Use kustomize transformers and overlays to generate final yaml
- Checkin the final yaml to GitOps repository.
helm template {releaseName} \
--values {valuesFile} \
--version {version} \
--repo {repo} \
{chartName} > {chartName}.yaml
kustomization.yaml
resources:
- minecraft_v3.1.3_Chart.yaml
DAPR
https://docs.dapr.io (opens in a new tab)
Building Block | Description |
---|---|
Service-to-service invocation | Resilient service-to-service invocation enables method calls, including retries, on remote services, wherever they are located in the supported hosting environment. |
Publish and subscribe | Publishing events and subscribing to topics between services enables event-driven architectures to simplify horizontal scalability and make them resilient to failure. Dapr provides at-least-once message delivery guarantee, message TTL, consumer groups and other advance features. |
Workflows | The workflow API can be combined with other Dapr building blocks to define long running, persistent processes or data flows that span multiple microservices using Dapr workflows or workflow components. |
State management | With state management for storing and querying key/value pairs, long-running, highly available, stateful services can be easily written alongside stateless services in your application. The state store is pluggable and examples include AWS DynamoDB, Azure Cosmos DB, Azure SQL Server, GCP Firebase, PostgreSQL or Redis, among others. |
Resource bindings | Resource bindings with triggers builds further on event-driven architectures for scale and resiliency by receiving and sending events to and from any external source such as databases, queues, file systems, etc. |
Actors | A pattern for stateful and stateless objects that makes concurrency simple, with method and state encapsulation. Dapr provides many capabilities in its actor runtime, including concurrency, state, and life-cycle management for actor activation/deactivation, and timers and reminders to wake up actors. |
Secrets | The secrets management API integrates with public cloud and local secret stores to retrieve the secrets for use in application code. |
Configuration | The configuration API enables you to retrieve and subscribe to application configuration items from configuration stores. |
Distributed lock | The distributed lock API enables your application to acquire a lock for any resource that gives it exclusive access until either the lock is released by the application, or a lease timeout occurs. |
Cryptography | The cryptography API provides an abstraction layer on top of security infrastructure such as key vaults. It contains APIs that allow you to perform cryptographic operations, such as encrypting and decrypting messages, without exposing keys to your applications. |
Jobs | The jobs API enables you to schedule jobs at specific times or intervals. |
Equinix Colocation
- Install bare metal servers in equinix colocation
- IBX
- https://www.equinix.com/products/data-center-services/colocation (opens in a new tab)
- https://docs.equinix.com/en-us/Content/Colocation-Products/colo-intro.htm (opens in a new tab)
Equinix fabric
- Use Equinix fabric for communication between your bare metal servers in equinix colocation and aws, azure, google and other clouds.
- https://www.equinix.com/products/digital-infrastructure-services/equinix-fabric (opens in a new tab)
- https://docs.equinix.com/en-us/Content/Interconnection/Fabric/Fabric-landing-main.htm (opens in a new tab)
- https://www.equinix.com/products/digital-infrastructure-services/equinix-fabric/provider-availability (opens in a new tab)
DevOps components reliability
- DevOps components like Cloudbees Jenkins will be initially installed in VMs or containers on the bare metal server in the Equinix colocation data center.
- Once they are used to create AWS, Azure and Google cloud infrastructure, they will be made highly available with Passive or Active strategies in these clouds.
- For example, Jenkins in AWS needs to handle CI/CD pipelines which build and deploy components to AWS infrastructure. Jenkins in Azure needs to handle CI/CD pipelines which build and deploy components to Azure infrastructure.
GitOps for CD - ArgoCD
- Harness, Spinnaker for CI/CD and Morpheus Data, Cloudify for cloud management are hype. They dont have proper open source support
- Use Cloudbees Jenkins for CI, ArgoCD for Kubernetes CD and Hashicorp cloud platform Terraform, Vault, etc. for Infrastructure CD.
- All Kubernetes (Helm charts) and Infrastructure code along with environment configuration code will be checked into the code repository.
- It is responsibility of Jenkins to compile it and translate it into files which can be checked into the GitOps repository used by components like ArgoCD, Terraform, etc. using commands like helm template, terraform plan. Translated files should contain all the environment configuration variables part of the files.
- Helm template -> Kubernetes yaml
- Terraform tf -> Terraform plan. Use read only access to account for drift detection and plan generation.
- Need to test if this is possible. Liquibase files -> sql. Use read only access to database for drift detection and sql generation.
- GitOps (PullOps vs PushOps)
- The difference between Helm and Argo is the difference between Ansible and Puppet. Ansible applies the playbook when you tell it to run, and never actually does anything to ensure that state is conformed to until you run the playbook again. Puppet applies the manifest you define and perpetually ensures that state is applied until the end of time and automatically, immediately, corrects any changes that deviates from the manifest. Puppet is stateful enforcement. Ansible is setting a state but not enforcing it. Ansible is good. Puppet is great but is actually resilient and self-correcting.
- Think of helm as push-ops and Argo as pull-ops. With push-ops, you often won't find out about any drift until the next time you try to push the latest state of the IaC repository. Sometimes that drift is important and can mean a lot of work for your team. With pull-ops, you can learn of any drift immediately as it occurs. Your argo app can be configured to automatically try to reconcile any drift back to the IaC state, or it can be configured to wait until manual intervention. It's up to you. The point is that you have the knowledge and the choice now.
- Deployment components like argocd, terraform, sql runner should run in the same account or kubernetes cluster or namespace where the infrastructure needs to be deployed and run when Jenkins updates the GitOps repository relevant branch. We can have one branch per component environment. Only Jenkins should have permission to update this GitOps repository.
- Terraform provider can be used to install argocd operator and create argocd applications in the kubernetes cluster and namespace.
- GitOps is more secure because Jenkins is not given access to do any deployment to any account. We can have single Jenkins for non production and production environments. It only has access to update GitOps repository.
- Also, we dont need to add code in Jenkins to run commands like helm install, terraform apply, liquibase deploy, etc.
- ArgoCD runs in the same namespace where kubernetes infrastructure needs to be deployed.
- We need to run terraform binary in the same account where we need to deploy the cloud infrastructure.
- Need to test if this is possible. Liquibase binary is not required since we need to just run sql statements. Use read only access to database for drift detection and sql generation.
- We have separate accounts, clusters and namespaces for different environments of an organizational unit or component.
- We can use the GitOps repository for Static code analysis, FinOps and architecture visualization.
- ArgoCD, Terraform and SQL should also do periodic drift detection. GitOps repository should exactly match the infrastructure in account, cluster, namespace or database.
- Need to test if this is possible. Jenkins may still need to connect to infrastructure to install middleware, configure it and deploy application changes to it. Ansible could be used with GitOps to achieve this. One can use ansible-pull and cron to continuously sync a git repo then with the --only-if-changed/-o flag, execute the changes locally, including the --check works as expected to do a dry-run.
- https://docs.ansible.com/ansible/latest/cli/ansible-pull.html (opens in a new tab)
- This should be on each VM which needs to be managed by ansible.
- AWS batch with fargate can be used to run the terraform binary which performs the apply and updates the state. It can also be a small EC2 instance with state backed up after each run. Appropriate IAM permissions can be assigned to the AWS EC2 or fargate similar to cloudformation iam permissions. State needs to be stored in the same account so that if account is deleted, state is also deleted.
- Liquibase binary daemon can be a process on the database machine.
- binary daemon process should be as close to the deployment endpoint as possible and traffic should not go over the internet. It should be in the same VPC or same cloud.
- If one reconcilation takes time, any more updates to the GitOps repository will have to wait for the previous reconcilation to finish.
- Every reconcilation will fix the drift if one exists due to manual changes.
- Wait for reconcilation to finish before Jenkins updates the GitOps repository because contract testing results for to be deployed component may change after reconcilation of a component.
- GitOps features
- Automated deployment of applications to specified target environments
- Support for multiple config management/templating tools (Kustomize, Helm, Jsonnet, plain-YAML)
- Ability to manage and deploy to multiple clusters
- SSO Integration (OIDC, OAuth2, LDAP, SAML 2.0, GitHub, GitLab, Microsoft, LinkedIn)
- Multi-tenancy and RBAC policies for authorization
- Rollback/Roll-anywhere to any application configuration committed in Git repository
- Health status analysis of application resources
- Automated configuration drift detection and visualization
- Automated or manual syncing of applications to its desired state
- Web UI which provides real-time view of application activity
- CLI for automation and CI integration
- Webhook integration (GitHub, BitBucket, GitLab)
- Access tokens for automation
- PreSync, Sync, PostSync hooks to support complex application rollouts (e.g.blue/green & canary upgrades)
- Audit trails for application events and API calls
- Prometheus metrics
- Parameter overrides for overriding helm parameters in Git
- Better to use Helm template -> Kubernetes yaml since Helm version change may make some changes to the Kubernetes yaml. We are tracking helm charts in source code repository. We can track what changes helm made in kubernetes yaml in GitOps repository. How will Helm hooks work ? Should we use argocd hooks ?
- ArgoCD application
- GitOps repository : one per organizational unit
- Target revision :
- HEAD
- branch : one branch per environment like component, integrated, e2e, performance, production and dr of the organizational unit
- tag
- Path : one per component which has source code repository
- Destination : one per kubernetes cluster of each environment account of organizational unit
- Namespace : one per environment of the organizational unit if multiple environments share the same kubernetes cluster.
- We can have 4 accounts :
- component, integrated and e2e combined into one : any change in kubernetes cluster will affect all 3 environments
- performance
- production
- dr
- OR we can have 6 accounts and 6 kubernetes clusters.
- We can have 4 accounts :
- GitOps should also be used to managed DevOps Resources
- When new component is added, compass should add the new component terraform infrastructure code in GitOps repository of each organizational unit where required.
- Terraform running in each account like DevOps account, environment account should keep the DevOps infrastructure as code in sync with the GitOps repository.
- So if tomorrow, we need to make some changes to all the DevOps resources, it can be done by just updating the GitOps repository.
- Examples of DevOps resources are CI/CD pipelines, artifact repositories, source code repositories, sonarqube projects, argocd applications, etc.
- CD should merge all the configuration values and create terraform.tfvars.json and download all required terraform modules and create terraform module files for each environment account for each component.
- Use opentofu instead of terraform
- Use https://flux-iac.github.io/tofu-controller/use-tf-controller/ (opens in a new tab) with fluxcd GitOps
- Steps
- Create management account
- Use terraform to create vpc and ec2 instance in management account. This infrastructure code will be added later to examplebank gitops repository.
- ec2 instance should have all the terraform code to bootstrap the management account like
- create organization examplebank. This infrastructure code will be added later to examplebank gitops repository.
- create organizational unit devops. This infrastructure code will be added later to examplebank gitops repository. Each organizational unit will have multiple accounts for each environment. So we cannot assign the infrastructure code to create organizational unit to any one account. It will be added to organization gitops repository.
- create aws accounts for different environments (component, integration, e2e, performance, production, dr) of the devops organizational unit. This infrastructure code will be added later to examplebank gitops repository. We cannot assign the infrastructure code to create accounts to any one organizational unit account. It will be added to organization gitops repository.
- create vpc and ec2 instance in each account
- devops account ec2 instance should have all the terraform code to bootstrap the each account like
- create vpc and eks cluster. This infrastructure code will be added later to devops gitops repository account branch.
- configure github. This infrastructure code will be added later to devops-github repository.
- When eks cluster is created, argocd will be installed to it. It will start looking at devops gitops repository account branch.
- For security, only CI pipeline tool like jenkins should be able to write to the GitOps repositories. Better to have single EC2 instance with both Jenkins and GitOps repositories so that only Jenkins which is on the same instance as the GitOps repositories can write to the GitOps repositories. Use GitBucket for the GitOps repository. Also https://plugins.jenkins.io/git-server/ (opens in a new tab), https://github.com/jenkinsci/git-server-plugin (opens in a new tab) is a good option.
- Checkin code to install jenkins and configure it will devops-jenkins job in devops gitops repository account branch.
- Create devops-jenkins repository which will trigger devops-jenkins CI/CD pipeline and configure examplebank, devops and devops-github jobs.
- Create examplebank, devops and devops-github repositories.
- examplebank organization for storing source code. examplebank-gitops organization for storing gitops code. Avoid monorepo because it leads to unnecessary commits to the component gitops repository, we can have unlimited repositories, less secure because multiple applications share the same git repo.
- One repository per component
- One branch per environment like component, integration, e2e, performance, production, dr
- One directory per public cloud account like aws, azure, gcp, oracle containing cloud iac
- One directory per kubernetes cluster containing kubernetes iac.
- One deploy key per repository. Do not use personal access tokens with fine grained tokens since it is in beta. devops-github Terraform should add the public key to the github repository and add the private key as a secret which should be available to argocd via ~/.ssh/config https://docs.github.com/en/authentication/connecting-to-github-with-ssh/managing-deploy-keys#using-multiple-repositories-on-one-server (opens in a new tab)
- Use below annotation and github webhooks to ensure that application cache is invalidated only if files in the directory in which application resides are changed.
- argocd.argoproj.io/manifest-generate-paths: .
- 3 Jenkins EC2 Master Controller instances with EFS mounted with write permissions. 3 EC2 Instances with EFS mounted with read permission containing git server. OR One gitops repository per organizational unit in Github. Only Jenkins Instances have write permission using short lived tokens. Argocd and terraform have only read permission using short lived tokens.
- Better to use Github for gitops repositories instead of git on a server for SLSA compliance. https://news-web.php.net/php.internals/113838 (opens in a new tab)
- We could upload kubernetes manifests to helm oci repositories and use helm as source for ArgoCD. But then, anyone who can push to the helm oci repository will be able to make changes to the container image or helm charts. So it is same problem as git repositories. So better to just create git repository with one branch per environment for terraform iac and kubernetes iac.
- For security reasons, disable kustomize, helm and jsonnet tools since developers may refer to dependant charts and hack the system. Better to just use Jenkins CI to convert helm charts to kubernetes yamls and store final yamls in GitOps repository.
- Export Kubernetes ArgoCD events to permanent storage for auditing.
- Add Admission controller in Kubernetes for SLSA verification of container images.
- Create argocd-repo-server-tls secret and add --repo-server-strict-tls parameter to argocd-application-controller pods. In case of service mesh sidecar container, disable to TLS
- https://argo-cd.readthedocs.io/en/stable/operator-manual/tls/#inbound-tls-certificates-used-by-argocd-repo-server (opens in a new tab)
- https://argo-cd.readthedocs.io/en/stable/operator-manual/tls/#configuring-tls-to-argocd-repo-server (opens in a new tab)
- https://argo-cd.readthedocs.io/en/stable/operator-manual/tls/#disabling-tls-to-argocd-repo-server (opens in a new tab)
- App of Apps pattern. Not required, organizational unit infrastructure as code terraform modules deployed using Terraform GitOps created by Jenkins should find the organizational unit components from github repositories and then create terraform kubernetes yaml argocd application resources for them in the argocd namespace. Add resources-finalizer.argocd.argoproj.io finalizer to the argocd application.
├── Chart.yaml
├── templates
│ ├── guestbook.yaml
│ ├── helm-dependency.yaml
│ ├── helm-guestbook.yaml
│ └── kustomize-guestbook.yaml
└── values.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: guestbook
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
namespace: argocd
server: {{ .Values.spec.destination.server }}
project: default
source:
path: guestbook
repoURL: https://github.com/argoproj/argocd-example-apps
targetRevision: HEAD
spec:
destination:
server: https://kubernetes.default.svc
argocd app create apps \
--dest-namespace argocd \
--dest-server https://kubernetes.default.svc \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path apps
argocd app sync apps
- Set up network policies to prevent direct access to Argo CD components (Redis and the repo-server). Make sure your cluster supports those network policies and can actually enforce them.
- Secrets
- Use Cloud provider secrets manager like AWS secrets manager for providing secrets to cloud resources since it can be integrated with Cloud IAM.
- Use DAPR for secret management inside kubernetes
- For secret management for EC2 instances use iam role to get secrets from aws secrets manager.
- The application which needs the secret for connecting to it like DBMS should rotate the secret and update the aws secrets manager. Not the other way around. Cron scripts can be defined for it.
- Both the old and new secret should be valid till the app start using the new secret.
- One secret can be shared by multiple components. Need to add tags on the secret and iam role should use tags to select which secrets should be available to components.
- IAM roles should be used by AWS components to connect to other AWS components like AWS RDS.
- IAM roles should be used to get client id and secret from secret manager and then use that to get short lived OIDC tokens from IDP like Okta to connect to IDP compliant applications.
- No passwords or long living tokens should be used for communication.
- DAPR mTLS and access policies for connecting to other kubernetes services and IAM role service account to connect to aws services outside kubernetes.
- https://www.redhat.com/en/blog/gitops-quality-life-tips (opens in a new tab)
- Dont use webhooks because then we need to install argocd api server and have network connection open from git provider to argocd api server. Better to just wait for configured refresh interval like 10 seconds. CI/CD pipeline can wait 10 seconds after making the change to GitOps repository
- Even though pod was stuck in pending state due to no nodes available, health of the argocd application was progressing because deployment status had desired number of replicas. CI/CD pipeline should check for ArgoCD events like synced and health status as healthy within specified timeout period. If not, then it should override the folder with previous commit which changed files in the folder and create new commit.
- Use an init container for argo-repo-server deployment to get repository credentials and update ~/.ssh/config through volume mounts.
- Set namespace in yaml files. kubectl may not be able to set namespace for crd resources
- Set annotations for both helm hooks and equivalent argocd hooks in helm charts. This way it will work with helm install and also argocd. If only helm hooks are defined in helm charts, use kustomize to set argocd hooks.
- Make your hook idempotent. They should run only when required. Like post-install should only run once during install.
- Annotate crd-install with hook-weight: "-2" to make sure it runs to success before any install or upgrade hooks.
- Annotate pre-install and post-install with hook-weight: "-1". This will make sure it runs to success before any upgrade hooks.
- Annotate pre-upgrade and post-upgrade with hook-delete-policy: before-hook-creation to make sure it runs on every sync.
- No need to use argocd notifications since jenkins ci/cd pipeline should look at kubernetes events for argocd status and after argocd reconcilation is successful and healthy, it should
- rollback the commit to GitOps repository
- fail the CI/CD pipeline
- run the tests.
- No need to use argocd applicationsets custom resource because we will have different GitOps repository for each component and not a monorepo for all components of an organizational unit. Also, we will use terraform to manage argocd application custom resources in argocd namespace. We will run static tests and policy as code in CI/CD pipeline before committing changes to the GitOps repository.
- When running helm template command set argocd application name as the release name. This is because argocd sets app.kubernetes.io/instance label = argocd application name and if we use this label in the selectors, it can be problematic if they are different.
- Set values explicitly for those generated by randAlphaNum function otherwise helm template will generate different values each time.
- Keep crd chart in a separate subchart folder inside the helm chart folder. Add it is as dependency for the helm chart.
- Give CRD resources a negative wave number using kustomize to have them deployed before any other resource. Try what happens if we add them as pre-sync hook ( We want crds to be deleted if app is deleted so maybe adding them as pre-sync hook will not work).
- Add a pre-sync hook to wait for the CRD apis to be available before deploying the custom resources.
We only install the Prometheus CRDs, AWS ebs csi driver, and the AWS cloud controller provider via kustomize immediately after cluster creation. We use kubeadm on vanilla k8s in AWS.
The way we install Argo and all the other core infra apps is that we create a repo for each app on our hosted Gitlab. Then for each project, we use a script to run helm template command that generates all the manifests, including kustomization, which we commit to the same repo. For Argo CD, these manifests include the read-only deploy keys, projects and self applications.
We use an “apps of apps” approach where we have a “self-argo” app that manages itself and then the “self” app that points to a kustomizion overlay for the cluster that uses the above repositories for all the core services. Things like AWS load balancer controller, kyverno, Prometheus, etc. The overlay does any “last mile” patches, stuff like setting cluster name or AWS region in pod containers environment variables. One of the patches we use is a LabelTransformer that targets all CustomResourceDefinitions and applies an Argo CD wave label with a negative number; this ensures all crds are installed first. The last resource we load is the Argo CD Applications for all the teams who will have access to the cluster which point to their own repository that they manage.
Additionally, we are using kube2iam and iam roles to grant access AWS resources
In ArgoCD, you can configure sync-waves to install first the CRDs and put those resources in a higher sync-wave with the sync option SkipDryRunOnMissingResource=true, so that it doesn't fail before applying
GitOps for CD - FluxCD
- CRD
- GitRepository, OCIRepository, HelmRepository and Bucket
- HelmRelease, Bucket, Kustomization
Google accounts
- Create account in google with examplebank.nonprod@gmail.com email address.
- Create account in google with examplebank.prod@gmail.com email address.
Github
- Create account in Github with email address examplebank.nonprod@gmail.com and username examplebank-nonproduction.
- Create organization in Github https://github.com/account/organizations/new?plan=free&ref_cta=Create%2520a%2520free%2520organization&ref_loc=cards&ref_page=%2Forganizations%2Fplan (opens in a new tab) with following inputs
- Organization name : examplebank
- Contact email : examplebank.nonprod@gmail.com
- My personal account
- Create organization in Github https://github.com/account/organizations/new?plan=free&ref_cta=Create%2520a%2520free%2520organization&ref_loc=cards&ref_page=%2Forganizations%2Fplan (opens in a new tab) with following inputs
- Organization name : examplebank-nonprod
- Contact email : examplebank.nonprod@gmail.com
- My personal account
- Create organization in Github https://github.com/account/organizations/new?plan=free&ref_cta=Create%2520a%2520free%2520organization&ref_loc=cards&ref_page=%2Forganizations%2Fplan (opens in a new tab) with following inputs
- Create account in Github with email address examplebank.prod@gmail.com and username examplebank-production.
- Create organization in Github https://github.com/account/organizations/new?plan=free&ref_cta=Create%2520a%2520free%2520organization&ref_loc=cards&ref_page=%2Forganizations%2Fplan (opens in a new tab) with following inputs
- Organization name : examplebank-prod
- Contact email : examplebank.prod@gmail.com
- My personal account
- Create organization in Github https://github.com/account/organizations/new?plan=free&ref_cta=Create%2520a%2520free%2520organization&ref_loc=cards&ref_page=%2Forganizations%2Fplan (opens in a new tab) with following inputs
Atlassian accounts for project management, ITSM and component catalog components
- Create account in Atlassian with email address examplebank.nonprod@gmail.com and subscribe to Jira (for project management), Jira Service Management (for ITSM) and Compass (for component catalog) products with url https://examplebanknonprod.atlassian.net (opens in a new tab)
- Create account in Atlassian with email address examplebank.prod@gmail.com and subscribe to Jira (for project management), Jira Service Management (for ITSM) and Compass (for component catalog) products with url https://examplebankprod.atlassian.net (opens in a new tab)
AWS organization management account
- Create AWS account with root user email address examplebank2024@gmail.com and AWS account name examplebank.
- Add MFA to root user.
- Create AWS IAM user with name administrator and permissions AdministratorAccess.
- Add MFA to administrator user.
devops-github code repository
- Use it to create Github repository template with following features.
- devops, docs and tests directories. ( No need for .github/workflows/docs since docs will be deployed by the CI/CD pipeline in different environments ).
- release/1.0 branch.
- devops/configuration folder will have configuration files for component, integrated, e2e, performance, production and dr environments.
CI/CD pipeline
- Create Github repository devops-jenkins in examplebank organization from the Github repository template.
- It will use packer to create Jenkins virtual machine image.
- It should install all plugins and configure using configuration as code.
- It will create Jenkins virtual machine on Data center (macos virtualization).
Component catalog
- Create Github repository devops-compass in examplebank organization from the Github repository template.
- devops-github, devops-jenkins and devops-compass components should be created. github repositories for them were created manually. jenkins jobs for them should be created.
- Compass should automatically create github repository and jenkins job for each component created in compass.
Cloud management platform
- Hashicorp cloud platform for Terraform, Vault, etc.
- Create component devops-hcp in compass.
- Jenkins CI/CD pipelines will use cloudify cli with terraform module blueprints to deploy infrastructure as code to private clouds, public clouds and kubernetes.
- It will use packer to create Cloudify virtual machine image.
- It will create Cloudify virtual machine on Data center (macos virtualization).
Organization code repository
- Create component examplebank in compass.
- Create terraform module in devops/infrastructure folder to
- create AWS organization with name examplebank.
- enable AWS identity center.
- Create unit tests in tests/unit folder for the terraform module.
- Create temporary shell script in devops/cicd_pipeline folder to
- install terraform and terratest.
- run the unit tests.
- run the terraform module.
- update devops/configuration files
- AWS control tower adds a lot of infrastructure whose code is not stored in terraform so we are not using AWS control tower.
- Temporary shell script will be replaced by Jenkins CI/CD pipeline when Jenkins component is installed in devops organizational unit.
devops organizational unit code repository
- Create component devops in compass.
- Create terraform module in devops/infrastructure folder to
- create AWS organizational unit with name devops in AWS organization examplebank.
- create AWS organization accounts for component, integrated, e2e, performance, production and dr environments of devops organizational unit.
- service control policies and iam identity center.
- install infrastructure like aws eks which is common for all components deployed to the devops organizational unit.
- Create unit tests in tests/unit folder for the terraform module.
- Create temporary shell script in devops/cicd_pipeline folder to
- install terraform and terratest.
- run the unit tests.
- run the terraform module.
- update devops/configuration files
- AWS control tower adds a lot of infrastructure whose code is not stored in terraform so we are not using AWS control tower.
- Temporary shell script will be replaced by Jenkins CI/CD pipeline when Jenkins component is installed in devops organizational unit.
Configuration
- Configuration should be stored in devops/configuration of examplebank organization, devops organizational unit, jenkins and other components.
- CI/CD pipeline should package these and push to artifact repository
- CI/CD pipeline should deploy these configurations to configuration management component like Hashicorp Vault during the deployment to respective environment.
- When the application starts, it should download this configuration according to authentication and authorization setup by CI/CD pipeline and merge the configuration with component configuration overwriting the organizational unit configuration which overwrites the organization configuration.
- Yaml files should be used to store the configuration due to requirements for dictionary and array data structures.
Secrets
- Components should use keystore and truststore for inter communication authentication and authorization.
- For each environment, they should get their keystore and truststore from secret management component like Hashicorp Vault according to authentication and authorization setup by CI/CD pipeline and use their CA certificate key from keystore and trusted CA certificate keys from truststore to communicate with other components.
- secret management component like Hashicorp Vault is responsible for rotating the CA certificate key.
- Middleware like databases are also components.
- Passwords should only be used if keys are not supported. This is because with keys, you have 2 way authentication and authorization. Server checks if client CA certificate key is in its truststore and client checks if server CA certificate key key is in its truststore. CA certificate keys are rotated from the same CA. CA certificate keys are long and difficult to brute force. Internally the mechanism uses short lived session tokens after initial handshake with the CA certificate keys.
- security organizational unit is responsible for secret management components like Hashicorp Vault.
Code repository component code repository
- create devops-github component code repository
artifact repository component code repository
- create devops-nexusrm component code repository
configuration management and secrets management component code repository
- create security-vault component code repository.
- need to first create security organizational unit code repository and CI/CD pipeline.
Identify code repository application
- github
Identify CI/CD pipeline application
- jenkins
CI/CD pipeline application infrastructure component
- Code repository
- Create devops-jenkins-infrastructure code repository
- This will use shell script or github actions to create Jenkins executables, archives, installers, container images, helm charts and virtual machine images for different operating systems and middlewares. It will install required plugins like configuration as code plugin as part of the image.
- This will use shell script or github actions to create Jenkins infrastructure and configure it using configuration as code
- CI/CD pipeline
- It will also create devops-jenkins-infrastructure CI/CD pipeline
Code repository application infrastructure component
- Code repository
- Create devops-github-infrastructure code repository
- CI/CD pipeline
- Update devops-jenkins-infrastructure code repository to create devops-github-infrastructure CI/CD pipeline
Identify ITSM with ITAM/CMDB/Component catalog application
- Jira service management (jsm) with Atlassian compass (component catalog)
- ITSM is has all features of helpdesk and servicedesk and more features so organizational unit for the application is itsm.
- Atlassian compass
- Affected services in Jira service management are synced with Atlassian compass components
ITAM/CMDB/Component catalog application infrastructure component
- Code repository
- Update devops-github-infrastructure code repository to create itsm-compass-infrastructure code repository
- CI/CD pipeline
- Update devops-jenkins-infrastructure code repository to create itsm-compass-infrastructure CI/CD pipeline
- add custom fields like organizational unit and application to compass settings -> custom fields.
Components code repository and CI/CD pipeline
- For each component added to Atlassian compass, devops-github-infrastructure and devops-jenkins-infrastructure CI/CD pipelines should create corresponding code repository and CI/CD pipeline.
- Events or webhooks