The main target of this solution is to dramatically reduce the CI/CD process and make it scalable, so we don't need to worry about this for a while.
Based on the generated ideas, we are proposing the following process:
- Restrict deployment for one thing at a time. Never pile up unrelated changes into one deployment. In case the feature cannot be deployed (e.g. bug that was found in the pre-prod environment), roll back the code so others are unblocked and can be deployed without the breaking change.
- CI/CD process includes multiple steps. Reuse as much as possible from previous steps of the build, to avoid repetitive work
- Execute quality-related steps of the CI process in parallel to any human requiring activity (review, manual qa, etc...). For example, when the tests and checks are already passed and there are no additional code changes, never execute them again for that change.
- Possibly deploy to QA environment without waiting for the tests to complete. This requires the team's responsibility to test things on their local environments to reduce the chance of broken QA to the minimum
- Build the deployable package only once, then use that package to deploy into different environments
- Introduce feature flags as a safeguard for anything we do.
- Polish the rollback process is done in seconds with a button push.
Once the process change was implemented we reduced the release time from 2.5 hours to roughly 12 minutes from the time the change is approved (to the exact time it takes to deploy an image to the cluster)