CTO’s Guide to Saving Cloud Costs

6 min readJan 19, 2024

The technology is changing so the way does things. If you wanted to build a new system to serve dynamic content 10 years ago I would advise you to to get a hosting package, 5 years ago the same question would result in me advising you to install a virtual host and serve various virtual guest machines. A year ago I would advise you to get an Ec2 instance with DynamoDB backend and serve your content through an Application Load Balancer, 6 months ago the advise would be to use Lambda, last month it would be Lambda with Lambda layers to optimize the execution time.

So why am I writing all this? It is quite clear that the advance in technology is way faster than it was in previous years. While 10 years ago the IT departments would stick with the same technology for at least 3–5 years now this gap has been reduced to a couple of months.

The age of making big bang decisions on technologies has already passed.

At the moment we are still going through the transformation to cloud. On-premises workloads are being migrated to cloud, or at the worst- case scenario they are connected to cloud via Customer Gateways (BGP supported hardware devices) and Virtual Private Gateways.

Then as executives, leaders of organisations where we are expected to drop costs and provide a better service, how are we going to make sure the gaps between the teams and the fast-changing technology is closed? How can we make sure we do use the correct technology, hardware, software, licenses and continue replacing these pieces for years to come while making sure the impact of these changes, does not affect the business continuity?

So here it is time we invent a new name what I call CCO, Continuous Cost Optimization led by a group of people gathered from different departments whose job is to find, track, replace those technologies while making sure the stakeholder management and the visibility is done correctly. I will call them CCOS, Continuous Cost Optimisation Squad.

The group should be made up of the best(yes the best) of each department:

DevOps or Site Reliability Engineering members leading the whole group. For me being in the middle of infrastructure, development, business DevOps have the best technical visibility to be able to achieve this role
Development members This member should have knowledge of the whole development of the company technically. Mostly these folks are the best ones that have deep knowledge in different programming languages and the ones that follow the latest trends.
QA / Release Management members Most of the time technical folks are not the best at communicating. QA / Release management, as their jobs call, do have the necessary contacts in the company as well as they are able to clearly communicate the incoming changes and the service disruptions getting the necessary input from the business side as needed.

So we have an autonomous team that has deep technical knowledge on infrastructure/automation/software development as well as testing and communication.

Good, what they do lack now is the responsibility and authority to changes. Giving them the responsibility of finding cost savings, making the savings visible to the other teams, actively leading the change will enable them to move forward.

Making responsibilities VERY CLEAR at the beginning is one of the most important parts of creating autonomous, respected teams that are able to get things done without being stopped by bureaucracy. Once the team is formed it is our duty as executives to really stress the importance of cost savings as well as the duties of the cost savings team. Once you get the buy-in from the rest of the team’s everything is written and sent in a written format. Why? Things change, priorities too but cost savings shouldn’t.

Armed with experts, responsibility, and authority this group now can focus on the following:

Absolutely no bureaucracy: This is a hands-on getting things done the team. No unnecessary meetings with anyone

Visibility & Stakeholder Management: The group has the responsibility to manage stakeholders and give visibility. This is the second most important duty of this team. Not only sending out emails they team should deliberately make sure all the teams that will be affected with the change know the date /time, but all other teams did also hear about the change and all the technical Executive team is given the visibility.

Precision planning: This team cannot withstand unless careful to the last detail planning is made. Once the optimization that needs to be done detected the team puts out a plan taking into consideration not only the actual infrastructure but all other applications and infrastructure pieces that will be affected by this change.

The search for cost savings

Well, you’ve created the team, gave them responsibility and authority to do so. Time to let them work.

Tag it all

The team should create segments and divide them into areas, departments, products the best way to do it is to start tagging and not stopping until every bit of your infrastructure is tagged.

Cost Aware Development Teams

Use AWS cost reports and create dashboards for each team manager/director to be able to check their costs easily. This is the first step to create cost aware development teams.

Reserve all stable workloads

Have the teams plan for a year ahead and pass their propositions to Continuous Cost Optimization Team. The team then will evaluate and decide if the resources are worth reserving. A tip if you are sure the resources will be using the same family (i.e. m4.xxx) and more than 9 months a year it is worth reserving.I use the following format to get information from my teams. You can create something very similar.

Spot instances for all

Continuous Cost Optimisation team need to convert whatever cannot be reserved and with an unstable usage of ec2 to spot instances. You can run spot instances directly or run your docker machines with docker on top of Amazon ECS. You can save up to %90 of ec2 costs:

Destroy and Recreate

One of the biggest advantages of using the cloud is to be able to create Cloud Formation templates to convert your infrastructure to code. Using these templates you can delete and re-create instances. If your teams need a staging environment only 15 days a month why keep it running? Have your Cost Optimisation Teamwork on templates and give your development directors power and responsibility to shut down their environments while not in use.

Scheduled resources

Your teams only work at dev environments during 08.00–19.00 every day? Why not shut these instances down? Get a cloudwatch alarm setup and shut them up with a lambda function. Here is what you can save:

Schedule Start Time Stop Time Hours Saved You Save
Mon-Sun 8.00 a.m. 8.00 p.m. 84 per week 50%
Mon-Sun 9.00 a.m. 5.00 p.m. 112 per week 66%
Mon-Sat 8.00 a.m. 8.00 p.m. 96 per week 57%
Mon-Sat 9.00 a.m. 5.00 p.m. 120 per week 71%
Mon-Fri 8.00 a.m. 8.00 p.m. 108 per week 64%
Mon-Fri 9.00 a.m. 5.00 p.m. 128 per week 76%

Hidden unused resources

With the deletion of EC2 instances, EBS volumes and their snapshots are kept in order to save any critical data. This can become big spending if not controlled correctly. Have your teams do a periodic search on these and delete these volumes. The same goes for Unusued Elastic IPs and ELBs

Cost optimisation is everybody’s responsibility. But it is your duty as an executive to create the necessary steps, teams and facilitate the creation of cost aware teams.

CTO’s Guide to Saving Cloud Costs

Written by Özgür Özdemircili

No responses yet