Fail Culture: The way to progress
It is exciting being a part of a gigantic project. You are ready to handle the tasks, you have your laptop clean and ready with only one folder in your desktop called the project and you start creating your outlook rules to handle incoming requests and tasks.
The day starts the first email arrives is someone from platform team alerting that it is a blocking issue that the QA environment is not stable. You get in the call it is time to work your experience and find where the problem may be.
Then the hell breaks down you learn that the people that are in the call are actually are not the people that need to be there. They are mere proxys who are not able to answer any questions, neither they are able to do a single click without the authorization of their bosses. You have not come here to give up! You put your jacket on and with a big smile you go to see the manager. Your smile reverts when he starts telling you that there are pending another 6 other teams and that his boss needs to talk to some other director to order a new server that is necessary to do the releases.
There are 10 urgent deployments that are in the queue, a pending meeting between the directors and QA team is complaining they are losing time and money.
What would you do? Or better what would you do to prevent this from happening?
We always are looking for one day where we get up, go to work and be able to finish our pending tasks such as going through the new technologies with your team to better the speed of your web app, or getting to read the detailed application manuals of your company to gain a better and in-depth understanding of how the whole echo system works. But this is normally not the case.
Let’s face it incidents and problems are a great part of our working lives and we do have a love-hate relationship with them. They are bad in the sense that they are capable of stopping the production environment or affecting the productivity of our team or worst effect directly to our clients and cause damage to the company we are working for.
But they are also the ones that give you the chance to see how your whole echo system works, where the weak points are and what can be improved. This information, which comes with a price, is a valuable asset if you really get to use it. Post-mortem and retrospective sessions following a big incident are the times where you are able to see where you have failed (yes firstly you and only then your team) failed to see. Converting this information to action is the second part where you decide with your team on what the procedure(I am using it in the very light sense of the word) to follow when this happens again(yes it will happen again). Spreading this information between the teams is another important stage of the resolution and prevention. Lessons learned should be shared with the rest of the teams.
- Don’t just put down the fire and forget but rather what caused the fire in the first place. Who knows one day you may stop a big one from happening.
- Do not become the bottleneck trying to achieve every little bit yourself. You are not that clever, neither you have 25 hours a day
- Do delegate and delegate real. Do not delegate meetings do delegate power.
- Do not create puppets, create leaders and people capable of responding and taking actions without you. Then ask for explanations.
- Surround your team with people that are responsible and that know how to deal with different situations.
- Allow yourself and your team to fail. We are humans, not chatbots (yet). It is not the failure that matters it is how you get up and what you have learned.
Now that you have all that you need. Stop crying and get to work ;)