Building Azure Costs was and is a long journey implementing a scaling and growing software as a service application. The major goal of all design and architecture decisions is that it scales infinitely. Successful marketing campaigns or great new features may turn the service down. Thanks to the Microsoft Azure platform and their managed platform as a service offerings, it was possible to invent this kind of solution. This blog article series has the intention to give an inside look into this journey and highlights some learnings we had on our way.
One of the most important and earliest steps to convert a prospect to a user of a payed plan or a free trial is the Sign-Up or Log-In process. When this process is broken customers can’t check out your service and you will lose the option to convert a prospect to a customers. If you think this could never happen that a basic process like Sign-Up can be broken ever, we at Azure Costs experienced another situation. As soon the core processes are not monitored very well, it’s not the question if they will fail, it’s only the question when they fail. Huge platforms like GitHub or Azure as self will recognise that by just watching 15 minutes on the system. If no Sign-Up happens something must be wrong. When you start with a SaaS application your prospect pressure is probably not that high. There are several root causes which needs different counter actions to cover them, bellow some examples are highlighted:
Your app is supporting external identity provider:
Many SaaS applications also Azure Costs are supporting an integration in external identity providers like Azure Active Directory, Microsoft Accounts or Google Accounts. Even GitHub Accounts are very popular when you more focused on the open source world or when your product become more technically. In an optimal world you would get an error from the identity provider which can be tracked from your APM service like Stackify, NewRelic or Airbrake. But more often we was seeing the situation that the prospect stuck in the inner process of the identity provider. Because of that we invented a system based on BrowserStack to emulate at least one times every hour a couple Log-In and Sign-Up scenarios as it would be done from the prospect as self. This gives us the proof that our authorisation system works as expected.
Implement:
Automated Login based on web automation tools like BrowserStack or Sauce Labs
Your business logic throws exceptions because of breaking changes:
In the case your business logic throws exceptions normally your prospect will get an error page which does not show the internals of your application. It’s for sure a bad idea to highlight the stack trace directly at the prospects face. Beside it does not look nice, it is a security risk because an attacker can learn a lot of your service from the stack trace you would expose. Tracking exception means implementing a monitoring and an APM solution. Microsoft is offering a service called Azure Insights which should be reviewed because it comes as part of the Azure Cloud. More powerful services we are using are Stackify and Airbrake. These services ensure that our staff gets a push notification for every single exception in our code. It’s even not expensive because the simplest plans are starting by around 15$ per month. From our perspective, this couple cups of coffee are well invested money to keep your service healthy. Don’t forget covering all your components, especially background worker and WebJobs are often forgotten because an extra mile is necessary.
Implement:
Exception tracking based on APM services like Stackify and Airbrake.
Your Data-Store has performance limitations:
Another challenge is often that managed services in Microsoft Azure but also in the Amazon Web Services has technical limits. Microsoft describes every limit in this document here. There are two main counter actions to handle this and preventing your prospects for Sign-Up or Log-In. The main counter action has something to do with architecture decisions. When you design your software be aware of these limits and probably invest more in micro services which are using separated storage backends. This would decrease the pressure from a single monolithic data-store. More often modern APM systems are able to monitor performance KPIs of your used data-store and this measurement should trigger alerts when you hit a KPI.
Implement:
Invest in micro services and implement performance KPI monitoring.
When Azure Costs was broken the first time a couple years ago we realised more focus on all of these categories was necessary and since we implemented Exception Tracking for backend, workers and frontend, performance monitoring for our data stores and web automation for external login providers we never lost a prospect in the Sign-Up process anymore.
If you are interested seeing this in action just visit azure-costs.com and try to Sign-Up. We are interested of your personal experience so please use the comment option in this blog to give us more hints in which areas you are investing to increase the service quality of your Software as a Service application.