Kubernetes: Implement Graceful Docker Container Shutdown

Containers and Kubernetes as Container Orchestrator make it much more comfortable and more uncomplicated to run, operate and scale complex applications at ease. Long running tasks should be executed in worker services connected loosely to the app via reliable message transfer and queues. There are several technologies on the market but in the end the patterns a very similar.


Architecture: A process registers on a message queue via polling, web sockets or some other technology. This pattern blocks to process even for the time it’s waiting and doing nothing (idle mode) but also when a message is processed.

Continuous Deployment, Continuous Integration or the Immutable Infrastructure approach increases the number of deployments in a production environment dramatically. This means it is most likely that Kubernetes as the container orchestrator would shut down the container in the middle of a long-running task by sending a SIGTERM signal. This would end up in a hard shutdown which just interrupts the current work and leads to a lot of problems. It could also happen that your managed Kubernetes Cluster Service Provider just reorganizes the infrastructure without knowing the usage pattern of your application. It ends up in the same hard shutdown.

What means a graceful shutdown?
A graceful shutdown means that the hosting infrastructure announces a stop request and gives the container the chance to interrupt or finish pending work. As long the container does not commit the shutdown the infrastructure will wait, wait for updating the container image or moving the container to another host.

How to implement a graceful shutdown?
Kubernetes does not know a lot about the usage pattern of the containers, but it supports a preStop hook which can be used for a graceful shutdown. The preStop hook is executed inside the container and blocks the termination process in Kubernetes until the grace period is over. Implementing a graceful shutdown is now easy by configuring a preStop hook which blocks as long as the container is doing his work, the hook should also notify the container about this job because Kubernetes just waits the grace period. The grace period should be configured on the deployment level to a useful size which fits the usage pattern of your application.


Sample Based on the Azure Worker CoreHelpers
Based on the Azure Worker CoreHelpers a simple worker can be implemented which is establishing a preStop Hook based on files. Every other IPC technology can be used if required!

Visit the project

Establish Soft and Graceful Shutdowns where ever you can to add more reliability to the part of the service your customers will never see. Any other ideas regarding graceful shutdowns or workers in containers, just drop a comment below…


Kubernetes vs. Azure App Services WebJobs – Why I switched

When Microsoft was launching Azure during the last years, the team invented a service called Azure WebSites in the early days. This service was an amazing step in the direction of making a very complex topic as easy as possible. Just today we are looking for a similar thing in Amazon Web Services without luck.


Nowadays the service is called Azure App Services and can be called the backbone of Microsofts strategy to make the life of Software as a Service developers easier. Definitely Azure App Services is the solution for headache free operations of Web Applications because you get so many important features for free e.g.:

  • Load Balancing
  • SSL Certificate Management
  • Patch Management and Operating System Updates
  • Deployment Slots

Besides that, an Azure App Service is fault tolerant even if you operate just a single instance and allows you to scale out based on many indicators like CPU, RAM or Queue Length. Adding WebJobs to App Services was the missing piece to bring the workload of a Web Application into the background, e.g. the processing of uploaded images or data reports which needs to be prepared.

As soon as Software as a Service offerings are growing (e.g. Azure Costs) the background work is something you could start worrying when using Azure App Services. The scale out features look great for small apps but have their pitfalls e.g. you can only scale out on instance level and not on job level so when you want to increase the amount of workers in App Services you need to increase the instance count. The other challenge we had is that the WebJob shares CPU and RAM with the Website process. This could influence the perceived performance of your application. This challenge could be covered by operating several App Service Plans in parallel and let the WebSite run in a different plan to the background worker.

If you are at this point the costs perspective comes into the game. Operating a App Services Plan which just runs WebJobs is a great thing but the reality is that you don’t need all the nice WebSite features and normally you want to have more granular ways for scale out. We at Azure Costs started searching for an alternative which runs as well in Azure because we would like to keep the traffic in one data center to keep an eye on these costs as well. During this research we identified that container technology could be very helpful because it allows us to spin another 20 containers just for scale out in a specific area.


Microsoft offers AKS (Azure Kubernetes Services) which gives you a fully managed Kubernetes Cluster running on Azure without having the pain to operate control plans. So it becomes very attractive also for smaller clusters consistent out of 2 up to 5 servers. Let’s review if and when yes, how AKS can cover the new requirements.

Requirement 1: Background Workers can’t influence the Web Site Host directly
We decided to let the WebSite be on an Azure App Service plan and use all the features like SSL certificates, Instance based Scale out based on CPU and RAM or Load Balancing. So by design the Kubernetes Cluster just runs on different hosts as the WebSite and they never influence each other on a direct way.

Requirement 2: Scale-Out is possible on a per job level
Kubernetes relies on docker containers which means what was a WebJob before, now becomes a container and it’s possible to scale out the containers as part of a deployment definition easily. Kubernetes has also native support for Azure to scale out the underlying servers called node pools in Azure. So the system becomes fully flexible and can breath as you need.

Requirement 3: Dedicated Resource Allocation for Job-Worker are possible
Azure App Services is totally shared which means the jobs are fighting with the Webserver for resources like RAM and CPU. It’s even not possible to allocate a minimum amount of CPU tickets to a dedicated worker. It could happen very easily that you overload the CPU when a lot of background work happens. Kubernetes has a different concept and strict Quality of Service classes (QOS) to avoid this topic and make it manageable.

Requirement 4: Patch-Management comes for free
Patch-Management is somehow the downside of AKS because you are operating virtual machines in the cloud and it’s up to you to trigger updates and so on. We decided to follow the idea of immutable infrastructure which means when we need an update we just re-deploy the whole cluster and remove the old one. It gives use all the capabilities Microsoft invests in his virtual machine image gallery for free.

Finally we defined for us that a good structure is to bail out all background work into a growing Kubernetes Cluster and dockerize our whole background logic. Since we did the change we have full control of resource allocation and can easily scale out on a per job level which means more people need on demand reports which should not block the IIS threads the system scales up this group of workers. Over the night when we import tons of data, the system scales up these kind of workers very easily.


Reviewing the costs speaks also a positive language. As Microsoft is not charging you for cluster management you just pay the virtual machines, and this means normally machines on premium storage for approx. 90% of the price of the corresponding App Service plan. Normally the machines can have double of the RAM the App Service Plan offers.

I can recommend the combination of Azure App Service Plans for your WebSite and WebServices and a Managed Kubernetes Cluster know as AKS for all the work behind the scene. How do you think about this architecture, did I miss something? Do you follow a different approach?

2018 will make cloud spending optimization more efficient and easier

Cloud Costs

What a rush, 2017 is over and Azure Costs, a growing cost management and optimization platform, delivered tons of great features and improvements.
We made our support for Cloud Service Providers available and now allow every CSP to implement complex N-Tier models and billing portal capabilities. This solution gives every customer cloud-vendor-independent transparency of cloud spendings and allows to leverage optimization potentials.

We are very excited to also start the new year with a firework of great features and functional enhancements:

Amazon Web Services availability – Welcome to Cloud Costs
Focusing on just a single cloud provider is like having only one single data center without any redundancy and a big vendor lock on. Customers are focusing more and more on a virtual data center strategy to take care of high availability, fault tolerance and disaster recovery.

We are very happy to support this activities by offering the integration of…

View original post 444 more words

Azure Costs: Revised Spending Analytics Engine for Cloud Solution Providers

Today we’re very excited to announce the availability of the revised spending analytics engine for cloud solution providers. All of the enhancements focus on more accuracy and productivity, when working with your customers on a daily basis.

Since Azure Costs is supporting direct cloud solution providers also called CSP Tier 1, we work together with many different CSPs around the globe. The analytics engine in Azure Costs is one of the most important components and ensures that the spending data is aggregated well and all processes like up-scale a resource are outlined and calculated correctly.


As a service in Microsoft Azure consists of many parts we’ve decided to focus on the development of the analytics engine to understand the different meters correctly. This gives us the possibility to invest more in predictions and recommendations based on machine learning and artificial intelligence. Especially complex situations like an out-scale event can now be visualised very easily.  

azure costs - azure cloud cost optimization made easy 2017-10-06 20-26-06

The new engine will be rolled out to all customers within several waves and after an opt-in phase it will become our standard engine. For now every CSP needs to switch directly in the data view generated with the new engine by selecting the action shown above.  In addition the spending dashboard of every single customer can be moved to the new engine manually as show bellow:

azure costs - azure cloud cost optimization made easy 2017-10-06 20-29-58

This prevents that we disturb customers of a cloud solutions provider by accident.

Interested in the new feature?
Getting started with Azure Costs for CSPs is very easy, just visit our portal just for Cloud Solution providers and enroll into the CSP program as described above. To become part of the public preview of the CSP support an existing enterprise plan is required.

Any questions, wishes or ideas? Try our feedback portal or drop a mail to help@azure-costs.com.

Software as a Service – Never break your sign-up process

Building Azure Costs was and is a long journey implementing a scaling and growing software as a service application. The major goal of all design and architecture decisions is that it scales infinitely. Successful marketing campaigns or great new features may turn the service down. Thanks to the Microsoft Azure platform and their managed platform as a service offerings, it was possible to invent this kind of solution. This blog article series has the intention to give an inside look into this journey and highlights some learnings we had on our way. 


One of the most important and earliest steps to convert a prospect to a user of a payed plan or a free trial is the Sign-Up or Log-In process. When this process is broken customers can’t check out your service and you will lose the option to convert a prospect to a customers. If you think this could never happen that a basic process like Sign-Up can be broken ever, we at Azure Costs experienced another situation. As soon the core processes are not monitored very well, it’s not the question if they will fail, it’s only the question when they fail. Huge platforms like GitHub or Azure as self will recognise that by just watching 15 minutes on the system. If no Sign-Up happens something must be wrong. When you start with a SaaS application your prospect pressure is probably not that high. There are several root causes which needs different counter actions to cover them, bellow some examples are highlighted:  

Your app is supporting external identity provider:
Many SaaS applications also Azure Costs are supporting an integration in external identity providers like Azure Active Directory, Microsoft Accounts or Google Accounts. Even GitHub Accounts are very popular when you more focused on the open source world or when your product become more technically. In an optimal world you would get an error from the identity provider which can be tracked from your APM service like Stackify, NewRelic or Airbrake. But more often we was seeing the situation that the prospect stuck in the inner process of the identity provider. Because of that we invented a system based on BrowserStack to emulate at least one times every hour a couple Log-In and Sign-Up scenarios as it would be done from the prospect as self. This gives us the proof that our authorisation system works as expected.  

Automated Login based on web automation tools like BrowserStack or Sauce Labs 

Your business logic throws exceptions because of breaking changes:
In the case your business logic throws exceptions normally your prospect will get an error page which does not show the internals of your application. It’s for sure a bad idea to highlight the stack trace directly at the prospects face. Beside it does not look nice, it is a security risk because an attacker can learn a lot of your service from the stack trace you would expose. Tracking exception means implementing a monitoring and an APM solution. Microsoft is offering a service called Azure Insights which should be reviewed because it comes as part of the Azure Cloud. More powerful services we are using are Stackify and Airbrake. These services ensure that our staff gets a push notification for every single exception in our code. It’s even not expensive because the simplest plans are starting by around 15$ per month. From our perspective, this couple cups of coffee are well invested money to keep your service healthy. Don’t forget covering all your components, especially background worker and WebJobs are often forgotten because an extra mile is necessary.

Exception tracking based on APM services like Stackify and Airbrake. 

Your Data-Store has performance limitations:
Another challenge is often that managed services in Microsoft Azure but also in the Amazon Web Services has technical limits. Microsoft describes every limit in this document here. There are two main counter actions to handle this and preventing your prospects for Sign-Up or Log-In. The main counter action has something to do with architecture decisions. When you design your software be aware of these limits and probably invest more in micro services which are using separated storage backends. This would decrease the pressure from a single monolithic data-store. More often modern APM systems are able to monitor performance KPIs of your used data-store and this measurement should trigger alerts when you hit a KPI.

Invest in micro services and implement performance KPI monitoring.

When Azure Costs was broken the first time a couple years ago we realised more focus on all of these categories was necessary and since we implemented Exception Tracking for backend, workers and frontend, performance monitoring for our data stores and web automation for external login providers we never lost a prospect in the Sign-Up process anymore.

If you are interested seeing this in action just visit azure-costs.com and try to Sign-Up. We are interested of your personal experience so please use the comment option in this blog to give us more hints in which areas you are investing to increase the service quality of your Software as a Service application.

Azure Costs: Feature Update for CSP Support

Today we’re very excited to announce some great feature updates for our Cloud Solution Providers platform. All the updates focus on more accuracy and productivity, when working on a daily basis with your customers.  


The following list describes all features in detail and gives you a brief overview on how to use them:  

Defining Cloud Solution Provider Margins allows the CSP to differentiate between the pricing Microsoft offers to CSPs and the price the CSP offers for the customer. Just define the different pricing tier per customer individually

 azure costs - azure cloud cost optimization made easy 2017-07-21 07-35-23
Generate Cloud Solution Provider Filters allows to hide services which are not charged for the customer. Only services targeted by the CSP filter will appear in the customer portal. azure costs - azure cloud cost optimization made easy 2017-07-21 07-36-13
Offer a customer spending portal beside a reseller spending portal allows your customers to manage the spendings by their own. This portal gives your customers 100% the same features incl. team management as if they would sign up to Azure Costs with an enterprise edition directly and an EA contract. azure costs - azure cloud cost optimization made easy 2017-07-21 07-36-48

Interested in the new feature?
Getting started with Azure Costs for CSPs is very easy, just visit our new portal just for Cloud Solution providers and enroll into the CSP program as described above. To become part of the public preview of the CSP support an existing enterprise plan is required.

Any questions, wishes or ideas? Try our feedback portal or drop a mail to help@azure-costs.com.


Hey node developers, switch to .NET Core – now!

Several years ago I started building a bigger project as a Software as a Service application. Beside all the different technical requirements, being able to work directly on my MacBook Pro without starting virtual machines, was a big wish. At this time a tool chain based on Node.js, Express, NPM and WebStorm was available. Over the years building backend services with Node.js, and this means with JavaScript, felt like rapid prototyping. Getting started is very fast and lightweight but when the project grows compile time features like a strong type system are missing. Year over year I did reviews how to get esp. the beauty of C# and the powerful compiler infrastructure of .NET back.


In the last month Microsoft released Visual Studio for Mac and with it the investments into .NET Core increased. Building backend services in Visual Studio for Mac based on .NET Core and ASP.NET Core including the out of the box support in Azure App Services is definitely what I was looking for.

Because of that I decided to give it a spin. After a couple weeks working with the framework I can say it was the right decision because of the following key reasons:

  • Full managed and type safe environment based on the powerful C# compiler
  • Broad ecosystem for components similar to NPM via NuGet
  • Ability to convert runtime errors in compile time errors 
  • Hosting the results on Linux, Docker Containers or just Azure App Services
  • ASP.NET Core comes with out of the box easy to use Dependency Injection system
  • ASP.NET Core learned and steeled the best things from the node + express chain
  • C# Attribute and Extension-Classes are unbeatable to beatify your code 

And last but not least everything works well on my MacBook Pro without the need of a single virtual machines – Thanks Microsoft for letting me keep the platform I love!

Azure Costs: Public Preview CSP Support

Today we’re very excited to announce the start of the public preview for Cloud Solution Providers.  After several weeks and months of continous improvements in the azure costs platform, we now start the public preview phase of the CSP support.  


The great new CSP portal gives you access to the spendings your customers are generating. There are a couple of use cases we would like to point out:  

Enroll into the CSP program:
When visiting our new CSP portal the system will require enrolling into the CSP program. You can do this with an existing Azure Costs account or in case you would like to differentiate between internal spendings and customer spendings, just use an additional account!

Register Accounts:
Microsoft requires every CSP to sign into the CSP program for every geographical region separately. As an internal reseller you will have accounts for USD, EUR or AUD and several more. Azure Costs allows you to register every single CSP account, to track costs in different currencies and countries separately.
Activate Customers:
During the registration process Azure Costs imports all existing customers. This does not mean Azure Costs tracks spendings. If you would like to track spendings for a specific customer, activate this customer in the “New/Not-Activated Customers” widget. The spending information of every activated customer will be imported automatically.

Interested in the new feature?
Getting started with Azure Costs for CSPs is very easy, just visit our new portal for Cloud Solution providers and enroll into the CSP program as described above. To become part of the public preview of the CSP support an existing enterprise plan is required.

Any questions, wishes or ideas? Try our feedback portal or drop a mail to help@azure-costs.com.


Git Deployment – Shallow Clone Support in Azure App Services – The missing piece

Azure App Services and the open source project KuduSync behind this great Azure Service is a huge time saver for agile teams. Especially DevOps teams will like the continuous deployment features.  Personally I focus a lot on the Git based deployment which enables you to roll back and forward in seconds whenever it is required. Beside that, it is possible to work with standard tools available on market to implement continuous deployment or integration.

Deployments - Microsoft Azure 2017-07-18 06-48-11

When I started working with Azure App Services building Node.js apps, I wrote a little node package called Azure Deploy. It allowed me to push changes as part of a build process directly into the Azure App Service. Originally, CodeShip was the service of choice for the build process but since I need to support Git Repositories beside GitHub, BitBucket and GitLabs, I migrated to Visual Studio Team Services (VSTS) and the integrated build platform.


After several months and hundreds of deploys, which means hundreds of commits to the local git repository, it became a fairly complex and fat thing. This is normally not a problem but my Azure Deploy package clones the local git repository from Azure App Service to a temp directory and copies the build output over it. Last but not least it commits and pushes the changes back to Azure. The big repository took more than 4 minutes to clone so I was wondering if I can use Shallow Clone to get only the latest state of the repository.

This idea works well on Unix based git servers, on GitHub or even in Visual Studio Team Services as well. But when you try to clone a local Git Repository of Azure App Services via Shallow Clone option

git clone --depth 1 https://github.com/jquery/jquery.git jquery

it ends up with an error. The error and its background is also documented in the GitHub project of KuduSync here. So what to do now?

Another nice option of Azure App Services is the option to pull changes from a Git Repository instantly after a commit. This works well in VSTS, based on GitHooks but also with GitHub and a couple other platforms. It’s also possible to clone via shallow clone flag from these repositories which closes the loop. The final solution is to commit into a VSTS or GitHub hosted publishing repository which triggers a pull deployment in Azure App Services.

At the end this change reduced the whole deployment time from 5 up to 9 minutes, down to approx. 90 seconds. You can find the updated Azure Deploy component in the NPM registry here.

Feature Announcement: Smart Compare

We’re very excited today, to announce the release of a game-changing new feature for azure costs: Smart Compare.

Smart Compare allows our customers to conveniently compare their monthly cloud costs with the costs of any previous month. By simply choosing the relevant months, azure costs now highlights cost spikes and deflections, so that our customers can focus on the costs they are really interested in – and ignore those they’re not.

These results can then be sorted and powerful filters allow our customers to limit what they see, to only what they’re interested in.


We are sure that this great feature will help our costumers to identify the real cost drivers and make informed decisions on cost optimization strategies.

How to get started?
Comparing cloud costs is this simple: The Smart Compare and sorting functionality can be used right now as part of our Preview UI. Just select multiple months as shown above, to identify cost drivers, spikes and deflections.

Interested in the SmartCompare feature?
Try the new feature today by simply logging into your azure costs portal. Smart Compare is part of every paid plan, starting with the Professional subscription.

Any questions, wishes or ideas? Try our feedback portal or drop a mail to help@azure-costs.com.