Build your own Twitter – Part 2 – Azure Table Structures

March 26, 2015March 22, 2015 Dirk Eisenberg Tutorials azure, cloud, SaaS

Azure Table Store offers a very important scalability feature which should be used when working with timelines. The partitionkey in every table allows Microsoft to put entities on different servers. Let’s recheck the limits of Azure Table Store to make a right decision (http://azure.microsoft.com/en-us/documentation/articles/azure-subscription-service-limits):

Azure Table Store returns 1000 entities per page. If the result contains more entities the client needs to query several times -> A timeline service should never page to render the first timeline
Azure Table Store returns 2000 entities with 1KB of size per second as a guaranteed SLA –> A timeline service should never request more data per page to stay performant
Azure Table Store allows to store up to 500TB per storage account, it could be stored in one table or different –> A timeline service should be able to handle several storage accounts, at least theoretically.

With all this limitations in mind it’s possible to build a table structure for the timeline service as follows:

timelines
The timlines table contains all timelines the system has registered. The partitionkey of this table is timeline identifier so every subjects timeline can be stored on different nodes. The partitionkey should be a key generated from the subjects identification, e.g. liveid{{UID of LiveId-Token}}. This prevents the system to lookup an other table to get the timeline identifier when the subject tries to render them.In addition the event which should be stored on several timelines can be identified by his event identifier as the row key. This allows the system to also implement removal jobs because a multiple stored event can be identified as the single one.
subjectFollowers
the subjectFollowers are a list of subjects following a specific subject. The partitionkey of this table is also the subject identifier so it’s easy to get all followers of a subject. In addition the row key becomes important because it identifies the subject who is following someone else. This gives a system the option to find all followers of a specific subject and all subjects a specific subject is following very fast. Works well in both directions.

This simple data structure allows the service to handle hundreds of different timelines and relations. Especially the background worker can now identify on which timeline the event needs to be stored.

Last but not least, when it comes to requesting the timeline content it will only be returned in pages of 250 elements to stay healthy with the performance. An other page can be requested at any time when the user starts paging.

Build your own Twitter – Part 1 – A timeline service with Azure Table Store

March 24, 2015March 22, 2015 Dirk Eisenberg Tutorials azure, coding, SaaS

The Azure Cost Monitor and many other cloud services I’m working on, needed a timeline service to present aggregated events and actions similar to an audit trail. When I was thinking about this, it came to my mind that the requirements I had were very similar to a Twitter feed but with less features.

Let’s recap what is needed to build a timeline service in general:

The Timeline
The Timeline is associated with a subject and contains tons of events happened and triggered by other subjects the timeline owner follows.
The Event
The Event is the incident that happens in the real world and is stored in different timelines. This means every follower of a subject will get the event in his own timeline. So a single event can be stored in many different timelines.
The Subject
The Subject is someone or something which triggers an event. Normally it is a natural person with an own timeline but it could also be a piece of hard/software.
The Target(s)
The Targets are subjects with a timeline who are following other subjects. This means every subject becomes a follower or target as soon as it is following an other subject. Posting an event then means sending the message to the timeline of every subject following the sender.

The following picture should give a good overview about the entities in the timeline service:

Starting with this definitions in mind it’s possible to identify components needed for building a timeline service:

Storage for the timeline
The storage for the timeline needs to be able to offer a very fast read performance and an OK write performance. The performance should especially not be dependent from the amount of data in the system. I chose Azure Table Store for this because I can store the information in different partitions and get a clear read performance SLA for every partition. In addition to this it’s totally cheap and payable – also for startups.
Access broker to the storage
Normally NoSQL storage which can be used for timeline access needs a little helper to manage access. In general a RESTful web-service acts as an access broker and hands out pre-signed links to the timeline content. This ensures that no timeline data needs to go through the timeline service at all. Only the pre-authorised access links will be generated for the system. This also means that at the end the client SDK needs to handle the raw data for the timeline. This makes it a bit more complicated but it lets the performance stay in a good range. An other important operation needs to be implemented in the Access broker as well: Posting events to the subjects followers is a slow and complex operation the system needs to implement with a direct server call normally executed asynchronous with the help of a worker job.
Metadata storage for subject and targets
Last but not least the combination of subject & targets needs to be stored somewhere in the backend. Azure Table Store can be used for this as well. All the operations to create a follow-ship can be implemented in the timeline service as well but not the rights & permission checks to post or create relations because the timeline service should be used machine-2-machine with API tokens.

The following graphic shows the technical architecture of a good scalable timeline service based on Azure services:

The next part of this tutorial will focus on the correct and scalable table structure based on Microsoft Azure Table Storage.

Azure Table Store: How to backup safely

March 13, 2015March 11, 2015 Dirk Eisenberg Misc

Microsoft Azure Table store is an amazing, simple, cheap and powerful service of the Microsoft Azure cloud. The service is something between a real NoSQL database and a simple KVP-store. I did many projects in the last time where the Azure Table Store was just a fast read cache or also the whole persistency backend.

As soon as the tables in Azure contain important data for the application provided to customers it’s necessary to think about backup. Microsoft guarantees that the data can not be corrupted on the storage (check out my article about the different storage account options) but accidental deletion, data corruption during automated processes or just by mistake can still happen. Where people are working shit sometimes happens, nobody can change this.

Backing up table stores is not that easy as backing up blob storage based on the idea of stamp copies. Every table needs to be replicated into another table or exported to the blob account. In my current project we searched for the perfect solution and finally came up with the following stack of services: We are using the Azure Cloud Backup from RedGate to export all tables in a GEO redundant storage account. With this solution we get a daily backup of our tables in parallel to Azure SQL backups based on the Microsoft built in features. The backups are stored into a GEO redundant storage account which helps to ensure that we have access to this backups even when one datacenter of Microsoft burns 🙂 or just looses power.

This setup in combination with the Microsoft Backup support for Azure SQL is very powerful and gives everybody the good feeling of being able to recover when chaos happens.

ngHelperAirbrake: Airbrake for AngularJS

March 12, 2015March 11, 2015 Dirk Eisenberg ngHelper angular, angularjs, coding, javascript, nghelper

Airbrake is a well known exception tracker which is used from thousands of users. A cool thing is that the Airbrake team also supports browser based javascript exception. Integrating these kind of javascript code gives AngularJS developers sometime a headache. The newest member of the ngHelper collection, the ngHelperAirbrake component makes it super simple and easy to integrate Airbrake in an existing AngularJS application.

It’s a bower component and works well with scaffolding tools like Yeoman. Installing the component is possible with the following command line:

bower install ng-helper-airbrake –save

After that the component is registered in the bower.json of the project. Moving up the dependency entry to the position right after the inclusion of angular ensures that the Airbrake-Shim is loaded as early as possible when doing a full page reload.

“dependencies”: {
“angular”: “~1.3.8”,
“ng-helper-airbrake”: “~0.1.0”,

ngHelperAirbrake offers the $airbrake angular service which allows to configure the different Airbrake settings. The documentation at our project page describes how to set the right configuration: https://github.com/ngHelper/ngHelperAirbrake

After configuring the project everything works as expected and Airbrake receives exception from the AngularJS application.

Azure: Do never use EF automigrations when working in teams

March 1, 2015 Dirk Eisenberg Misc azure, cloud, coding, SaaS

Todays number one approach in building database access libraries on ASP.NET based applications is the Entity Framework. Since Microsoft supports migration based database updates, which mostly look like copied from the approach Ruby on Rails offers since ages, it’s possible to use them in scenarios where continuous deployment is key.

The core model which fits perfectly into a development process around continuous delivery & deployment is a code-first approach. This means that the developer creates so called POCOs (Plain Old C# Object) and the system is able to generate the needed SQL changes, based on the current state of the database.

The Entity Framework has two options to generate the migration:

Auto Migrations
The Entity Framework tries to generate the database changes on runtime without any specific migrations implemented as code. Everything happens magically 😦
Explicit Migrations
The Entity Framework just runs the migration scripts written in a .NET based special DSL for database operations. Nothing happens magically 🙂 This needs the developers brain but is controllable even in large teams.

When you work in a team Auto Migrations are really a bad idea and should be disabled from the beginning. Assume developers do changes manually in the database which means the magic around AutoMigrations will generate a different set on instructions against this database as the next developer will get. The results of auto migrations in team environments are not reliable and repeatable.

So it’s a really good idea to disable automatic migrations from the start:

public Configuration() {
AutomaticMigrationsEnabled = false;
}

Auto Migrations do not have a win at all when it comes to projects with more than one developer or projects which are deployed automatically. Continuous deployment relies on a strict unit of work and as less magic as possible. This will also help to troubleshoot when the continuous deployment breaks the production and a rollback is required.

Microsoft also published a nice article with more information at the MSDN: http://msdn.microsoft.com/en-US/data/jj554735.aspx

Thoughts about making software

a blog of Dirk Eisenberg

Month: March 2015

Build your own Twitter – Part 1 – A timeline service with Azure Table Store

Azure Table Store: How to backup safely

ngHelperAirbrake: Airbrake for AngularJS