Big Data in your browser: Parallel.js

Big Data often has something todo with analysing a big amount of data. The nature of this data makes it possible to split it up into smaller parts and let them be processed from many distributed nodes. Inspired from the team of CrowdProcess we like the idea to use the computing power of a growing web browser grid to solve data analytic problems.

The Azure Cost Monitor does not have the requirement to solve big data problems of user A in the browser of user B, we would never do this because of data privacy but we have a lot of statistic jobs which need to be processed. From an architecture perspective the question comes up why not to use a growing amount of browser based compute nodes connected with our system instead? Starting with this idea we identified that WebWorkers in modern browsers are acting like small and primitive compute nodes in big data networks. The team from the SETI@Home project also gave us the hint that this option works very well to solve big data challenges.

A very simple picture was painted very fast on the board to illustrate our requirements. The user should not be disturbed from the pre-calculation of statistic data in his browser and the whole solution should prevent battery drain and unwanted fan activities:

ParallelJS-Pic01

It’s also important to understand that some smaller devices like a RaspberryPI which is used for internet browsing or an older smartphone is not able to process the job in time to generate a great user experience. Because of this, the picture changed a bit and we invented a principal we call “Preemptive Task Offloading”.

ParallelJS-Pic02

“Preemptive Task Offloading” lives from the idea that the server and the browser are using the same programming language and the same threading subsystem to manage tasks. Because of that the service itself can decide whether it moves tasks in the browser on the end user or pre-calculates them on the server to ensure great user experience.

ParallelJS-Pic03

The illustrated solution is able to improve the user experience for your end users dramatically and lowers the hosting costs for SaaS applications in the same time.


How it works

The first step is to find the lowest common denominator, in our case it’s called JavaScript. Javascript can be executed in all modern browsers and in the server via node.js. Besides this node and web browser has concepts, e.g. WebWorkers to handle multi threading and multi tasking. The second important ingredient is a framework which abstracts the technical handling of  threads or tasks because they are working different in the backend or frontend. We identified parallel.js as a great solution for this because it gives us a common interface to the world of parallel tasks in frontend and backend technologies. Last but not least a system needs to identify the capabilities of the browser. For this we are using two main approaches. The first one tries to identify the capability to spin of web workers and identifies the amount of CPUs. For this we are using the CPU Core Estimator to also support older browsers. The second step of capability negotiation is a small fibonacci calculation to identify how fast the browser really is. If we come to a positive result our system starts the task offloading into the web browser, a negative result leads to a small call against our API to get the preprocessed information from our servers.


Conclusion

After testing this idea several weeks, I can say that this approach helps a lot to build high performance applications, with acceptable costs on the server side. Personally I don’t like the approach to give customer sensitive data into the browser of other customers to much, but I think this approach works great in scientific projects. What do you think about big data approaches in the browser? What are your pitfall or challenges in this area? Just leave a comment bellow or push a message on Twitter.

Advertisements

Azure Active Directory: Verify issued JWT in node.js

Microsoft Azure Active Directory is a steady growing identity- and access-management platform which can be used from developers to swap out user management, authentication and authorisation. Azure Active Directory offers several end points and authentication protocols e.g. SAML2, WS-FED or oAuth2. A widely adopted protocol is oAuth2 which ends up with an issued JWT token. This article describes how the JWT token issued by Azure Active Directory can be verified in a node.js application.

Anatomy of a JWT
A JWT token is a non-encrypted digitally signed JSON payload which contains different attributes (claims) to identify the user.

jwt

The header is very static and should be used to identify which algorithm was used for the digital signing. This signing algorithm needs to be used to verify the digital signature in the node.js application later on. The payload contains the JSON object with all the claims and information which can be used to verify the user. Trusting this content is only possible when the digital signature of the token is valid and some standard claims, e.g. the issuer or the audience are verified. Otherwise it could be that someone else generated a JWT (man in the middle attack) to get unauthorised access to your application. The signature is the last part of the JWT and needs to be used for verification of the payload. This signature was generated with the algorithm described in the header to prevent unauthorised access.

How AAD issues a token
Azure Active Directory offers every developer the possibility to create applications. If this application is a multi tenant application, other active directory administrators are able to install this application into their directory. At the end of the day an Azure Active Directory application can live in many tenants. Every tenant in the AAD ecosystem has an own set of keys and certificates which are used to sign cryptographic messages. This means that when a directory with the Id “DIRAAA” issues a token for an application the issuer would be

https://sts.windows.net/DIRAAA/

If a directory with the Id “DIRBBBB” issues a token for the same application the issuer would be

https://sts.windows.net/DIRBBB/

So the node.js application needs to verify if the token was issued from the directory we expect. Another side effect of this is, that Azure Active Directory uses different keys for every tenant to issue tokens. This means that the validation code needs to get the right verification key for the token. Microsoft uses RS256 for JWTs issued via oAuth2, so the right certificate needs to be downloaded from somewhere.

Download the right certificates
Microsoft publishes the certificates (public portion of the signing keys) as part of the well known OpenId configuration. It can be downloaded here:

https://login.windows.net/<<tenantid>>/.well-known/openid-configuration

The result is a JSON payload which contains the jwks_uri that should be used to download the certificates. Behind the URI several certificates are available and we currently don’t know which the right one is. The simplest way would be to do a little brute force and verify the JWT against every certificate.

Verification Strategy
Inventing code which is able to verify any AAD issued JWT, without knowing if the application is a multi-tenant or single tenant application is the goal. The following process describes a possible algorithm which can be implemented with existing JWT libraries very easily:

  1. Decode the token to extract the tenant-id because the tenant-id is part of the payload, stored as tid-claim. (!!! Currently we don’t know if we can trust this information !!!)
  2. Download the signing certificates from the well known openid configuration endpoint Microsoft provides. The end point url can be generated with the help of the tenant-id.
  3. Verify the JWT with RS256 against the downloaded certificates. For this, every existing JWT module can be used.
  4. After the token is validated check if the iss-claim contains the same value we expect from the tenantid.

After this process the system verifies the token and we know that this token was issued by Azure Active Directory for the described tenant. This means we are now able to rely on this information.

Node.js integration
All described steps are implemented in a small node package which allows to verify a given token as long as the node application has internet access and can download the certificates. The component can be installed via:

npm install azure-ad-jwt –save

A basic example to verify a given token could look like this:

The component is currently not intended to be an express middleware but it’s easy to extend it that way. A good starting point is the express-jwt middleware which should be used as starting point. The current implementation does not work with certificate caching, so when your system has a huge amount of verification requests it makes no sense to download the certificates during every request. This can be done once when the application starts or in a small cache implementation which invalidates the certificate when it was expired as well.

I hope this helps everybody in the node.js space to integrate Azure Active Directory very fast and easily. The described component is used from the Azure Cost Monitor in the production environment so feel free to integrate the package also in your real world applications.

So when you have any questions, feel free and leave a message on this blog.

Azure App Services: Restart your node-WebJobs during GitDeploy

With Azure App Services (aka. Azure WebSites), the Microsoft Azure cloud offers a great, highly scalable and simple way to host cloud and SaaS services. Besides ASP.NET, several other platforms and languages are supported, e.g. node.js, Python or Java. I personally prefer hosting services written in node.js on this nice managed service of Microsoft.

A common problem for web-services are background jobs like e.g. sending out e-mails or calculating some sales numbers once a day. This use-case can be addressed with Azure WebJobs which are running on the same instance as the web service itself. Jamie Espinosa described the behaviour of WebJobs on an Azure Friday very well. Azure Friday is BTW hosting a whole series about Azure WebJobs, so check it out to get more information.

Normally when deploying a web service into the Azure WebSite the associated WebJobs will be restarted out of the box. A special thing of node.js based Azure WebJobs is that only when the run.js file is changed the WebJob will be restarted. This means when the system just changes an other module or updates the npm dependencies no restart will be enforced.

The whole deployment is based on the Kudu-Project and this project offers so called Post-Deployment-Action-Hooks to trigger a simple script right after the successful deployment of the sources. When ever the run.js file becomes touched the system just restarts the web service, so the solution for this deployment issue was to write a short batch which touches all run.js files:

@echo off

echo Restarting all WebJobs
for /R ..\wwwroot\App_Data\jobs %%G IN (*run.js) DO echo Touching %%G
for /R ..\wwwroot\App_Data\jobs %%G IN (*run.js) DO touch %%G

exit 0

This script can be registered as Post-Deployment-Action-Hook via FTP at every Azure WebSite. Just copy the file to the following location:

deployment-hooks

This works fine but after all there is still one piece missing: How to get the deployment hooks deployed with git themselves? There are several options to reconfigure the deployment hook directory but I was not able to figure this out. So when you have an idea, feel free and leave a message to discuss any options.

Build your own Twitter – Part 3 – Azure Timeline Service for Node.js

The last part of this article series described the principles of Twitter-like services based on Azure Storage Tables. This part now describes the structure of a new node module which acts as a timeline service. This service can be used very easily in existing node projects.

To integrate this node module just install the azure-timeline-service via node package manager. This integrates everything that is required automatically:

npm install azure-timeline –save

The module allows to post events to a specific user timeline and the timeline of all followers. The following snip-let illustrates it:

var user = azureTimelineService.createSubject(“<>”, “<>”);

user.postEvent(‘login’, { timestamp: new Date() }).then(function() {
console.log(“DONE”);
});

Every method works asynchronous based on promises. Following another user is as simple as posting an event to a timeline

user.follow(user01).then(function() {
console.log(“DONE”);
})

Following a user means all events this user posts to a timeline will be posted to the followers timeline as well. Last but not least loading a timeline is important. The system returns currently all events from a timeline which is a point of change in the future:

user.loadTimeline().then(function(events) {
console.log(events);
})

All samples are implemented in the sample file of the Azure Timeline project here. Any questions? Feel free to open an issue at GitHub or just stay in touch via this block.

ngHelper-Toolbar: Now supports secondary actions & dividers

The $toolbar service is a great helper when it comes to building toolbars in AngularJS applications. The new version 0.0.3 allows you to handle new secondary actions, as shown here in the Azure Cost Monitor application:

secondary-actions

The secondary action can be defined in the addItem function similar to all other options the API supports:

$toolbar.addItem(‘childContract, contract, null, null, true, ‘/report/1234’, null, ‘activeContract’, ‘fa-trash’, function () {                   $scope.removeContract(contract);
});

Making the menu more user-friendly can be achieved by adding dividers in the structure. When using the special menu title “DIVIDER” the system will use this in the menu structure as divider:

$toolbar.addItem(‘user.divider’, ‘DIVIDER’, null, null, true, null, null, ‘user’);

The new navigation infrastructure of the Azure Cost Monitor is using the $toolbar service from the ngHelper-Toolbar project. We hope this feature makes it simple to maintain your toolbars. Any questions, wishes or ideas? Try the issue button on the GitHub page or contact the author via this blog.

Build your own Twitter – Part 1 – A timeline service with Azure Table Store

The Azure Cost Monitor and many other cloud services I’m working on, needed a timeline service to present aggregated events and actions similar to an audit trail. When I was thinking about this, it came to my mind that the requirements I had were very similar to a Twitter feed but with less features.

Let’s recap what is needed to build a timeline service in general:

  1. The Timeline
    The Timeline is associated with a subject and contains tons of events happened and triggered by other subjects the timeline owner follows.
  2. The Event
    The Event is the incident that happens in the real world and is stored in different timelines. This means every follower of a subject will get the event in his own timeline. So a single event can be stored in many different timelines.
  3. The Subject
    The Subject is someone or something which triggers an event. Normally it is a natural person with an own timeline but it could also be a piece of hard/software.
  4. The Target(s)
    The Targets are subjects with a timeline who are following other subjects. This means every subject becomes a follower or target as soon as it is following an other subject. Posting an event then means sending the message to the timeline of every subject following the sender.

The following picture should give a good overview about the entities in the timeline service:

Screen Shot 2015-03-21 at 10.40.15

Starting with this definitions in mind it’s possible to identify components needed for building a timeline service:

  • Storage for the timeline 
    The storage for the timeline needs to be able to offer a very fast read performance and an OK write performance. The performance should especially not be dependent from the amount of data in the system. I chose Azure Table Store for this because I can store the information in different partitions and get a clear read performance SLA for every partition. In addition to this it’s totally cheap and payable – also for startups.
  • Access broker to the storage 
    Normally NoSQL storage which can be used for timeline access needs a little helper to manage access. In general a RESTful web-service acts as an access broker and hands out pre-signed links to the timeline content. This ensures that no timeline data needs to go through the timeline service at all. Only the pre-authorised access links will be generated for the system. This also means that at the end the client SDK needs to handle the raw data for the timeline. This makes it a bit more complicated but it lets the performance stay in a good range. An other important operation needs to be implemented in the Access broker as well: Posting events to the subjects followers is a slow and complex operation the system needs to implement with a direct server call normally executed asynchronous with the help of a worker job.
  • Metadata storage for subject and targets
    Last but not least the combination of subject & targets needs to be stored somewhere in the backend. Azure Table Store can be used for this as well. All the operations to create a follow-ship can be implemented in the timeline service as well but not the rights & permission checks to post or create relations because the timeline service should be used machine-2-machine with API tokens.

The following graphic shows the technical architecture of a good scalable timeline service based on Azure services:

arch-timeline

The next part of this tutorial will focus on the correct and scalable table structure based on Microsoft Azure Table Storage.

ngHelperAirbrake: Airbrake for AngularJS

Airbrake is a well known exception tracker which is used from thousands of users. A cool thing is that the Airbrake team also supports browser based javascript exception. Integrating these kind of javascript code gives AngularJS developers sometime a headache. The newest member of the ngHelper collection, the ngHelperAirbrake component makes it super simple and easy to integrate Airbrake in an existing AngularJS application.

It’s a bower component and works well with scaffolding tools like Yeoman. Installing the component is possible with the following command line:

bower install ng-helper-airbrake –save

After that the component is registered in the bower.json of the project. Moving up the dependency entry to the position right after the inclusion of angular ensures that the Airbrake-Shim is loaded as early as possible when doing a full page reload.

“dependencies”: {
“angular”: “~1.3.8”,
“ng-helper-airbrake”: “~0.1.0”,

ngHelperAirbrake offers the $airbrake angular service which allows to configure the different Airbrake settings. The documentation at our project page describes how to set the right configuration: https://github.com/ngHelper/ngHelperAirbrake

After configuring the project everything works as expected and Airbrake receives exception from the AngularJS application.