Big Data in your browser: Parallel.js

Big Data often has something todo with analysing a big amount of data. The nature of this data makes it possible to split it up into smaller parts and let them be processed from many distributed nodes. Inspired from the team of CrowdProcess we like the idea to use the computing power of a growing web browser grid to solve data analytic problems.

The Azure Cost Monitor does not have the requirement to solve big data problems of user A in the browser of user B, we would never do this because of data privacy but we have a lot of statistic jobs which need to be processed. From an architecture perspective the question comes up why not to use a growing amount of browser based compute nodes connected with our system instead? Starting with this idea we identified that WebWorkers in modern browsers are acting like small and primitive compute nodes in big data networks. The team from the SETI@Home project also gave us the hint that this option works very well to solve big data challenges.

A very simple picture was painted very fast on the board to illustrate our requirements. The user should not be disturbed from the pre-calculation of statistic data in his browser and the whole solution should prevent battery drain and unwanted fan activities:

ParallelJS-Pic01

It’s also important to understand that some smaller devices like a RaspberryPI which is used for internet browsing or an older smartphone is not able to process the job in time to generate a great user experience. Because of this, the picture changed a bit and we invented a principal we call “Preemptive Task Offloading”.

ParallelJS-Pic02

“Preemptive Task Offloading” lives from the idea that the server and the browser are using the same programming language and the same threading subsystem to manage tasks. Because of that the service itself can decide whether it moves tasks in the browser on the end user or pre-calculates them on the server to ensure great user experience.

ParallelJS-Pic03

The illustrated solution is able to improve the user experience for your end users dramatically and lowers the hosting costs for SaaS applications in the same time.


How it works

The first step is to find the lowest common denominator, in our case it’s called JavaScript. Javascript can be executed in all modern browsers and in the server via node.js. Besides this node and web browser has concepts, e.g. WebWorkers to handle multi threading and multi tasking. The second important ingredient is a framework which abstracts the technical handling of  threads or tasks because they are working different in the backend or frontend. We identified parallel.js as a great solution for this because it gives us a common interface to the world of parallel tasks in frontend and backend technologies. Last but not least a system needs to identify the capabilities of the browser. For this we are using two main approaches. The first one tries to identify the capability to spin of web workers and identifies the amount of CPUs. For this we are using the CPU Core Estimator to also support older browsers. The second step of capability negotiation is a small fibonacci calculation to identify how fast the browser really is. If we come to a positive result our system starts the task offloading into the web browser, a negative result leads to a small call against our API to get the preprocessed information from our servers.


Conclusion

After testing this idea several weeks, I can say that this approach helps a lot to build high performance applications, with acceptable costs on the server side. Personally I don’t like the approach to give customer sensitive data into the browser of other customers to much, but I think this approach works great in scientific projects. What do you think about big data approaches in the browser? What are your pitfall or challenges in this area? Just leave a comment bellow or push a message on Twitter.