Big Data often has something todo with analysing a big amount of data. The nature of this data makes it possible to split it up into smaller parts and let them be processed from many distributed nodes. Inspired from the team of CrowdProcess we like the idea to use the computing power of a growing web browser grid to solve data analytic problems.
The Azure Cost Monitor does not have the requirement to solve big data problems of user A in the browser of user B, we would never do this because of data privacy but we have a lot of statistic jobs which need to be processed. From an architecture perspective the question comes up why not to use a growing amount of browser based compute nodes connected with our system instead? Starting with this idea we identified that WebWorkers in modern browsers are acting like small and primitive compute nodes in big data networks. The team from the SETI@Home project also gave us the hint that this option works very well to solve big data challenges.
A very simple picture was painted very fast on the board to illustrate our requirements. The user should not be disturbed from the pre-calculation of statistic data in his browser and the whole solution should prevent battery drain and unwanted fan activities:
It’s also important to understand that some smaller devices like a RaspberryPI which is used for internet browsing or an older smartphone is not able to process the job in time to generate a great user experience. Because of this, the picture changed a bit and we invented a principal we call “Preemptive Task Offloading”.
“Preemptive Task Offloading” lives from the idea that the server and the browser are using the same programming language and the same threading subsystem to manage tasks. Because of that the service itself can decide whether it moves tasks in the browser on the end user or pre-calculates them on the server to ensure great user experience.
The illustrated solution is able to improve the user experience for your end users dramatically and lowers the hosting costs for SaaS applications in the same time.
How it works
After testing this idea several weeks, I can say that this approach helps a lot to build high performance applications, with acceptable costs on the server side. Personally I don’t like the approach to give customer sensitive data into the browser of other customers to much, but I think this approach works great in scientific projects. What do you think about big data approaches in the browser? What are your pitfall or challenges in this area? Just leave a comment bellow or push a message on Twitter.