For the past two years, Philips Hue has been running on a custom-built Internet of Things platform running on Google App Engine. It works great but it’s time to bring it to the cutting edge even further. The goal is to simplify the architecture, improve performance and reduce resource usage (and therefor costs).
Hue’s current architecture runs on Google Cloud Platform, which kicks ass. It has excellent tooling and makes DevOps very easy. This ease of use is also around the corner for Node.js, with the announced Managed VMs and Deployment Manager. All the more reason to stick with this for the new platform.
With Google I/O starting tomorrow, 5 Q-ers are now in San Francisco. So while we’re exploring this new architecture we thought we might as well discuss it with the guys from Joyent. They have a lot of large scale production experience with Node and are active core contributers to Node itself. Lucky for us, TJ Fontaine himself could sit down and have a chat with us! Here’s some of the things we discussed that might be useful for you.
Our platform (as many will do) needs to scale horizontally to manage the large amount of connections. As a Node process only uses a single core we need to scale on 2 levels: multiple physical machines and multiple cores per machine. Google Compute Engine has nailed the first level with their excellent load balancer. Scaling for multi-core can be done in various ways:
Nginx can run as a separate process on the machine, distributing the load evenly to the multiple Node.js processes running on that single instance. Nginx is so lightweight that it can share one of the cores with a Node.js process.
Plus side: dedicated process, good performance, flexible
Down side: yet another application that we need to configure and manage. Also, we would need to manually spin up multiple Node.js processes, complicating the bootstrapping of the instance.
There’s 1 master process that listens on a TCP port, it passes the workload to an available child process.
Plus side: less experimental option, it’s been in Node.js a long time
Down side: an issue in a child process could influence the load balancing performance, starving the other child processes.
Node Cluster API
Node.js version 0.12 will enhance the Cluster API with Round Robin load balancing which significantly increases the efficiency and reliability of this option.
Plus side: easy setup because of PM2, independent child processes, fast detection of killed processes
Down side: not as flexible as nginx
After Google I/O we’re going to explore the Cluster API and Nginx options further, they both seem good solutions. At the moment Cluster API feels like the best fit, especially because we’re expecting version 0.12 to be released quite soon.
Next to that we got a ton of tips on running in production. Amongst others tools for debugging core files, configuring linux and working with websockets.
Also want to exchange ideas with Joyent? They’re doing a roadshow visiting Lisbon and Amsterdam next week. Q42 will definitely be there!
And of course, if you want to get in contact with us, you can always drop me an e-mail. Or, if you’re at Google I/O this week, drop by at our Cloud Platform booth on the third floor.