Performance & Load Testing: The Big Time
Man, the picture is lookin' bright from here
I'm gettin' ready for the big time
Someday I'm gonna be big time news
- Bon Jovi - "Mister Big Time"
In almost three years of answering questions from the community, I've noticed one constant thread that boils down to one thing: "Is Reaction ready for the big time?"
What is the "big time"? It means different things for different people: the ability to handle lots of products, traffic, or both, at a fast speed. Our clients are already handling large loads and datasets, and they need to know that Reaction won't crumble under the weight. We knew we needed to give our clients some hard data on performance.
It was time to really put Reaction to the test.
How big is big?
But first, we had some questions to answer. For one, how big is big?
To answer this, Vishal Wadher, our Director of Solutions Engineering, worked with cofounder Sara Hicks to create a matrix of different scenarios based on industry standards and real-world use cases. Here is a simplified version of this matrix:
|Total unique SKUs||3000||210,000||2,100,00||52,000,000|
|Concurrent sessions at peak||100||1000||3000||20,000|
|Product images per SKU||3||5||7||9|
|Acceptable 1st load - Home/category||3 Seconds||3 seconds||3 seconds||3 seconds|
|Acceptable 1st load - Product detail page||2 seconds||2 seconds||2 seconds||2 seconds|
Note: We also have admin-specific metrics in our test suite, like Orders and Accounts, but since we're initially focusing on the consumer-facing side of the application, that's what I'll be discussing today.
Now that we know what sort of data we're looking at, let's see how we added that to the database. A while ago, I had written a plugin called
reaction-devtools, which allows developers to load different datasets into their code for testing purposes. Using the devtools plugin as a jumping off point, we started creating these larger datasets, but almost immediately ran into a problem. Even though
devtools uses the Mongo bulk loader, because of the single-threaded nature of Node, we were only able to run one operation at a time. This was causing long load times, almost up to 8 hours for Enterprise datasets.
Fortunately, Solutions Engineer Akarshit Wal found a solution. He broke this off into a command-line version of the app, which uses multiple processors to parallelize loading, cutting load times dramatically. In the next week or so, we'll be incorporating his additions into the devtools plugin, so that other developers can test their sites with larger datasets.
So with that problem solved, we were able to move on to the next question...
What does load look like?
In previous load testing projects, it was relatively easy to simulate clients. All I had to do was hit REST endpoint with the sort of requests clients would use, then parse the results. Using tools like siege or Tsung, it's relatively trivial to generate tremendous amounts of load using a few well-powered machines. Each machine has the ability to simulate at least a few hundred clients, if not more.
But Meteor doesn't work like that. Meteor combines HTTP requests with WebSocket requests, then uses both to render the client experience. Just doing either HTTP requests or Websockets requests wasn't going to be enough. The clients had to be smarter.
Our first thought was to use browsers. Nothing acted more like a client than an actual client, so we set up some tests using Selenium Webdriver. Though theoretically possible, scaling up a load using browsers was just impractical. Trying to bring up that many machines was too difficult and expensive for this sort of iterative testing. So, back to the drawing board.
After looking at different options (eg. jMeter), we settled on a tool called Locust, which allows developers to write their own custom clients in Python. With some help from the Locust community, I was able to write a client that behaves like a browser in the sense that it combines HTTP and Websocket requests interactively. It makes an HTTP request, subscribes to Collections, then makes Meteor calls to perform various operations simulating user behavior, such as Browse Categories and Leave, Add to Cart and Abandon, Add to Cart and Checkout, etc. After leveraging and modifying a fairly simple Meteor client for Python, as well as Locust's built-in HTTP client, we ended up with the following chunk of code, which allowed us to simulate actual customer behavior:
@stopwatch def get_homepage(self): self.http.get("/") self.client.subscribe("Products") self.client.subscribe("Sessions", [None], callback=None, ready_callback=self.sessions_ready) self.client.subscribe("Products/grid") # We don't use these subscriptions but they are used by the client # so we simulate the same self.client.subscribe("Packages") self.client.subscribe("MerchantShops") self.client.subscribe("shopsCount") self.client.subscribe("PrimaryShop") self.client.subscribe("Translations", ["yX4TT8uS4teJLxfQX"]) self.client.subscribe("Templates") self.client.subscribe("BrandAssets") self.client.subscribe("Groups") self.client.subscribe("Tags") self.client.subscribe("ProductGridMedia", [["BCTMZ6HTxFSppJESk"]]) self.client.subscribe("meteor.loginServiceConfiguration") self.client.subscribe("meteor_autoupdate_clientVersions") # once we are logged in we can subscribe to our account self.client.subscribe("Accounts", [self._data['account_id']], callback=self.accounts_subscribed, ready_callback=self.accounts_ready) while self.client.find_one("Catalog") is None: time.sleep(1) # pick a product from the catalog to use for product detail selected_product_id = random.choice(self._data["catalog"].keys()) self._data["selected_product"] = self._data["catalog"][selected_product_id] self._data["selected_product"]["_id"] = selected_product_id
When simulating load, we wanted to know how long it would take for the page to be served, as well as how long until the user could actually use it. Taurus allowed us to combine both our Locust swarm and our Selenium tests into one suite so that we could use the browser tests as samplers.
An example Taurus scenario might look something like this (for a tiny test load):
execution: - executor: locust concurrency: 2 ramp-up: 30s hold-for: 90s scenario: load - executor: jmeter concurrency: 1 delay: 30s hold-for: 1m ramp-up: 30s scenario: visit_home_page - executor: jmeter concurrency: 1 delay: 30s hold-for: 1m ramp-up: 30s scenario: visit_pdp - executor: jmeter concurrency: 1 delay: 30s hold-for: 1m ramp-up: 30s scenario: visit_tag
While the load was running, we had X number of browsers performing the same operatons as the load generators and measuring how long it took for pages to render, thus testing the whole application from front-end to backend.
Where are we testing this?
Testing isn't possible on a desktop, so we built out a hosting infrastructure on AWS instead. Since hosting high-volume websites is something of a science, we decided to start with the most naive implementation, then ramp up our testing as things got more complex, letting our load define when we needed to level up. Overall, we tried our best to replicate a production-style deployment as best we could (within reason).
We also leveraged Atlas, MongoDB's database-as-a-service offering for our Mongo hosting, which has worked out well thus far. Unless they have lots of experience managing databases, I'd recommend Atlas to any of our customers, especially the ones who care about sleeping soundly at night.
This is continually an evolving platform. We've been leveraging AWS whenever possible to make our learnings repeatable and hopefully reproducable by the community. When we publish these final tests, we hope to be able to make our tools, our methodolgies, and our results public and replicatable. We used Cloud Formation to build our entire infrastructure in just a few moments, thus leveraging all the pieces of the AWS platform. This is how we ended up building the entire cluster in a VPC behind an ELB.
In addition to all of this, we have also built our own custom cluster of Kadira, a project that was originally run by Arunoda Susiripala, then purchased and close-sourced by the MDG. Kadira gives Meteor users deep insight into what's happening behind the scenes. Using Kadira, users can drill down into any method and see all the relevant DB requests and method calls, as well as what's making a particular method or publication take so long. Needless to say, this is pretty invaluable. We've instrumented all of our servers with this tool.
Testing is not complete but in our first round of testing using the Mid-retailer dataset above and simulating > 1000 users browsing/checking out, etc. Here are the results we got. Each time is "time to render" the complete, usable web page.
Using the "mid-retailer" dataset with no load
|Task||Time to Complete (avg/seconds)|
|Visit Home Page||4.146|
|Visit Tag Page||3.613|
|Visit Product Detail Page||3.362 (avg)|
Using the "mid-retailer" data with a load of 1000 simulated users
|Task||Time to Complete (avg/seconds)|
|Visit Home Page||4.101 (with a target of 3 seconds)|
|Visit Tag Page||3.670 (with a target of 2 seconds)|
|Visit Product Detail Page||3.383 (with a target of 2 seconds)|
Since we're continuing to scale up, we don't have a complete set of results yet, but we hope to have them ready at the next Reaction Action livestream. If you want to follow our progress, watch the issues marked with the
While we may not have all the metrics available, we have learned a few things so far:
The database is almost never the problem.
Watch your indexes. Other than that, Mongo is blazing fast for most queries. We probably could have optimized the number of times we called the same query (eg. for something like
getPrimaryShop), but not at the same sort of granular level as an SQL database. Most queries are < 30ms.
It's pretty easy to sink a single process.
When a user hits the home page, the server has a lot of work to do. It pushes all this data down to the client, so having lots of processors is important to handling load. In our initial experiments, we used pm2 to spread requests across a bunch of processors. Now, we are transitioning Amazon's Elastic Container Service and a modified version of our Docker image. This way, each container serves one process, allowing our team to easily auto-scale out with load.
Using a CDN is a good idea...
It comes as no surprise to anybody, but Meteor makes it pretty easy to serve the initial built JS to client from a CDN, which dropped that load time from around 2.5 seconds to 600 ms.
...But serving images from Mongo isn't as bad as you might think.
It's always important to test before optimizing. I had expected serving images from Mongo to be a performance dealbreaker right from the get-go, but it worked surprisingly well, even though we had overloaded our dataset products with lots of images. Nevertheless, it's still preferable to use a CDN.
The latest release was a gamechanger.
When loading more than a few hundred products on the Grid using the Products collection, the platform would start to become unusable because of all the extra work it had to perform in order to calculate things, such as how many items were in stock. The changes to the Catalog eliminated this problem. If a user is running an older version of Reaction, this would be their reason to upgrade.
We are learning a lot about what it means to host a Reaction website at scale, and so far, we've been pretty pleased with the results—Reaction holds up pretty well under load. The next step is to continue building our test infrastructure and iterating on it.
We expect to have these test results complete within the next month or so. I'll be writing another blog post sharing our complete results, as well as how we built our infrastructure for this project. We hope to share more soon.