Making NASA's Landslides Viewer lightning-fast

Recently, one fine evening, I was exploring NASA's Open Data Portal in the hope of discovering some interesting data. I found the Landslides project which is part of Global Precipitation Measurement at NASA. As part of the project, they've assembled a website where you can see all the recorded landslide events on a map. That website took — allow me to dramatify this — 30000ms to load. Now I'm a performance (in the context of computers) enthusiast. So I just forgot about the data and all I wanted to accomplish at the moment was to make that website fast. I dug in.

A good, long 30 seconds to load

Why was the Landslides viewer loading so slow? Was it the server? Could be. Was it my Internet speed? Well, to some extent. But I have a 100Mbps. That's fine. So what makes it so dayum slow?

The problem was that NASA loads its entire dataset at once and renders it onto the map. That'd of course be a problem because the dataset contains about 12,000 landslide events from across the globe. So imagine rendering 12,000 elements on the map altogether. The problem becomes worse because the map needs to adjust those elements as you move the map (horizontal or vertical shifting), zoom in or out, and other such interactions.

How to go about making it fast?

We know there are no one-size-fits-all solutions to problems on the Web (and software in general). A standard solution could help with many problems and a problem may require a unique solution. "It depends," right?

So initially, with little experience with maps, I just started thinking of some common ways to fix the performance:

Load fewer data points.
Store the entire dataset in a database and query as needed.
Design a better API to fetch data. GraphQL?
Run Brotli over the JSON data with 12000 events.
Use a better, performant map library
Use Next.JS for faster loading and app perf

There can be hundreds of things. It almost makes you go crazy.

Down to 24 seconds

You always gotta stick to this rule — "First, make it work." So I picked the most obvious solution first to start simple. I used the Next framework. And to my surprise, it worked. The loading time came down to 24 seconds. That's a 20% improvement. Nice.

However, it's still 24 seconds. So what do we do next? I somehow knew that using a database instead of a JSON file with 12000 events couldn't be the solution. It'd make it fast but require some good engineering on the backend. And probably still won't get us where we want to go.

Let's talk about the libraries I used so far. I was using Next/React to build the website (that you now know). The other most important library was Leaflet. It's a library to render maps on the web. It's popular, and also the same lib that NASA's Landslide Viewer uses.

On to the interesting fact. NASA's viewer uses marker clustering and my implementation wasn't using any clustering yet. That means there was a clear milestone in sight. But let's get to the underlying concept first.

Euclidean Clustering

Euclidean clustering is a method of grouping together the points in the Euclidean space that are in proximity to each other. The points could be in 3D space, 4D space, or even 2D as in our case. The goal of Euclidean clustering is to partition the dataset into clusters that can be used as a single unit depending on the use case. In our case, we cluster together the location points of landslides that fall in a certain range of distance.

In 2-dimensional space, the distance between 2 points \((x_1, y_1)\) and \((x_2, y_2)\) is given by

\[ \text{Distance} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} \]

Now imagine this case — when your map loads for the first time, and it's zoomed out enough to show the complete world map, there is no point rendering markers for 12,000 locations. We can cluster together location points close together as one entity. We render them distinctly when the user zooms into that area on the map.

With marker clustering, the webpage started loading in about 14s. That is good. Almost 40% improvement on the previous implementation.

The Golden Key

At this point, I tried a couple of things and failed to make the map experience any better. In software, when we hit such roadblocks, it's a sign that we need to acquire some domain knowledge, look around, explore, and get creative. In our case, the domain we were dealing with was 2-dimensional space and maps. To make any further improvements, we need to see what solutions exist in that domain. And that's exactly what I did. I looked around and Googled for making Euclidean clustering fast. And that's how I got to the Golden Key for making map rendering faster and smoother — k-D Trees.

k-D Tree

k-D stands for k-dimensional. It's a variant of the binary search tree and is a space-partitioning data structure for organizing points in k-dimensional spaces, optimized for range and nearest-neighbor querying. So by paying a little upfront cost of constructing the tree structure (that is O(n logn) for the median-split algorithm) we get an O(logn) query time for every search operation later.

Now picture using this data structure. When you zoom out or zoom in to map, the re-rendering of data points happens in log time. That is super efficient and mind-blowingly fast. With the implementation of the k-D tree, I was able to reach an initial load time of sub 3 seconds and super smooth interactions.

Conclusion

This was an interesting project. I was able to pick a slow website with poor UX and make it fast. We started with 30s initial load time and brought it down to sub 3s. That is ~93% faster.

Important things to learn here are, that framework matters, optimizing for UX matters and algorithms do matter. Interestingly, observation plays a crucial role in optimizing for UX. You need to pick what better means in the respective context. We cannot optimize what we don't define.

You can take a look at the final result here.