a case study for low architectural complexity

Dan Luu published an article presenting Wave as a case study for a business model where a simple, boring architecture fits best. Instead of a state-of-the-art asynchronous service-based architecture, they use a synchronous monolith backed by a database and serving a unified API.

(…) for most types of applications, even at the traffic levels of the top 100 sites, computers are fast enough that high-traffic applications can be served with simple architectures, which can usually be built at lower cost and easier than complex architectures.

The author states that Wave’s architecture is based on a Python monolith serving CRUD queries on top of a Postgres database.

Its server processes hang while waiting for I/O operations, including network requests. The company experimented with asynchronous frameworks such as Eventlet, but found that their immaturity resulted in significant operational overhead, and ultimately decided not to use them.

The cost of having CPU resources doing nothing but waiting is not negligible. However, at the volumes demanded by Wave, the cost of the engineering team is much higher. In their business model and with their current traffic levels, the relatively low computational load per unit of revenue does not justify investing engineering time to optimize hardware usage costs.

Instead, long-running tasks whose launch requests do not need to return a response are sent to a queue. RabbitMQ supports this queue.

They chose Kubernetes at the platform level. The reasoning was that the business would expand into other countries as the business grew, with different regulations eventually requiring them to maintain their systems or databases locally. A monolith-based architecture makes it easier to split their backend as needed to comply with local laws and regulations compared to a complex service-based architecture.

Wave has adopted GraphQL for the API layer. The ability to document and generate code with the exact return types has led to more secure clients. Compositing capabilities in the query language allow all Wave applications to essentially share a single API, reducing complexity and enabling customers to avoid unnecessary network round trips by retrieving only the data needed.

Application data is transported over HTTP/3. Its underlying QUIC protocol better matches operational constraints in the field, such as unreliable mobile data service and low bandwidth. Wave now only maintains a custom transport protocol above USSD for emergencies.

The author points out that choosing Kubernetes or GraphQL brought additional complexity, but their advantages outweigh the disadvantages.

By keeping our application architecture as simple as possible, we can spend our complexity (and headcount) budget where there is complexity that benefits our business.

When starting the company, their initial preference was to buy construction software to save time for the small team of engineers at the time. As vendors become unable to solve specific problems or provide a solution tailored to their needs, it may make sense to handle the additional complexity internally, both financially and operationally.

An example is integration with telecommunications providers. Wave needs SMS services, but the leading SaaS SMS provider does not operate in all of its target countries, and the cost of service would be prohibitive. In this situation, the author states that “the team that provides the telecom integrations pays for itself many times over.”

In hindsight, they wouldn’t be as quick to adopt some choices made during the initial design and system build phases if they were building a similar system today (eg, using RabbitMQ or Python). However, the current operational drawbacks are not significant enough to justify migrating to a different technology.

Sharon D. Cole