Reduce Socket Server Memory Footprint with uWebSockets.js

As engineering teams look to save money wherever they can, cutting infrastructure costs is a clear low hanging fruit. One area where significant gains can be made is by optimizing the usage of WebSockets. In this technical exploration, we'll delve into the decision-making process behind choosing WebSockets and the journey of migrating from the default NodeJS WebSockets to uWebSockets.js.

Why WebSockets?

WebSockets have become a staple in modern web applications, providing a bidirectional communication channel between clients and servers. Unlike traditional HTTP requests, WebSockets facilitate real-time data exchange without the need for constant polling. This makes them an ideal choice for applications that require instant updates, such as chat applications, live notifications, and online collaboration.

Self-hosted vs. Cloud-based server

Before diving into the specifics of uWebSockets.js, it's essential to consider the deployment environment. Choosing between a self-hosted server and a cloud-based solution can significantly impact the performance and cost of running your WebSocket-powered application.

Self-hosted servers provide more control over infrastructure but may require additional maintenance and expertise. On the other hand, cloud-based solutions offer scalability and ease of management, but they come with associated costs. What's more, if you hit limitations in your cloud-hosted solution, your only recourse is to create hacky workarounds or abandon the effort in favor of a self-hosted solution. The decision depends on factors like the scale of your application, budget constraints, and the level of control your team requires.

In our case, we had to serve clients in air-gapped deployments, so a dependency on a cloud-based server was untenable. Even without this constraint, we likely would have opted for the self-hosted solution due to cost and flexibility. While a cloud-based solution is quick and inexpensive to start with, the price starts to hurt as you grow and achieve product market fit.

Migrating to uWebSockets.js

uWebSockets.js is a NodeJS framework, similar to Express. By electing to use it, you're sacrificing simplicity for performance, so make sure the tradeoff is worthwhile before adopting it. That said, it is a clear winner in terms of performance, and cutting edge runtimes such as Bun use it by default, so it has a significant backing.

Ending the Response

One of the first challenges encountered during the migration to uWebSockets.js is understanding how a response can end. The server can send a terminating write, e.g. res.end(), or the client can hang up between asynchronous actions like a DB read. This complexity is hidden away with higher-level frameworks, but with uWebSockets, it's your responsibility to make sure each response is terminated once, and only once. That means no calling res.end() twice, or calling it after the client hangs up. To fulfill that contract, you can either write perfect code, or you can write an abstraction that is a bit more forgiving. For example, to prevent res.end() from getting called twice, you can proxy it. Below we're setting a one-way done flag on the object that short circuits when called twice.

res._end = res.end
res.end = (body) => {
  if (res.done) return res
  res.done = true
  return res._end(body)
}

To handle client-side hangups and support handler-specific cleanup, we can start each handler with the following:

res.onAborted(() => {
  res.done = true
  if (res.abortEvents) {
    res.abortEvents.forEach((f) => f())
  }
})

res.onAborted = (handler) => {
  res.abortEvents = res.abortEvents || []
  res.abortEvents.push(handler)
  return res
}

// deeply nested in a handler
res.onAborted(() => {
  readStream.destroy()
})

Here, we're calling onAborted, which makes asynchronous code safe to perform. Next, we overwrite that method with one that supports multiple calls to it. In doing so, a deeply nested handler can clean up a stream without worrying about code somewhere else attempting to do the same thing.

Corking

Corking was introduced in v16.5.0 as an optional performance strategy. In v20.30.0, it became required, and the absence of a cork resulted in a console warning. While we weren't motivated to cork every response to increase throughput, we quickly noticed that the warnings flooded our server logs, and had to devise a plan to ensure every message was corked.

The problem was that our handlers would eagerly write to the HTTP response object. For example,

// what we had
const handler = async (req: HTTPRequest, res: HTTPResponse) => {
  res.writeHeader('200') // write #1
  const data = await getData(res)
  res.end(data) // write #2
}

// what we needed
const handler = async (req: HTTPRequest, res: HTTPResponse) => {
  const data = await getData(res)
  res.cork(() => {
    // corked into a single write
    res.writeHeader('200').end(data)
  })
}

While it is technically possible to rewrite each route handler to comply with the new pattern, it would take a great deal of time, and any future code edits would be susceptible to making the same mistake again. A better approach would be to remove the footgun so it isn't possible for future developers to make the same mistake.

To achieve this, we proxied the res object to cache all writes until the response was complete.

const patchRes = (res: HTTPResponse) => {
  // Proxy `writeStatus` to only cache the status in memory
  res._writeStatus = res.writeStatus
  res.writeStatus = (status: RecognizedString) => {
    res.status = status
    return res
  }

  // Proxy `end` to retrieve the status and perform the cork and writes at once
  res._end = res.end
  res.end = (body?: RecognizedString) => {
    if (res.done) return res
    res.done = true
    return res.cork(() => {
      if (res.status) res._writeStatus(res.status)
      return res.end(body)
    })
  }
}

This ensured that all code, both existing and future, will be error free 🎉.

Streaming Buffers

Vanilla NodeJS is not good at streaming media content. While the best solution is to offload that work to a CDN, there are some cases where that may be overkill, or using NodeJS is inescapable. For such use cases, uWebSockets.js provides features for efficiently handling streaming buffers. The cleanest, most efficient function I could write is to create a stream from a filesystem or remote server and pipe it to the client as follows:

const pipeStreamOverResponse = (
  res: HttpResponse,
  readStream: fs.ReadStream,
  totalSize: number
) => {
  res.onAborted(() => {
    readStream.destroy()
    res.aborted = true
  })
  readStream
    .on('data', (chunk: Buffer) => {
      if (res.aborted) {
        readStream.destroy()
        return
      }
      const ab = chunk.buffer.slice(chunk.byteOffset, chunk.byteOffset + chunk.byteLength)
      const lastOffset = res.getWriteOffset()
      const [ok, done] = res.tryEnd(ab, totalSize)
      if (done) readStream.destroy()
      if (ok) return
      // backpressure found!
      readStream.pause()
      // save the current chunk & its offset
      res.ab = ab
      res.abOffset = lastOffset

      // set up a drainage
      res.onWritable((offset) => {
        const [ok, done] = res.tryEnd(res.ab.slice(offset - res.abOffset), totalSize)
        if (done) {
          readStream.destroy()
        } else if (ok) {
          readStream.resume()
        }
        return ok
      })
    })

    .on('error', () => {
      if (!res.aborted) {
        res.cork(() => {
          res.writeStatus('500').end()
        })
      }
      readStream.destroy()
    })
}

Server Sent Events (SSE)

In addition to WebSockets, uWebSockets.js supports Server Sent Events (SSE), which enable servers to push updates to clients over a single HTTP connection. Integrating SSE alongside WebSockets can support clients who are stuck behind firewalls that terminate WebSockets prematurely. First, you'll have to disable Nagle's Algorithm, which is a TCP optimization to batch chunks together. If your app is behind an ingress like, you may have to disable it via response header, too. Finally, tryEnd accepts the content-size as a second argument. If this value is set arbitrarily large, the response will not close. The end result is an initial response that looks like this:

globalThis.clients = {}

const SSEConnectionHandler = (req: HTTPRequest, res: HTTPResponse) => {
  const id = new UUID()
  globalThis.clients[id] = res
  res
    .writeHeader('content-type', 'text/event-stream')
    .writeHeader('cache-control', 'no-cache')
    .writeHeader('connection', 'keep-alive')
    .writeHeader('x-accel-buffering', 'no')
    .tryEnd(`retry: 1000\n`, 1e8)
    .tryEnd(`event: id\n`, 1e8)
}

const broadcast = async () => {
  Object.values(globalThis.clients).forEach((res) => {
    const data = await getData()
    res.tryEnd(`data: ${rawData}\n\n`, 1e8)
  })
}

Above, you can see each new SSE response gets kept in a global lookup table.

When the server wants to push a new message to that client, it can re-use the existing HTTPResponse.

The Results

Years later, we're still thrilled to be using uWebSockets.js. During our migration to Kubernetes, we scaled up to 3 socket servers for redundancy and reduced the memory of each to 1GiB because uWebSockets.js is so efficient. Even with thousands of connected clients it performs flawlessly. The low memory footprint paired with the improved throughput was well worth the tradeoff.