Published on

Removing node_modules from your Server for Security and Size

Authors

For years the size of node_modules has been a laughing point for developers. It's not uncommon for a 20MB app to be accompanied by 2GB of dependencies. As the ecosystem evolved, node_modules became the focus of DevSecOps teams. Packages have scripts that run on installation, provide telemetry, and include executable binaries. While these get bundled away for client code, it is standard practice to upload the node_modules whole cloth to the server. This isn't great.

At Parabol, our app is used across a wide range of infrastructures: from our public SaaS, to privately managed instances, to air-gapped government networks. Security is paramount, and the ability to deliver updates in the form of a single, small Docker image allows us to ship frequent updates. By using webpack to bundle our server, we provide our clients with a single entry point, support instance-specific CDNs, and shrink our Docker image size by 90%, which saves us time and money.

Defining your entry points

For those new to the NodeJS ecosystem, it may be surprising to learn that installing a package can execute a variety of scripts. The output of these scripts often includes an executable in /node_modules/.bin, which is delivered with the server code. Anyone with access to the machine can execute these scripts and we as developers have to accept that there are probably a few exploitable packages in there. What's worse, our application doesn't use 90% of these scripts at runtime.

The only way to combat this is to systematically remove every object from node_modules that our application didn't use in production. Thankfully there exists a tool to do this-- webpack! It traverses every import, extracts the dependencies, and inlines them into a single .js file.

What about webpack-node-externals?

Inlining node_modules may seem counter-intuitive. After all, there is a package called webpack-node-externals that is designed specifically to help you avoid doing it. By flagging all node_modules as external, they'll be kept out of the bundle, thus decreasing build time & avoiding headaches such as dynamic imports and .node binaries.

const nodeExternals = require('webpack-node-externals')
module.exports = {
  target: 'node',
  externals: [nodeExternals()],
}

This strategy is fantastic for development builds, where server size doesn't matter and scripts don't have access to the production environment. When security and size are important, the effort to bundle is worth the time investment.

Handling Missing Dependencies

Invariably, when attempting to bundle all your node_modules into your webpack build, you'll come across a missing dependency. For example, in the popular PostgreSQL driver package pg, you'll see an error that looks like this:

Module not found: Error: Can't resolve 'pg-native' in 'node_modules/pg/lib/native'

This is because pg is conditionally requiring the pg-native package. You probably haven't seen that error in runtime because the require is conditional, but webpack is a static analyzer. It scans the code for keywords like require and import and follows the string literal inside those statements to build its dependency tree.

While we could mitigate the error by installing the pg-native as an app dependency, it's not ideal since we aren't actually using the package. What's more, that package may also require more dependencies, causing you to import even more unnecessary packages.

A simpler approach for packages that you aren't using is to simply tell webpack to ignore that import statement:

//webpack.config.js
module.exports = {
  plugins: [
    new webpack.IgnorePlugin({ resourceRegExp: /^canvas$/, contextRegExp: /jsdom$/ }),
    new webpack.IgnorePlugin({ resourceRegExp: /^pg-native$/, contextRegExp: /pg\lib/ }),
  ],
}

Here, we ignore the canvas import that is coming from jsdom and the pg-native import coming from pg. The contextRegExp isn't necessary, but it's a nice bit of self-documenting code to remind other devs and your future self about what package requires it.

Handling Dynamic Requirements

While the above strategy works for string literals, it fails for dynamic imports. The error may not be obvious until runtime, or you may see the following during the build:

Critical dependency: require function is used in a way in which dependencies cannot be statically extracted

This can be especially true for architecture-specific packages. For example, packages such as uWebSockets.js use an ARM64 binary when running on your Macbook, but require an AMD64 binary when running on Linux in production. The offending code might look like this:

require('./uws_' + process.platform + '_' + process.arch + '_' + process.versions.modules + '.node')

To overcome this, you can hardcode those variables during compilation and webpack will concatenate the string literals into a single static import:

// webpack.config.js
plugins: [
  new webpack.DefinePlugin({
    'process.platform': JSON.stringify(process.platform),
    'process.arch': JSON.stringify(process.arch),
    'process.versions.modules': JSON.stringify(process.versions.modules),
  }),
]

// resulting output module
process.dlopen(
  module,
  __dirname +
    __webpack_require__(123).sep +
    __webpack_require__.p +
    'node/binaries/uws_darwin_arm64_115.node'
)

Above, we see webpack was able reference the .node binary correctly, ignoring the dozens of other prebuilt binaries and extracting it into the output directory. True, this strategy means that your build architecture must match your runtime architecture, but you were already building on the same architecture that you run on, right?

Handling complex .node binary references

Generally, .node binaries are the most difficult part of bundling node_modules into your build. The above case of uWebSockets.js is easy because the author included prebuilt binaries in the package. But what if the binaries are built during installation?

Take, for example, the famous sharp package used for image manipulation. It has a dependency on a vendor package called libvips. When sharp is installed, it uses a file called bindings.gyp to build a reference to the included libvips binary. Unfortunately, that reference maps to the node_modules folder structure, not the structure of our output directory. Since building references between two binaries is outside the scope of JavaScript, we could either write a script to rewrite the bindings & reinstall, or we can adjust our output directory structure.

Inside bindings.gyp we see sharp expects to find libvips two directories up. Therefore, all we have to do is nest the binary in the output directory. In webpack, that looks like this:

// webpack.config.js
module.exports = {
  plugins: [
    new CopyWebpackPlugin({
      patterns: [
        {
          // copy sharp's libvips to the output
          from: path.resolve(__dirname, 'node_modules', 'sharp', 'vendor'),
          to: 'vendor',
        },
      ],
    }),
  ],
  rules: [
    {
      include: [/node_modules/],
      test: /\.node$/,
      use: [
        {
          loader: 'node-loader',
          options: {
            // place sharp 2 directories deep
            name: 'node/binaries/[name].[ext]',
          },
        },
      ],
    },
  ],
}

Notice that the name is prefixed with two directories (./node/binaries/). While this rule is applied to all binaries, we could easily target only sharp in the case of conflicting requirements.

Substitution When Possible

While the above patterns work, the best option is often replacing node binaries with plain old JavaScript. For example, the secure hashing package bcrypt can be replaced with bcryptjs. You may sacrifice some performance, but in general the difference is on the scale of fractional milliseconds. That generally does not matter for anything except the most computationally expensive tasks (e.g. image manipulation, model inference).

Bundling Client Code for Server Side Rendering (SSR)

If your server does any SSR, it's likely that you include your client bundle alongside your server bundle. This can be a headache because much of that code isn't necessary, and in fact cannot be accessed in a server environment.

By using webpack, your server bundle will only include the necessary client code. If you do any code splitting (i.e. lazy imports), you may notice that this results in more than one .js per entry point. To instruct webpack to ignore the lazy imports and to merge all chunks, you can set the following config:

// webpack.config.js
module.exports = {
  module: {
    parser: {
      javascript: {
        dynamicImportMode: 'eager',
      },
    },
  },
}

Conclusion

By using webpack to bundle your server code, you can eliminate the need to ship the bloated node_modules and client assets with your code. From a security aspect, this is a huge win because there is only a single entry point. Attackers can only utilize code paths that your team has written.

Personally, I have found that this speeds up security audits as well. Common Vulnerabilities and Exposures (CVEs) that are identified based on lockfile dependencies can be quickly dismissed during an audit because the resulting code is not in the actual bundle. This is incredibly useful because all too often the CVEs identified are for packages that are only used in building, development, or testing.

When it comes to size, we experienced a 95% smaller server application. While it does take an extra 15-30 seconds to bundle node_modules, that time in CI is well worth the savings in throughput and storage costs.