tags: [testing, mocking, esm, cjs, transformers, injection, seams, unit-testing, vitest] related:

packages/nlp/semantic-resolver.js
packages/nlp/tests/semantic-resolver.test.js
packages/testing/tests/nl-runner.test.js status: current —

026 — The Test Seam for Heavy Dependencies

The semantic resolver uses @xenova/transformers to embed UI contracts. The model is 23MB, runs in WASM, and takes several seconds to load on first call. In production, that’s fine — the model loads once and caches for the process lifetime. In a unit test suite that runs hundreds of tests in two seconds, it’s a non-starter.

The obvious fix: mock @xenova/transformers with vi.mock(). Intercept the import, return a fake pipeline, done.

Except it doesn’t work.

Why Module Mocking Fails Here

vi.mock() (and Jest’s equivalent) intercepts module imports at the module registry level. When you write vi.mock('@xenova/transformers', () => ...), Vitest hoists the call and installs a fake before any import or require for that module resolves.

That works for static imports. It also works for require(). It does not reliably work for dynamic import() expressions — especially when the file being tested is a CommonJS module that contains a dynamic await import(...) internally.

semantic-resolver.js is a CommonJS file (.js without "type": "module" in package.json). It loads @xenova/transformers lazily via await import('@xenova/transformers') inside its getPipeline() function. The module system sees this as a dynamic ESM import triggered at runtime, not a static dependency resolvable at collect time. vi.mock() doesn’t reach it.

The result: every test that calls buildSemanticIndex or resolveSemanticSelector hangs for five seconds while the test runner waits for the real model download to time out.

The Test Seam

The fix is one private variable and one exported function:

// semantic-resolver.js

let _pipeline = null;
let _pipelineOverride = null;

async function getPipeline() {
  if (_pipelineOverride) return _pipelineOverride;
  if (_pipeline)         return _pipeline;
  const { pipeline } = await import('@xenova/transformers');
  _pipeline = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', { quantized: true });
  return _pipeline;
}

function _setPipelineForTest(fn) {
  _pipelineOverride = fn;
}

module.exports = { buildSemanticIndex, resolveSemanticSelector, contractToText, _setPipelineForTest };

The _ prefix is load-bearing: it signals “this is test infrastructure, not production API.” A caller reading the exports knows immediately that _setPipelineForTest is not meant for application code.

In tests:

beforeAll(() => _setPipelineForTest(fakePipeline));
afterAll(()  => _setPipelineForTest(null));

fakePipeline is a deterministic function that returns embeddings based on keyword presence — no WASM, no network, no disk access, no waiting. Tests run in milliseconds.

What the Fake Pipeline Needs to Do

The real pipeline takes an array of text strings and returns a tensor with shape [N, hidden_size]. The fake needs the same shape.

const KEYWORDS = ['confirm', 'submit', 'approve', 'create', 'new', 'delete', 'customer'];
const DIM = KEYWORDS.length;

function fakePipeline(texts) {
  const data = new Float32Array(texts.length * DIM);
  texts.forEach((text, i) => {
    const v = new Float32Array(DIM).fill(0);
    KEYWORDS.forEach((kw, j) => { if (text.toLowerCase().includes(kw)) v[j] = 1; });
    const mag = Math.sqrt(v.reduce((s, x) => s + x * x, 0)) || 1;
    for (let j = 0; j < DIM; j++) data[i * DIM + j] = v[j] / mag;
  });
  return { dims: [texts.length, DIM], data };
}

This produces normalized vectors based on keyword overlap. “confirm the order” gets a high value on the confirm dimension. “delete product” gets a high value on delete. Cosine similarity between semantically related texts is high; between unrelated texts, low. The test corpus has enough structure that the scorer behaves predictably.

The fake is not a simulation of the real model — it doesn’t need to be. Unit tests don’t verify model quality. They verify that the resolver’s scoring logic, ambiguity detection, and fallback chain behave correctly given some distance function. The fake provides that function in a form that’s fast, deterministic, and inspection-friendly.

The Broader Pattern

The test seam for @xenova/transformers is one instance of a pattern that applies to any heavy, lazy-loaded dependency.

The shape of the problem is always the same:

An expensive resource that shouldn’t be created until needed (model, database connection, file system handle, external API client)
A get*() function that acquires it on first call and caches it
A dynamic import or late require() that module-level mocking can’t intercept
Tests that would time out or fail if the real resource were used

The solution is always the same: separate “acquiring the resource” from “using the resource,” and expose the acquisition function as a seam. The seam is a module-level variable that the real path writes to and the test path bypasses.

Database connections have this pattern in every serious Node.js project — a getDb() function that holds a _db variable, with a _setDbForTest(db) that takes a test pool. HTTP clients have it — a getClient() that wraps axios.create(), with a _setClientForTest(mock). The ML model case is less common but the mechanics are identical.

What makes the pattern work is that it doesn’t fight the module system. vi.mock() fights the module system — it tries to intercept imports before they happen. The injection seam doesn’t intercept anything; it just provides an alternative path that the code checks first. No hoisting, no ESM/CJS boundary issues, no timing problems.

The `_` Naming Convention

The underscore prefix on _setPipelineForTest and _pipelineOverride does two things.

First, it signals intent: these are not part of the public interface. A consumer reading module.exports sees buildSemanticIndex, resolveSemanticSelector, contractToText, and _setPipelineForTest. Three are clearly API. One is clearly infrastructure.

Second, it enables fast exclusion in production code review. If _setPipelineForTest appears in application code, it’s a bug. The prefix makes that immediately visible without reading the docstring.

The convention matters because the injection seam has to be exported to be usable in tests. Exporting it without marking it as non-production creates a false API surface. With the prefix, it’s exported but clearly not public — a distinction the type system in a typed language would make explicit, and that the naming convention makes explicit here.

What This Enables

With the test seam in place, semantic-resolver.test.js runs 16 tests in 13 milliseconds. Tests verify: embedding shapes, synonym resolution (“approve order” → confirm contract), ambiguity detection, the full fallback chain through resolveWithFallback. All without the model.

The model itself isn’t tested in unit tests — it doesn’t need to be. The model’s quality is the responsibility of its authors. What needs testing is whether the code that wraps the model — the scorer, the ranker, the ambiguity detector, the fallback chain — behaves correctly. That’s what the fake enables.

When the real model does run, it runs once: on the first translateNlStep() call in a live session. The cost is paid in production where it belongs, not in development where it compounds.

The Principle

A unit test that downloads a 23MB model is not a unit test. It’s an integration test with a timeout.

The purpose of a unit test is to isolate a unit of logic from its dependencies and verify the logic. An ML model isn’t the logic being tested — it’s a dependency. Dependencies get faked. The unit gets tested.

The test seam is how you make that separation clean when the dependency is loaded dynamically, lazily, or via mechanisms that module-level mocking can’t reach. It adds three lines to the module and zero complexity to the logic. The logic doesn’t know whether it’s running with the real model or the fake — it only knows what getPipeline() returns. That’s the seam.

The Test Seam for Heavy Dependencies