setEncoding('utf8') on response body corrupts multi-byte UTF-8 characters at chunk boundaries

## Bug Description

When using undici's `Pool.request()` and iterating over `response.body` with `setEncoding('utf8')`, multi-byte UTF-8 characters (specifically 3-byte CJK characters) that span chunk boundaries are replaced with U+FFFD (replacement character).

This does **NOT** occur with:
- Node.js built-in `https` module's `setEncoding('utf8')` on the same endpoint
- Collecting raw Buffers and calling `Buffer.concat().toString('utf8')` on the same undici response

## Reproducible By

Verified on a production server (Node v24.14.1, undici 7.15.0) against an Elasticsearch endpoint returning ~40KB JSON containing Chinese text.

All three tests run **in the same Node.js process, against the same endpoint, returning the same data**:

```javascript
const { Pool } = require('undici');
const https = require('https');

const pool = new Pool('https://your-elasticsearch-host');
const requestOpts = {
  path: '/your-index/_search',
  method: 'POST',
  headers: {
    'Authorization': 'ApiKey YOUR_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    query: { match_all: {} },
    size: 10
  })
};

// ❌ BROKEN: undici + setEncoding('utf8')
const r1 = await pool.request(requestOpts);
let str = '';
r1.body.setEncoding('utf8');
for await (const chunk of r1.body) { str += chunk; }
console.log('undici setEncoding FFFD:', (str.match(/\ufffd/g) || []).length);
// Output: 10

// ✅ OK: undici + Buffer.concat
const r2 = await pool.request(requestOpts);
const bufs = [];
for await (const chunk of r2.body) {
  bufs.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk));
}
const txt = Buffer.concat(bufs).toString('utf8');
console.log('undici Buffer.concat FFFD:', (txt.match(/\ufffd/g) || []).length);
// Output: 0

// ✅ OK: Node.js https + setEncoding('utf8')
const httpsResult = await new Promise((resolve) => {
  const url = new URL('https://your-elasticsearch-host/your-index/_search');
  const req = https.request(url, {
    method: 'POST',
    headers: requestOpts.headers
  }, (res) => {
    let s = '';
    res.setEncoding('utf8');
    res.on('data', (c) => { s += c; });
    res.on('end', () => resolve(s));
  });
  req.write(requestOpts.body);
  req.end();
});
console.log('https setEncoding FFFD:', (httpsResult.match(/\ufffd/g) || []).length);
// Output: 0
```

The corrupted characters are consistently 3-byte UTF-8 CJK characters (e.g., U+50B3 `傳` = bytes `e5 82 b3`) that fall on chunk boundaries. The corruption is deterministic — same request always produces FFFD at the same positions.

## Expected Behavior

`setEncoding('utf8')` on undici response body should produce identical output to `Buffer.concat().toString('utf8')`. The internal `StringDecoder` should correctly buffer incomplete multi-byte sequences across chunks, as Node.js's built-in `https` module does.

## Logs & Screenshots

```
# Same request, same data, same Node.js process:
undici Pool + setEncoding('utf8'):     10 FFFD  ← broken
undici Pool + Buffer.concat:            0 FFFD  ← correct
Node.js https + setEncoding('utf8'):    0 FFFD  ← correct
```

## Environment

- **OS:** Alpine Linux (Docker `node:24-alpine`)
- **Node.js:** v24.14.1
- **undici:** 7.15.0
- **Upstream server:** Elasticsearch 9.2.0 (chunked transfer-encoding, `application/json; charset=utf-8`)

## Additional context

This issue was discovered through the `@elastic/elasticsearch` Node.js client (v9.1.1), which uses `@elastic/transport` (v9.1.2). The transport layer calls `response.body.setEncoding('utf8')` in `UndiciConnection.js`, causing all JSON responses containing CJK characters to be silently corrupted (~1 character per ~4KB of response body).

Downstream impact: any application using undici with `setEncoding('utf8')` for non-ASCII text is affected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

setEncoding('utf8') on response body corrupts multi-byte UTF-8 characters at chunk boundaries #5002

Bug Description

Reproducible By

Expected Behavior

Logs & Screenshots

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

setEncoding('utf8') on response body corrupts multi-byte UTF-8 characters at chunk boundaries #5002

Description

Bug Description

Reproducible By

Expected Behavior

Logs & Screenshots

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions