Forum: The Computer Express

Asymmetric UTF-8 autoterm detection corrupts user charset settings

From HM Derdok@1:103/705 to GitLab issue in main/sbbs on Sat Mar 28 12:07:48 2026

open https://gitlab.synchro.net/main/sbbs/-/issues/1106

# Bug Report: Asymmetric UTF-8 autoterm detection corrupts user charset settings

**Date:** 2026-03-28
**Affects:** Synchronet BBS v3.20+ (confirmed on v3.21)
**Component:** `src/sbbs3/logon.cpp`, `src/sbbs3/answer.cpp`
**Related commit:** `aa0539b89` ("What appears to be a more complete fix for auto vs manual terminal adjustment", 2026-01-03)

---

## Summary

There are two interrelated issues that combine to corrupt registered users' terminal charset settings:

1. **One-way autoterm persistence in `logon.cpp`** — When autoterm detects UTF-8, the user record is upgraded to UTF-8, but when autoterm does NOT detect UTF-8 (i.e. CP437), the user record is never downgraded back. This means a single false positive permanently corrupts the user's charset setting.

2. **False UTF-8 detection over WebSocket proxy** — The UTF-8 BOM probe in `answer.cpp` is timing-sensitive. When the connection path includes a WebSocket intermediary (e.g. `websocketservice.js` used by fTelnet web terminals), the additional latency causes intermittent false positives in the UTF-8 cursor displacement check.

Either issue alone would be manageable, but together they create a ratchet effect: WebSocket latency occasionally triggers a false UTF-8 detection, and the one-way persistence ensures it sticks forever.

---

## How to reproduce

### Environment

- Synchronet BBS with WebSocket service enabled (`websocketservice.js`)
- fTelnet web terminal connecting via WebSocket → RLogin (or WebSocket → Telnet)
- fTelnet configured with CP437 font and `RLoginTerminalType = "ansi-bbs-cp437"`

### Steps

1. Create or use an existing registered user account (not Guest — Guest accounts get a full terminal reset on every login and are not affected)
2. Connect via fTelnet through the WebSocket proxy repeatedly (the web terminal's normal connection path)
3. After some number of logins (varies — sometimes 2-3, sometimes 10+, depends on connection latency), the user's charset will flip to UTF-8
4. Observe that extended ASCII / CP437 box-drawing characters now render incorrectly (the BBS is sending UTF-8 encoded output to a CP437 bitmap font terminal)
5. Disconnect and reconnect — the problem persists across sessions because it was written to the user record

### Why it's intermittent

The autoterm probe in `answer.cpp` works by:

1. Sending a cursor position report request (CPR: `ESC[6n`)
2. Sending a UTF-8 BOM (`0xEF 0xBB 0xBF` — 3 bytes, but only 1 character in UTF-8)
3. Sending another CPR request
4. Comparing the two cursor positions — if the cursor moved < 3 columns, the terminal consumed the BOM as a single UTF-8 character → UTF-8 detected

Over a direct RLogin/Telnet socket, the round-trip timing is tight and the two CPR responses are almost always parsed correctly. Over the WebSocket path, the connection goes:

```
BBS ↔ local rlogin socket ↔ websocketservice.js ↔ WebSocket ↔ browser ↔ fTelnet
```

The WebSocket proxy also has connection setup overhead (opening the local socket, sending "Redirecting to server..."). This additional latency can cause the CPR responses to arrive delayed or coalesced, leading the BBS to misinterpret the cursor displacement and falsely set the UTF-8 flag.

---

## Expected vs. actual behavior

### Expected

If a user connects with a CP437 terminal (as declared by fTelnet's RLogin terminal type string `ansi-bbs-cp437`), and autoterm detects CP437 on this session:

- The user record should reflect CP437 (no `UTF8` flag)
- If a previous session erroneously set UTF-8, the current correct detection should clear it

### Actual

- `logon.cpp` line 176 has a condition that upgrades to UTF-8:
```cpp
|| ((autoterm & UTF8) && !(useron.misc & UTF8))
```
- There is **no symmetric condition** to downgrade when autoterm does NOT detect UTF-8 but the user record has it set
- Once a false positive sets `UTF8` on the user record, it persists indefinitely regardless of subsequent correct CP437 detections

---

## Impact / user burden

When a user's record gets stuck in UTF-8 mode:

- All CP437 box-drawing, line art, and extended ASCII renders as garbage
- ANSI art, menus, and door games display incorrectly
- The user must manually go into their Terminal Settings and change their charset back to CP437
- This fix is temporary — the next false positive will corrupt it again
- Most users don't know where this setting is or why it changed
- Sysops get support requests about "broken display" that seem random and unreproducible

**Note:** Guest/anonymous users are NOT affected because `logon.cpp` line 83-84 does a full `useron.misc &= ~TERM_FLAGS; useron.misc |= autoterm;` reset on every guest login. The bug only affects returning registered users.

---

## Our workaround (BBS-side JavaScript)

We implemented a workaround in `mods/logon.js` that runs after the C++ `logon()` has already applied autoterm to the user record:

```javascript
(function fix_ftelnet_charset() {
var dominated_utf8 = (console.autoterm & USER_UTF8) ? true : false;
var is_rlogin = (bbs.sys_status & SS_RLOGIN) ? true : false;
var rterm = is_rlogin ? (bbs.rlogin_terminal || '') : '';
var is_ftelnet_cp437 = is_rlogin && rterm.indexOf('cp437') >= 0;

if (is_ftelnet_cp437 && dominated_utf8) {
console.autoterm &= ~USER_UTF8;
console.autoterm |= (USER_ANSI | USER_COLOR);
if (user.settings & USER_AUTOTERM) {
user.settings = (user.settings & ~USER_UTF8)
| USER_ANSI | USER_COLOR;
}
log(LOG_INFO, "charset-fix: stripped UTF8 from fTelnet RLogin session"
+ " (terminal='" + rterm + "', user=" + user.alias
+ ", node=" + bbs.node_num + ")");
}
})();
```

**Limitations of this workaround:**
- Only works for RLogin connections where we can check `bbs.rlogin_terminal` for "cp437"
- Telnet connections (guest login path) don't have a reliable terminal type identifier
- Runs after the C++ logon code has already persisted the bad value — we're patching after the fact
- Every sysop running fTelnet over WebSocket would need to implement something similar

---

## Recommended upstream fixes

### Issue 1: Asymmetric autoterm persistence (the core bug)

In `src/sbbs3/logon.cpp` around line 172-178, the current logic:

```cpp
const int manual_term = ANSI | RIP | PETSCII | UTF8;
if ((useron.misc & AUTOTERM)
|| ((useron.misc & manual_term) && (useron.misc & manual_term) != (autoterm & manual_term))
|| ((autoterm & UTF8) && !(useron.misc & UTF8))) {
useron.misc &= ~manual_term;
useron.misc |= (AUTOTERM | autoterm);
}
```

The third condition `((autoterm & UTF8) && !(useron.misc & UTF8))` forces an upgrade to UTF-8 when detected, but there is no corresponding condition for the reverse case. A symmetric fix would be to also trigger the reset when the user record has UTF-8 but autoterm did not detect it:

```cpp
const int manual_term = ANSI | RIP | PETSCII | UTF8;
if ((useron.misc & AUTOTERM)
|| ((useron.misc & manual_term) && (useron.misc & manual_term) != (autoterm & manual_term))
|| ((autoterm & UTF8) && !(useron.misc & UTF8))
|| (!(autoterm & UTF8) && (useron.misc & UTF8))) {
useron.misc &= ~manual_term;
useron.misc |= (AUTOTERM | autoterm);
}
```

In practice, this might simplify — the second condition (`manual_term` mismatch) may already cover this case if evaluated correctly. The fact that the explicit one-way UTF8 condition was needed suggests the second condition wasn't catching UTF8 transitions in all cases, so a symmetric explicit condition for the downgrade path seems safest.

### Issue 2: WebSocket proxy timing sensitivity

This is harder to fix at the C++ level since the autoterm probe timing is inherently dependent on the transport. Possible mitigations:

- Increase the timeout/retry for CPR responses in `answer.cpp` when the connection is over a WebSocket (though the BBS may not know this at probe time)
- Weight the RLogin terminal type string (`ansi-bbs-cp437`) as authoritative over the CPR-based UTF-8 detection — if the terminal explicitly declares CP437, trust that over the BOM displacement test
- The symmetric persistence fix (Issue 1) would make this self-correcting: even if a false positive occurs, the next correct detection would clear it

The fix for Issue 1 alone would largely resolve the practical impact, since false positives would no longer be permanent.

---

## References

- `src/sbbs3/logon.cpp` lines 172-178 — autoterm persistence logic
- `src/sbbs3/answer.cpp` — UTF-8 BOM autoterm probe
- Commit `aa0539b89` (2026-01-03) — introduced the one-way UTF8 upgrade condition
- Commit `ae706b805` (2025-05-19) — issue #923 fix (NO_EXASCII preservation) - `exec/websocketservice.js` — WebSocket-to-RLogin/Telnet proxy
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Sat Mar 28 13:09:04 2026

https://gitlab.synchro.net/main/sbbs/-/issues/1106#note_8698

I've not been able to reproduce the "False UTF-8 detection over WebSocket proxy" issue as described on https://web.synchro.net, which uses ftelnect over the websocketservice.js and always autodetects "80x25 CP437 / ANSI" for clients connecting in this way.
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Sat Mar 28 13:32:00 2026

https://gitlab.synchro.net/main/sbbs/-/issues/1106#note_8700

Is the user in question's account configured for terminal auto-detect or not? --- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Sat Mar 28 13:57:38 2026

https://gitlab.synchro.net/main/sbbs/-/issues/1106#note_8701

The BBS in question is heavily modified from stock/default, do we even know if the code in `logon.cpp` (`sbbs_t::logon()`) is being executed?

In `sbbs_t::logon()`, any user that has the auto-terminal type detection enabled (and that should be almost *all* users), has their terminal-related 'misc' flags (including UTF8) overridden by whatever was auto-detected:
```
if ((useron.misc & AUTOTERM) ...
useron.misc &= ~manual_term; // these flags are forced off, Including UTF8
useron.misc |= (AUTOTERM | autoterm); // Any auto-detected term flags (including UTF8) are set
```
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Sat Mar 28 18:20:58 2026

https://gitlab.synchro.net/main/sbbs/-/issues/1106#note_8702

Do you have a screenshot you can attach that demonstrates "Observe that extended ASCII / CP437 box-drawing characters now render incorrectly (the BBS is sending UTF-8 encoded output to a CP437 bitmap font terminal)"?
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

Who's Online
Recent Visitors
- Guest
  Thu Mar 19 01:49:18 2026
  from Afganistan, Miami via Telnet
- Guest
  Wed Mar 18 09:58:06 2026
  from Vilnius, Lithiania via Telnet
- Guest
  Sun Mar 8 08:55:47 2026
  from Jkl via SSH
- Guest
  Fri Jan 2 22:29:10 2026
  from Minneapolis, Mn via Telnet

System Info

Sysop:	Coz
Location:	Anoka, MN
Users:	2
Nodes:	4 (0 / 4)
Uptime:	493119:14:16
Calls:	392
Files:	6,773
Messages:	241,782

Asymmetric UTF-8 autoterm detection corrupts user charset settings

Who's Online

Recent Visitors

System Info