fix(baileys): prevent healthy instances from being killed after stream:error 515#2509
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds tracking for recent WhatsApp stream:error 515 events and adjusts Baileys reconnect logic so that transient 401 loggedOut events following a 515 trigger a reconnect instead of destroying the WhatsApp instance. Sequence diagram for handling stream:error 515 followed by 401 loggedOutsequenceDiagram
actor User
participant WhatsApp
participant BaileysClient
participant BaileysStartupService
participant EvolutionAPI
User->>WhatsApp: Connect device B
WhatsApp-->>BaileysClient: stream:error code=515
BaileysClient->>BaileysStartupService: ws event CB:stream:error
BaileysStartupService->>BaileysStartupService: _lastStream515At = Date.now()
BaileysClient->>BaileysStartupService: connection.update state=connecting
BaileysClient->>BaileysStartupService: connection.update state=open
BaileysStartupService->>EvolutionAPI: webhook connection.update state=open
WhatsApp-->>BaileysClient: 401 loggedOut (old session cleanup)
BaileysClient->>BaileysStartupService: connection.update connection=close statusCode=401
BaileysStartupService->>BaileysStartupService: recentStream515 = Date.now() - _lastStream515At < 30000
alt statusCode is loggedOut and recentStream515
BaileysStartupService->>BaileysStartupService: shouldReconnect = true
BaileysStartupService->>BaileysStartupService: connectToWhatsapp(phoneNumber)
else treated as real logout
BaileysStartupService->>EvolutionAPI: logout.instance
end
Class diagram for updated BaileysStartupService reconnect logicclassDiagram
class ChannelStartupService
class BaileysStartupService {
- boolean endSession
- Log logBaileys
- Promise~void~ eventProcessingQueue
- number _lastStream515At
+ connectToWhatsapp(phoneNumber: string) Promise~void~
+ createClient(number: string) Promise~void~
}
ChannelStartupService <|-- BaileysStartupService
class BaileysClient {
+ ws: WebSocket
+ onConnectionUpdate()
}
class WebSocket {
+ on(event: string, handler: function)
}
BaileysStartupService --> BaileysClient : client
BaileysClient --> WebSocket : ws
class DisconnectReason {
<<enumeration>>
loggedOut
forbidden
}
class ConnectionCloseHandler {
+ handleClose(statusCode: number)
- recentStream515: boolean
- shouldReconnect: boolean
}
BaileysStartupService ..> ConnectionCloseHandler : uses
ConnectionCloseHandler ..> DisconnectReason : uses
%% Logical behavior (no notes):
%% - When CB:stream:error code 515 received, BaileysStartupService sets _lastStream515At
%% - In connection close handler, recentStream515 is true if now - _lastStream515At < 30000
%% - shouldReconnect is true if statusCode not in noReconnect list or statusCode is loggedOut and recentStream515
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- Consider extracting the
30_000window into a named constant (e.g.,STREAM_515_LOGGED_OUT_GRACE_MS) to make the intent of the timeframe clearer and easier to tweak in the future. - In the
CB:stream:errorhandler, it may be safer to handle both string and numericcodevalues (e.g.,String(node?.attrs?.code) === '515') to avoid relying on a specific type from the underlying library.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider extracting the `30_000` window into a named constant (e.g., `STREAM_515_LOGGED_OUT_GRACE_MS`) to make the intent of the timeframe clearer and easier to tweak in the future.
- In the `CB:stream:error` handler, it may be safer to handle both string and numeric `code` values (e.g., `String(node?.attrs?.code) === '515'`) to avoid relying on a specific type from the underlying library.
## Individual Comments
### Comment 1
<location path="src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts" line_range="430" />
<code_context>
const statusCode = (lastDisconnect?.error as Boom)?.output?.statusCode;
const codesToNotReconnect = [DisconnectReason.loggedOut, DisconnectReason.forbidden, 402, 406];
- const shouldReconnect = !codesToNotReconnect.includes(statusCode);
+ const recentStream515 = Date.now() - this._lastStream515At < 30_000;
+ const shouldReconnect =
+ !codesToNotReconnect.includes(statusCode) ||
</code_context>
<issue_to_address>
**suggestion:** Consider extracting the 30s threshold into a named constant or config for clarity and tuning.
Using the literal `30_000` here makes future tuning harder as we learn more about `515` frequency in production. A named constant or config-backed value (like the cache TTLs) would document the intent and make it easier to adjust without changing code.
Suggested implementation:
```typescript
private logBaileys = this.configService.get<Log>('LOG').BAILEYS;
private eventProcessingQueue: Promise<void> = Promise.resolve();
private _lastStream515At = 0;
// Cache TTL constants (in seconds)
private readonly MESSAGE_CACHE_TTL_SECONDS = 5 * 60; // 5 minutes - avoid duplicate message processing
// Reconnect behavior thresholds (in milliseconds)
private readonly RECENT_STREAM_515_THRESHOLD_MS = 30_000; // 30 seconds - treat 515 as recent
```
```typescript
const statusCode = (lastDisconnect?.error as Boom)?.output?.statusCode;
const codesToNotReconnect = [DisconnectReason.loggedOut, DisconnectReason.forbidden, 402, 406];
const recentStream515 = Date.now() - this._lastStream515At < this.RECENT_STREAM_515_THRESHOLD_MS;
```
</issue_to_address>
### Comment 2
<location path="src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts" line_range="722" />
<code_context>
this.sendDataWebhook(Events.CALL, payload, true, ['websocket']);
});
+ this.client.ws.on('CB:stream:error', (node: any) => {
+ if (node?.attrs?.code === '515') {
+ this._lastStream515At = Date.now();
</code_context>
<issue_to_address>
**suggestion:** Tighten the `node` type instead of using `any` to improve safety and maintainability.
Because this handler is tied to `'CB:stream:error'`, the payload shape should be predictable. Defining a minimal type for `node` (e.g. `{ attrs?: { code?: string } }`) will improve type-checking and make future payload changes safer to handle.
Suggested implementation:
```typescript
type StreamErrorNode = { attrs?: { code?: string } };
this.client.ws.on('CB:stream:error', (node: StreamErrorNode) => {
```
```typescript
if (node.attrs?.code === '515') {
```
</issue_to_address>
### Comment 3
<location path="src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts" line_range="723" />
<code_context>
});
+ this.client.ws.on('CB:stream:error', (node: any) => {
+ if (node?.attrs?.code === '515') {
+ this._lastStream515At = Date.now();
+ }
</code_context>
<issue_to_address>
**suggestion:** Avoid the magic '515' string by introducing a named constant or enum entry.
Using the literal `'515'` here obscures what the code means. Please define a named constant or enum value instead so the intent is clear and future changes to the value are safer.
Suggested implementation:
```typescript
this.sendDataWebhook(Events.CALL, payload, true, ['websocket']);
});
const STREAM_ERROR_RECONNECT_CODE = '515';
this.client.ws.on('CB:stream:error', (node: any) => {
if (node?.attrs?.code === STREAM_ERROR_RECONNECT_CODE) {
this._lastStream515At = Date.now();
}
});
```
If you already have a shared constants file or an enum for WhatsApp/stream error codes in this codebase, it would be better to:
1. Move `STREAM_ERROR_RECONNECT_CODE` into that shared location (e.g., `whatsapp.constants.ts` or similar), and
2. Import and use it here instead of defining it inline within this method.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Thanks for the review @sourcery-ai! I've pushed a follow-up commit (d803c9f) that addresses all three suggestions:
|
|
Hi! As part of our PR triage, we re-ran CI on this PR and the Check Code Quality workflow is failing. Could you please:
Once CI is green I'll re-review for merge. Thanks! |
…m:error 515 When WhatsApp sends stream:error code=515 (Connection Replaced), Baileys handles the reconnect correctly and fires connection.update with state='open'. However, WhatsApp then sends a 401 (loggedOut) to clean up the old session slot, which Evolution API incorrectly treated as a real logout, killing the newly-connected healthy instance. The fix tracks when a stream:error 515 node arrives via the CB:stream:error WebSocket event. If a loggedOut (401) close event fires within 30 seconds of a 515, it is treated as a transient reconnect rather than a real logout. Fixes evolution-foundation#2498
Addresses sourcery-ai review feedback on the previous commit:
- Extract the 30 000ms reconnect grace window into a named class constant
STREAM_515_RECONNECT_GRACE_MS so future tuning is self-documenting
rather than a literal scattered through the close handler.
- Extract the magic '515' string into STREAM_ERROR_CODE_RECONNECT.
- Replace the loose 'node: any' on the 'CB:stream:error' handler with a
minimal structural type ({ attrs?: { code?: string | number } }) so
the payload shape is documented and type-checked.
- Compare the code via String(...) so a numeric 515 from the underlying
socket library still triggers the grace window — the original literal
'515' check would have silently broken on a type change.
…e line Check Code Quality lint failed on prettier/prettier (the recentStream515 expression fits within the project's 120-col printWidth on a single line, while shouldReconnect still needs to break across two). Co-authored-by: Octopus <liyuan851277048@icloud.com>
d803c9f to
1118470
Compare
|
Hi @DavidsonGomes — fixed the two prettier errors in |
Fixes #2498
Problem
When WhatsApp sends
stream:error code=515(Connection Replaced — normal multi-device protocol behavior), Baileys handles the reconnect correctly and firesconnection.updatewithstate='open'. However, WhatsApp then sends a401 loggedOutmessage to clean up the old session slot. Evolution API's close handler incorrectly treated this 401 as a real logout, killing the newly-connected healthy instance.Sequence of events that triggered the bug:
stream:error code=515— WhatsApp notifies of connection replacementCONNECTED TO WHATSAPP ✓— session is open, webhook firesstate='open'401 loggedOutto clean up old session slotlogout.instance→ REMOVED ← bugSolution
Track when a
stream:error 515node arrives via theCB:stream:errorWebSocket event using a class-level timestamp (_lastStream515At). In the connection close handler, if aloggedOut (401)event fires within 30 seconds of a 515, treat it as a transient reconnect side-effect rather than a real logout and callconnectToWhatsapp()instead of destroying the instance.Changes:
private _lastStream515At = 0;class property to track when 515 last occurredCB:stream:errorWebSocket handler increateClient()to record the timestamp when code is'515'shouldReconnectlogic in the connection close handler to allow reconnect when a loggedOut follows within 30s of a 515Testing
The fix can be validated by:
stream:error code=515appears followed byCONNECTED TO WHATSAPP ✓, the instance should remain connected instead of being removedWAMonitoringService: REMOVED/LOGOUTappears after a successful reconnectSummary by Sourcery
Bug Fixes: