b2b_entities: fix crash calling b2b_init_request from async resume route#3822
b2b_entities: fix crash calling b2b_init_request from async resume route#3822NormB wants to merge 1 commit into
Conversation
When b2b_init_request() is called from an async resume route (e.g. after rest_post), the SIP message is a TM faked request whose parsed headers are memcpy'd SHM pointers from the transaction copy. b2b_apply_lumps() calls free_sip_msg() which invokes pkg_free() on those SHM pointers, triggering "bad pointer (out of memory block!) - aborting" and crashing the worker process. Guard b2b_apply_lumps() with a check for FL_TM_FAKE_REQ (already set by TM's fake_req()) and return early, mirroring how TM's own free_faked_req() avoids freeing the SHM header list. Fixes: OpenSIPS#3796
|
Hi @NormB , thanks for the detailed report here. Even if I fully agree on how the crash happens, the fix is not 100% correct - it avoids the crash, indeed, but it misbehave, in the sense of not pushing (into the UAC side of the b2b) any lump/change that was done prior to the async jump. |
|
@bogdan-iancu I will be able to test and validate. |
Summary
Fixes a crash (
fm_free: freeing dangling pkg pointer/qm_free_dbg: bad pointer (out of memory block\!) - aborting) whenb2b_init_request()is called from an async resume route (e.g. afterrest_post()). The worker process aborts with signal 6 (SIGABRT) and a core dump is generated, taking down the entire OpenSIPS instance.Details
When an initial INVITE is processed with an async operation (e.g.
async(rest_post(...), resume_route)), the TM module:t_newtran(), including all parsed headers and any lumps added before the async call (e.g. fromrecord_route(),fix_nated_contact(),append_hf(), etc.)t_resume_async_request()intm/async.ccallsfake_req()(tm/t_msgbuilder.h:194) to build a faked SIP message for the resume routefake_req()doesmemcpy(faked_req, shm_msg, sizeof(struct sip_msg))at line 202 — this copies the entiresip_msgstruct, meaning the faked request'smsg->headers,msg->add_rm, andmsg->body_lumpspointers all point into SHM memory (they are the same SHM pointers from the transaction's stored request)fake_req()then PKG-duplicates only specific fields (new_uri,dst_uri,path_vec,set_global_address,set_global_port) and setsfaked_req->msg_flags |= FL_TM_FAKE_REQat line 211When the resume route calls
b2b_init_request(), the script-level handlerb2bl_script_init_request()(b2b_logic/logic.c:3228) callsb2b_api.apply_lumps(msg)directly at line 3245 — before any B2B entities are created. This dispatches tob2b_apply_lumps()inb2b_entities/dlg.c, where the following crash sequence occurs:b2b_apply_lumps()checksif(\!msg->body_lumps && \!msg->add_rm) return 0;— if any module added lumps before the async call, these SHM lump pointers are non-NULL, so execution continuesbuild_req_buf_from_sip_req()processes the lumps to build a new bufferfree_sip_msg(msg)is called to free the old message structurefree_sip_msg()(parser/msg_parser.c:965), the call tofree_hdr_field_lst(msg->headers)at line 980 walks the parsed header linked list and callspkg_free()on each header nodepkg_free()on SHM pointers triggers the memory allocator's safety check and aborts the processTM's own
free_faked_req()(tm/t_msgbuilder.h:345) is designed to handle this correctly — it carefully frees only the PKG-duplicated fields and never callsfree_hdr_field_lst()on the headers. The b2b_entities module'sb2b_apply_lumps()was missing this awareness.The crash is deterministic: it occurs on every call to
b2b_init_request()from an async resume route, provided any module added lumps to the request before the async call. Common lump-producing functions includerecord_route(),fix_nated_contact(),append_hf(),remove_hf(), and many others. In production configurations, lumps are almost always present.Note that
b2b_apply_lumps()is called from 6 sites — not onlyb2bl_script_init_request()(logic.c:3245) but alsob2bl_script_bridge()(bridging.c:2290) and 4 internal call sites withindlg.citself (lines 1103, 3306, 3373, 3755). The fix in the function body protects all of them.Affected versions: The bug exists in all versions where
b2b_apply_lumps()exists alongside TM async support. Confirmed on 3.6.2 (reporter) and 4.0.0-dev (our testing).Solution
Add a guard at the top of
b2b_apply_lumps()that checks theFL_TM_FAKE_REQflag and returns early:This mirrors TM's own approach —
FL_TM_FAKE_REQ(1<<18, defined inparser/msg_parser.h:121) is already set byfake_req()attm/t_msgbuilder.h:211precisely to mark faked requests. TM'sfree_faked_req()uses this same awareness to avoid freeing SHM headers.Skipping lump application is safe for faked requests because:
b2b_apply_lumps()returning 0 simply means "no modifications were made", which all 6 callers already handleReproduction
Tested on OpenSIPS 4.0.0-dev (x86_64/linux,
DBG_MALLOCenabled) with this minimal config:A simple Python HTTP server on port 8899 with a 100ms delay ensures the async machinery activates. A SIP INVITE sent via
nc -utriggers the flow.record_route)CRITICAL: qm_free_dbg: bad pointer (out of memory block\!) - aborting, worker killed by signal 6, core dump generated, entire OpenSIPS instance terminatesrecord_route)b2b_init_requestcompletes successfully, B2B session established, process stablerecord_route)b2b_apply_lumpsreturns early at the\!msg->body_lumps && \!msg->add_rmcheck, confirming the lump-dependent triggerCompatibility
No backward compatibility issues. The fix only adds an early-return guard for a specific message type (TM faked requests) that was previously crashing. No behavior change for normal (non-faked) SIP messages. The
FL_TM_FAKE_REQflag is defined in core headers (parser/msg_parser.h) and is available to all modules.Closing issues
Closes #3796