Skip to content

MIP SDK file 1.18.124 hangs indefinitely on Linux #1563

@tamir-laminar

Description

@tamir-laminar

Issue

File SDK 1.18.124 on Linux: default HttpDelegate hangs indefinitely during PolicyEngine fetch (dataservice.protection.outlook.com/PsorWebService) — request submitted, response never read

Versions

MIP SDK version: mip_sdk_file_ubuntu2404_1.18.124 (also reproduces with mip_sdk_file_rhel9_1.18.124)
Environment: Ubuntu 24.04 container; reproduces on both native amd64 (Intel Xeon, Ubuntu 20.04 host) and amd64-under-qemu (Apple Silicon). Same code worked correctly on 1.16.126.

Description

Symptom

FileEngine::AddEngineAsync future never resolves when MipConfiguration is created with isOfflineOnly=false. Reproduces 100% on first call after MipContext::Create / FileProfile::LoadAsync.

Last log lines before the wedge

http_director_impl.cpp:150  "Sending HTTP request: ID: {…}, Type: GET,
  Url: https://dataservice.protection.outlook.com/PsorWebService/v1/ClientSyncFile/MipPolicies,
  Headers['Authorization'] = 'UOID:…;Tenant:…;Audience:https://syncservice.o365syncservice.com/;Roles:UnifiedPolicy.Tenant.Read;…'"

Nothing is ever logged after this line. No HttpResponse, no error, no timeout. Our custom AuthDelegate returned a valid token in 710 ms (also logged by MIP at auth_request_transformer.cpp:180).

Smoking gun — kernel state while wedged

While hung, in the container:

$ cat /proc/<pid>/net/tcp        # decoded IPs
local     remote                              state
…:39476   52.102.115.192:443 (dataservice…)   CLOSE_WAIT

CLOSE_WAIT means the server sent FIN and the kernel delivered it, but the SDK never closed the socket or read the response. This is application-side, not network.

Confirmation — no MIP thread is polling sockets

$ for t in /proc/<pid>/task/*; do
    echo "$(basename $t) $(cat $t/comm) $(cat $t/wchan)"
  done
…
26 OneDS Task Disp   futex_wait_queue
28 Policy Profile    futex_wait_queue
29 Policy Profile    futex_wait_queue
…   (60+ MIP threads, all in futex_wait_queue)
18 cdm_function_ap   do_epoll_wait     ← Go runtime netpoller, not MIP

Every MIP thread is sleeping on a futex. The single epoll_wait thread is the host application's Go netpoller, not MIP's. No MIP thread is waiting on the socket events that the kernel has ready. The HTTP director's internal libcurl multi-handle / event loop has wedged.

Network is fine

From inside the same container, same libcurl.so.4, same /etc/ssl/certs:

$ curl -v --max-time 30 https://dataservice.protection.outlook.com/PsorWebService/v1/ClientSyncFile/MipPolicies
< HTTP/2 401   (in 10 ms — 401 is expected, no token)
< policysync-auth-error: No valid OAuth token or Client Certificate were found.

Endpoint is reachable, cert chain validates, response is fast.

Workaround that confirms the bug is in MIP's HTTP layer

Implementing a custom mip::HttpDelegate using curl_easy_perform (synchronous, one easy handle per request — same libcurl.so.4 MIP uses internally) and registering it via MipConfiguration::SetHttpDelegate(...) resolves the hang completely:

  • AddEngineAsync future resolves
  • HTTP requests complete in expected timeframes (~100–700 ms)
  • Both success (200) and failure (400/401/500) responses surface cleanly to MIP and propagate up as expected NetworkError / NoAuthTokenError

This pinpoints the bug to MIP's default HTTP delegate (the libcurl multi-handle / event-loop wrapper), not to libcurl, OpenSSL, the network, or the Authorization header format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions