Skip to content

Add iOS/iPadOS app monitoring via OpenTelemetry Swift SDK (SWIP-11)#13828

Merged
wu-sheng merged 9 commits intomasterfrom
feature/swip11-ios-monitoring
Apr 19, 2026
Merged

Add iOS/iPadOS app monitoring via OpenTelemetry Swift SDK (SWIP-11)#13828
wu-sheng merged 9 commits intomasterfrom
feature/swip11-ios-monitoring

Conversation

@wu-sheng
Copy link
Copy Markdown
Member

@wu-sheng wu-sheng commented Apr 18, 2026

Add iOS/iPadOS app monitoring via OpenTelemetry Swift SDK (SWIP-11)

  • If this is non-trivial feature, paste the links/URLs to the design doc. SWIP-11.
  • Update the documentation to include this new feature. Setup guide at docs/en/setup/backend/backend-ios-monitoring.md; SWIP-11 design doc revised.
  • Tests(including UT, IT, E2E) are added to verify the new feature. New oap-cases/ios-metrickit LAL unit tests and test/e2e-v2/cases/ios/ e2e covering HTTP OAL, MetricKit MAL, and LAL log persistence.
  • If it's UI related, attach the screenshots below.

Introduces the IOS layer with two SpanListeners:

  • IOSHTTPSpanListener — outbound HTTP (NSURLSession) client metrics via OAL (service_cpm, endpoint_cpm, etc.). Supports all three OTel Swift URLSessionInstrumentation semconv modes (.old / .stable / .httpDup) via stable-then-legacy attribute fallback (server.addressnet.peer.name, http.request.methodhttp.method, http.response.status_codehttp.status_code).
  • IOSMetricKitSpanListener — daily MetricKit stats (app launch time, hang time, CPU/GPU/memory, network transfer, exit counts split by foreground/background, OOM kills). Histogram percentiles use a finite 30 s overflow ceiling and emit per-bucket counts with defaultHistogramBucketUnit(MILLISECONDS) so MAL's percentile math produces correct values.

Ships LAL rule for MetricKit crash/hang diagnostic logs, MAL rules for service- and instance-level aggregation, iOS Root/Service/Instance/Endpoint dashboards, a Mobile menu entry, user-setup docs, SWIP-11 design doc, and an e2e test registered in the CI workflow.

Bug fixes wrapped in:

  • LAL layer: auto mode dropped logs after the extractor set the layer — codegen now propagates layer "..." assignments to LogMetadata.layer so FilterSpec.doSink() sees the script-decided value.

UI submodule bump (apache/skywalking-booster-ui@a6be0e0):

  • Mobile menu icon + i18n labels (en/es/zh) for the iOS layer.
  • Fix metric label rendering in multi-expression dashboard widgets.

Screenshots

iOS Service dashboard — Overview tab (MetricKit launch/hang percentile charts, Hang Time Sum, Abnormal Exit counts split by foreground/background, OOM kills, outbound HTTP triplet, Peak Memory, Avg Network Transfer, Scroll Hitch Ratio)
iOS Service — Overview tab

iOS Service dashboard — Instance tab (App Version Breakdown table: Launch Time P50 / Hang Time Sum / Crashes / Outbound Load per service.version)
iOS Service — Instance tab (App Version Breakdown)

iOS Service dashboard — Log tab (LAL-persisted MetricKit diagnostic logs: crash / hang / CPU exception records from the ios-metrickit rule)
iOS Service — Log tab (MetricKit diagnostic logs)

iOS Endpoint dashboard (remote domain: Outbound Load, Avg Response Time, Success Rate, Response Time Percentile)
iOS Endpoint dashboard

  • If this pull request closes/resolves/fixes an existing issue, replace the issue number.
  • Update the CHANGES log.

@wu-sheng wu-sheng added backend OAP backend related. feature New feature labels Apr 18, 2026
@wu-sheng wu-sheng added this to the 10.5.0 milestone Apr 18, 2026
wu-sheng and others added 2 commits April 18, 2026 08:14
SWIP-11 iOS-specific implementation:
- IOS(47, true) layer in Layer.java
- ios-analyzer module with two SpanListeners:
  - IOSLayerSpanListener: detect os.name=iOS, register service with Layer.IOS
  - IOSMetricKitSpanListener: extract MetricKit daily stats to MAL, skip trace
- MAL rules (otel-rules/ios/ios-metrickit.yaml): 16 metrics with device/OS labels
- LAL rule (lal/ios-metrickit.yaml): layer:auto crash/hang diagnostic logs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces IOS layer with two SpanListeners:
- IOSHTTPSpanListener: outbound HTTP (NSURLSession) client metrics
  via OAL (service_cpm, endpoint_cpm, etc.). Supports OTel Swift
  .old/.stable/.httpDup semconv modes via stable-then-legacy
  attribute fallback (server.address -> net.peer.name, etc.).
- IOSMetricKitSpanListener: daily MetricKit stats (app launch time,
  hang time, CPU/GPU/memory, network transfer, exit counts split by
  foreground/background, OOM kills). Histogram percentiles use a
  finite 30s overflow ceiling and emit per-bucket counts with the
  MILLISECONDS unit override so MAL percentile math stays accurate.

Ships LAL rule for MetricKit crash/hang diagnostic logs, MAL rules
for service- and instance-level aggregation, iOS Root/Service/
Instance/Endpoint dashboards, Mobile menu entry, user-setup docs,
SWIP-11 design doc, and an e2e test.

Bug fixes wrapped in:
- LAL 'layer: auto' mode dropped logs after the extractor decided
  the layer; codegen now propagates 'layer "..."' assignments to
  LogMetadata.layer so FilterSpec.doSink sees the script-decided
  value.
- Stable-semconv fallback keeps IOSHTTPSpanListener working across
  all three OTel Swift URLSessionInstrumentation modes.

UI submodule bump (apache/skywalking-booster-ui@a6be0e0):
- Mobile menu icon + i18n labels (en/es/zh) for the iOS layer.
- Fix metric label rendering in multi-expression dashboard widgets.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds end-to-end iOS/iPadOS application monitoring support in SkyWalking OAP via the OpenTelemetry Swift SDK, introducing a new IOS layer, span listeners for outbound HTTP and MetricKit metrics/logs, plus corresponding MAL/LAL rules, UI dashboards/menu, documentation, and CI e2e coverage.

Changes:

  • Add IOS layer and ios-analyzer module with IOSHTTPSpanListener (OAL traffic metrics) and IOSMetricKitSpanListener (MetricKit → MAL via OTLP receiver’s MAL pipeline).
  • Add iOS MAL/LAL rules and UI initialized templates (root/service/instance/endpoint dashboards + Mobile menu).
  • Add iOS e2e test case and update docs/SWIP/CHANGES/security guidance.

Reviewed changes

Copilot reviewed 38 out of 38 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/e2e-v2/cases/ios/expected/service.yml Expected service layer registration output for iOS e2e.
test/e2e-v2/cases/ios/expected/metrics-has-value.yml Generic assertion for metrics returning non-empty values.
test/e2e-v2/cases/ios/expected/metrics-has-value-label.yml Generic assertion for metrics returning non-empty labeled values.
test/e2e-v2/cases/ios/expected/logs.yml Expected persisted MetricKit diagnostic log output for e2e.
test/e2e-v2/cases/ios/expected/endpoint.yml Expected remote-domain endpoint output for e2e.
test/e2e-v2/cases/ios/e2e.yaml New iOS e2e: sends OTLP traces/logs, verifies service/metrics/endpoints/logs.
test/e2e-v2/cases/ios/docker-compose.yml e2e compose enabling OTLP traces/logs, iOS MAL rules, and iOS LAL.
oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml Adds “Mobile → iOS” menu entry pointing to iOS docs.
oap-server/server-starter/src/main/resources/ui-initialized-templates/ios/ios-service.json iOS Service dashboard template (HTTP + MetricKit + logs).
oap-server/server-starter/src/main/resources/ui-initialized-templates/ios/ios-root.json iOS root dashboard template (service list + intro text).
oap-server/server-starter/src/main/resources/ui-initialized-templates/ios/ios-instance.json iOS instance (app version) dashboard template.
oap-server/server-starter/src/main/resources/ui-initialized-templates/ios/ios-endpoint.json iOS endpoint (remote domain) dashboard template.
oap-server/server-starter/src/main/resources/otel-rules/ios/ios-metrickit.yaml Service-level MetricKit MAL rules for iOS.
oap-server/server-starter/src/main/resources/otel-rules/ios/ios-metrickit-instance.yaml Instance-level (app version) MetricKit MAL rules for iOS.
oap-server/server-starter/src/main/resources/lal/ios-metrickit.yaml LAL rule to persist MetricKit diagnostic logs with layer:auto.
oap-server/server-starter/pom.xml Pull ios-analyzer into server-starter distribution.
oap-server/server-receiver-plugin/otel-receiver-plugin/src/main/java/org/apache/skywalking/oap/server/receiver/otel/otlp/OpenTelemetryMetricRequestProcessor.java Adds toMeter(...) to feed pre-built SampleFamily into MAL pipeline.
oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java Includes IOS layer folder for UI template initialization.
oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java Adds Layer.IOS.
oap-server/analyzer/pom.xml Adds ios-analyzer module to build.
oap-server/analyzer/log-analyzer/src/test/resources/scripts/lal/test-lal/oap-cases/ios-metrickit.yaml Adds LAL unit test script for iOS MetricKit diagnostics.
oap-server/analyzer/log-analyzer/src/test/resources/scripts/lal/test-lal/oap-cases/ios-metrickit.data.yaml Adds LAL unit test data covering iOS/iPadOS/non-iOS cases.
oap-server/analyzer/log-analyzer/src/main/java/org/apache/skywalking/oap/log/analyzer/v2/provider/log/listener/RecordSinkListener.java Refines comments and removes unused getBuilder() accessor.
oap-server/analyzer/log-analyzer/src/main/java/org/apache/skywalking/oap/log/analyzer/v2/compiler/LALBlockCodegen.java Fix: propagate extractor-decided layer to LogMetadata.layer for layer:auto.
oap-server/analyzer/ios-analyzer/src/main/resources/META-INF/services/org.apache.skywalking.oap.server.core.trace.SpanListener Registers iOS span listeners via SPI.
oap-server/analyzer/ios-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/ios/listener/IOSMetricKitSpanListener.java New listener converting MetricKit spans into MAL samples (no trace persistence).
oap-server/analyzer/ios-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/ios/listener/IOSHTTPSpanListener.java New listener emitting OAL sources for outbound HTTP URLSession spans.
oap-server/analyzer/ios-analyzer/pom.xml New analyzer module definition for iOS listeners.
docs/menu.yml Adds “Mobile Monitoring → iOS” doc navigation entry.
docs/en/swip/SWIP-11.md Updates SWIP-11 to reflect final design/implementation details.
docs/en/setup/backend/backend-ios-monitoring.md Adds iOS monitoring setup guide.
docs/en/security/README.md Adds client-side monitoring security guidance for public-ingress telemetry.
docs/en/changes/changes.md Adds changelog entry for iOS monitoring + related fixes.
CLAUDE.md Adds guidance about full rebuild on cross-module changes.
.github/workflows/skywalking.yaml Registers iOS monitoring e2e case in CI.
.claude/skills/run-e2e/SKILL.md Adds e2e triage and UI-template DB-reset guidance.
.claude/skills/package/SKILL.md Adds packaging skill doc to avoid stale dist/image artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ame in iOS listeners

- IOSHTTPSpanListener: pipe service.version through NamingControl.formatInstanceName()
  so the instance name respects length/format limits.
- IOSMetricKitSpanListener: add CoreModule dependency + NamingControl service;
  format service.name / service.version into MAL labels; early-return CONTINUE
  when service.name is missing/blank to avoid emitting empty-named service entities;
  fall back to 'unknown' instance id when service.version is missing.
The setup-phase curl loop used 'curl && break || sleep 5' which silently
succeeded even when every curl attempt got connection-refused (sleep
returns 0). In CI the OAP container wasn't ready at the 25-second mark,
so all 5 attempts failed and the log was never ingested — causing the
'logs list' verify case to fail after 20 retries with an empty result.

Replace the loop with curl --retry-connrefused --retry-all-errors -f,
and 'set -e' so the step surfaces the failure instead of swallowing it.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class iOS/iPadOS application monitoring support to SkyWalking OAP using telemetry produced by the OpenTelemetry Swift SDK, including new IOS-layer dashboards, MetricKit metrics/log processing, and CI e2e coverage. Also fixes a LAL layer:auto behavior where extractor-assigned layers weren’t propagated into LogMetadata, causing logs to be dropped.

Changes:

  • Introduce Layer.IOS plus iOS SpanListeners for URLSession outbound HTTP OAL metrics and MetricKit → MAL metrics conversion.
  • Add iOS MetricKit MAL rules, MetricKit diagnostic LAL rule, and UI initialized templates + menu entry for Mobile/iOS.
  • Add new iOS e2e case and LAL unit test coverage; update docs, security notice, and changelog.

Reviewed changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/e2e-v2/cases/ios/expected/service.yml Expected service inventory assertions for IOS layer e2e.
test/e2e-v2/cases/ios/expected/metrics-has-value.yml Generic expected shape for metrics queries (non-empty values).
test/e2e-v2/cases/ios/expected/metrics-has-value-label.yml Generic expected shape for metrics queries with labels.
test/e2e-v2/cases/ios/expected/logs.yml Expected persisted MetricKit diagnostic logs assertions.
test/e2e-v2/cases/ios/expected/endpoint.yml Expected endpoint list assertions for domain-based endpoints.
test/e2e-v2/cases/ios/e2e.yaml New e2e scenario: send OTLP traces/logs and verify iOS metrics/log persistence.
test/e2e-v2/cases/ios/docker-compose.yml e2e compose config enabling OTLP handlers, iOS MAL rules, and iOS LAL file.
oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml Add “Mobile → iOS” menu entry in initialized UI templates.
oap-server/server-starter/src/main/resources/ui-initialized-templates/ios/ios-service.json New iOS service dashboard template.
oap-server/server-starter/src/main/resources/ui-initialized-templates/ios/ios-root.json New iOS root dashboard template.
oap-server/server-starter/src/main/resources/ui-initialized-templates/ios/ios-instance.json New iOS instance (app version) dashboard template.
oap-server/server-starter/src/main/resources/ui-initialized-templates/ios/ios-endpoint.json New iOS endpoint (remote domain) dashboard template.
oap-server/server-starter/src/main/resources/otel-rules/ios/ios-metrickit.yaml Service-level MAL rules for MetricKit-derived metrics.
oap-server/server-starter/src/main/resources/otel-rules/ios/ios-metrickit-instance.yaml Instance-level (service.version) MAL rules for MetricKit-derived metrics.
oap-server/server-starter/src/main/resources/lal/ios-metrickit.yaml LAL rule to persist MetricKit diagnostic logs with IOS layer auto-detection.
oap-server/server-starter/pom.xml Include ios-analyzer so iOS SpanListeners ship in the starter distribution.
oap-server/server-receiver-plugin/otel-receiver-plugin/src/main/java/org/apache/skywalking/oap/server/receiver/otel/otlp/OpenTelemetryMetricRequestProcessor.java Add toMeter(...) entry point to push prebuilt SampleFamily into MAL converters.
oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java Register IOS layer for UI template initialization.
oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java Add IOS enum value.
oap-server/analyzer/pom.xml Add new ios-analyzer module to build.
oap-server/analyzer/log-analyzer/src/test/resources/scripts/lal/test-lal/oap-cases/ios-metrickit.yaml New LAL script test case for iOS MetricKit diagnostics.
oap-server/analyzer/log-analyzer/src/test/resources/scripts/lal/test-lal/oap-cases/ios-metrickit.data.yaml Test data + expectations for iOS MetricKit diagnostics LAL rule.
oap-server/analyzer/log-analyzer/src/main/java/org/apache/skywalking/oap/log/analyzer/v2/provider/log/listener/RecordSinkListener.java Clarify builder dispatch semantics; remove unused accessor.
oap-server/analyzer/log-analyzer/src/main/java/org/apache/skywalking/oap/log/analyzer/v2/compiler/LALBlockCodegen.java Fix: propagate extractor-assigned layer into LogMetadata for layer:auto filtering/routing.
oap-server/analyzer/ios-analyzer/src/main/resources/META-INF/services/org.apache.skywalking.oap.server.core.trace.SpanListener Register iOS SpanListeners via SPI.
oap-server/analyzer/ios-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/ios/listener/IOSMetricKitSpanListener.java Convert MetricKit spans into MAL SampleFamily and bypass trace persistence.
oap-server/analyzer/ios-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/ios/listener/IOSHTTPSpanListener.java Emit OAL sources for outbound URLSession client spans (service/instance/endpoint traffic).
oap-server/analyzer/ios-analyzer/pom.xml New analyzer module definition + dependencies.
docs/menu.yml Add “Mobile Monitoring → iOS” doc navigation entry.
docs/en/swip/readme.md Move SWIP-11 from Proposed to Accepted list.
docs/en/swip/SWIP-11.md Update SWIP-11 design doc to match implementation details.
docs/en/setup/backend/backend-ios-monitoring.md New setup guide for iOS monitoring using OTel Swift SDK.
docs/en/security/README.md Expand security guidance for client-side monitoring/ingestion endpoints.
docs/en/changes/changes.md Changelog entry for iOS monitoring + related fixes.
CLAUDE.md Add build guidance for cross-module changes.
.github/workflows/skywalking.yaml Register new iOS e2e case in CI workflow.
.claude/skills/run-e2e/SKILL.md Add debugging guidance for sequential e2e verify failures + UI template refresh notes.
.claude/skills/package/SKILL.md New packaging skill doc to avoid stale-jar docker image builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…vadoc

- application.yml: include ios/* in the default enabledOtelMetricsRules so
  MetricKit MXMetricPayload spans are actually converted to MAL metrics by
  default. Previously, with the starter's default rule list, the
  IOSMetricKitSpanListener's toMeter() was a silent no-op (converters was
  empty) and the span was neither a metric nor a persisted trace.

- OpenTelemetryMetricRequestProcessor: initialize converters to an empty
  list so processMetricsRequest()'s unconditional converters.forEach(...)
  no longer NPEs when no rules are enabled / rule loading produced none.
  toMeter() loses the now-redundant null guard.

- IOSHTTPSpanListener: treat missing HTTP status (0) as FAILURE. Client-side
  errors (DNS, TLS, timeout, connection-refused) produce no response code,
  so the previous status = (code == 0 || code < 400) inflated *_sla metrics
  on exactly the errors users care about. Also clamp latency to >= 0 to
  guard against device clock skew, and cap at Integer.MAX_VALUE.

- OTLPSpanReader: update the spanKind() javadoc to match the actual contract
  (OTLP proto enum name like 'SPAN_KIND_CLIENT', not 'CLIENT').
Regenerated via license-eye dependency resolve after the skywalking-ui
submodule bump to a6be0e0 pulled lodash / lodash-es 4.17.23 -> 4.18.1.
Missed updating this expected file after adding ios/* to the default
enabledOtelMetricsRules in application.yml; the storage e2e compares
/debugging/config/dump output against a static expected file, so
every change to the default rule list must be mirrored here.
spring-ai-examples forwards requests to provider:9090 (mock OpenAI),
but only depended on oap. The trigger began before provider's Tomcat
was listening, producing ConnectException and a flaky 500.
@wu-sheng wu-sheng merged commit 9e57d91 into master Apr 19, 2026
420 of 423 checks passed
@wu-sheng wu-sheng deleted the feature/swip11-ios-monitoring branch April 19, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend OAP backend related. feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants