[refact](udf) remove the udf cache expiration_time property#63897
[refact](udf) remove the udf cache expiration_time property#63897zhangstar333 wants to merge 4 commits into
Conversation
Issue Number: close #xxx <!--Describe your changes.-->
## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
2250be3 to
a1f5113
Compare
There was a problem hiding this comment.
Summary: I found two correctness issues in the new static UDF classloader cache behavior. The main blocker is a first-use race where one executor can close another executor's live URLClassLoader, preserving the NoClassDefFoundError class of failure this PR is trying to eliminate. There is also a regression for static-load UDFs loaded through the system classloader, where a null classLoader is valid but now treated as a cache miss.
Critical checkpoints:
- Goal/test: The PR aims to stop time-based UDF classloader eviction from breaking static-load Java UDFs. The goal is only partially met; concurrent first use can still close a live loader, and system-classloader static UDFs are not cached effectively. I did not find tests covering these concurrency/system-loader paths.
- Scope: The change is focused, but the cache lifecycle semantics changed from synchronized ExpiringMap operations to ConcurrentHashMap replacement without atomic construction.
- Concurrency: The modified static cache is shared by concurrent Java UDF executors and BE clean-cache tasks. The cache miss/build/put path is not atomic, which creates the live-loader close race noted inline.
- Lifecycle: UdfClassCache.classLoader may intentionally be null for system-classloader UDFs; the new validity check does not preserve that lifecycle invariant.
- Configuration/compatibility: expiration_time remains accepted and serialized but is now ignored; this is a user-visible semantic change and should be documented or removed in a coordinated way.
- Parallel paths: DROP FUNCTION cleanup still exists through FE clean-cache tasks and BE JNI cleanup; static-load lookup is the affected path.
- Testing: No new tests were included for concurrent static-load first use, DROP/reload lifecycle, or empty jarPath/system-classloader static UDFs.
- Observability/performance: No additional observability is required for the core issue, but repeated rebuilding for system-classloader static UDFs is avoidable overhead.
- Data/transaction/persistence: Not applicable to data visibility or transaction persistence.
User focus: No additional user-provided review focus was specified.
|
run buildall |
TPC-H: Total hot run time: 31603 ms |
TPC-DS: Total hot run time: 172206 ms |
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Problem Summary:
doc apache/doris-website#3845
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)