fix(collector): use Telemetry API event time as metric timestamp#2288
Conversation
|
|
|
Thank you for working on this @dmedovich! I will take some time tomorrow to take a closer look at your changes! |
| if t, err := time.Parse(time.RFC3339, el.Time); err == nil { | ||
| if ets := pcommon.NewTimestampFromTime(t); ets > r.lastEventTime { | ||
| r.lastEventTime = ets | ||
| } | ||
| } |
There was a problem hiding this comment.
nit: unlikely to have performance impact, but time.Parse(time.RFC3339, el.Time) is now being done here and again in createLogs at:
with different error handling (
createLogs aborts the batch on failure). Not blocking, but if you wanted to keep this aligned the parse could also move to httpHandler's existing loop and feed both recordMetrics and createLogs a parsed value. The httpHandler's loop I'm referring to:
|
@dmedovich one small nit, otherwise LGTM! |
a65afb2 to
3d62233
Compare
|
@wpessers good catch, thx. Moved the parse to httpHandler so both recordMetrics and createLogs share the pre-parsed value. Added a fallback in both methods for cases where they're called directly (like tests). Updated in the latest commit. |
There was a problem hiding this comment.
Couple of remarks.
- First and foremost I appreciate the work that has gone into this and how quick you were to update after my previous feedback.
- Second, it's quite difficult for me to now review your follow-up fix since it seems like you amended your commit and force pushed. We always squash on merge, so please to help us reviewers out just commit your fixes in a new commit. It's easier for everyone involved. And now since I have some doubts about the solution we ended up with, it's hard for me to remember what exactly the initial fix looked like. Brings me to my third point.
- I'm not entirely convinced about the current solution. What is mostly sitting wrong with me is that there is now this fallback code in
receiver.gothat is dead code in production, and is merely there to accommodate the tests. Also it seems like what I asked in my original feedback didn't fully land. Instead of moving the parsing of the timestamp to the already existing loop, you added a second loop to do so. All this considered, I should've made it more clear that the nit I brought up was not to be taken as a "requirement". I much prefer contributors engage in discussion when they disagree, which I highly suspect you did at some point given the changes you've made to accommodate my feedback.
All this to say, I propose you revert back to the original solution you submitted in the PR before my feedback. As I said in my comment back then I don't think there's any performance impact and I'd argue it was more readable than what we have now. Again if you disagree, don't hesitate to say so!
|
@wpessers yeah, i agree we misunderstood each other a bit, and the second loop turned out to be not quite what it should have been. I should have clarified the criteria and implementation details on my side before jumping into it. I reverted the changes back to the original format in 2a74570. I'll be more careful with these kinds of details next time. Thanks for the patience! |
|
@dmedovich thank you! I want to reiterate that most of all though, I really appreciate the effort you put into the fix and the quick follow ups! Merging this now 🚀 |
|
@wpessers big love big brother |
recordMetricson the receiver (lastEventTime).lastEventTimeas the data point timestamp influshMetricsLocked, falling back totime.Now()only when no event has been observed yet (in which caseAppendDataPointsexports nothing anyway).r.mu, which already protects bothrecordMetricsandflushMetricsLocked.This mirrors the behavior of the log pipeline and resolves the metric/log timestamp divergence described in the linked issue.
Testing
TestMetricTimestampMatchesEventTime, which records events with a fixed RFC3339 time, performs a delayed flush, and asserts every Sum/Histogram data point'sTimestamp()matches the event time rather than wall-clocktime.Now().go test ./...andgo vet ./...pass for the package.Fixes #2263