-
Notifications
You must be signed in to change notification settings - Fork 17
redo destroy pg #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: stable/v4.x
Are you sure you want to change the base?
redo destroy pg #435
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -449,6 +449,10 @@ void ReplicationStateMachine::write_snapshot_obj(std::shared_ptr< homestore::sna | |
| set_snapshot_context(context); // Update the snapshot context in case apply_snapshot is not called | ||
| auto hs_pg = home_object_->get_hs_pg(m_snp_rcv_handler->get_context_pg_id()); | ||
| hs_pg->pg_state_.clear_state(PGStateMask::BASELINE_RESYNC); | ||
| // we only reset this if destroying pg happens in BR case. for other cases (on_destroy and _exit_pg), | ||
| // since this replica will leave the PG and no later logs will be received, no need to reset this. | ||
| reset_no_space_left_error_info(); | ||
| repl_dev()->reset_latch_lsn(); | ||
| return; | ||
| } | ||
|
|
||
|
|
@@ -499,7 +503,7 @@ void ReplicationStateMachine::write_snapshot_obj(std::shared_ptr< homestore::sna | |
| if (home_object_->pg_exists(pg_data->pg_id())) { | ||
| LOGI("pg already exists, clean pg resources before snapshot, pg={} {}", pg_data->pg_id(), log_suffix); | ||
| // Need to pause state machine before destroying the PG, if fail, let raft retry. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. comments out of date, as well as we dont have a branch that returns false as of now.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let`s remove this out-of-date comments after addressing other comments for this PR |
||
| if (!home_object_->pg_destroy(pg_data->pg_id(), true /* pause state machine */)) { | ||
| if (!home_object_->pg_destroy(pg_data->pg_id())) { | ||
| LOGE("failed to destroy existing pg, let raft retry, pg={} {}", pg_data->pg_id(), log_suffix); | ||
| return; | ||
| } | ||
|
|
@@ -1030,7 +1034,42 @@ void ReplicationStateMachine::on_log_replay_done(const homestore::group_id_t& gr | |
| const auto pg_id = pg_id_opt.value(); | ||
| RELEASE_ASSERT(home_object_->pg_exists(pg_id), "pg={} should exist, but not! fatal error!", pg_id); | ||
|
|
||
| const auto& shards_in_pg = (const_cast< HSHomeObject::HS_PG* >(home_object_->_get_hs_pg_unlocked(pg_id)))->shards_; | ||
| const auto hs_pg = (const_cast< HSHomeObject::HS_PG* >(home_object_->get_hs_pg(pg_id))); | ||
| RELEASE_ASSERT(hs_pg, "Failed to get pg={} when log replay done", pg_id); | ||
| if (hs_pg->pg_sb_->state == PGState::DESTROYED) { | ||
| // if we find a pg with a state of destroyed in recovery path, we can make sure that pg_destroy was called and | ||
| // crash occured before it was completed. | ||
|
|
||
| // pg_destroy will be called in the following scenarios: | ||
| // 1 baseline resync: when the first snapshot message is received, if the pg already exists, we will call | ||
| // pg_destroy to clean up the stale pg resources before resync. | ||
|
|
||
| // 2 exit_pg: when processing exit_pg request, we will call pg_destroy to destroy the pg resources. | ||
|
|
||
| // 3 RaftReplDev::leave() is called and thus RaftReplDev::permanent_destroy() is called from nuraft_mesg. and | ||
| // RaftReplDev::leave() will be called in the following scenarios: | ||
| // a. destroy_repl_dev: commit log entry of journal_type_t::HS_CTRL_DESTROY | ||
| // b. removed from raft group: the move_out member in replace member. | ||
|
|
||
| // for baseline resync, no need to redo destroy pg here since the first snapshot message will be received again | ||
| // and trigger pg_destroy again if the pg already exists. but for other cases, we need to redo destory pg to | ||
| // clean up the stale pg resources since no message will be received to trigger pg_destroy again. so, we call it | ||
| // here for all the above cases to make sure the stale pg resources are cleaned up. | ||
|
|
||
| // there is also a concern that in baseline resync case, if some resource is destroyed in pg_destroy but not all | ||
| // the resources are destroyed before crash, then when recovery, log replay will hit those destroyed resources | ||
| // and cause error. for example, we have a put_blob log after cp_lsn and before dc_lsn, and the pg_index_table | ||
| // is destory but before pg_super_blk is destroyed, crash happens. when recovery, log replay will hit the | ||
| // put_blob log and try to write to the index table, but since the index table is destroyed, it will cause | ||
| // error. but actually, this is not a problem. since before we starting pg_destroy in baseline resync, | ||
| // m_rd_sb->last_snapshot_lsn will be persisted upto the snapshot.get_last_log_idx(). then all the log less than | ||
| // or equal to m_rd_sb->last_snapshot_lsn will not be replayed or committed after recovery. so, the concern is | ||
|
Comment on lines
+1064
to
+1066
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @xiaoxichen in baseline resync case , before we start destroying pg, we will m_rd_sb->last_snapshot_lsn upto snapshot.get_last_log_idx(). then raft_repl_dev#need_skip_processing will help us skipping replaying all the logs in recovery path(so that we will not hit those destroyed resources , like pg_index_table, etc.). so we don`t need wait for all the appended log to be committed in pg_destroy for BR case
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so basically you reverted your changes
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dont get the point of this comments, you said for br no need to redo destroy PG but we do it anyway. The concern of log replay vs destroy is not valid here ... if we reach here the log replay had been done...If we want to record the thinking why waiting for log commit is not needed, better rephrase this paragraph and move it to Similar for L1043-1052, those lines explains the situations that a PG can be destroy, better to move to |
||
| // not valid. pls refer to raft_repl_dev#need_skip_processing for more details. | ||
| home_object_->destroy_pg_resource(pg_id); | ||
|
xiaoxichen marked this conversation as resolved.
|
||
| return; | ||
| } | ||
|
|
||
| const auto& shards_in_pg = hs_pg->shards_; | ||
| auto chunk_selector = home_object_->chunk_selector(); | ||
|
|
||
| for (const auto& shard_iter : shards_in_pg) { | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.