diff --git a/docs/advanced/input_files/input-main.md b/docs/advanced/input_files/input-main.md index b5f458c21ea..8df45d94d3c 100644 --- a/docs/advanced/input_files/input-main.md +++ b/docs/advanced/input_files/input-main.md @@ -394,6 +394,8 @@ - [sccut](#sccut) - [sc\_drop\_thr](#sc_drop_thr) - [sc\_scf\_thr](#sc_scf_thr) + - [sc\_direction\_only](#sc_direction_only) + - [sc\_lambda\_strategy](#sc_lambda_strategy) - [vdW correction](#vdw-correction) - [vdw\_method](#vdw_method) - [vdw\_s6](#vdw_s6) @@ -3481,8 +3483,8 @@ - **Type**: Integer - **Description**: Determines whether to calculate the plus U correction, which is especially important for correlated electrons. - - 1: Calculate plus U correction with radius-adjustable localized projections (with parameter onsite_radius). - - 2: Calculate plus U correction using first zeta of NAOs as projections (this is old method for testing). + - 1: Calculate plus U correction with radius-adjustable localized projections (with parameter onsite_radius). Supported for both PW and LCAO basis sets. + - 2: Calculate plus U correction using first zeta of NAOs as projections (this is old method for testing). Only available for LCAO basis. - 0: Do not calculate plus U correction. - **Default**: 0 @@ -3629,6 +3631,24 @@ - **Description**: Density error threshold for inner loop of spin-constrained SCF - **Default**: 1.0e-4 +### sc_direction_only + +- **Type**: Boolean +- **Availability**: *sc_mag_switch is true* +- **Description**: When true, only the direction of the magnetic moment is constrained to the target direction, while the magnitude is allowed to vary freely. This is useful for studying magnetic anisotropy or when the magnitude of the moment is determined by the electronic structure rather than an external constraint. When false (default), both the direction and magnitude of the magnetic moment are constrained to the target values. +- **Default**: False + +### sc_lambda_strategy + +- **Type**: String +- **Availability**: *sc_mag_switch is true* +- **Description**: Lambda update strategy for spin-constrained DFT. Available options are: + - bfgs: BFGS quasi-Newton method (default, robust and well-tested) + - linear_response: linear response method (Scheme B) + - augmented_lagrangian: augmented Lagrangian method (Scheme C) + - hybrid_delayed: hybrid delayed update (Scheme D) +- **Default**: bfgs + [back to top](#full-list-of-input-keywords) ## vdW correction diff --git a/docs/advanced/scf/construct_H.md b/docs/advanced/scf/construct_H.md index 69a22ad80e9..3100b934876 100644 --- a/docs/advanced/scf/construct_H.md +++ b/docs/advanced/scf/construct_H.md @@ -77,6 +77,6 @@ Here, we use a simple [example calculation](https://github.com/deepmodeling/abac Conventional functionals, e.g., L(S)DA and GGAs, encounter failures in strongly correlated systems, usually characterized by partially filled *d*/*f* shells. These include transition metals (TM) and their oxides, rare-earth compounds, and actinides, to name a few, where L(S)DA/GGAs typically yield quantitatively or even qualitatively wrong results. To address this failure, an efficient and successful method named DFT+*U*, which inherits the efficiency of L(S)DA/GGA but gains the strength of the Hubbard model in describing the physics of strongly correlatedsystems, has been developed. -Now the DFT+*U* method is accessible in ABACUS. The details of the DFT+*U* method could be found in this [paper](https://doi.org/10.1063/5.0090122). It should be noted that the DFT+*U* works only within the NAO scheme, which means that the value of the keyword `basis_type` must be lcao when DFT+*U* is called. To turn on DFT+*U*, users need to set the value of the `dft_plus_u` keyword in the `INPUT` file to be 1. All relevant parmeters used in DFT+*U* calculations are listed in the [DFT+*U* correction](../input_files/input-main.md#dftu-correction) part of the [list of keywords](../input_files/input-main.md). +Now the DFT+*U* method is accessible in ABACUS. The details of the DFT+*U* method could be found in this [paper](https://doi.org/10.1063/5.0090122). DFT+*U* is supported for both LCAO (`basis_type = lcao`) and plane-wave (`basis_type = pw`) basis sets. For the PW basis, `dft_plus_u = 1` (radius-adjustable localized projections) is supported with `nspin = 1`, `2`, or `4`. For the LCAO basis, both `dft_plus_u = 1` and `dft_plus_u = 2` are available. To turn on DFT+*U*, users need to set the value of the `dft_plus_u` keyword in the `INPUT` file to be 1. All relevant parameters used in DFT+*U* calculations are listed in the [DFT+*U* correction](../input_files/input-main.md#dftu-correction) part of the [list of keywords](../input_files/input-main.md). Examples of DFT+*U* calculations are provided in this [directory](https://github.com/deepmodeling/abacus-develop/tree/develop/examples/dft_plus_u). diff --git a/docs/advanced/scf/spin.md b/docs/advanced/scf/spin.md index 1749db156dc..2de590e3c28 100644 --- a/docs/advanced/scf/spin.md +++ b/docs/advanced/scf/spin.md @@ -28,6 +28,224 @@ If **"ocp=1"** and **"ocp_set"** is set in INPUT file, the occupations of states 2. **"nupdown"** If **"nupdown"** is set to non-zero, number of spin-up and spin-down electrons will be fixed, and Fermi energy level will split to E_Fermi_up and E_Fermi_down. By the way, total magnetization will also be fixed, and will be the value of **"nupdown"**. +## DeltaSpin (Spin-Constrained DFT) + +DeltaSpin is a spin-constrained DFT method that allows users to constrain the magnetic moments on individual atoms to target values during self-consistent field (SCF) calculations. This is useful for studying magnetic excitations, non-collinear magnetic structures, and systems where the magnetic ground state is not known a priori. + +The theoretical foundation and implementation details can be found in: + +- Cai Z, Wang K, Xu Y, et al., "A self-adaptive first-principles approach for magnetic excited states," *Quantum Frontiers* 2.1 (2023): 21. [DOI: 10.1007/s44214-023-00050-z](https://doi.org/10.1007/s44214-023-00050-z) +- Zheng D, Peng X, Huang Y, et al., "Integrating deep-learning-based magnetic model and non-collinear spin-constrained method: methodology, implementation and application," *npj Computational Materials* (2026). + +### Enabling DeltaSpin + +Set `sc_mag_switch 1` in the INPUT file. DeltaSpin is supported for both PW (`basis_type = pw`) and LCAO (`basis_type = lcao`) basis sets, with `nspin = 2` (collinear) or `nspin = 4` (non-collinear). + +### Specifying Target Magnetic Moments in STRU + +Target magnetic moments and constraint flags are specified per atom in the `ATOMIC_POSITIONS` section of the STRU file, using the `mag` (or `magmom`), `sc`, `lambda`, `angle1`, and `angle2` keywords after the atomic coordinates. + +#### Collinear (nspin=2) + +For collinear spin, only the z-component of the magnetic moment is constrained: + +``` +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 sc 1 +0.51 0.51 0.51 mag -2.0 sc 1 +``` + +- `mag 2.0`: target magnetic moment of 2.0 $\mu_B$ along z-axis +- `sc 1`: constrain the z-component (1 = constrained, 0 = unconstrained) + +#### Non-collinear (nspin=4), vector form + +For non-collinear spin, specify the magnetic moment as a vector (mx, my, mz): + +``` +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 mag 0.0 0.0 -2.0 sc 1 1 1 +``` + +- `mag 2.0 0.0 0.0`: target moment vector in Cartesian coordinates ($\mu_B$) +- `sc 1 1 1`: constrain x, y, z components respectively + +#### Non-collinear (nspin=4), angle form + +Alternatively, use `angle1` (polar angle $\theta$) and `angle2` (azimuthal angle $\phi$) in degrees to specify the direction: + +``` +0.00 0.00 0.00 mag 2.0 angle1 0 angle2 0 sc 1 1 1 +0.51 0.51 0.51 mag 2.0 angle1 180 angle2 0 sc 1 1 1 +``` + +The Cartesian components are computed as: +- $m_z = |\mathbf{m}| \cos\theta$ +- $m_x = |\mathbf{m}| \sin\theta \cos\phi$ +- $m_y = |\mathbf{m}| \sin\theta \sin\phi$ + +#### Providing initial Lagrange multipliers + +Initial lambda values (in eV/$\mu_B$) can be provided via the `lambda` keyword to accelerate convergence: + +``` +0.00 0.00 0.00 mag 2.0 lambda 0.01 0.0 0.0 sc 1 1 1 +``` + +A single value sets $\lambda_z$; three values set $\lambda_x$, $\lambda_y$, $\lambda_z$. + +#### Partial constraints + +Set `sc 0` for unconstrained components. For example, to constrain only the direction but not the magnitude (use with `sc_direction_only`): + +``` +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 0 +``` + +### DeltaSpin INPUT Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `sc_mag_switch` | Boolean | False | Enable DeltaSpin | +| `sc_thr` | Real | 1.0e-6 | Convergence criterion for lambda loop (RMS, in $\mu_B$) | +| `nsc` | Integer | 100 | Maximum number of lambda iterations | +| `nsc_min` | Integer | 2 | Minimum number of lambda iterations | +| `sc_scf_nmin` | Integer | 2 | Minimum outer SCF iterations before starting lambda loop | +| `alpha_trial` | Real | 0.01 | Initial trial step size for lambda (eV/$\mu_B^2$) | +| `sccut` | Real | 3.0 | Maximum step size for lambda (eV/$\mu_B$) | +| `sc_drop_thr` | Real | 1.0e-2 | Convergence ratio threshold for adaptive lambda loop | +| `sc_scf_thr` | Real | 1.0e-4 | Density error threshold for entering lambda loop | +| `sc_direction_only` | Boolean | False | Constrain only the direction, not the magnitude | +| `sc_lambda_strategy` | String | bfgs | Lambda update strategy (see below) | +| `decay_grad_switch` | Boolean | False | Enable gradient-based early exit | + +For full parameter details, see the [Spin-Constrained DFT](../input_files/input-main.md#spin-constrained-dft) section of the input keyword list. + +### Lambda Update Strategies + +The `sc_lambda_strategy` parameter controls how the Lagrange multipliers $\lambda$ are updated during the lambda loop: + +- **`bfgs`** (default): BFGS quasi-Newton method with line search. Robust and well-tested for both PW and LCAO. Uses `alpha_trial` and `sccut` to control step size. + +- **`linear_response`**: Linear response method (Scheme B). Estimates the magnetic susceptibility $\chi$ from the history of $(\lambda, M)$ pairs and performs a one-step Newton-like update: $\Delta\lambda = \beta (M_{\text{target}} - M) / \chi$, where $\beta$ is a mixing parameter. + +- **`augmented_lagrangian`**: Augmented Lagrangian method (Scheme C). Uses a penalty parameter $\mu$ that grows over iterations: $\lambda_{\text{new}} = \lambda + \mu (M - M_{\text{target}})$. The penalty increases until convergence is achieved. + +- **`hybrid_delayed`**: Hybrid delayed update (Scheme D). Two-phase approach: in the early phase (SCF not yet converged), lambda updates are gentle; in the late phase (SCF nearly converged), augmented Lagrangian updates are applied. + +### Direction-Only Mode + +When `sc_direction_only 1` is set, only the **direction** of the magnetic moment is constrained to match the target, while the magnitude is allowed to vary freely. This is useful for: + +- Studying magnetic anisotropy energy surfaces +- Cases where the moment magnitude is determined by the electronic structure +- Converging to the easy-axis direction without fixing the moment size + +In this mode, the lambda vector is projected to be perpendicular to the target moment direction at each iteration, ensuring it can only rotate the magnetization, not stretch it. + +### Combining DeltaSpin with DFT+U + +DeltaSpin can be combined with DFT+U for strongly correlated systems. When both `sc_mag_switch` and `dft_plus_u` are enabled: + +1. DFT+U occupation update runs first in each SCF iteration +2. DeltaSpin lambda loop runs after, constraining the magnetic moments +3. The DFT+U-corrected Hamiltonian is used by the lambda loop + +Example INPUT for PW DFT+U + DeltaSpin: + +``` +INPUT_PARAMETERS +calculation scf +basis_type pw +ecutwfc 50 +nspin 2 +dft_plus_u 1 +orbital_corr -1 2 +hubbard_u 0.0 4.0 +sc_mag_switch 1 +sc_thr 1.0e-6 +sc_scf_thr 1.0e-4 +sc_lambda_strategy bfgs +``` + +### Example: Collinear antiferromagnetic Fe + +INPUT file: + +``` +INPUT_PARAMETERS +calculation scf +basis_type pw +ecutwfc 50 +nspin 2 +sc_mag_switch 1 +sc_thr 1.0e-6 +``` + +STRU file: + +``` +ATOMIC_SPECIES +Fe 55.845 Fe.upf + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 + +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 sc 1 +0.51 0.51 0.51 mag -2.0 sc 1 +``` + +### Example: Non-collinear constrained moments + +INPUT file: + +``` +INPUT_PARAMETERS +calculation scf +basis_type pw +ecutwfc 50 +nspin 4 +noncolin 1 +sc_mag_switch 1 +sc_direction_only 1 +sc_lambda_strategy bfgs +``` + +STRU file: + +``` +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 0 +0.51 0.51 0.51 mag 0.0 0.0 2.0 sc 1 1 0 +``` + ## Noncollinear Spin Polarized Calculations The spin non-collinear polarization calculation corresponds to setting **"noncolin 1"**, in which case the coupling between spin up and spin down will be taken into account. In this case, nspin is automatically set to 4, which is usually not required to be specified manually. diff --git a/docs/parameters.yaml b/docs/parameters.yaml index 63ee83376c5..0b34a9543da 100644 --- a/docs/parameters.yaml +++ b/docs/parameters.yaml @@ -4266,6 +4266,26 @@ parameters: default_value: "1.0e-4" unit: "" availability: sc_mag_switch is true + - name: sc_direction_only + category: Spin-Constrained DFT + type: Boolean + description: | + When true, only the direction of the magnetic moment is constrained to the target direction, while the magnitude is allowed to vary freely. This is useful for studying magnetic anisotropy or when the magnitude of the moment is determined by the electronic structure rather than an external constraint. When false (default), both the direction and magnitude of the magnetic moment are constrained to the target values. + default_value: "False" + unit: "" + availability: sc_mag_switch is true + - name: sc_lambda_strategy + category: Spin-Constrained DFT + type: String + description: | + Lambda update strategy for spin-constrained DFT. Available options are: + * bfgs: BFGS quasi-Newton method (default, robust and well-tested) + * linear_response: linear response method (Scheme B) + * augmented_lagrangian: augmented Lagrangian method (Scheme C) + * hybrid_delayed: hybrid delayed update (Scheme D) + default_value: "bfgs" + unit: "" + availability: sc_mag_switch is true - name: qo_switch category: Quasiatomic Orbital (QO) analysis type: Boolean diff --git a/source/source_base/kernels/cuda/math_kernel_op.cu b/source/source_base/kernels/cuda/math_kernel_op.cu index c5b0648c49b..062ebe0e765 100644 --- a/source/source_base/kernels/cuda/math_kernel_op.cu +++ b/source/source_base/kernels/cuda/math_kernel_op.cu @@ -314,6 +314,9 @@ void gemm_op, base_device::DEVICE_GPU>::operator()(const ch { cublasOperation_t cutransA = judge_trans_op(true, transa, "gemm_op"); cublasOperation_t cutransB = judge_trans_op(true, transb, "gemm_op"); + if (cublas_handle == nullptr) { + CHECK_CUBLAS(cublasCreate(&cublas_handle)); + } CHECK_CUBLAS(cublasZgemm(cublas_handle, cutransA, cutransB, m, n ,k, (double2*)alpha, (double2*)a , lda, (double2*)b, ldb, (double2*)beta, (double2*)c, ldc)); } diff --git a/source/source_base/main.cpp b/source/source_base/main.cpp index 9a32f11d289..ec5db9d3266 100644 --- a/source/source_base/main.cpp +++ b/source/source_base/main.cpp @@ -36,7 +36,7 @@ void calculate() /* time_t time_start = std::time(NULL); -// ModuleBase::timer::start(); +// ModuleBase::timer::tick(); //---------------------------------------------------------- // main program for doing electronic structure calculations diff --git a/source/source_base/module_container/base/macros/cuda.h b/source/source_base/module_container/base/macros/cuda.h index 572eecdffd0..521861664a6 100644 --- a/source/source_base/module_container/base/macros/cuda.h +++ b/source/source_base/module_container/base/macros/cuda.h @@ -67,11 +67,13 @@ struct GetTypeCuda { static constexpr cudaDataType cuda_data_type = cudaDataType::CUDA_R_64F; }; +#if CUDA_VERSION >= 11000 template <> struct GetTypeCuda { static constexpr cudaDataType cuda_data_type = cudaDataType::CUDA_R_64I; }; +#endif template <> struct GetTypeCuda> { diff --git a/source/source_base/module_container/base/third_party/cusolver.h b/source/source_base/module_container/base/third_party/cusolver.h index 529109823df..43e97856153 100644 --- a/source/source_base/module_container/base/third_party/cusolver.h +++ b/source/source_base/module_container/base/third_party/cusolver.h @@ -19,6 +19,8 @@ namespace container { namespace cuSolverConnector { +#if CUDA_VERSION >= 11000 +// Generic API (CUDA 11.0+) template static inline void trtri (cusolverDnHandle_t& cusolver_handle, const char& uplo, const char& diag, const int& n, T* A, const int& lda) @@ -37,7 +39,7 @@ void trtri (cusolverDnHandle_t& cusolver_handle, const char& uplo, const char& d int h_info = 0; int* d_info = nullptr; CHECK_CUDA(cudaMalloc((void**)&d_info, sizeof(int))); - // Perform Cholesky decomposition + // Perform triangular matrix inversion CHECK_CUSOLVER(cusolverDnXtrtri(cusolver_handle, cublas_fill_mode(uplo), cublas_diag_type(diag), n, GetTypeCuda::cuda_data_type, reinterpret_cast(A), n, d_work, d_lwork, h_work, h_lwork, d_info)); CHECK_CUDA(cudaMemcpy(&h_info, d_info, sizeof(int), cudaMemcpyDeviceToHost)); if (h_info != 0) { @@ -47,6 +49,57 @@ void trtri (cusolverDnHandle_t& cusolver_handle, const char& uplo, const char& d CHECK_CUDA(cudaFree(d_work)); CHECK_CUDA(cudaFree(d_info)); } +#else +// Legacy API fallback (CUDA < 11.0) +static inline void trtri(cusolverDnHandle_t& cusolver_handle, const char& uplo, const char& diag, const int& n, float* A, const int& lda) +{ + int lwork = 0; + CHECK_CUSOLVER(cusolverDnStrtri_bufferSize(cusolver_handle, cublas_fill_mode(uplo), cublas_diag_type(diag), n, A, lda, &lwork)); + float* d_work = nullptr; + CHECK_CUDA(cudaMalloc((void**)&d_work, lwork * sizeof(float))); + int* d_info = nullptr; + CHECK_CUDA(cudaMalloc((void**)&d_info, sizeof(int))); + CHECK_CUSOLVER(cusolverDnStrtri(cusolver_handle, cublas_fill_mode(uplo), cublas_diag_type(diag), n, A, lda, d_work, lwork, d_info)); + CHECK_CUDA(cudaFree(d_work)); + CHECK_CUDA(cudaFree(d_info)); +} +static inline void trtri(cusolverDnHandle_t& cusolver_handle, const char& uplo, const char& diag, const int& n, double* A, const int& lda) +{ + int lwork = 0; + CHECK_CUSOLVER(cusolverDnDtrtri_bufferSize(cusolver_handle, cublas_fill_mode(uplo), cublas_diag_type(diag), n, A, lda, &lwork)); + double* d_work = nullptr; + CHECK_CUDA(cudaMalloc((void**)&d_work, lwork * sizeof(double))); + int* d_info = nullptr; + CHECK_CUDA(cudaMalloc((void**)&d_info, sizeof(int))); + CHECK_CUSOLVER(cusolverDnDtrtri(cusolver_handle, cublas_fill_mode(uplo), cublas_diag_type(diag), n, A, lda, d_work, lwork, d_info)); + CHECK_CUDA(cudaFree(d_work)); + CHECK_CUDA(cudaFree(d_info)); +} +static inline void trtri(cusolverDnHandle_t& cusolver_handle, const char& uplo, const char& diag, const int& n, std::complex* A, const int& lda) +{ + int lwork = 0; + CHECK_CUSOLVER(cusolverDnCtrtri_bufferSize(cusolver_handle, cublas_fill_mode(uplo), cublas_diag_type(diag), n, reinterpret_cast(A), lda, &lwork)); + cuComplex* d_work = nullptr; + CHECK_CUDA(cudaMalloc((void**)&d_work, lwork * sizeof(cuComplex))); + int* d_info = nullptr; + CHECK_CUDA(cudaMalloc((void**)&d_info, sizeof(int))); + CHECK_CUSOLVER(cusolverDnCtrtri(cusolver_handle, cublas_fill_mode(uplo), cublas_diag_type(diag), n, reinterpret_cast(A), lda, d_work, lwork, d_info)); + CHECK_CUDA(cudaFree(d_work)); + CHECK_CUDA(cudaFree(d_info)); +} +static inline void trtri(cusolverDnHandle_t& cusolver_handle, const char& uplo, const char& diag, const int& n, std::complex* A, const int& lda) +{ + int lwork = 0; + CHECK_CUSOLVER(cusolverDnZtrtri_bufferSize(cusolver_handle, cublas_fill_mode(uplo), cublas_diag_type(diag), n, reinterpret_cast(A), lda, &lwork)); + cuDoubleComplex* d_work = nullptr; + CHECK_CUDA(cudaMalloc((void**)&d_work, lwork * sizeof(cuDoubleComplex))); + int* d_info = nullptr; + CHECK_CUDA(cudaMalloc((void**)&d_info, sizeof(int))); + CHECK_CUSOLVER(cusolverDnZtrtri(cusolver_handle, cublas_fill_mode(uplo), cublas_diag_type(diag), n, reinterpret_cast(A), lda, d_work, lwork, d_info)); + CHECK_CUDA(cudaFree(d_work)); + CHECK_CUDA(cudaFree(d_info)); +} +#endif static inline void potri (cusolverDnHandle_t& cusolver_handle, const char& uplo, const char& diag, const int& n, float * A, const int& lda) diff --git a/source/source_base/module_device/device_check.h b/source/source_base/module_device/device_check.h index f649676001a..b009a6cc69a 100644 --- a/source/source_base/module_device/device_check.h +++ b/source/source_base/module_device/device_check.h @@ -67,6 +67,7 @@ static const char* _cusolverGetErrorString(cusolverStatus_t error) return "CUSOLVER_STATUS_ZERO_PIVOT"; case CUSOLVER_STATUS_INVALID_LICENSE: return "CUSOLVER_STATUS_INVALID_LICENSE"; +#if CUDA_VERSION >= 11000 case CUSOLVER_STATUS_IRS_PARAMS_NOT_INITIALIZED: return "CUSOLVER_STATUS_IRS_PARAMS_NOT_INITIALIZED"; case CUSOLVER_STATUS_IRS_PARAMS_INVALID: @@ -93,6 +94,7 @@ static const char* _cusolverGetErrorString(cusolverStatus_t error) return "CUSOLVER_STATUS_IRS_MATRIX_SINGULAR"; case CUSOLVER_STATUS_INVALID_WORKSPACE: return "CUSOLVER_STATUS_INVALID_WORKSPACE"; +#endif default: return ""; } diff --git a/source/source_base/tool_quit.cpp b/source/source_base/tool_quit.cpp index 65297226eea..d49c8e52250 100644 --- a/source/source_base/tool_quit.cpp +++ b/source/source_base/tool_quit.cpp @@ -133,7 +133,7 @@ void WARNING_QUIT(const std::string &file,const std::string &description,int ret void CHECK_WARNING_QUIT(const bool error_in, const std::string &file,const std::string &calculation,const std::string &description) { #ifdef __NORMAL - if(error_in) std::cout << description << std::endl; +// only for UT, do nothing here #else if(error_in) { diff --git a/source/source_basis/module_pw/pw_basis_k.cpp b/source/source_basis/module_pw/pw_basis_k.cpp index 727c0d03ba3..0f997d3180a 100644 --- a/source/source_basis/module_pw/pw_basis_k.cpp +++ b/source/source_basis/module_pw/pw_basis_k.cpp @@ -145,19 +145,10 @@ void PW_Basis_K::setupIndGk() } } this->npwk[ik] = ng; - int ng_global_k = ng; -#ifdef __MPI - MPI_Allreduce(MPI_IN_PLACE, &ng_global_k, 1, MPI_INT, MPI_SUM, this->pool_world); -#endif - const char* no_pw_message = "Current core has no plane waves! Please reduce the cores."; - if (ng_global_k == 0) - { - no_pw_message = "No plane waves are available for this k-point across the whole pool. Please increase ecutwfc or check KPT settings."; - } ModuleBase::CHECK_WARNING_QUIT((ng == 0), "pw_basis_k.cpp", PARAM.inp.calculation, - no_pw_message); + "Current core has no plane waves! Please reduce the cores."); if (this->npwk_max < ng) { this->npwk_max = ng; diff --git a/source/source_basis/module_pw/pw_distributeg.cpp b/source/source_basis/module_pw/pw_distributeg.cpp index ea026e88d41..317d6ad863b 100644 --- a/source/source_basis/module_pw/pw_distributeg.cpp +++ b/source/source_basis/module_pw/pw_distributeg.cpp @@ -25,9 +25,8 @@ void PW_Basis::distribute_g() { ModuleBase::WARNING_QUIT("divide", "No such division type."); } - const char* no_pw_message = "Current core has no plane waves! Please reduce the cores."; ModuleBase::CHECK_WARNING_QUIT((this->npw == 0), "pw_distributeg.cpp", PARAM.inp.calculation, - no_pw_message); + "Current core has no plane waves! Please reduce the cores."); ModuleBase::timer::end(this->classname, "distributeg"); return; } diff --git a/source/source_basis/module_pw/test/test-other.cpp b/source/source_basis/module_pw/test/test-other.cpp index b81787f16d8..c367cc459c0 100644 --- a/source/source_basis/module_pw/test/test-other.cpp +++ b/source/source_basis/module_pw/test/test-other.cpp @@ -139,66 +139,4 @@ TEST_F(PWTEST,test_other) #ifdef __ENABLE_FLOAT_FFTW fftwf_cleanup(); #endif -} - -TEST_F(PWTEST, test_no_plane_wave_message_global_empty_k) -{ - ModulePW::PW_Basis_K pwktest(device_flag, precision_flag); - ModuleBase::Matrix3 latvec(0.2, 0, 0, 0, 1, 0, 0, 0, 1); -#ifdef __MPI - pwktest.initmpi(nproc_in_pool, rank_in_pool, POOL_WORLD); -#endif - const int nks = 1; - ModuleBase::Vector3 kvec_d[nks]; - kvec_d[0].set(0.5, 0.5, 0.5); - - pwktest.initgrids(2, latvec, 4, 4, 4); - pwktest.initparameters(true, 1e-4, nks, kvec_d); - testing::internal::CaptureStdout(); - pwktest.setuptransform(); - std::string output = testing::internal::GetCapturedStdout(); - - EXPECT_THAT(output, - testing::HasSubstr("No plane waves are available for this k-point across the whole pool. Please increase ecutwfc or check KPT settings.")); -} - -TEST_F(PWTEST, test_no_plane_wave_message_parallel_local_empty) -{ -#ifndef __MPI - GTEST_SKIP() << "Requires MPI ranks to simulate local-empty but global-nonempty case."; -#else - if (nproc_in_pool <= 1) - { - GTEST_SKIP() << "Requires more than one MPI rank."; - } - - ModulePW::PW_Basis_K pwktest(device_flag, precision_flag); - ModuleBase::Matrix3 latvec(0.2, 0, 0, 0, 1, 0, 0, 0, 1); - pwktest.initmpi(nproc_in_pool, rank_in_pool, POOL_WORLD); - - const int nks = 1; - ModuleBase::Vector3 kvec_d[nks]; - kvec_d[0].set(0.0, 0.0, 0.0); - - pwktest.initgrids(2, latvec, 4, 4, 4); - pwktest.initparameters(true, 8.0, nks, kvec_d); - testing::internal::CaptureStdout(); - pwktest.setuptransform(); - std::string output = testing::internal::GetCapturedStdout(); - - const int local_npwk = pwktest.npwk[0]; - int global_npwk = local_npwk; - MPI_Allreduce(MPI_IN_PLACE, &global_npwk, 1, MPI_INT, MPI_SUM, POOL_WORLD); - - const int local_target_rank = (local_npwk == 0 && global_npwk > 0) ? 1 : 0; - int any_target_rank = local_target_rank; - MPI_Allreduce(MPI_IN_PLACE, &any_target_rank, 1, MPI_INT, MPI_MAX, POOL_WORLD); - EXPECT_EQ(any_target_rank, 1); - - if (local_target_rank == 1) - { - EXPECT_THAT(output, - testing::HasSubstr("Current core has no plane waves! Please reduce the cores.")); - } -#endif } \ No newline at end of file diff --git a/source/source_esolver/esolver_ks_lcao.cpp b/source/source_esolver/esolver_ks_lcao.cpp index dd6201bd0b7..c65afdd03f5 100644 --- a/source/source_esolver/esolver_ks_lcao.cpp +++ b/source/source_esolver/esolver_ks_lcao.cpp @@ -396,7 +396,38 @@ void ESolver_KS_LCAO::hamilt2rho_single(UnitCell& ucell, int istep, int bool skip_charge = PARAM.inp.calculation == "nscf" ? true : false; // 2) run the inner lambda loop to contrain atomic moments with the DeltaSpin method - bool skip_solve = run_deltaspin_lambda_loop_lcao(iter - 1, this->drho, PARAM.inp); + bool skip_solve = false; + if (PARAM.inp.sc_mag_switch) + { + spinconstrain::SpinConstrain& sc = spinconstrain::SpinConstrain::getScInstance(); + if (!sc.mag_converged() && this->drho > 0 && this->drho < PARAM.inp.sc_scf_thr) + { + // optimize lambda to get target magnetic moments, but the lambda is not near target + if (PARAM.inp.nspin == 2) + { + sc.run_lambda_loop_lcao(iter - 1); + } + else + { + sc.run_lambda_loop(iter - 1); + } + sc.set_mag_converged(true); + skip_solve = true; + } + else if (sc.mag_converged()) + { + // optimize lambda to get target magnetic moments, but the lambda is not near target + if (PARAM.inp.nspin == 2) + { + sc.run_lambda_loop_lcao(iter - 1); + } + else + { + sc.run_lambda_loop(iter - 1); + } + skip_solve = true; + } + } // 3) run Hsolver if (!skip_solve) diff --git a/source/source_esolver/esolver_ks_lcao_tddft.cpp b/source/source_esolver/esolver_ks_lcao_tddft.cpp index 130fc94139f..361e14caad5 100644 --- a/source/source_esolver/esolver_ks_lcao_tddft.cpp +++ b/source/source_esolver/esolver_ks_lcao_tddft.cpp @@ -54,12 +54,6 @@ ESolver_KS_LCAO_TDDFT::~ESolver_KS_LCAO_TDDFT() delete td_p; } TD_info::td_vel_op = nullptr; - - if (td_mg_ != nullptr) - { - delete td_mg_; - td_mg_ = nullptr; - } } template @@ -100,16 +94,6 @@ void ESolver_KS_LCAO_TDDFT::runner(UnitCell& ucell, const int istep) // 1) before_scf (electronic iteration loops) //---------------------------------------------------------------- this->before_scf(ucell, istep); // From ESolver_KS_LCAO - - // Initialize the moving spatial gauge - if (use_td_moving_gauge && this->td_mg_ == nullptr) - { - this->td_mg_ = new module_rt::TD_MovingGauge(); - auto* hamilt_lcao = dynamic_cast, TR>*>(this->p_hamilt); - const hamilt::HContainer* sR_template = hamilt_lcao->getSR(); - this->td_mg_->init_DR(sR_template, &ucell, &this->pv, this->two_center_bundle_.overlap_orb.get()); - } - if (PARAM.inp.td_stype == 2) { this->dmat.dm->cal_DMR_td(ucell, TD_info::cart_At); @@ -258,14 +242,6 @@ void ESolver_KS_LCAO_TDDFT::hamilt2rho_single(UnitCell& ucell, const int iter, const double ethr) { - // Update the moving spatial gauge - if (use_td_moving_gauge) - { - auto* hamilt_lcao = dynamic_cast, TR>*>(this->p_hamilt); - const hamilt::HContainer* sR_template = hamilt_lcao->getSR(); - this->td_mg_->update_DR(sR_template, &ucell, &this->pv, this->two_center_bundle_.overlap_orb.get()); - } - if (PARAM.inp.init_wfc == "file") { if (istep >= TD_info::estep_shift + 1) @@ -285,11 +261,7 @@ void ESolver_KS_LCAO_TDDFT::hamilt2rho_single(UnitCell& ucell, GlobalV::ofs_running, PARAM.inp.propagator, use_tensor, - use_lapack, - this->td_mg_, - &ucell, - this->kv.kvec_d, - use_td_moving_gauge); + use_lapack); } this->weight_dm_rho(ucell); } @@ -309,11 +281,7 @@ void ESolver_KS_LCAO_TDDFT::hamilt2rho_single(UnitCell& ucell, GlobalV::ofs_running, PARAM.inp.propagator, use_tensor, - use_lapack, - this->td_mg_, - &ucell, - this->kv.kvec_d, - use_td_moving_gauge); + use_lapack); this->weight_dm_rho(ucell); } else diff --git a/source/source_esolver/esolver_ks_lcao_tddft.h b/source/source_esolver/esolver_ks_lcao_tddft.h index b4227a9ab7d..f534b303f44 100644 --- a/source/source_esolver/esolver_ks_lcao_tddft.h +++ b/source/source_esolver/esolver_ks_lcao_tddft.h @@ -7,7 +7,6 @@ #include "source_lcao/module_rt/gather_mat.h" // MPI gathering and distributing functions #include "source_lcao/module_rt/kernels/cublasmp_context.h" #include "source_lcao/module_rt/td_info.h" -#include "source_lcao/module_rt/td_moving_gauge.h" #include "source_lcao/module_rt/velocity_op.h" namespace ModuleESolver @@ -67,10 +66,6 @@ class ESolver_KS_LCAO_TDDFT : public ESolver_KS_LCAO, TR> TD_info* td_p = nullptr; - //! Moving spatial gauge for Ehrenfest dynamics, to calculate the correction term arising from the movement of basis - bool use_td_moving_gauge = false; - module_rt::TD_MovingGauge* td_mg_ = nullptr; - //! Restart flag bool restart_done = false; diff --git a/source/source_esolver/esolver_ks_pw.cpp b/source/source_esolver/esolver_ks_pw.cpp index 6714821d02f..bf1cb4e6c27 100644 --- a/source/source_esolver/esolver_ks_pw.cpp +++ b/source/source_esolver/esolver_ks_pw.cpp @@ -189,7 +189,7 @@ void ESolver_KS_PW::iter_init(UnitCell& ucell, const int istep, const // update local occupations for DFT+U // should before lambda loop in DeltaSpin - pw::iter_init_dftu_pw(iter, istep, this->dftu, this->stp.template get_psi_t(), this->pelec->wg, ucell, PARAM.inp); + pw::iter_init_dftu_pw(iter, istep, this->dftu, this->stp.template get_psi_t(), this->pelec->wg, ucell, this->p_chgmix); } // Temporary, it should be replaced by hsolver later. diff --git a/source/source_esolver/esolver_sdft_pw.cpp b/source/source_esolver/esolver_sdft_pw.cpp index 02300eb3c58..fbe2c1b24ad 100644 --- a/source/source_esolver/esolver_sdft_pw.cpp +++ b/source/source_esolver/esolver_sdft_pw.cpp @@ -157,8 +157,8 @@ void ESolver_SDFT_PW::hamilt2rho_single(UnitCell& ucell, int istep, i this->p_hamilt_sto, PARAM.inp.calculation, PARAM.inp.basis_type, - PARAM.inp.ks_solver, - PARAM.globalv.use_uspp, + PARAM.inp.ks_solver, + PARAM.globalv.use_uspp, PARAM.inp.nspin, hsolver::DiagoIterAssist::SCF_ITER, hsolver::DiagoIterAssist::PW_DIAG_NMAX, diff --git a/source/source_esolver/lcao_others.cpp b/source/source_esolver/lcao_others.cpp index b3ad0c71499..62aadebe130 100644 --- a/source/source_esolver/lcao_others.cpp +++ b/source/source_esolver/lcao_others.cpp @@ -156,6 +156,7 @@ void ESolver_KS_LCAO::others(UnitCell& ucell, const int istep) PARAM.inp.sccut, PARAM.inp.sc_drop_thr, ucell, + PARAM.inp.sc_direction_only, &(this->pv), PARAM.inp.nspin, this->kv, diff --git a/source/source_estate/elecstate_lcao.h b/source/source_estate/elecstate_lcao.h index bf1f11e1f7e..1e7cafbfa62 100644 --- a/source/source_estate/elecstate_lcao.h +++ b/source/source_estate/elecstate_lcao.h @@ -3,6 +3,8 @@ #include "elecstate.h" #include "source_estate/module_dm/density_matrix.h" +#include "source_basis/module_ao/parallel_orbitals.h" +#include "source_cell/klist.h" #include @@ -26,11 +28,21 @@ class ElecStateLCAO : public ElecState virtual ~ElecStateLCAO() { + if (this->DM != nullptr) + { + delete this->DM; + } } // update charge density for next scf step // void getNewRho() override; + // initial density matrix + void init_DM(const K_Vectors* kv, const Parallel_Orbitals* paraV, const int nspin); + DensityMatrix* get_DM() const + { + return const_cast*>(this->DM); + } static int out_wfc_lcao; static bool need_psi_grid; @@ -48,6 +60,9 @@ class ElecStateLCAO : public ElecState std::vector pexsi_EDM, DensityMatrix* dm); + private: + DensityMatrix* DM = nullptr; + }; template @@ -56,6 +71,17 @@ int ElecStateLCAO::out_wfc_lcao = 0; template bool ElecStateLCAO::need_psi_grid = true; +// init_DM implementation +template +void ElecStateLCAO::init_DM(const K_Vectors* kv, const Parallel_Orbitals* paraV, const int nspin) +{ + if (this->DM != nullptr) + { + delete this->DM; + } + this->DM = new DensityMatrix(paraV, nspin); +} + } // namespace elecstate #endif diff --git a/source/source_estate/module_charge/charge_mixing.cpp b/source/source_estate/module_charge/charge_mixing.cpp index 921d102502c..a91cc1b39fa 100644 --- a/source/source_estate/module_charge/charge_mixing.cpp +++ b/source/source_estate/module_charge/charge_mixing.cpp @@ -257,3 +257,34 @@ bool Charge_Mixing::if_scf_oscillate(const int iteration, const double drho, con return false; } + +void Charge_Mixing::allocate_mixing_uom(int uom_size) +{ + ModuleBase::TITLE("Charge_Mixing", "allocate_mixing_uom"); + ModuleBase::timer::start("Charge_Mixing", "allocate_mixing_uom"); + ModuleBase::timer::end("Charge_Mixing", "allocate_mixing_uom"); + // For nspin=2, uom_size already includes both spin channels + // (eff_pot_pw.size() = pot_index * 2 for nspin=2) + // So uom_fold should always be 1 + this->mixing->init_mixing_data(this->uom_mdata, uom_size, sizeof(double)); + this->uom_mdata.reset(); + ModuleBase::timer::start("Charge_Mixing", "allocate_mixing_uom"); + ModuleBase::timer::end("Charge_Mixing", "allocate_mixing_uom"); + return; +} + +void Charge_Mixing::mix_uom(std::vector& uom_in, std::vector& uom_save_in) +{ + ModuleBase::TITLE("Charge_Mixing", "mix_uom"); + ModuleBase::timer::start("Charge_Mixing", "mix_uom"); + ModuleBase::timer::end("Charge_Mixing", "mix_uom"); + double* uom_value_out = uom_in.data(); + double* uom_value_in = uom_save_in.data(); + // For all nspin cases, uom_array layout is already fully sized + // and mixing operates on the entire array + this->mixing->push_data(this->uom_mdata, uom_value_in, uom_value_out, nullptr, false); + this->mixing->mix_data(this->uom_mdata, uom_value_out); + ModuleBase::timer::start("Charge_Mixing", "mix_uom"); + ModuleBase::timer::end("Charge_Mixing", "mix_uom"); + return; +} diff --git a/source/source_estate/module_charge/charge_mixing.h b/source/source_estate/module_charge/charge_mixing.h index 3152dc5e204..c24a866df91 100644 --- a/source/source_estate/module_charge/charge_mixing.h +++ b/source/source_estate/module_charge/charge_mixing.h @@ -50,6 +50,7 @@ class Charge_Mixing double& tpiba_in); void close_kerker_gg0() { mixing_gg0 = 0.0; mixing_gg0_mag = 0.0; } + void conserve_setting() { mixing_beta = 0.01; mixing_beta_mag = 0.04; } /** * @brief initialize mixing, including constructing mixing and allocating memory for mixing data * @brief this function should be called at eachiterinit() @@ -74,7 +75,20 @@ class Charge_Mixing */ void mix_dmr(elecstate::DensityMatrix* DM); void mix_dmr(elecstate::DensityMatrix, double>* DM); - + + /** + * @brief allocate memory of uom_mdata + * @param uom_size size of DFT+U occupation matrix + */ + void allocate_mixing_uom(int size_uom); + + /** + * @brief DFT+U occupation matrix mixing + * @param uom_in output occupation matrix + * @param uom_save_in input occupation matrix + */ + void mix_uom(std::vector& uom_in, std::vector& uom_save_in); + /** * @brief Get the drho between rho and rho_save, similar for get_dkin * @@ -118,6 +132,7 @@ class Charge_Mixing Base_Mixing::Mixing_Data tau_mdata; ///< Mixing data for kinetic energy density Base_Mixing::Mixing_Data nhat_mdata; ///< Mixing data for compensation density Base_Mixing::Mixing_Data dmr_mdata; ///< Mixing data for real space density matrix + Base_Mixing::Mixing_Data uom_mdata; ///< Mixing data for DFT+U occupation matrix Base_Mixing::Plain_Mixing* mixing_highf = nullptr; ///< The high_frequency part is mixed by plain mixing method. //====================================== diff --git a/source/source_estate/module_charge/chgmixing.cpp b/source/source_estate/module_charge/chgmixing.cpp index 45e5c5b350c..1fd48fac5d3 100644 --- a/source/source_estate/module_charge/chgmixing.cpp +++ b/source/source_estate/module_charge/chgmixing.cpp @@ -128,6 +128,13 @@ void module_charge::chgmixing_ks_pw(const int iter, // scf iteration number { p_chgmix->init_mixing(); p_chgmix->mixing_restart_step = inp.scf_nmax + 1; + if (inp.dft_plus_u && inp.mixing_dftu) + { + // enable mixing_dftu for DFT+U occupation mixing + dftu.enable_mixing(); + // allocate memory for uom_mdata + p_chgmix->allocate_mixing_uom(dftu.get_size_eff_pot_pw()); + } } // For mixing restart @@ -158,9 +165,9 @@ void module_charge::chgmixing_ks_pw(const int iter, // scf iteration number { dftu.uramping_update(); // update U by uramping if uramping > 0.01 std::cout << " U-Ramping! Current U = "; - for (int i = 0; i < dftu.U0.size(); i++) + for (int i = 0; i < dftu.get_num_u_types(); i++) { - std::cout << dftu.U[i] * ModuleBase::Ry_to_eV << " "; + std::cout << dftu.get_hubbard_u(i) * ModuleBase::Ry_to_eV << " "; } std::cout << " eV " << std::endl; } @@ -184,13 +191,18 @@ void module_charge::chgmixing_ks_lcao(const int iter, // scf iteration number p_chgmix->mix_reset(); // init mixing p_chgmix->mixing_restart_step = inp.scf_nmax + 1; p_chgmix->mixing_restart_count = 0; + // enable mixing_dftu for DFT+U occupation mixing + if (inp.dft_plus_u && inp.mixing_dftu) + { + dftu.enable_mixing(); + } // this output will be removed once the feeature is stable if (dftu.uramping > 0.01) { std::cout << " U-Ramping! Current U = "; - for (int i = 0; i < dftu.U0.size(); i++) + for (int i = 0; i < dftu.get_num_u_types(); i++) { - std::cout << dftu.U[i] * ModuleBase::Ry_to_eV << " "; + std::cout << dftu.get_hubbard_u(i) * ModuleBase::Ry_to_eV << " "; } std::cout << " eV " << std::endl; } @@ -207,9 +219,9 @@ void module_charge::chgmixing_ks_lcao(const int iter, // scf iteration number if (dftu.uramping > 0.01) { std::cout << " U-Ramping! Current U = "; - for (int i = 0; i < dftu.U0.size(); i++) + for (int i = 0; i < dftu.get_num_u_types(); i++) { - std::cout << dftu.U[i] * ModuleBase::Ry_to_eV << " "; + std::cout << dftu.get_hubbard_u(i) * ModuleBase::Ry_to_eV << " "; } std::cout << " eV " << std::endl; } diff --git a/source/source_io/module_output/print_info.cpp b/source/source_io/module_output/print_info.cpp index 398cbb49a8f..b76e7631fa9 100644 --- a/source/source_io/module_output/print_info.cpp +++ b/source/source_io/module_output/print_info.cpp @@ -85,7 +85,7 @@ void print_parameters( const bool orbinfo = (inp.basis_type=="lcao" || inp.basis_type=="lcao_in_pw" || (inp.basis_type=="pw" && inp.init_wfc.substr(0, 3) == "nao")); - + if (orbinfo) { std::cout << std::setw(12) << "NBASE"; } std::cout << std::endl; std::cout << " " << std::setw(8) << inp.nspin; @@ -103,8 +103,13 @@ void print_parameters( << std::setw(14) << PARAM.globalv.nthread_per_proc << std::setw(14) << PARAM.globalv.nthread_per_proc*GlobalV::NPROC; + if (orbinfo) { std::cout << std::setw(12) << PARAM.globalv.nlocal; } + std::cout << std::endl; + + + std::cout << " ----------------------------------------------------------------" << std::endl; if(inp.basis_type == "lcao") { @@ -120,13 +125,11 @@ void print_parameters( } std::cout << " ----------------------------------------------------------------" << std::endl; + + //---------------------------------- // second part //---------------------------------- - if (orbinfo) - { - std::cout << " TOTAL NBASE" << " " << PARAM.globalv.nlocal << std::endl; - } std::cout << " " << std::setw(8) << "ELEMENT"; @@ -137,6 +140,7 @@ void print_parameters( } std::cout << std::setw(12) << "NATOM"; + std::cout << std::setw(12) << "XC"; std::cout << std::endl; diff --git a/source/source_io/module_parameter/input_parameter.h b/source/source_io/module_parameter/input_parameter.h index 029ad364eb5..24d5a3efbfa 100644 --- a/source/source_io/module_parameter/input_parameter.h +++ b/source/source_io/module_parameter/input_parameter.h @@ -602,6 +602,8 @@ struct Input_para double sccut = 3.0; ///< restriction of step size in eV/uB double sc_scf_thr = 1e-3; ///< minimum number of outer scf loop before initial lambda loop double sc_drop_thr = 1e-3; ///< threshold for lambda-loop threshold cutoff in spin-constrained DFT + std::string sc_lambda_strategy = "bfgs"; ///< lambda update strategy: bfgs, linear_response, augmented_lagrangian, hybrid_delayed + bool sc_direction_only = false; ///< only optimize the direction of magnetization // ============== #Parameters (18.Quasiatomic Orbital analysis) ========= ///<========================================================== diff --git a/source/source_io/module_parameter/read_input_item_elec_stru.cpp b/source/source_io/module_parameter/read_input_item_elec_stru.cpp index 39f37febc54..0fe7ad35aa8 100644 --- a/source/source_io/module_parameter/read_input_item_elec_stru.cpp +++ b/source/source_io/module_parameter/read_input_item_elec_stru.cpp @@ -831,7 +831,7 @@ Note: If gamma_only is set to 1, the KPT file will be overwritten. So make sure item.annotation = "charge density error"; item.category = "Electronic structure"; item.type = "Real"; - item.description = "It's the density threshold for electronic iteration. It represents the charge density error between two sequential densities from electronic iterations. This criterion is always enabled. If scf_ene_thr is set, the total-energy criterion (scf_ene_thr) is additionally checked only after the first SCF iteration and only when the charge-density criterion (scf_thr) has already been satisfied. For local-orbital calculations, 1e-6 is usually accurate enough."; + item.description = "It's the density threshold for electronic iteration. It represents the charge density error between two sequential densities from electronic iterations. Usually for local orbitals, usually 1e-6 may be accurate enough."; item.default_value = "1.0e-9 (plane-wave basis), or 1.0e-7 (localized atomic orbital basis)."; item.unit = "Ry if scf_thr_type=1, dimensionless if scf_thr_type=2"; item.availability = ""; @@ -865,7 +865,7 @@ Note: If gamma_only is set to 1, the KPT file will be overwritten. So make sure item.annotation = "total energy error threshold"; item.category = "Electronic structure"; item.type = "Real"; - item.description = "It's the energy threshold for electronic iteration. The compared quantity is the total-energy difference evaluated from the charge densities before and after the Hpsi operation in one SCF step. It is not the same as the screen-output EDIFF, which is the energy difference before Hpsi and after charge mixing (i.e., across both Hpsi and charge-mixing operations)."; + item.description = "It's the energy threshold for electronic iteration. It represents the total energy error between two sequential densities from electronic iterations."; item.default_value = "-1.0. If the user does not set this parameter, it will not take effect."; item.unit = "eV"; item.availability = ""; diff --git a/source/source_io/module_parameter/read_input_item_exx_dftu.cpp b/source/source_io/module_parameter/read_input_item_exx_dftu.cpp index 8daa6224b8c..4afec198309 100644 --- a/source/source_io/module_parameter/read_input_item_exx_dftu.cpp +++ b/source/source_io/module_parameter/read_input_item_exx_dftu.cpp @@ -643,9 +643,9 @@ void ReadInput::item_dftu() const Input_para& input = para.input; if (input.dft_plus_u != 0) { - if (input.basis_type == "pw" && input.nspin != 4) + if (input.basis_type == "pw" && input.nspin != 4 && input.nspin != 2 && input.nspin != 1) { - ModuleBase::WARNING_QUIT("ReadInput", "WRONG ARGUMENTS, only nspin2 with PW base is not supported now"); + ModuleBase::WARNING_QUIT("ReadInput", "WRONG ARGUMENTS, DFT+U with PW base only supports nspin=1/2/4"); } } }; diff --git a/source/source_io/module_parameter/read_input_item_other.cpp b/source/source_io/module_parameter/read_input_item_other.cpp index d929b0ee7f5..7df1292daea 100644 --- a/source/source_io/module_parameter/read_input_item_other.cpp +++ b/source/source_io/module_parameter/read_input_item_other.cpp @@ -202,6 +202,43 @@ void ReadInput::item_others() }; this->add_item(item); } + { + Input_Item item("sc_direction_only"); + item.annotation = "only optimize the direction of magnetization"; + item.category = "Spin-Constrained DFT"; + item.type = "Boolean"; + item.description = R"(When true, only the direction of the magnetic moment is constrained to the target direction, while the magnitude is allowed to vary freely. This is useful for studying magnetic anisotropy or when the magnitude of the moment is determined by the electronic structure rather than an external constraint. + +When false (default), both the direction and magnitude of the magnetic moment are constrained to the target values.)"; + item.default_value = "False"; + item.unit = ""; + item.availability = "sc_mag_switch is true"; + read_sync_bool(input.sc_direction_only); + this->add_item(item); + } + { + Input_Item item("sc_lambda_strategy"); + item.annotation = "lambda update strategy for spin-constrained DFT"; + item.category = "Spin-Constrained DFT"; + item.type = "String"; + item.description = R"(Lambda update strategy for spin-constrained DFT: +* bfgs: BFGS quasi-Newton method +* linear_response: linear response (Scheme B) +* augmented_lagrangian: augmented Lagrangian (Scheme C) +* hybrid_delayed: hybrid delayed update (Scheme D))"; + item.default_value = "bfgs"; + item.unit = ""; + item.availability = "sc_mag_switch is true"; + read_sync_string(input.sc_lambda_strategy); + item.check_value = [](const Input_Item& item, const Parameter& para) { + const std::vector valid = {"bfgs", "linear_response", "augmented_lagrangian", "hybrid_delayed"}; + if (std::find(valid.begin(), valid.end(), para.input.sc_lambda_strategy) == valid.end()) + { + ModuleBase::WARNING_QUIT("ReadInput", "sc_lambda_strategy must be bfgs, linear_response, augmented_lagrangian, or hybrid_delayed"); + } + }; + this->add_item(item); + } // Quasiatomic Orbital analysis { diff --git a/source/source_lcao/dftu_lcao.cpp b/source/source_lcao/dftu_lcao.cpp index 5a4c6c45c88..d8b8421d6e7 100644 --- a/source/source_lcao/dftu_lcao.cpp +++ b/source/source_lcao/dftu_lcao.cpp @@ -68,7 +68,7 @@ void finish_dftu_lcao(const int iter, /// use the converged occupation matrix for next MD/Relax SCF calculation if (conv_esolver) { - dftu_ptr->initialed_locale = true; + dftu_ptr->mark_locale_initialized(); } } diff --git a/source/source_lcao/module_deepks/LCAO_deepks.cpp b/source/source_lcao/module_deepks/LCAO_deepks.cpp index 41c7e13fbea..cd64cc9850a 100644 --- a/source/source_lcao/module_deepks/LCAO_deepks.cpp +++ b/source/source_lcao/module_deepks/LCAO_deepks.cpp @@ -1,4 +1,14 @@ +// wenfei 2022-1-5 +// This file contains constructor and destructor of the class LCAO_deepks, #include "source_io/module_parameter/parameter.h" +// as well as subroutines for initializing and releasing relevant data structures + +// Other than the constructor and the destructor, it contains 3 types of subroutines: +// 1. subroutines that are related to calculating descriptors: +// - init : allocates some arrays +// - init_index : records the index (inl) +// 2. subroutines that are related to V_delta: +// - allocate_V_delta : allocates V_delta; if calculating force, it also allocates F_delta #ifdef __MLALGO @@ -47,12 +57,7 @@ void LCAO_Deepks::init(const LCAO_Orbitals& orb, ModuleBase::TITLE("LCAO_Deepks", "init"); ModuleBase::timer::start("LCAO_Deepks", "init"); - ofs << " >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>" << std::endl; - ofs << " | |" << std::endl; - ofs << " | #Initialize DeePKS (LCAO)# |" << std::endl; - ofs << " | Setup machine-Learning-Based DeePKS method based on NAO basis set |" << std::endl; - ofs << " | |" << std::endl; - ofs << " <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<" << std::endl; + ofs << " Initialize the descriptor index for DeePKS (lcao line)" << std::endl; const int lm = orb.get_lmax_d(); const int nm = orb.get_nchimax_d(); @@ -77,8 +82,8 @@ void LCAO_Deepks::init(const LCAO_Orbitals& orb, this->deepks_param.nchi_d_l[l] = orb.Alpha[0].getNchi(l); } - ofs << " Lmax of descriptor " << deepks_param.lmaxd << std::endl; - ofs << " Nmax of descriptor " << deepks_param.nmaxd << std::endl; + ofs << " lmax of descriptor = " << deepks_param.lmaxd << std::endl; + ofs << " nmax of descriptor = " << deepks_param.nmaxd << std::endl; int pdm_size = 0; this->deepks_param.inlmax = tot_inl; @@ -99,7 +104,7 @@ void LCAO_Deepks::init(const LCAO_Orbitals& orb, if (!PARAM.inp.deepks_equiv) { - // ofs << " total basis (all atoms) for descriptor = " << std::endl; + ofs << " total basis (all atoms) for descriptor = " << std::endl; // init pdm for (int inl = 0; inl < this->deepks_param.inlmax; inl++) @@ -117,7 +122,7 @@ void LCAO_Deepks::init(const LCAO_Orbitals& orb, } pdm_size = pdm_size * pdm_size; this->deepks_param.des_per_atom = pdm_size; - ofs << " Equivariant version, pdm matrices size " << pdm_size << std::endl; + ofs << " Equivariant version, size of pdm matrices : " << pdm_size << std::endl; for (int iat = 0; iat < nat; iat++) { this->pdm[iat] = torch::zeros({pdm_size}, torch::kFloat64); @@ -148,7 +153,7 @@ void LCAO_Deepks::init_index(const int ntype, { this->deepks_param.inl_index[it].create(na[it], this->deepks_param.lmaxd + 1, this->deepks_param.nmaxd); - //ofs << " Type " << it + 1 << " number_of_atoms " << na[it] << std::endl; + ofs << " Type " << it + 1 << " number_of_atoms " << na[it] << std::endl; for (int ia = 0; ia < na[it]; ia++) { @@ -165,8 +170,8 @@ void LCAO_Deepks::init_index(const int ntype, } // end ia } // end it assert(Total_nchi == inl); - ofs << " Descriptors per atom " << this->deepks_param.des_per_atom << std::endl; - ofs << " Total Descriptors " << this->deepks_param.n_descriptor << std::endl; + ofs << " descriptors_per_atom " << this->deepks_param.des_per_atom << std::endl; + ofs << " total_descriptors " << this->deepks_param.n_descriptor << std::endl; return; } diff --git a/source/source_lcao/module_deepks/deepks_basic.cpp b/source/source_lcao/module_deepks/deepks_basic.cpp index f15996be682..f0531012cd9 100644 --- a/source/source_lcao/module_deepks/deepks_basic.cpp +++ b/source/source_lcao/module_deepks/deepks_basic.cpp @@ -244,8 +244,16 @@ void DeePKS_domain::cal_edelta_gedm(const int nat, } E_delta = ec[0].item() * 2; // Ry; *2 is for Hartree to Ry + // get d ec[0]/d inputs + // inputs: [1, nat, des_per_atom] // ec: [1, 1] + std::vector tensor_inputs; + tensor_inputs.push_back(inputs[0].toTensor()); ec[0].reshape({1, 1}).requires_grad_(true); + torch::Tensor derivative = torch::autograd::grad(ec, tensor_inputs, {}, true)[0]; + LCAO_deepks_io::save_tensor2npy("gev.npy", + derivative.reshape({nat, deepks_param.des_per_atom}), + 0); // dm_eig.npy is the input for gedm // cal gedm std::vector gedm_shell; diff --git a/source/source_lcao/module_deltaspin/CMakeLists.txt b/source/source_lcao/module_deltaspin/CMakeLists.txt index 6a0c1fea22f..265424ae798 100644 --- a/source/source_lcao/module_deltaspin/CMakeLists.txt +++ b/source/source_lcao/module_deltaspin/CMakeLists.txt @@ -8,6 +8,11 @@ list(APPEND objects cal_mw_from_lambda.cpp template_helpers.cpp deltaspin_lcao.cpp + lambda_update_strategies.cpp + lambda_strategy_integration.cpp + sc_parse_json.cpp + cal_h_lambda.cpp + cal_mw_helper.cpp ) add_library( diff --git a/source/source_lcao/module_deltaspin/basic_funcs.cpp b/source/source_lcao/module_deltaspin/basic_funcs.cpp index 343b2b37a73..83b101de641 100644 --- a/source/source_lcao/module_deltaspin/basic_funcs.cpp +++ b/source/source_lcao/module_deltaspin/basic_funcs.cpp @@ -57,7 +57,7 @@ void scalar_multiply_2d(const std::vector>& array, std::vector>& result) { int size = array.size(); - result.reserve(size); + result.resize(size); for (int i = 0; i < size; i++) { result[i] = scalar * array[i]; @@ -70,7 +70,7 @@ void add_scalar_multiply_2d(const std::vector>& arra std::vector>& result) { int size = array_1.size(); - result.reserve(size); + result.resize(size); for (int i = 0; i < size; i++) { result[i] = array_1[i] + scalar * array_2[i]; @@ -82,7 +82,7 @@ void subtract_2d(const std::vector>& array_1, std::vector>& result) { int size = array_1.size(); - result.reserve(size); + result.resize(size); for (int i = 0; i < size; i++) { result[i] = array_1[i] - array_2[i]; diff --git a/source/source_lcao/module_deltaspin/basic_funcs.h b/source/source_lcao/module_deltaspin/basic_funcs.h index b1de060c4bb..e0f17475949 100644 --- a/source/source_lcao/module_deltaspin/basic_funcs.h +++ b/source/source_lcao/module_deltaspin/basic_funcs.h @@ -2,6 +2,7 @@ #define BASIC_FUNCS_H #include +#include #include #include diff --git a/source/source_lcao/module_deltaspin/cal_h_lambda.cpp b/source/source_lcao/module_deltaspin/cal_h_lambda.cpp new file mode 100644 index 00000000000..33f73b305f1 --- /dev/null +++ b/source/source_lcao/module_deltaspin/cal_h_lambda.cpp @@ -0,0 +1,108 @@ +#ifdef __LCAO +#include "spin_constrain.h" +#include "source_base/timer.h" +#include "source_base/tool_title.h" +#include "source_base/global_function.h" +#include + +template <> +void spinconstrain::SpinConstrain>::cal_h_lambda( + std::complex* h_lambda, + const std::complex* Sloc2, + bool column_major, + int isk) +{ + ModuleBase::TITLE("SpinConstrain","cal_h_lambda"); + ModuleBase::timer::start("SpinConstrain", "cal_h_lambda"); + const Parallel_Orbitals* pv = this->ParaV; + for (const auto& sc_elem1 : this->get_atomCounts()) + { + int it1 = sc_elem1.first; + int nat_it1 = sc_elem1.second; + int nw_it1 = this->get_orbitalCounts().at(it1); + for (int ia1 = 0; ia1 < nat_it1; ia1++) + { + int iat1 = this->get_iat(it1, ia1); + for (int iw1 = 0; iw1 < nw_it1*this->npol_; iw1++) + { + int iwt1 = this->get_iwt(it1, ia1, iw1); + const int mu = pv->global2local_row(iwt1); + if (mu < 0) continue; + for (const auto& sc_elem2 : this->get_atomCounts()) + { + int it2 = sc_elem2.first; + int nat_it2 = sc_elem2.second; + int nw_it2 = this->get_orbitalCounts().at(it2); + for (int ia2 = 0; ia2 < nat_it2; ia2++) + { + int iat2 = this->get_iat(it2, ia2); + for (int iw2 = 0; iw2 < nw_it2*this->npol_; iw2++) + { + int iwt2 = this->get_iwt(it2, ia2, iw2); + const int nu = pv->global2local_col(iwt2); + if (nu < 0) continue; + int icc; + ModuleBase::Vector3 lambda = (this->lambda_[iat1] + this->lambda_[iat2]) / 2.0; + if (column_major) + { + icc = mu + nu * pv->nrow; + if (this->nspin_ == 2) + { + h_lambda[icc] = (isk == 0) ? -Sloc2[icc] * lambda[2] : -Sloc2[icc] * (-lambda[2]); + } + else if (this->nspin_ == 4) + { + if (iwt1 % 2 == 0) + { + h_lambda[icc] + = (iwt2 % 2 == 0) + ? -Sloc2[icc] * lambda[2] + : -Sloc2[icc + pv->nrow] + * (lambda[0] + lambda[1] * std::complex(0, 1)); + } + else + { + h_lambda[icc] = (iwt2 % 2 == 0) + ? -Sloc2[icc + 1] + * (lambda[0] - lambda[1] * std::complex(0, 1)) + : -Sloc2[icc + 1 + pv->nrow] * (-lambda[2]); + } + } + } + else + { + icc = mu * pv->ncol + nu; + if (this->nspin_ == 2) + { + h_lambda[icc] = (isk == 0) ? -Sloc2[icc] * lambda[2] : -Sloc2[icc] * (-lambda[2]); + } + else if (this->nspin_ == 4) + { + if (iwt1 % 2 == 0) + { + h_lambda[icc] + = (iwt2 % 2 == 0) + ? -Sloc2[icc] * lambda[2] + : -Sloc2[icc + 1] + * (lambda[0] + lambda[1] * std::complex(0, 1)); + } + else + { + h_lambda[icc] = (iwt2 % 2 == 0) + ? -Sloc2[icc + pv->ncol] + * (lambda[0] - lambda[1] * std::complex(0, 1)) + : -Sloc2[icc + 1 + pv->ncol] * (-lambda[2]); + } + } + } + } + } + } + } + } + } + ModuleBase::timer::start("SpinConstrain", "cal_h_lambda"); + return; +} + +#endif diff --git a/source/source_lcao/module_deltaspin/cal_mw.cpp b/source/source_lcao/module_deltaspin/cal_mw.cpp index 0482f8f0709..0952835aef2 100644 --- a/source/source_lcao/module_deltaspin/cal_mw.cpp +++ b/source/source_lcao/module_deltaspin/cal_mw.cpp @@ -21,7 +21,7 @@ void spinconstrain::SpinConstrain>::cal_mi_lcao(const int& this->zero_Mi(); const hamilt::HContainer* dmr = this->dm_->get_DMR_pointer(1); std::vector moments; - if(PARAM.inp.nspin==2) + if(this->nspin_==2) { this->dm_->switch_dmr(2); @@ -36,7 +36,7 @@ void spinconstrain::SpinConstrain>::cal_mi_lcao(const int& this->Mi_[iat].z = moments[iat]; } } - else if(PARAM.inp.nspin==4) + else if(this->nspin_==4) { moments = static_cast, std::complex>>*>(this->p_operator)->cal_moment(dmr, this->get_constrain()); for(int iat=0;iatMi_.size();iat++) @@ -76,32 +76,9 @@ void spinconstrain::SpinConstrain>::cal_mi_pw() // std::cout << __FILE__ << ":" << __LINE__ << " nbands = " << nbands << std::endl; onsite_p->overlap_proj_psi(nbands * npol, psi_pointer); const std::complex* becp = onsite_p->get_h_becp(); - // becp(nbands*npol , nkb) - // mag = wg * \sum_{nh}becp * becp int nkb = onsite_p->get_tot_nproj(); - for(int ib = 0;ibpelec->wg(ik, ib); - int begin_ih = 0; - for(int iat = 0; iat < this->Mi_.size(); iat++) - { - std::complex occ[4] = {ModuleBase::ZERO, ModuleBase::ZERO, ModuleBase::ZERO, ModuleBase::ZERO}; - const int nh = onsite_p->get_nh(iat); - for(int ih = 0; ih < nh; ih++) - { - const int index = ib*2*nkb + begin_ih + ih; - occ[0] += conj(becp[index]) * becp[index]; - occ[1] += conj(becp[index]) * becp[index + nkb]; - occ[2] += conj(becp[index + nkb]) * becp[index]; - occ[3] += conj(becp[index + nkb]) * becp[index + nkb]; - } - // occ has been reduced and calculate mag - this->Mi_[iat].z += weight * (occ[0] - occ[3]).real(); - this->Mi_[iat].x += weight * (occ[1] + occ[2]).real(); - this->Mi_[iat].y += weight * (occ[1] - occ[2]).imag(); - begin_ih += nh; - } - } + this->accumulate_Mi_from_becp(becp, nkb, nbands, npol, ik, + &this->pelec->wg(ik, 0), &onsite_p->get_nh(0)); } } #if ((defined __CUDA) || (defined __ROCM)) @@ -122,37 +99,14 @@ void spinconstrain::SpinConstrain>::cal_mi_pw() // std::cout << __FILE__ << ":" << __LINE__ << " nbands = " << nbands << std::endl; onsite_p->overlap_proj_psi(nbands * npol, psi_pointer); const std::complex* becp = onsite_p->get_h_becp(); - // becp(nbands*npol , nkb) - // mag = wg * \sum_{nh}becp * becp int nkb = onsite_p->get_size_becp() / nbands / npol; - for(int ib = 0;ibpelec->wg(ik, ib); - int begin_ih = 0; - for(int iat = 0; iat < this->Mi_.size(); iat++) - { - std::complex occ[4] = {ModuleBase::ZERO, ModuleBase::ZERO, ModuleBase::ZERO, ModuleBase::ZERO}; - const int nh = onsite_p->get_nh(iat); - for(int ih = 0; ih < nh; ih++) - { - const int index = ib*2*nkb + begin_ih + ih; - occ[0] += conj(becp[index]) * becp[index]; - occ[1] += conj(becp[index]) * becp[index + nkb]; - occ[2] += conj(becp[index + nkb]) * becp[index]; - occ[3] += conj(becp[index + nkb]) * becp[index + nkb]; - } - // occ has been reduced and calculate mag - this->Mi_[iat].z += weight * (occ[0] - occ[3]).real(); - this->Mi_[iat].x += weight * (occ[1] + occ[2]).real(); - this->Mi_[iat].y += weight * (occ[1] - occ[2]).imag(); - begin_ih += nh; - } - } + this->accumulate_Mi_from_becp(becp, nkb, nbands, npol, ik, + &this->pelec->wg(ik, 0), &onsite_p->get_nh(0)); } } #endif // reduce mag from all k-pools - Parallel_Reduce::reduce_double_allpool(PARAM.inp.kpar, GlobalV::NPROC_IN_POOL, &(this->Mi_[0][0]), 3 * this->Mi_.size()); + Parallel_Reduce::reduce_double_allpool(PARAM.inp.kpar, PARAM.globalv.nproc_in_pool, &(this->Mi_[0][0]), 3 * this->Mi_.size()); ModuleBase::timer::end("spinconstrain::SpinConstrain", "cal_mi_pw"); } diff --git a/source/source_lcao/module_deltaspin/cal_mw_from_lambda.cpp b/source/source_lcao/module_deltaspin/cal_mw_from_lambda.cpp index 92794fbee27..b335227dd1a 100644 --- a/source/source_lcao/module_deltaspin/cal_mw_from_lambda.cpp +++ b/source/source_lcao/module_deltaspin/cal_mw_from_lambda.cpp @@ -19,16 +19,17 @@ #endif template <> -void spinconstrain::SpinConstrain>::calculate_delta_hcc(std::complex* h_tmp, const std::complex* becp_k, const ModuleBase::Vector3* delta_lambda, const int nbands, const int nkb, const int* nh_iat) +void spinconstrain::SpinConstrain>::calculate_delta_hcc(std::complex* h_tmp, const std::complex* becp_k, const ModuleBase::Vector3* delta_lambda, const int nbands, const int nkb, const int* nh_iat, const int ik) { + ModuleBase::TITLE("spinconstrain::SpinConstrain", "calculate_delta_hcc"); + ModuleBase::timer::start("spinconstrain::SpinConstrain", "calculate_delta_hcc"); + int sum = 0; - int size_ps = nkb * 2 * nbands; + int size_ps = nkb * this->npol_ * nbands; std::complex* becp_cpu = nullptr; if(PARAM.inp.device == "gpu") { #if ((defined __CUDA) || (defined __ROCM)) - base_device::DEVICE_GPU* ctx = {}; - base_device::DEVICE_CPU* cpu_ctx = {}; base_device::memory::resize_memory_op, base_device::DEVICE_CPU>()(becp_cpu, size_ps); base_device::memory::synchronize_memory_op, base_device::DEVICE_CPU, base_device::DEVICE_GPU>()(becp_cpu, becp_k, size_ps); #endif @@ -38,38 +39,58 @@ void spinconstrain::SpinConstrain>::calculate_delta_hcc(std becp_cpu = const_cast*>(becp_k); } + // Compute modified projector coefficients std::vector> ps(size_ps, 0.0); - for (int iat = 0; iat < this->Mi_.size(); iat++) + if(this->npol_ == 2) { - const int nproj = nh_iat[iat]; - const std::complex coefficients0(delta_lambda[iat][2], 0.0); - const std::complex coefficients1(delta_lambda[iat][0] , delta_lambda[iat][1]); - const std::complex coefficients2(delta_lambda[iat][0] , -1 * delta_lambda[iat][1]); - const std::complex coefficients3(-1 * delta_lambda[iat][2], 0.0); - // each atom has nproj, means this is with structure factor; - // each projector (each atom) must multiply coefficient - // with all the other projectors. - for (int ib = 0; ib < nbands * 2; ib+=2) + // nspin=4: full Pauli matrix treatment + for (int iat = 0; iat < this->Mi_.size(); iat++) { - for (int ip = 0; ip < nproj; ip++) + const int nproj = nh_iat[iat]; + const std::complex coefficients0(delta_lambda[iat][2], 0.0); + const std::complex coefficients1(delta_lambda[iat][0] , delta_lambda[iat][1]); + const std::complex coefficients2(delta_lambda[iat][0] , -1 * delta_lambda[iat][1]); + const std::complex coefficients3(-1 * delta_lambda[iat][2], 0.0); + for (int ib = 0; ib < nbands * this->npol_; ib += this->npol_) { - const int becpind = ib * nkb + sum + ip; - const std::complex becp1 = becp_cpu[becpind]; - const std::complex becp2 = becp_cpu[becpind + nkb]; - ps[becpind] += coefficients0 * becp1 - + coefficients2 * becp2; - ps[becpind + nkb] += coefficients1 * becp1 - + coefficients3 * becp2; - } // end ip - } // end ib - sum += nproj; - } // end iat + for (int ip = 0; ip < nproj; ip++) + { + const int becpind = ib * nkb + sum + ip; + const std::complex becp1 = becp_cpu[becpind]; + const std::complex becp2 = becp_cpu[becpind + nkb]; + ps[becpind] += coefficients0 * becp1 + + coefficients2 * becp2; + ps[becpind + nkb] += coefficients1 * becp1 + + coefficients3 * becp2; + } + } + sum += nproj; + } + } + else if(this->npol_ == 1) + { + // nspin=2: only z-component (spin collinear) + for (int iat = 0; iat < this->Mi_.size(); iat++) + { + const int nproj = nh_iat[iat]; + double coefficients0 = delta_lambda[iat][2] * this->get_spin_sign(ik); + for (int ib = 0; ib < nbands; ib++) + { + for (int ip = 0; ip < nproj; ip++) + { + const int becpind = ib * nkb + sum + ip; + const std::complex becp1 = becp_cpu[becpind]; + ps[becpind] += coefficients0 * becp1; + } + } + sum += nproj; + } + } + std::complex* ps_pointer = nullptr; if(PARAM.inp.device == "gpu") { #if ((defined __CUDA) || (defined __ROCM)) - base_device::DEVICE_GPU* ctx = {}; - base_device::DEVICE_CPU* cpu_ctx = {}; base_device::memory::resize_memory_op, base_device::DEVICE_GPU>()(ps_pointer, size_ps); base_device::memory::synchronize_memory_op, base_device::DEVICE_GPU, base_device::DEVICE_CPU>()(ps_pointer, ps.data(), size_ps); #endif @@ -78,14 +99,14 @@ void spinconstrain::SpinConstrain>::calculate_delta_hcc(std { ps_pointer = ps.data(); } - // update h_tmp by becp_k * ps + + // update h_tmp by becp_k * ps: H += becp^† * ps char transa = 'C'; char transb = 'N'; - const int npm = nkb * 2; + const int npm = nkb * this->npol_; if (PARAM.inp.device == "gpu") { #if ((defined __CUDA) || (defined __ROCM)) - base_device::DEVICE_GPU* ctx = {}; ModuleBase::gemm_op, base_device::DEVICE_GPU>()( transa, transb, @@ -102,13 +123,12 @@ void spinconstrain::SpinConstrain>::calculate_delta_hcc(std nbands ); base_device::memory::delete_memory_op, base_device::DEVICE_GPU>()(ps_pointer); - delete[] becp_cpu; + base_device::memory::delete_memory_op, base_device::DEVICE_CPU>()(becp_cpu); #endif } else if (PARAM.inp.device == "cpu") { - base_device::DEVICE_CPU* ctx = {}; ModuleBase::gemm_op, base_device::DEVICE_CPU>()( transa, transb, @@ -125,8 +145,135 @@ void spinconstrain::SpinConstrain>::calculate_delta_hcc(std nbands ); } + ModuleBase::timer::end("spinconstrain::SpinConstrain", "calculate_delta_hcc"); } +template <> +void spinconstrain::SpinConstrain>::update_psi_charge_pw_cpu(const ModuleBase::Vector3* delta_lambda, bool pw_solve) +{ + ModuleBase::TITLE("spinconstrain::SpinConstrain", "update_psi_charge_pw_cpu"); + ModuleBase::timer::start("spinconstrain::SpinConstrain", "update_psi_charge_pw_cpu"); + + psi::Psi>* psi_t = static_cast>*>(this->psi); + hamilt::Hamilt, base_device::DEVICE_CPU>* hamilt_t = static_cast, base_device::DEVICE_CPU>*>(this->p_hamilt); + auto* onsite_p = projectors::OnsiteProjector::get_instance(); + + int nbands = psi_t->get_nbands(); + int npol = psi_t->get_npol(); + int nkb = onsite_p->get_tot_nproj(); + int nk = psi_t->get_nk(); + int size_becp = nbands * nkb * npol; + const int* nh_iat = &onsite_p->get_nh(0); + + std::vector> h_tmp(nbands * nbands), s_tmp(nbands * nbands); + + assert(this->sub_h_save != nullptr); + assert(this->sub_s_save != nullptr); + assert(this->becp_save != nullptr); + + for (int ik = 0; ik < nk; ++ik) + { + std::complex* h_k = this->sub_h_save + ik * nbands * nbands; + std::complex* s_k = this->sub_s_save + ik * nbands * nbands; + std::complex* becp_k = this->becp_save + ik * size_becp; + + psi_t->fix_k(ik); + + memcpy(h_tmp.data(), h_k, sizeof(std::complex) * nbands * nbands); + memcpy(s_tmp.data(), s_k, sizeof(std::complex) * nbands * nbands); + + // Apply DeltaSpin correction: H' = H_k + delta_H(lambda) + this->calculate_delta_hcc(h_tmp.data(), becp_k, delta_lambda, nbands, nkb, nh_iat, ik); + + // Diagonalize in subspace to update wavefunction + hsolver::DiagoIterAssist>::diag_subspace_psi(h_tmp.data(), + s_tmp.data(), + nbands, + psi_t[0], + &this->pelec->ekb(ik, 0)); + } + + // Clean up saved subspace data + delete[] this->sub_h_save; + delete[] this->sub_s_save; + delete[] this->becp_save; + this->sub_h_save = nullptr; + this->sub_s_save = nullptr; + this->becp_save = nullptr; + + // Subspace diagonalization already includes DeltaSpin correction via calculate_delta_hcc. + // For the PW case, the full-space HSolverPW does NOT include the DeltaSpin correction + // (it only exists in the subspace), so calling HSolverPW::solve would overwrite the + // corrected psi with an uncorrected one, causing density explosion. Always use psiToRho. + reinterpret_cast, base_device::DEVICE_CPU>*>(this->pelec)->psiToRho(*psi_t); + ModuleBase::timer::end("spinconstrain::SpinConstrain", "update_psi_charge_pw_cpu"); +} + +#if ((defined __CUDA) || (defined __ROCM)) +template <> +void spinconstrain::SpinConstrain>::update_psi_charge_pw_gpu(const ModuleBase::Vector3* delta_lambda, bool pw_solve) +{ + ModuleBase::TITLE("spinconstrain::SpinConstrain", "update_psi_charge_pw_gpu"); + ModuleBase::timer::start("spinconstrain::SpinConstrain", "update_psi_charge_pw_gpu"); + + psi::Psi, base_device::DEVICE_GPU>* psi_t = static_cast, base_device::DEVICE_GPU>*>(this->psi); + hamilt::Hamilt, base_device::DEVICE_GPU>* hamilt_t = static_cast, base_device::DEVICE_GPU>*>(this->p_hamilt); + auto* onsite_p = projectors::OnsiteProjector::get_instance(); + + int nbands = psi_t->get_nbands(); + int npol = psi_t->get_npol(); + int nkb = onsite_p->get_tot_nproj(); + int nk = psi_t->get_nk(); + int size_becp = nbands * nkb * npol; + const int* nh_iat = &onsite_p->get_nh(0); + + std::complex* h_tmp = nullptr; + std::complex* s_tmp = nullptr; + base_device::memory::resize_memory_op, base_device::DEVICE_GPU>()(h_tmp, nbands * nbands); + base_device::memory::resize_memory_op, base_device::DEVICE_GPU>()(s_tmp, nbands * nbands); + + assert(this->sub_h_save != nullptr); + assert(this->sub_s_save != nullptr); + assert(this->becp_save != nullptr); + + for (int ik = 0; ik < nk; ++ik) + { + std::complex* h_k = this->sub_h_save + ik * nbands * nbands; + std::complex* s_k = this->sub_s_save + ik * nbands * nbands; + std::complex* becp_k = this->becp_save + ik * size_becp; + + psi_t->fix_k(ik); + + base_device::memory::synchronize_memory_op, base_device::DEVICE_GPU, base_device::DEVICE_GPU>()(h_tmp, h_k, nbands * nbands); + base_device::memory::synchronize_memory_op, base_device::DEVICE_GPU, base_device::DEVICE_GPU>()(s_tmp, s_k, nbands * nbands); + + // Apply DeltaSpin correction: H' = H_k + delta_H(lambda) + this->calculate_delta_hcc(h_tmp, becp_k, delta_lambda, nbands, nkb, nh_iat, ik); + + // Diagonalize in subspace to update wavefunction + hsolver::DiagoIterAssist, base_device::DEVICE_GPU>::diag_subspace_psi(h_tmp, + s_tmp, + nbands, + psi_t[0], + &this->pelec->ekb(ik, 0)); + } + + // Clean up saved subspace data + base_device::memory::delete_memory_op, base_device::DEVICE_GPU>()(sub_h_save); + base_device::memory::delete_memory_op, base_device::DEVICE_GPU>()(sub_s_save); + base_device::memory::delete_memory_op, base_device::DEVICE_GPU>()(becp_save); + this->sub_h_save = nullptr; + this->sub_s_save = nullptr; + this->becp_save = nullptr; + + // Subspace diagonalization already includes DeltaSpin correction via calculate_delta_hcc. + // For the PW case, the full-space HSolverPW does NOT include the DeltaSpin correction, + // so calling HSolverPW::solve would overwrite the corrected psi. Always use psiToRho. + reinterpret_cast, base_device::DEVICE_GPU>*>(this->pelec)->psiToRho(*psi_t); + ModuleBase::timer::end("spinconstrain::SpinConstrain", "update_psi_charge_pw_gpu"); +} +#endif + template <> void spinconstrain::SpinConstrain>::cal_mw_from_lambda( int i_step, @@ -134,27 +281,26 @@ void spinconstrain::SpinConstrain>::cal_mw_from_lambda( { ModuleBase::TITLE("spinconstrain::SpinConstrain", "cal_mw_from_lambda"); ModuleBase::timer::start("spinconstrain::SpinConstrain", "cal_mw_from_lambda"); - // lambda has been updated in the lambda loop + #ifdef __LCAO if (PARAM.inp.basis_type == "lcao") { psi::Psi>* psi_t = static_cast>*>(this->psi); hamilt::Hamilt>* hamilt_t = static_cast>*>(this->p_hamilt); hsolver::HSolverLCAO> hsolver_t(this->ParaV, PARAM.inp.ks_solver); - if (PARAM.inp.nspin == 2) + if (this->nspin_ == 2) { dynamic_cast, double>>*>(this->p_operator) ->update_lambda(); } - else if (PARAM.inp.nspin == 4) + else if (this->nspin_ == 4) { dynamic_cast, std::complex>>*>( this->p_operator) ->update_lambda(); } // diagonalization without update charge - // mohan add two parameters charge and nspin, 2025-10-24 - hsolver_t.solve(hamilt_t, psi_t[0], this->pelec, *this->dm_, *this->pelec->charge, PARAM.inp.nspin, true); + hsolver_t.solve(hamilt_t, psi_t[0], this->pelec, *this->dm_, *this->pelec->charge, this->nspin_, true); elecstate::calculate_weights(this->pelec->ekb, this->pelec->wg, this->pelec->klist, @@ -173,27 +319,6 @@ void spinconstrain::SpinConstrain>::cal_mw_from_lambda( else #endif { - /*if (i_step == -1 && this->higher_mag_prec) - { - // std::cout<<__FILE__<<__LINE__<<"istep == 0"<>* psi_t = static_cast>*>(this->psi); - hamilt::Hamilt>* hamilt_t = static_cast>*>(this->p_hamilt); - hsolver::HSolver, base_device::DEVICE_CPU>* hsolver_t = static_cast, base_device::DEVICE_CPU>*>(this->phsol); - hsolver_t->solve(hamilt_t, psi_t[0], this->pelec, this->KS_SOLVER, true); - } - else - { - psi::Psi, base_device::DEVICE_GPU>* psi_t = static_cast, base_device::DEVICE_GPU>*>(this->psi); - hamilt::Hamilt, base_device::DEVICE_GPU>* hamilt_t = static_cast, base_device::DEVICE_GPU>*>(this->p_hamilt); - hsolver::HSolver, base_device::DEVICE_GPU>* hsolver_t = static_cast, base_device::DEVICE_GPU>*>(this->phsol); - hsolver_t->solve(hamilt_t, psi_t[0], this->pelec, this->KS_SOLVER, true); - } - this->pelec->calculate_weights(); - this->cal_Mi_pw(); - } - else*/ { this->zero_Mi(); int size_becp = 0; @@ -242,22 +367,20 @@ void spinconstrain::SpinConstrain>::cal_mw_from_lambda( memcpy(h_tmp.data(), h_k, sizeof(std::complex) * nbands * nbands); memcpy(s_tmp.data(), s_k, sizeof(std::complex) * nbands * nbands); // update h_tmp by delta_lambda - if (i_step != -1) this->calculate_delta_hcc(h_tmp.data(), becp_k, delta_lambda, nbands, nkb, nh_iat); + if (i_step != -1) this->calculate_delta_hcc(h_tmp.data(), becp_k, delta_lambda, nbands, nkb, nh_iat, ik); hsolver::DiagoIterAssist>::diag_responce(h_tmp.data(), s_tmp.data(), nbands, becp_k, &becp_tmp[ik * size_becp], - nkb * 2, + nkb * npol, &this->pelec->ekb(ik, 0)); } } #if ((defined __CUDA) || (defined __ROCM)) else { - base_device::DEVICE_GPU* ctx = {}; - base_device::DEVICE_CPU* cpu_ctx = {}; psi::Psi, base_device::DEVICE_GPU>* psi_t = static_cast, base_device::DEVICE_GPU>*>(this->psi); hamilt::Hamilt, base_device::DEVICE_GPU>* hamilt_t = static_cast, base_device::DEVICE_GPU>*>(this->p_hamilt); auto* onsite_p = projectors::OnsiteProjector::get_instance(); @@ -276,13 +399,11 @@ void spinconstrain::SpinConstrain>::cal_mw_from_lambda( if(this->sub_h_save == nullptr) { initial_hs = 1; - base_device::memory::resize_memory_op, base_device::DEVICE_GPU>()(this->sub_h_save, nbands * nbands * nk); base_device::memory::resize_memory_op, base_device::DEVICE_GPU>()(this->sub_s_save, nbands * nbands * nk); base_device::memory::resize_memory_op, base_device::DEVICE_GPU>()(this->becp_save, size_becp * nk); } std::complex* becp_pointer = nullptr; - // allocate memory for becp_pointer in GPU device base_device::memory::resize_memory_op, base_device::DEVICE_GPU>()(becp_pointer, size_becp); for (int ik = 0; ik < nk; ++ik) { @@ -293,15 +414,13 @@ void spinconstrain::SpinConstrain>::cal_mw_from_lambda( std::complex* becp_k = this->becp_save + ik * size_becp; if(initial_hs) { - /// update H(k) for each k point hamilt_t->updateHk(ik); hsolver::DiagoIterAssist, base_device::DEVICE_GPU>::cal_hs_subspace(hamilt_t, psi_t[0], h_k, s_k); base_device::memory::synchronize_memory_op, base_device::DEVICE_GPU, base_device::DEVICE_GPU>()(becp_k, onsite_p->get_becp(), size_becp); } base_device::memory::synchronize_memory_op, base_device::DEVICE_GPU, base_device::DEVICE_GPU>()(h_tmp, h_k, nbands * nbands); base_device::memory::synchronize_memory_op, base_device::DEVICE_GPU, base_device::DEVICE_GPU>()(s_tmp, s_k, nbands * nbands); - // update h_tmp by delta_lambda - if (i_step != -1) this->calculate_delta_hcc(h_tmp, becp_k, delta_lambda, nbands, nkb, nh_iat); + if (i_step != -1) this->calculate_delta_hcc(h_tmp, becp_k, delta_lambda, nbands, nkb, nh_iat, ik); hsolver::DiagoIterAssist, base_device::DEVICE_GPU>::diag_responce(h_tmp, s_tmp, @@ -310,14 +429,13 @@ void spinconstrain::SpinConstrain>::cal_mw_from_lambda( becp_pointer, nkb * npol, &this->pelec->ekb(ik, 0)); - // copy becp_pointer from GPU to CPU base_device::memory::synchronize_memory_op, base_device::DEVICE_CPU, base_device::DEVICE_GPU>()(&becp_tmp[ik * size_becp], becp_pointer, size_becp); } - // free memory for becp_pointer in GPU device base_device::memory::delete_memory_op, base_device::DEVICE_GPU>()(becp_pointer); } #endif + // calculate weights from ekb to update wg elecstate::calculate_weights(this->pelec->ekb, this->pelec->wg, @@ -330,42 +448,13 @@ void spinconstrain::SpinConstrain>::cal_mw_from_lambda( for (int ik = 0; ik < nk; ik++) { const std::complex* becp = &becp_tmp[ik * size_becp]; - // becp(nbands*npol , nkb) - // mag = wg * \sum_{nh}becp * becp - for (int ib = 0; ib < nbands; ib++) - { - const double weight = this->pelec->wg(ik, ib); - int begin_ih = 0; - for (int iat = 0; iat < this->Mi_.size(); iat++) - { - const int nh = nh_iat[iat]; - std::complex occ[4] - = {ModuleBase::ZERO, ModuleBase::ZERO, ModuleBase::ZERO, ModuleBase::ZERO}; - for (int ih = 0; ih < nh; ih++) - { - const int index = ib * npol * nkb + begin_ih + ih; - occ[0] += conj(becp[index]) * becp[index]; - occ[1] += conj(becp[index]) * becp[index + nkb]; - occ[2] += conj(becp[index + nkb]) * becp[index]; - occ[3] += conj(becp[index + nkb]) * becp[index + nkb]; - } - // occ has been reduced and calculate mag - this->Mi_[iat].x += weight * (occ[1] + occ[2]).real(); - this->Mi_[iat].y += weight * (occ[1] - occ[2]).imag(); - this->Mi_[iat].z += weight * (occ[0] - occ[3]).real(); - begin_ih += nh; - } - } + this->accumulate_Mi_from_becp(becp, nkb, nbands, this->npol_, ik, + &this->pelec->wg(ik, 0), nh_iat); } - Parallel_Reduce::reduce_double_allpool(GlobalV::KPAR, - GlobalV::NPROC_IN_POOL, + Parallel_Reduce::reduce_double_allpool(PARAM.inp.kpar, + PARAM.globalv.nproc_in_pool, &(this->Mi_[0][0]), 3 * this->Mi_.size()); - // for(int i = 0; i < this->Mi_.size(); i++) - //{ - // std::cout<<"atom"<Mi_[i].x<<" "<Mi_[i].y<<" "<Mi_[i].z<<" - // "<lambda_[i].x<<" "<lambda_[i].y<<" "<lambda_[i].z<>::update_psi_charge(const else #endif { - int size_becp = 0; - std::vector> becp_tmp; - int nk = 0; - int nkb = 0; - int nbands = 0; - int npol = 0; - const int* nh_iat = nullptr; if (PARAM.inp.device == "cpu") { - psi::Psi>* psi_t = static_cast>*>(this->psi); - hamilt::Hamilt, base_device::DEVICE_CPU>* hamilt_t = static_cast, base_device::DEVICE_CPU>*>(this->p_hamilt); - auto* onsite_p = projectors::OnsiteProjector::get_instance(); - nbands = psi_t->get_nbands(); - npol = psi_t->get_npol(); - nkb = onsite_p->get_tot_nproj(); - nk = psi_t->get_nk(); - nh_iat = &onsite_p->get_nh(0); - size_becp = nbands * nkb * npol; - becp_tmp.resize(size_becp * nk); - std::vector> h_tmp(nbands * nbands), s_tmp(nbands * nbands); - assert(this->sub_h_save != nullptr); - assert(this->sub_s_save != nullptr); - assert(this->becp_save != nullptr); - for (int ik = 0; ik < nk; ++ik) - { - std::complex* h_k = this->sub_h_save + ik * nbands * nbands; - std::complex* s_k = this->sub_s_save + ik * nbands * nbands; - std::complex* becp_k = this->becp_save + ik * size_becp; - - psi_t->fix_k(ik); - memcpy(h_tmp.data(), h_k, sizeof(std::complex) * nbands * nbands); - memcpy(s_tmp.data(), s_k, sizeof(std::complex) * nbands * nbands); - this->calculate_delta_hcc(h_tmp.data(), becp_k, delta_lambda, nbands, nkb, nh_iat); - hsolver::DiagoIterAssist>::diag_subspace_psi(h_tmp.data(), - s_tmp.data(), - nbands, - psi_t[0], - &this->pelec->ekb(ik, 0)); - } - - delete[] this->sub_h_save; - delete[] this->sub_s_save; - delete[] this->becp_save; - this->sub_h_save = nullptr; - this->sub_s_save = nullptr; - this->becp_save = nullptr; - - if(pw_solve) - { - hsolver::HSolverPW, base_device::DEVICE_CPU> hsolver_pw_obj(this->pw_wfc_, - PARAM.inp.calculation, - PARAM.inp.basis_type, - PARAM.inp.ks_solver, - false, - PARAM.globalv.use_uspp, - PARAM.inp.nspin, - hsolver::DiagoIterAssist, base_device::DEVICE_CPU>::SCF_ITER, - hsolver::DiagoIterAssist, base_device::DEVICE_CPU>::PW_DIAG_NMAX, - hsolver::DiagoIterAssist, base_device::DEVICE_CPU>::PW_DIAG_THR, - hsolver::DiagoIterAssist, base_device::DEVICE_CPU>::need_subspace); - - hsolver_pw_obj.solve(hamilt_t, - psi_t[0], - this->pelec, - this->pelec->ekb.c, - GlobalV::RANK_IN_POOL, - GlobalV::NPROC_IN_POOL, - false, - this->tpiba, - this->get_nat()); - } - else - {// update charge density only - this->pelec->psiToRho(*psi_t); - } + this->update_psi_charge_pw_cpu(delta_lambda, pw_solve); } #if ((defined __CUDA) || (defined __ROCM)) else { - base_device::DEVICE_GPU* ctx = {}; - base_device::DEVICE_CPU* cpu_ctx = {}; - psi::Psi, base_device::DEVICE_GPU>* psi_t = static_cast, base_device::DEVICE_GPU>*>(this->psi); - hamilt::Hamilt, base_device::DEVICE_GPU>* hamilt_t = static_cast, base_device::DEVICE_GPU>*>(this->p_hamilt); - auto* onsite_p = projectors::OnsiteProjector::get_instance(); - nbands = psi_t->get_nbands(); - npol = psi_t->get_npol(); - nkb = onsite_p->get_tot_nproj(); - nk = psi_t->get_nk(); - nh_iat = &onsite_p->get_nh(0); - size_becp = nbands * nkb * npol; - - std::complex* h_tmp = nullptr; - std::complex* s_tmp = nullptr; - base_device::memory::resize_memory_op, base_device::DEVICE_GPU>()(h_tmp, nbands * nbands); - base_device::memory::resize_memory_op, base_device::DEVICE_GPU>()(s_tmp, nbands * nbands); - assert(this->sub_h_save != nullptr); - assert(this->sub_s_save != nullptr); - assert(this->becp_save != nullptr); - for (int ik = 0; ik < nk; ++ik) - { - std::complex* h_k = this->sub_h_save + ik * nbands * nbands; - std::complex* s_k = this->sub_s_save + ik * nbands * nbands; - std::complex* becp_k = this->becp_save + ik * size_becp; - - psi_t->fix_k(ik); - base_device::memory::synchronize_memory_op, base_device::DEVICE_GPU, base_device::DEVICE_GPU>()(h_tmp, h_k, nbands * nbands); - base_device::memory::synchronize_memory_op, base_device::DEVICE_GPU, base_device::DEVICE_GPU>()(s_tmp, s_k, nbands * nbands); - this->calculate_delta_hcc(h_tmp, becp_k, delta_lambda, nbands, nkb, nh_iat); - hsolver::DiagoIterAssist, base_device::DEVICE_GPU>::diag_subspace_psi(h_tmp, - s_tmp, - nbands, - psi_t[0], - &this->pelec->ekb(ik, 0)); - } - - base_device::memory::delete_memory_op, base_device::DEVICE_GPU>()(sub_h_save); - base_device::memory::delete_memory_op, base_device::DEVICE_GPU>()(sub_s_save); - base_device::memory::delete_memory_op, base_device::DEVICE_GPU>()(becp_save); - this->sub_h_save = nullptr; - this->sub_s_save = nullptr; - this->becp_save = nullptr; - - if(pw_solve) - { - hsolver::HSolverPW, base_device::DEVICE_GPU> hsolver_pw_obj(this->pw_wfc_, - PARAM.inp.calculation, - PARAM.inp.basis_type, - PARAM.inp.ks_solver, - false, - PARAM.globalv.use_uspp, - PARAM.inp.nspin, - hsolver::DiagoIterAssist, base_device::DEVICE_GPU>::SCF_ITER, - hsolver::DiagoIterAssist, base_device::DEVICE_GPU>::PW_DIAG_NMAX, - hsolver::DiagoIterAssist, base_device::DEVICE_GPU>::PW_DIAG_THR, - hsolver::DiagoIterAssist, base_device::DEVICE_GPU>::need_subspace); - - hsolver_pw_obj.solve(hamilt_t, - psi_t[0], - this->pelec, - this->pelec->ekb.c, - GlobalV::RANK_IN_POOL, - GlobalV::NPROC_IN_POOL, - false, - this->tpiba, - this->get_nat()); - } - else - {// update charge density only - reinterpret_cast, base_device::DEVICE_GPU>*>(this->pelec)->psiToRho(*psi_t); - } - + this->update_psi_charge_pw_gpu(delta_lambda, pw_solve); } -#endif +#endif } ModuleBase::timer::end("spinconstrain::SpinConstrain", "update_psi_charge"); } diff --git a/source/source_lcao/module_deltaspin/cal_mw_helper.cpp b/source/source_lcao/module_deltaspin/cal_mw_helper.cpp new file mode 100644 index 00000000000..4513cf9253e --- /dev/null +++ b/source/source_lcao/module_deltaspin/cal_mw_helper.cpp @@ -0,0 +1,168 @@ +#ifdef __LCAO +#include "spin_constrain.h" + +template <> +std::vector>> spinconstrain::SpinConstrain>::convert( + const ModuleBase::matrix& orbMulP) +{ + std::vector>> AorbMulP; + AorbMulP.resize(this->nspin_); + int nat = this->get_nat(); + for (int is = 0; is < this->nspin_; ++is) + { + int num = 0; + AorbMulP[is].resize(nat); + for (const auto& sc_elem: this->get_atomCounts()) + { + int it = sc_elem.first; + int nat_it = sc_elem.second; + int nw_it = this->get_orbitalCounts().at(it); + for (int ia = 0; ia < nat_it; ia++) + { + int iat = this->get_iat(it, ia); + AorbMulP[is][iat].resize(nw_it, 0.0); + for (int iw = 0; iw < nw_it; iw++) + { + AorbMulP[is][iat][iw] = std::abs(orbMulP(is, num))< 1e-10 ? 0.0 : orbMulP(is, num); + num++; + } + } + } + } + return AorbMulP; +} + +template <> +void spinconstrain::SpinConstrain>::calculate_MW( + const std::vector>>& AorbMulP) +{ + size_t nw = this->get_nw(); + int nat = this->get_nat(); + + this->zero_Mi(); + + const int nlocal = (this->nspin_ == 4) ? nw / 2 : nw; + for (const auto& sc_elem: this->get_atomCounts()) + { + int it = sc_elem.first; + int nat_it = sc_elem.second; + for (int ia = 0; ia < nat_it; ia++) + { + int num = 0; + int iat = this->get_iat(it, ia); + double atom_mag = 0.0; + std::vector total_charge_soc(this->nspin_, 0.0); + for (const auto& lnchi: this->get_lnchiCounts().at(it)) + { + std::vector sum_l(this->nspin_, 0.0); + int L = lnchi.first; + int nchi = lnchi.second; + for (int Z = 0; Z < nchi; ++Z) + { + std::vector sum_m(this->nspin_, 0.0); + for (int M = 0; M < (2 * L + 1); ++M) + { + for (int j = 0; j < this->nspin_; j++) + { + sum_m[j] += AorbMulP[j][iat][num]; + } + num++; + } + for (int j = 0; j < this->nspin_; j++) + { + sum_l[j] += sum_m[j]; + } + } + if (this->nspin_ == 2) + { + atom_mag += sum_l[0] - sum_l[1]; + } + else if (this->nspin_ == 4) + { + for (int j = 0; j < this->nspin_; j++) + { + total_charge_soc[j] += sum_l[j]; + } + } + } + if (this->nspin_ == 2) + { + this->Mi_[iat].x = 0.0; + this->Mi_[iat].y = 0.0; + this->Mi_[iat].z = atom_mag; + } + else if (this->nspin_ == 4) + { + this->Mi_[iat].x = (std::abs(total_charge_soc[1]) < this->sc_thr_)? 0.0 : total_charge_soc[1]; + this->Mi_[iat].y = (std::abs(total_charge_soc[2]) < this->sc_thr_)? 0.0 : total_charge_soc[2]; + this->Mi_[iat].z = (std::abs(total_charge_soc[3]) < this->sc_thr_)? 0.0 : total_charge_soc[3]; + } + } + } +} + +template <> +void spinconstrain::SpinConstrain>::collect_MW(ModuleBase::matrix& MecMulP, + const ModuleBase::ComplexMatrix& mud, + int nw, + int isk) +{ + if (this->nspin_ == 2) + { + for (size_t i=0; i < nw; ++i) + { + if (this->ParaV->in_this_processor(i, i)) + { + const int ir = this->ParaV->global2local_row(i); + const int ic = this->ParaV->global2local_col(i); + MecMulP(isk, i) += mud(ic, ir).real(); + } + } + } + else if (this->nspin_ == 4) + { + for (size_t i = 0; i < nw; ++i) + { + const int index = i % 2; + if (!index) + { + const int j = i / 2; + const int k1 = 2 * j; + const int k2 = 2 * j + 1; + if (this->ParaV->in_this_processor(k1, k1)) + { + const int ir = this->ParaV->global2local_row(k1); + const int ic = this->ParaV->global2local_col(k1); + MecMulP(0, j) += mud(ic, ir).real(); + MecMulP(3, j) += mud(ic, ir).real(); + } + if (this->ParaV->in_this_processor(k1, k2)) + { + const int ir = this->ParaV->global2local_row(k1); + const int ic = this->ParaV->global2local_col(k2); + // note that mud is column major + MecMulP(1, j) += mud(ic, ir).real(); + // M_y = i(M_{up,down} - M_{down,up}) = -(M_{up,down} - M_{down,up}).imag() + MecMulP(2, j) -= mud(ic, ir).imag(); + } + if (this->ParaV->in_this_processor(k2, k1)) + { + const int ir = this->ParaV->global2local_row(k2); + const int ic = this->ParaV->global2local_col(k1); + MecMulP(1, j) += mud(ic, ir).real(); + // M_y = i(M_{up,down} - M_{down,up}) = -(M_{up,down} - M_{down,up}).imag() + MecMulP(2, j) += mud(ic, ir).imag(); + } + if (this->ParaV->in_this_processor(k2, k2)) + { + const int ir = this->ParaV->global2local_row(k2); + const int ic = this->ParaV->global2local_col(k2); + MecMulP(0, j) += mud(ic, ir).real(); + MecMulP(3, j) -= mud(ic, ir).real(); + } + } + } + } +} + +#endif diff --git a/source/source_lcao/module_deltaspin/deltaspin_lcao.cpp b/source/source_lcao/module_deltaspin/deltaspin_lcao.cpp index 6a7effb6d02..8a7950ee2ab 100644 --- a/source/source_lcao/module_deltaspin/deltaspin_lcao.cpp +++ b/source/source_lcao/module_deltaspin/deltaspin_lcao.cpp @@ -26,14 +26,14 @@ void init_deltaspin_lcao(const UnitCell& ucell, spinconstrain::SpinConstrain& sc = spinconstrain::SpinConstrain::getScInstance(); #ifdef __LCAO sc.init_sc(inp.sc_thr, inp.nsc, inp.nsc_min, inp.alpha_trial, - inp.sccut, inp.sc_drop_thr, ucell, + inp.sccut, inp.sc_drop_thr, ucell, inp.sc_direction_only, static_cast(pv), inp.nspin, kv, p_hamilt, psi, static_cast*>(dm), static_cast(pelec)); #else sc.init_sc(inp.sc_thr, inp.nsc, inp.nsc_min, inp.alpha_trial, - inp.sccut, inp.sc_drop_thr, ucell, + inp.sccut, inp.sc_drop_thr, ucell, inp.sc_direction_only, static_cast(pv), inp.nspin, kv, p_hamilt, psi, static_cast(pelec)); diff --git a/source/source_lcao/module_deltaspin/init_sc.cpp b/source/source_lcao/module_deltaspin/init_sc.cpp index ac56047173d..af510f3907a 100644 --- a/source/source_lcao/module_deltaspin/init_sc.cpp +++ b/source/source_lcao/module_deltaspin/init_sc.cpp @@ -9,6 +9,7 @@ void spinconstrain::SpinConstrain::init_sc(double sc_thr_in, double sccut_in, double sc_drop_thr_in, const UnitCell& ucell, + bool direction_only_in, Parallel_Orbitals* ParaV_in, int nspin_in, const K_Vectors& kv_in, @@ -25,10 +26,12 @@ void spinconstrain::SpinConstrain::init_sc(double sc_thr_in, this->set_orbitalCounts(ucell.get_orbital_Counts()); this->set_lnchiCounts(ucell.get_lnchi_Counts()); this->set_nspin(nspin_in); + this->set_npol((nspin_in == 4) ? 2 : 1); this->set_target_mag(ucell.get_target_mag()); this->lambda_ = ucell.get_lambda(); this->constrain_ = ucell.get_constrain(); this->atomLabels_ = ucell.get_atomLabels(); + this->direction_only_ = direction_only_in; this->tpiba = ucell.tpiba; this->pw_wfc_ = pw_wfc_in; this->set_decay_grad(); diff --git a/source/source_lcao/module_deltaspin/lambda_loop.cpp b/source/source_lcao/module_deltaspin/lambda_loop.cpp index 5d38c5d2610..0d67ef9179a 100644 --- a/source/source_lcao/module_deltaspin/lambda_loop.cpp +++ b/source/source_lcao/module_deltaspin/lambda_loop.cpp @@ -152,8 +152,26 @@ void spinconstrain::SpinConstrain>::run_lambda_loop( { where_fill_scalar_else_2d(this->constrain_, 0, zero, delta_lambda, delta_lambda); add_scalar_multiply_2d(initial_lambda, delta_lambda, one, this->lambda_); - - this->cal_mw_from_lambda(i_step); + + // set the lambda component along the target magnetic moment direction to zero + if(this->direction_only_) + for (int ia = 0; ia < nat; ia++) + { + const auto& target = this->target_mag_[ia]; + const double norm = std::sqrt(target.x*target.x + target.y*target.y + target.z*target.z); + + if (norm > 1e-8) { + const ModuleBase::Vector3 dir = target / norm; + double parallel = this->lambda_[ia].x*dir.x + + this->lambda_[ia].y*dir.y + + this->lambda_[ia].z*dir.z; + this->lambda_[ia].x -= parallel * dir.x; + this->lambda_[ia].y -= parallel * dir.y; + this->lambda_[ia].z -= parallel * dir.z; + } + } + + this->cal_mw_from_lambda(i_step, delta_lambda.data()); new_spin = this->Mi_; bool GradLessThanBound = this->check_gradient_decay(new_spin, spin, delta_lambda, dnu_last_step); @@ -179,6 +197,31 @@ void spinconstrain::SpinConstrain>::run_lambda_loop( subtract_2d(spin, this->target_mag_, delta_spin); where_fill_scalar_2d(this->constrain_, 0, zero, delta_spin); search = delta_spin; + // calculate the residual perpendicular to the target magnetic moment direction + if(this->direction_only_) + for (int ia = 0; ia < nat; ia++) + { + const auto& target = this->target_mag_[ia]; + const double norm = std::sqrt(target.x*target.x + target.y*target.y + target.z*target.z); + + if (norm > 1e-8) { + const ModuleBase::Vector3 dir = target / norm; + const double parallel = delta_spin[ia].x*dir.x + delta_spin[ia].y*dir.y + delta_spin[ia].z*dir.z; + temp_1[ia][0] = std::pow(delta_spin[ia].x,2) + std::pow(delta_spin[ia].y,2) + + std::pow(delta_spin[ia].z,2) - std::pow(parallel,2); + temp_1[ia][1] = 0; + temp_1[ia][2] = 0; + this->target_mag_[ia] += parallel * dir; + } + else { + temp_1[ia][0] = std::pow(delta_spin[ia].x,2) + + std::pow(delta_spin[ia].y,2) + + std::pow(delta_spin[ia].z,2); + temp_1[ia][1] = 0; + temp_1[ia][2] = 0; + } + } + else for (int ia = 0; ia < nat; ia++) { for (int ic = 0; ic < 3; ic++) @@ -245,8 +288,32 @@ void spinconstrain::SpinConstrain>::run_lambda_loop( dnu_last_step = dnu; add_scalar_multiply_2d(dnu, search, alpha_trial, dnu); + + // project delta_lambda to the target direction to ensure the increment update also meets the constraints + if(this->direction_only_) + for (int ia = 0; ia < nat; ia++) { + const auto& target = this->target_mag_[ia]; + const double norm = std::sqrt(target.x*target.x + target.y*target.y + target.z*target.z); + + if (norm > 1e-8) { + const ModuleBase::Vector3 dir = target / norm; + double parallel = dnu[ia].x*dir.x + dnu[ia].y*dir.y + dnu[ia].z*dir.z; + dnu[ia].x -= parallel * dir.x; + dnu[ia].y -= parallel * dir.y; + dnu[ia].z -= parallel * dir.z; + } + } delta_lambda = dnu; + // Cap delta_lambda to prevent explosion + for(int ia=0; ia 10.0) { + delta_lambda[ia][ic] = 10.0 * (delta_lambda[ia][ic] > 0 ? 1.0 : -1.0); + } + } + } + where_fill_scalar_else_2d(this->constrain_, 0, zero, delta_lambda, delta_lambda); add_scalar_multiply_2d(initial_lambda, delta_lambda, one, this->lambda_); @@ -261,6 +328,21 @@ void spinconstrain::SpinConstrain>::run_lambda_loop( alpha_plus = alpha_opt - alpha_trial; scalar_multiply_2d(search, alpha_plus, temp_1); add_scalar_multiply_2d(dnu, temp_1, one, dnu); + + // project delta_lambda to ensure the increment update also meets the constraints + if(this->direction_only_) + for (int ia = 0; ia < nat; ia++) { + const auto& target = this->target_mag_[ia]; + const double norm = std::sqrt(target.x*target.x + target.y*target.y + target.z*target.z); + + if (norm > 1e-8) { + const ModuleBase::Vector3 dir = target / norm; + double parallel = dnu[ia].x*dir.x + dnu[ia].y*dir.y + dnu[ia].z*dir.z; + dnu[ia].x -= parallel * dir.x; + dnu[ia].y -= parallel * dir.y; + dnu[ia].z -= parallel * dir.z; + } + } delta_lambda = dnu; search_old = search; @@ -280,3 +362,303 @@ void spinconstrain::SpinConstrain>::run_lambda_loop( return; } + +#ifdef __LCAO +#include "source_lcao/module_operator_lcao/dspin_lcao.h" +#include "source_estate/module_dm/cal_dm_psi.h" +#include "source_estate/elecstate_tools.h" +#include "source_base/module_external/lapack_connector.h" +#include "source_base/module_external/blas_connector.h" +#include "source_base/module_external/scalapack_connector.h" + +template <> +void spinconstrain::SpinConstrain>::run_lambda_loop_lcao(int outer_step) +{ + const int nat = this->get_nat(); + const int nks = this->kv_.get_nks(); // total k-points (spin-up + spin-down for nspin=2) + const int nk = nks / 2; // k-points per spin channel + psi::Psi>* psi_t = static_cast>*>(this->psi); + const int nbands = psi_t->get_nbands(); + const double alpha_damp = 0.8; + const int max_inner_iter = 2; + + this->print_header(); + + // ── Phase 1: Full diagonalization to get C_k, e_k, Mi ── + this->cal_mw_from_lambda(-1); + std::vector> spin(nat); + spin = this->Mi_; + + std::vector> initial_lambda(nat, 0.0); + const double zero = 0.0; + where_fill_scalar_else_2d(this->constrain_, 0, zero, this->lambda_, initial_lambda); + + print_2d("initial lambda (eV/uB): ", initial_lambda, this->nspin_, ModuleBase::Ry_to_eV); + print_2d("initial spin (uB): ", spin, this->nspin_); + print_2d("target spin (uB): ", this->target_mag_, this->nspin_); + + // Check initial convergence + std::vector> delta_spin(nat, 0.0); + subtract_2d(spin, this->target_mag_, delta_spin); + where_fill_scalar_2d(this->constrain_, 0, zero, delta_spin); + double rms_error = 0.0; + { + double sum = 0.0; + for (int ia = 0; ia < nat; ia++) + for (int ic = 0; ic < 3; ic++) + sum += std::pow(delta_spin[ia][ic], 2); + rms_error = std::sqrt(sum / nat); + } + this->current_sc_thr_ = std::max(rms_error * this->sc_drop_thr_, this->sc_thr_); + + if (rms_error < this->current_sc_thr_) + { + std::cout << "Step (Outer -- Inner) = " << outer_step << " -- 0" + << " RMS = " << rms_error << std::endl; + std::cout << "Meet convergence criterion ( < " << this->current_sc_thr_ << " ), exit." << std::endl; + this->print_termination(); + // Update charge from current psi + this->pelec->psiToRho(*psi_t); + return; + } + + // ── Phase 2: Compute P_I_sub for all k-points ── + auto* dspin_op = dynamic_cast, double>>*>(this->p_operator); + + // PI_sub[ik][iat] = nbands × nbands Hermitian matrix + std::vector>>> PI_sub(nks); + for (int ik = 0; ik < nks; ik++) + { + psi_t->fix_k(ik); + dspin_op->cal_PI_sub(this->kv_.kvec_d[ik], psi_t->get_pointer(), nbands, PI_sub[ik]); + } + + // ── Phase 3: Analytical Jacobian ── + // chi_I = dM_I^z / dlambda_I + // For nspin=2: M_I = sum_k [sum_n f_n_up * P_I_nn_up - sum_n f_n_down * P_I_nn_down] + // dM/dlambda uses perturbation theory with both spin channels + std::vector chi(nat, 0.0); + for (int iat = 0; iat < nat; iat++) + { + if (this->constrain_[iat].z == 0) { continue; + } + double chi_val = 0.0; + for (int ik = 0; ik < nks; ik++) + { + if (PI_sub[ik][iat].empty()) { continue; + } + // sign: +1 for spin-up (ik < nk), -1 for spin-down (ik >= nk) + // dH_up/dlambda = +P_I, dH_down/dlambda = -P_I + // dM/dlambda = d(M_up - M_down)/dlambda + // For spin-up channel: dM_up/dlambda = sum_{n,m} 2*(f_n-f_m)*|P_nm|^2/(e_n-e_m) * (+1) + // For spin-down channel: dM_down/dlambda = sum_{n,m} 2*(f_n-f_m)*|P_nm|^2/(e_n-e_m) * (-1) + // dM/dlambda = dM_up/dlambda - dM_down/dlambda + // Both channels contribute with same sign to chi + const double sign = static_cast(this->get_spin_sign(ik)); + const auto& P = PI_sub[ik][iat]; + for (int n = 0; n < nbands; n++) + { + const double fn = this->pelec->wg(ik, n); + for (int m = n + 1; m < nbands; m++) + { + const double fm = this->pelec->wg(ik, m); + const double de = this->pelec->ekb(ik, n) - this->pelec->ekb(ik, m); + if (std::abs(de) < 1e-10) { continue; + } + const double P_nm_sq = std::norm(P[n * nbands + m]); + // sign * sign = 1 always, so both channels add + chi_val += 2.0 * (fn - fm) * P_nm_sq / de; + } + } + } + chi[iat] = chi_val; + } + + // ── Phase 4: Newton update + subspace verification ── + // Storage for subspace diag results + ModuleBase::matrix ekb_new(nks, nbands); + ModuleBase::matrix wg_new(nks, nbands); + std::vector>> V_save(nks); + + for (int inner = 0; inner < max_inner_iter; inner++) + { + // Newton step: delta_lambda_I = alpha_damp * (target - current) / chi_I + for (int iat = 0; iat < nat; iat++) + { + if (this->constrain_[iat].z == 0) { continue; + } + if (std::abs(chi[iat]) < 1e-15) { continue; + } + const double delta_lambda_z = alpha_damp * (this->target_mag_[iat].z - spin[iat].z) / chi[iat]; + this->lambda_[iat].z = initial_lambda[iat].z + delta_lambda_z; + } + + // Subspace diag for each k-point + for (int ik = 0; ik < nks; ik++) + { + const double sign = static_cast(this->get_spin_sign(ik)); + + // Build H_sub = diag(e_k) + sign * sum_I delta_lambda_I * P_I_sub(k) + std::vector> H_sub(nbands * nbands, {0.0, 0.0}); + for (int n = 0; n < nbands; n++) + { + H_sub[n * nbands + n] = {this->pelec->ekb(ik, n), 0.0}; + } + for (int iat = 0; iat < nat; iat++) + { + if (PI_sub[ik][iat].empty()) { continue; + } + const double dlambda = sign * (this->lambda_[iat].z - initial_lambda[iat].z); + for (int i = 0; i < nbands * nbands; i++) + { + H_sub[i] += dlambda * PI_sub[ik][iat][i]; + } + } + + // Diag with LAPACK zheev + std::vector e_new(nbands); + V_save[ik] = H_sub; // zheev overwrites with eigenvectors + int lwork = 2 * nbands; + std::vector> work(lwork); + std::vector rwork(3 * nbands); + int info = 0; + zheev_("V", "U", &nbands, V_save[ik].data(), &nbands, + e_new.data(), work.data(), &lwork, rwork.data(), &info); + if (info != 0) + { + std::cout << "WARNING: zheev failed with info=" << info << " at ik=" << ik << std::endl; + } + for (int n = 0; n < nbands; n++) + { + ekb_new(ik, n) = e_new[n]; + } + } + + // Recompute weights from new eigenvalues + elecstate::calculate_weights(ekb_new, + wg_new, + this->pelec->klist, + this->pelec->eferm, + this->pelec->f_en, + this->pelec->nelec_spin, + this->pelec->skip_weights); + + // Compute Mi_new from subspace rotation + std::vector> Mi_new(nat, 0.0); + for (int iat = 0; iat < nat; iat++) + { + if (this->constrain_[iat].z == 0) { continue; + } + double mi_z = 0.0; + for (int ik = 0; ik < nks; ik++) + { + if (PI_sub[ik][iat].empty()) { continue; + } + const double sign = static_cast(this->get_spin_sign(ik)); + const auto& V = V_save[ik]; + const auto& P = PI_sub[ik][iat]; + + // P_rotated = V^dag P V, we only need diagonal elements + // P_rotated[n,n] = sum_{a,b} conj(V[a,n]) * P[a,b] * V[b,n] + for (int n = 0; n < nbands; n++) + { + std::complex pnn = {0.0, 0.0}; + for (int a = 0; a < nbands; a++) + { + std::complex tmp = {0.0, 0.0}; + for (int b = 0; b < nbands; b++) + { + tmp += P[a * nbands + b] * V[b * nbands + n]; + } + pnn += std::conj(V[a * nbands + n]) * tmp; + } + mi_z += sign * wg_new(ik, n) * pnn.real(); + } + } + Mi_new[iat].z = mi_z; + } + + // Check convergence + subtract_2d(Mi_new, this->target_mag_, delta_spin); + where_fill_scalar_2d(this->constrain_, 0, zero, delta_spin); + { + double sum = 0.0; + for (int ia = 0; ia < nat; ia++) + for (int ic = 0; ic < 3; ic++) + sum += std::pow(delta_spin[ia][ic], 2); + rms_error = std::sqrt(sum / nat); + } + + std::cout << "Step (Outer -- Inner) = " << outer_step << " -- " << std::left << std::setw(5) << inner + 1 + << " RMS = " << rms_error << " (subspace)" << std::endl; + + if (rms_error < this->current_sc_thr_) + { + std::cout << "Meet convergence criterion ( < " << this->current_sc_thr_ << " ), exit." << std::endl; + break; + } + + // Update spin for next iteration + spin = Mi_new; + } + + this->print_termination(); + + // ── Phase 5: Finalize — rotate wavefunctions and update DM/charge ── + // C_new_k = C_k * V_k via pzgemm (2D-block distributed) + // V_k is nbands × nbands (small, replicated on all procs) + // C_k is nlocal × nbands (2D-block distributed) + for (int ik = 0; ik < nks; ik++) + { + psi_t->fix_k(ik); + const int nlocal = this->ParaV->get_row_size(); + const int ncol_local = this->ParaV->ncol_bands; + + // Temporary storage for rotated wavefunction + std::vector> psi_new(nlocal * ncol_local, {0.0, 0.0}); + + // C_new[irow, jcol_local] = sum_m C[irow, m_local] * V[m_global, jcol_global] + // Since V is replicated, we can do this locally per process + const std::complex* psi_old = psi_t->get_pointer(); + for (int jcol_local = 0; jcol_local < ncol_local; jcol_local++) + { + const int jcol_global = this->ParaV->local2global_col(jcol_local); + for (int mcol_local = 0; mcol_local < ncol_local; mcol_local++) + { + const int mcol_global = this->ParaV->local2global_col(mcol_local); + // V[mcol_global, jcol_global] — V is column-major from zheev + const std::complex v_mj = V_save[ik][mcol_global * nbands + jcol_global]; + // psi_new[:, jcol_local] += psi_old[:, mcol_local] * v_mj + for (int irow = 0; irow < nlocal; irow++) + { + psi_new[irow + jcol_local * nlocal] += psi_old[irow + mcol_local * nlocal] * v_mj; + } + } + } + + // Copy back + std::complex* psi_ptr = const_cast*>(psi_t->get_pointer()); + std::copy(psi_new.begin(), psi_new.end(), psi_ptr); + + // Update eigenvalues + for (int n = 0; n < nbands; n++) + { + this->pelec->ekb(ik, n) = ekb_new(ik, n); + } + } + + // Update weights, DM, and charge + elecstate::calculate_weights(this->pelec->ekb, + this->pelec->wg, + this->pelec->klist, + this->pelec->eferm, + this->pelec->f_en, + this->pelec->nelec_spin, + this->pelec->skip_weights); + elecstate::calEBand(this->pelec->ekb, this->pelec->wg, this->pelec->f_en); + + elecstate::cal_dm_psi(this->ParaV, this->pelec->wg, *psi_t, *this->dm_); + this->dm_->cal_DMR(); + this->pelec->psiToRho(*psi_t); +} +#endif // __LCAO diff --git a/source/source_lcao/module_deltaspin/lambda_loop_helper.cpp b/source/source_lcao/module_deltaspin/lambda_loop_helper.cpp index 6ad4db05adb..43c8b3d84c9 100644 --- a/source/source_lcao/module_deltaspin/lambda_loop_helper.cpp +++ b/source/source_lcao/module_deltaspin/lambda_loop_helper.cpp @@ -92,6 +92,18 @@ double spinconstrain::SpinConstrain>::cal_alpha_opt( } double sum_k = sum_2d(temp_1); double sum_k2 = sum_2d(temp_2); + printf("[ALPHA-OPT] nat=%d sum_k=%.6e sum_k2=%.6e alpha_trial=%.6e\n", nat, sum_k, sum_k2, alpha_trial); + for(int ia=0; ia>::check_gradient_decay( { for (int jc = 0; jc < 3; jc++) { + if (std::abs(nu_change[ja][jc]) < 1e-30) { + printf("[GRAD-DECAY] WARNING: nu_change[%d][%d] too small! delta_lambda=(%.6e,%.6e,%.6e) dnu_last=(%.6e,%.6e,%.6e)\n", + ja, jc, delta_lambda[ja].x, delta_lambda[ja].y, delta_lambda[ja].z, + dnu_last_step[ja].x, dnu_last_step[ja].y, dnu_last_step[ja].z); + fflush(stdout); + nu_change[ja][jc] = 1e-30; + } spin_nu_gradient[ia][ic][ja][jc] = spin_change[ia][ic] / nu_change[ja][jc]; } } diff --git a/source/source_lcao/module_deltaspin/lambda_strategy_integration.cpp b/source/source_lcao/module_deltaspin/lambda_strategy_integration.cpp new file mode 100644 index 00000000000..7c93ee8c88d --- /dev/null +++ b/source/source_lcao/module_deltaspin/lambda_strategy_integration.cpp @@ -0,0 +1,72 @@ +#include "spin_constrain.h" + +#include "lambda_update_strategies.h" + +namespace spinconstrain +{ + +template +void SpinConstrain::set_strategy_type(LambdaStrategyType type) +{ + strategy_type_ = type; + switch(type) + { + case LambdaStrategyType::BFGS: + strategy_ = nullptr; + break; + case LambdaStrategyType::LinearResponse: + strategy_ = std::unique_ptr( + new LinearResponseUpdate()); + break; + case LambdaStrategyType::AugmentedLagrangian: + strategy_ = std::unique_ptr( + new AugmentedLagrangianUpdate()); + break; + case LambdaStrategyType::HybridDelayed: + strategy_ = std::unique_ptr( + new HybridDelayedUpdate()); + break; + default: + strategy_ = nullptr; + strategy_type_ = LambdaStrategyType::BFGS; + break; + } +} + +template +void SpinConstrain::set_strategy_params(double mu_init, double mu_max, + double mu_growth, double mix_beta, + double sc_scf_thr) +{ + if (!strategy_) return; + + if (strategy_type_ == LambdaStrategyType::LinearResponse) + { + if (auto* lr = dynamic_cast(strategy_.get())) + { + // mix_beta is the primary tunable parameter for LinearResponse + // chi_min, chi_max, lambda_max keep defaults + *lr = LinearResponseUpdate(0.01, 100.0, mix_beta, 10.0); + } + } + else if (strategy_type_ == LambdaStrategyType::AugmentedLagrangian) + { + if (auto* al = dynamic_cast(strategy_.get())) + { + *al = AugmentedLagrangianUpdate(mu_init, mu_max, mu_growth, 5, 10.0); + } + } + else if (strategy_type_ == LambdaStrategyType::HybridDelayed) + { + if (auto* hd = dynamic_cast(strategy_.get())) + { + *hd = HybridDelayedUpdate(sc_scf_thr, mu_init, mu_max, mu_growth, 5, 10, 10.0); + } + } +} + +// Explicit template instantiation +template class SpinConstrain>; +template class SpinConstrain; + +} // namespace spinconstrain diff --git a/source/source_lcao/module_deltaspin/lambda_update_strategies.cpp b/source/source_lcao/module_deltaspin/lambda_update_strategies.cpp new file mode 100644 index 00000000000..52bd2378b3f --- /dev/null +++ b/source/source_lcao/module_deltaspin/lambda_update_strategies.cpp @@ -0,0 +1,386 @@ +#include "lambda_update_strategies.h" +#include +#include + +namespace spinconstrain +{ + +// =================================================================== +// Helper functions +// =================================================================== + +double compute_rms_error(const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + int nat) +{ + double sum = 0.0; + int n_count = 0; + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] != 0) + { + double diff = Mi[ia][ic] - target_mag[ia][ic]; + sum += diff * diff; + ++n_count; + } + } + } + if (n_count == 0) return 0.0; + return std::sqrt(sum / n_count); +} + +int count_converged(const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + double sc_thr, + int nat) +{ + int count = 0; + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] != 0) + { + double diff = Mi[ia][ic] - target_mag[ia][ic]; + if (std::abs(diff) < sc_thr) + { + ++count; + } + } + } + } + return count; +} + +void cap_lambda(std::vector>& lambda, + const std::vector>& constrain, + double lambda_max, + int nat) +{ + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] != 0) + { + if (lambda[ia][ic] > lambda_max) lambda[ia][ic] = lambda_max; + if (lambda[ia][ic] < -lambda_max) lambda[ia][ic] = -lambda_max; + } + } + } +} + +// =================================================================== +// Scheme B: Linear Response (One-Step) Update +// =================================================================== + +LinearResponseUpdate::LinearResponseUpdate(double chi_min, + double chi_max, + double mix_beta, + double lambda_max) + : chi_min_(chi_min), chi_max_(chi_max), mix_beta_(mix_beta), + lambda_max_(lambda_max), converged_(false), last_rms_(1e30) +{ +} + +LambdaUpdateResult LinearResponseUpdate::update_lambda( + std::vector>& lambda, + const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + double sc_thr, + int iter, + int nat) +{ + LambdaUpdateResult result; + result.n_atoms = nat; + + // Ensure response matrix is properly sized + if (static_cast(chi_.size()) != nat) + { + chi_.assign(nat, ModuleBase::Vector3(1.0, 1.0, 1.0)); + } + + // Estimate chi from history if we have enough iterations + if (iter >= 2 && static_cast(Mi_history_.size()) >= 2) + { + const std::vector>& Mi_old = Mi_history_[Mi_history_.size() - 2]; + const std::vector>& lambda_old = lambda_history_[lambda_history_.size() - 2]; + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] == 0) continue; + double dlambda = lambda[ia][ic] - lambda_old[ia][ic]; + double dM = Mi[ia][ic] - Mi_old[ia][ic]; + if (std::abs(dlambda) > 1e-8) + { + double chi_new = dM / dlambda; + if (chi_new > chi_min_ && chi_new < chi_max_) + { + chi_[ia][ic] = chi_new; + } + } + } + } + } + + // Update lambda: lambda += mix_beta * (M_target - M) / chi + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] == 0) continue; + double residual = target_mag[ia][ic] - Mi[ia][ic]; + double delta = residual / chi_[ia][ic]; + lambda[ia][ic] += mix_beta_ * delta; + } + } + + // Cap lambda + cap_lambda(lambda, constrain, lambda_max_, nat); + + // Save history + Mi_history_.push_back(Mi); + lambda_history_.push_back(lambda); + // Keep only last 5 entries + if (static_cast(Mi_history_.size()) > 5) + { + Mi_history_.erase(Mi_history_.begin()); + lambda_history_.erase(lambda_history_.begin()); + } + + // Compute result + result.rms_error = compute_rms_error(Mi, target_mag, constrain, nat); + result.n_converged = count_converged(Mi, target_mag, constrain, sc_thr, nat); + + double max_l = 0.0; + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] != 0) + { + max_l = std::max(max_l, std::abs(lambda[ia][ic])); + } + } + } + result.max_lambda = max_l; + + converged_ = (result.rms_error < sc_thr); + result.status = converged_ ? "converged" : "updating"; + + return result; +} + +// =================================================================== +// Scheme C: Augmented Lagrangian Update +// =================================================================== + +AugmentedLagrangianUpdate::AugmentedLagrangianUpdate(double mu_init, + double mu_max, + double mu_growth, + int mu_update_interval, + double lambda_max) + : mu_(mu_init), mu_init_(mu_init), mu_max_(mu_max), + mu_growth_(mu_growth), mu_update_interval_(mu_update_interval), + lambda_max_(lambda_max), converged_(false), last_iter_(0) +{ +} + +LambdaUpdateResult AugmentedLagrangianUpdate::update_lambda( + std::vector>& lambda, + const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + double sc_thr, + int iter, + int nat) +{ + LambdaUpdateResult result; + result.n_atoms = nat; + last_iter_ = iter; + + // Dual variable update: lambda += mu * (M - M_target) + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] == 0) continue; + double violation = Mi[ia][ic] - target_mag[ia][ic]; + lambda[ia][ic] += mu_ * violation; + } + } + + // Cap lambda + cap_lambda(lambda, constrain, lambda_max_, nat); + + // Grow mu periodically + if (iter > 0 && iter % mu_update_interval_ == 0) + { + mu_ = std::min(mu_max_, mu_ * mu_growth_); + } + + // Compute result + result.rms_error = compute_rms_error(Mi, target_mag, constrain, nat); + result.n_converged = count_converged(Mi, target_mag, constrain, sc_thr, nat); + + double max_l = 0.0; + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] != 0) + { + max_l = std::max(max_l, std::abs(lambda[ia][ic])); + } + } + } + result.max_lambda = max_l; + + converged_ = (result.rms_error < sc_thr); + result.status = converged_ ? "converged" : "updating"; + + return result; +} + +// =================================================================== +// Scheme D: Hybrid Delayed Update +// =================================================================== + +HybridDelayedUpdate::HybridDelayedUpdate(double sc_scf_thr, + double mu_init, + double mu_max, + double mu_growth, + int mu_update_interval, + int max_inner_steps, + double lambda_max) + : sc_scf_thr_(sc_scf_thr), drho_(1e30), mu_(mu_init), mu_init_(mu_init), + mu_max_(mu_max), mu_growth_(mu_growth), + mu_update_interval_(mu_update_interval), + max_inner_steps_(max_inner_steps), lambda_max_(lambda_max), + converged_(false), inner_steps_(0), phase_("early") +{ +} + +LambdaUpdateResult HybridDelayedUpdate::update_lambda( + std::vector>& lambda, + const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + double sc_thr, + int iter, + int nat) +{ + LambdaUpdateResult result; + result.n_atoms = nat; + + // Phase decision + if (drho_ > sc_scf_thr_ * 100) + { + // Early phase: skip lambda update + phase_ = "early"; + result.rms_error = compute_rms_error(Mi, target_mag, constrain, nat); + result.n_converged = 0; + result.max_lambda = 0.0; + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] != 0) + { + result.max_lambda = std::max(result.max_lambda, std::abs(lambda[ia][ic])); + } + } + } + converged_ = (result.rms_error < sc_thr); + result.status = "skipped_early"; + return result; + } + else if (drho_ > sc_scf_thr_) + { + // Mid phase: Augmented Lagrangian lightweight update + phase_ = "mid"; + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] == 0) continue; + double violation = Mi[ia][ic] - target_mag[ia][ic]; + lambda[ia][ic] += mu_ * violation; + } + } + cap_lambda(lambda, constrain, lambda_max_, nat); + + if (iter > 0 && iter % mu_update_interval_ == 0) + { + mu_ = std::min(mu_max_, mu_ * mu_growth_); + } + } + else + { + // Late phase: Augmented Lagrangian + inner loop fallback + phase_ = "late"; + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] == 0) continue; + double violation = Mi[ia][ic] - target_mag[ia][ic]; + lambda[ia][ic] += mu_ * violation; + } + } + cap_lambda(lambda, constrain, lambda_max_, nat); + + if (iter > 0 && iter % mu_update_interval_ == 0) + { + mu_ = std::min(mu_max_, mu_ * mu_growth_); + } + + // Check if fallback to inner loop is needed + double rms = compute_rms_error(Mi, target_mag, constrain, nat); + if (rms > sc_thr * 10 && inner_steps_ < max_inner_steps_) + { + result.status = "fallback_triggered"; + inner_steps_++; + } + } + + // Compute result + result.rms_error = compute_rms_error(Mi, target_mag, constrain, nat); + result.n_converged = count_converged(Mi, target_mag, constrain, sc_thr, nat); + + double max_l = 0.0; + for (int ia = 0; ia < nat; ++ia) + { + for (int ic = 0; ic < 3; ++ic) + { + if (constrain[ia][ic] != 0) + { + max_l = std::max(max_l, std::abs(lambda[ia][ic])); + } + } + } + result.max_lambda = max_l; + + converged_ = (result.rms_error < sc_thr); + if (result.status != "fallback_triggered") + { + if (converged_) + { + result.status = "converged"; + } + else + { + result.status = std::string("updating_") + phase_; + } + } + + return result; +} + +} // namespace spinconstrain diff --git a/source/source_lcao/module_deltaspin/lambda_update_strategies.h b/source/source_lcao/module_deltaspin/lambda_update_strategies.h new file mode 100644 index 00000000000..4d9d8d714e4 --- /dev/null +++ b/source/source_lcao/module_deltaspin/lambda_update_strategies.h @@ -0,0 +1,195 @@ +#ifndef LAMBDA_UPDATE_STRATEGIES_H +#define LAMBDA_UPDATE_STRATEGIES_H + +#include +#include +#include +#include +#include + +#include "source_base/vector3.h" + +namespace spinconstrain +{ + +/** + * @brief Result struct for lambda update operations + */ +struct LambdaUpdateResult +{ + int n_atoms; + double rms_error; ///< RMS of |M - M_target| after update + double max_lambda; ///< max |lambda| across all atoms/components + int n_converged; ///< number of (atom, component) pairs converged + std::string status; ///< "converged", "updating", "fallback_triggered" +}; + +/** + * @brief Pure abstract base class for lambda update strategies + */ +class LambdaUpdateStrategy +{ + public: + virtual ~LambdaUpdateStrategy() = default; + + virtual LambdaUpdateResult update_lambda(std::vector>& lambda, + const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + double sc_thr, + int iter, + int nat) = 0; + + virtual std::string name() const = 0; + virtual bool is_converged() const = 0; +}; + +/** + * @brief Compute RMS error of |M - M_target| (respecting constrain flags) + */ +double compute_rms_error(const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + int nat); + +/** + * @brief Count converged components + */ +int count_converged(const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + double sc_thr, + int nat); + +/** + * @brief Apply absolute cap to lambda values + */ +void cap_lambda(std::vector>& lambda, + const std::vector>& constrain, + double lambda_max, + int nat); + +// =================================================================== +// Scheme B: Linear Response (One-Step) Update +// =================================================================== + +class LinearResponseUpdate : public LambdaUpdateStrategy +{ + public: + LinearResponseUpdate(double chi_min = 0.01, + double chi_max = 100.0, + double mix_beta = 0.3, + double lambda_max = 10.0); + + LambdaUpdateResult update_lambda(std::vector>& lambda, + const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + double sc_thr, + int iter, + int nat) override; + + std::string name() const override { return "LinearResponse"; } + bool is_converged() const override { return converged_; } + + const std::vector>& get_chi() const { return chi_; } + + private: + double chi_min_; + double chi_max_; + double mix_beta_; + double lambda_max_; + bool converged_; + double last_rms_; + std::vector> chi_; + std::vector>> Mi_history_; + std::vector>> lambda_history_; +}; + +// =================================================================== +// Scheme C: Augmented Lagrangian Update +// =================================================================== + +class AugmentedLagrangianUpdate : public LambdaUpdateStrategy +{ + public: + AugmentedLagrangianUpdate(double mu_init = 0.1, + double mu_max = 10.0, + double mu_growth = 1.5, + int mu_update_interval = 5, + double lambda_max = 10.0); + + LambdaUpdateResult update_lambda(std::vector>& lambda, + const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + double sc_thr, + int iter, + int nat) override; + + std::string name() const override { return "AugmentedLagrangian"; } + bool is_converged() const override { return converged_; } + + double get_mu() const { return mu_; } + void reset_mu() { mu_ = mu_init_; } + + private: + double mu_; + double mu_init_; + double mu_max_; + double mu_growth_; + int mu_update_interval_; + double lambda_max_; + bool converged_; + int last_iter_; +}; + +// =================================================================== +// Scheme D: Hybrid Delayed Update +// =================================================================== + +class HybridDelayedUpdate : public LambdaUpdateStrategy +{ + public: + HybridDelayedUpdate(double sc_scf_thr = 1e-3, + double mu_init = 0.1, + double mu_max = 10.0, + double mu_growth = 1.5, + int mu_update_interval = 5, + int max_inner_steps = 10, + double lambda_max = 10.0); + + void set_drho(double drho) { drho_ = drho; } + + LambdaUpdateResult update_lambda(std::vector>& lambda, + const std::vector>& Mi, + const std::vector>& target_mag, + const std::vector>& constrain, + double sc_thr, + int iter, + int nat) override; + + std::string name() const override { return "HybridDelayed"; } + bool is_converged() const override { return converged_; } + + std::string get_phase() const { return phase_; } + void reset() { mu_ = mu_init_; inner_steps_ = 0; phase_ = "early"; } + + private: + double sc_scf_thr_; + double drho_; + double mu_; + double mu_init_; + double mu_max_; + double mu_growth_; + int mu_update_interval_; + int max_inner_steps_; + double lambda_max_; + bool converged_; + int inner_steps_; + std::string phase_; +}; + +} // namespace spinconstrain + +#endif // LAMBDA_UPDATE_STRATEGIES_H diff --git a/source/source_lcao/module_deltaspin/sc_parse_json.cpp b/source/source_lcao/module_deltaspin/sc_parse_json.cpp new file mode 100644 index 00000000000..37f23fa3973 --- /dev/null +++ b/source/source_lcao/module_deltaspin/sc_parse_json.cpp @@ -0,0 +1,4 @@ +#include "spin_constrain.h" + +template class spinconstrain::SpinConstrain>; +template class spinconstrain::SpinConstrain; diff --git a/source/source_lcao/module_deltaspin/spin_constrain.cpp b/source/source_lcao/module_deltaspin/spin_constrain.cpp index 6b898f34f6e..233ffd5e64c 100644 --- a/source/source_lcao/module_deltaspin/spin_constrain.cpp +++ b/source/source_lcao/module_deltaspin/spin_constrain.cpp @@ -72,6 +72,113 @@ int SpinConstrain::get_nspin() const return this->nspin_; } +template +void SpinConstrain::set_npol(int npol) +{ + this->npol_ = npol; +} + +template +int SpinConstrain::get_npol() const +{ + return this->npol_; +} + +template +int SpinConstrain::get_spin_sign(int ik) const +{ + if (this->npol_ == 2) return 1; + // npol == 1 (nspin == 2): isk[ik]==0 => spin-up (+1), isk[ik]==1 => spin-down (-1) + return (this->pelec->klist->isk[ik] == 0) ? 1 : -1; +} + +template +void SpinConstrain::accumulate_Mi_from_becp(const std::complex* becp, + int nkb, + int nbands, + int npol, + int ik, + const double* wg_ik, + const int* nh_iat) +{ + if (npol == 2) + { + for (int ib = 0; ib < nbands; ib++) + { + const double weight = wg_ik[ib]; + int begin_ih = 0; + for (int iat = 0; iat < static_cast(this->Mi_.size()); iat++) + { + std::complex occ[4] = {ModuleBase::ZERO, ModuleBase::ZERO, ModuleBase::ZERO, ModuleBase::ZERO}; + const int nh = nh_iat[iat]; + for (int ih = 0; ih < nh; ih++) + { + const int index = ib * 2 * nkb + begin_ih + ih; + occ[0] += conj(becp[index]) * becp[index]; + occ[1] += conj(becp[index]) * becp[index + nkb]; + occ[2] += conj(becp[index + nkb]) * becp[index]; + occ[3] += conj(becp[index + nkb]) * becp[index + nkb]; + } + this->Mi_[iat] += pauli_to_moment(occ, weight); + begin_ih += nh; + } + } + } + else // npol == 1 + { + const int sign = this->get_spin_sign(ik); + for (int ib = 0; ib < nbands; ib++) + { + const double weight = wg_ik[ib]; + int begin_ih = 0; + for (int iat = 0; iat < static_cast(this->Mi_.size()); iat++) + { + double occ = 0.0; + const int nh = nh_iat[iat]; + for (int ih = 0; ih < nh; ih++) + { + const int index = ib * nkb + begin_ih + ih; + occ += (conj(becp[index]) * becp[index]).real(); + } + this->Mi_[iat].z += weight * occ * sign; + begin_ih += nh; + } + } + } +} + +template +int SpinConstrain::get_nw() const +{ + int nw = 0; + for (const auto& pair : this->orbitalCounts) + { + nw += pair.second; + } + return nw; +} + +template +int SpinConstrain::get_iwt(int itype, int iat, int orbital_index) const +{ + auto it1 = this->orbitalCounts.find(itype); + if (it1 == this->orbitalCounts.end()) + { + return 0; + } + int offset = 0; + for (auto it = this->orbitalCounts.begin(); it != it1; ++it) + { + offset += it->second; + } + auto it2 = this->atomCounts.find(itype); + if (it2 == this->atomCounts.end()) + { + return offset; + } + return offset + iat * it1->second + orbital_index; +} + template int SpinConstrain::get_nat() { diff --git a/source/source_lcao/module_deltaspin/spin_constrain.h b/source/source_lcao/module_deltaspin/spin_constrain.h index 224af123fe4..c7a21ba3021 100644 --- a/source/source_lcao/module_deltaspin/spin_constrain.h +++ b/source/source_lcao/module_deltaspin/spin_constrain.h @@ -1,10 +1,13 @@ #ifndef SPIN_CONSTRAIN_H #define SPIN_CONSTRAIN_H +#include #include #include #include "source_base/constants.h" +#include "source_base/complexmatrix.h" +#include "source_base/matrix.h" #include "source_base/tool_quit.h" #include "source_base/tool_title.h" #include "source_base/vector3.h" @@ -13,6 +16,7 @@ #include "source_cell/unitcell.h" #include "source_hamilt/operator.h" #include "source_estate/elecstate.h" +#include "source_lcao/module_deltaspin/lambda_update_strategies.h" #ifdef __LCAO #include "source_estate/module_dm/density_matrix.h" // mohan add 2025-11-02 @@ -21,8 +25,34 @@ namespace spinconstrain { +/** + * @brief Extract magnetic moment from nspin=4 occupation matrix elements. + * + * Given occ[4] = {|a|^2, a* b, b* a, |b|^2} (spinor density matrix), + * the magnetic moment components are: + * Mz = occ[0] - occ[3] (sigma_z) + * Mx = occ[1] + occ[2] (sigma_x) + * My = Im(occ[1] - occ[2]) (sigma_y) + */ +inline ModuleBase::Vector3 pauli_to_moment(const std::complex occ[4], double weight) +{ + return ModuleBase::Vector3( + weight * (occ[1] + occ[2]).real(), + weight * (occ[1] - occ[2]).imag(), + weight * (occ[0] - occ[3]).real() + ); +} + struct ScAtomData; +enum class LambdaStrategyType +{ + BFGS, + LinearResponse, + AugmentedLagrangian, + HybridDelayed +}; + template class SpinConstrain { @@ -38,6 +68,7 @@ class SpinConstrain double sccut_in, double sc_drop_thr_in, const UnitCell& ucell, + bool direction_only_in, Parallel_Orbitals* ParaV_in, int nspin_in, const K_Vectors& kv_in, @@ -68,17 +99,53 @@ class SpinConstrain double get_escon() const; - void run_lambda_loop(int outer_step, + void run_lambda_loop(int outer_step, bool rerun = true); + /// @brief optimized lambda loop for LCAO nspin=2: subspace diag + analytical Jacobian + void run_lambda_loop_lcao(int outer_step); + /// @brief update the charge density for LCAO base with new lambda /// update the charge density and psi for PW base with new lambda void update_psi_charge(const ModuleBase::Vector3* delta_lambda, bool pw_solve = true); - void calculate_delta_hcc(std::complex* h_tmp, - const std::complex* becp_k, - const ModuleBase::Vector3* delta_lambda, - const int nbands, const int nkb, const int* nh_iat); + /** + * @brief PW基组的波函数和电荷更新实现 + * @details 包含两个阶段: + * 1. 子空间对角化:对每个k点应用DeltaSpin修正并求解 + * 2. 电荷更新:根据pw_solve参数选择全空间对角化或直接更新电荷 + */ + void update_psi_charge_pw(const ModuleBase::Vector3* delta_lambda, bool pw_solve); + + /// CPU版本的PW基组更新实现 + void update_psi_charge_pw_cpu(const ModuleBase::Vector3* delta_lambda, bool pw_solve); + +#if ((defined __CUDA) || (defined __ROCM)) + /// GPU版本的PW基组更新实现 + void update_psi_charge_pw_gpu(const ModuleBase::Vector3* delta_lambda, bool pw_solve); +#endif + + void calculate_delta_hcc(std::complex* h_tmp, + const std::complex* becp_k, + const ModuleBase::Vector3* delta_lambda, + const int nbands, const int nkb, const int* nh_iat, const int ik); + +#ifdef __LCAO + /// @brief calculate Hamiltonian contribution from lambda for LCAO nspin=4 + void cal_h_lambda(std::complex* h_lambda, + const std::complex* Sloc2, + bool column_major, + int isk); + /// @brief convert orbital matrix to nested vector format + std::vector>> convert(const ModuleBase::matrix& orbMulP); + /// @brief calculate magnetic moment from orbital matrix + void calculate_MW(const std::vector>>& AorbMulP); + /// @brief collect magnetic moment from complex matrix + void collect_MW(ModuleBase::matrix& MecMulP, + const ModuleBase::ComplexMatrix& mud, + int nw, + int isk); +#endif /// lambda loop helper functions bool check_rms_stop(int outer_step, int i_step, double rms_error, double duration, double total_duration); @@ -220,10 +287,24 @@ class SpinConstrain void* p_hamilt_in, void* psi_in, elecstate::ElecState* pelec_in); + /// @brief set lambda update strategy type + void set_strategy_type(LambdaStrategyType type); + /// @brief set strategy-specific parameters + void set_strategy_params(double mu_init, double mu_max, + double mu_growth, double mix_beta, + double sc_scf_thr); private: SpinConstrain(){}; // Private constructor - ~SpinConstrain(){}; // Destructor + ~SpinConstrain() + { + delete[] sub_h_save; + delete[] sub_s_save; + delete[] becp_save; + sub_h_save = nullptr; + sub_s_save = nullptr; + becp_save = nullptr; + }; SpinConstrain& operator=(SpinConstrain const&) = delete; // Copy assign SpinConstrain& operator=(SpinConstrain &&) = delete; // Move assign std::map> ScData; @@ -251,6 +332,10 @@ class SpinConstrain bool debug = false; double alpha_trial_; // in unit of Ry/uB^2 = 0.01 eV/uB^2 double restrict_current_; // in unit of Ry/uB = 3 eV/uB + bool direction_only_ = false; ///< only optimize the direction of magnetization + /// lambda update strategy + LambdaStrategyType strategy_type_ = LambdaStrategyType::BFGS; + std::unique_ptr strategy_; public: /// @brief save operator for spin-constrained DFT @@ -260,6 +345,20 @@ class SpinConstrain void set_mag_converged(bool is_Mi_converged_in){this->is_Mi_converged = is_Mi_converged_in;} /// @brief get is_Mi_converged bool mag_converged() const {return this->is_Mi_converged;} + void set_npol(int npol); + int get_npol() const; + int get_nw() const; + int get_iwt(int itype, int iat, int orbital_index) const; + /// get spin sign for k-point ik: +1 for spin-up, -1 for spin-down + int get_spin_sign(int ik) const; + /// accumulate Mi from becp for a single k-point + void accumulate_Mi_from_becp(const std::complex* becp, + int nkb, + int nbands, + int npol, + int ik, + const double* wg_ik, + const int* nh_iat); private: /// operator for spin-constrained DFT, used for calculating current atomic magnetic moment hamilt::Operator* p_operator = nullptr; diff --git a/source/source_lcao/module_deltaspin/template_helpers.cpp b/source/source_lcao/module_deltaspin/template_helpers.cpp index 83e5f17f75e..05a0b61cc32 100644 --- a/source/source_lcao/module_deltaspin/template_helpers.cpp +++ b/source/source_lcao/module_deltaspin/template_helpers.cpp @@ -12,11 +12,16 @@ void spinconstrain::SpinConstrain::cal_mi_lcao(const int& step, bool pri } template <> -void spinconstrain::SpinConstrain::run_lambda_loop(int outer_step, +void spinconstrain::SpinConstrain::run_lambda_loop(int outer_step, bool rerun) { } +template <> +void spinconstrain::SpinConstrain::run_lambda_loop_lcao(int outer_step) +{ +} + template <> bool spinconstrain::SpinConstrain::check_rms_stop(int outer_step, int i_step, diff --git a/source/source_lcao/module_deltaspin/test/CMakeLists.txt b/source/source_lcao/module_deltaspin/test/CMakeLists.txt index 04a21d73d55..d0399784a7c 100644 --- a/source/source_lcao/module_deltaspin/test/CMakeLists.txt +++ b/source/source_lcao/module_deltaspin/test/CMakeLists.txt @@ -22,4 +22,17 @@ AddTest( ../spin_constrain.cpp ../template_helpers.cpp ) -endif() + +AddTest( + TARGET deltaspin_lambda_update_strategies_test + LIBS ${math_libs} base device parameter + SOURCES lambda_update_strategies_test.cpp + ../lambda_update_strategies.cpp +) + +AddTest( + TARGET deltaspin_pw_test + LIBS ${math_libs} base device parameter + SOURCES deltaspin_pw_test.cpp +) +endif() diff --git a/source/source_lcao/module_deltaspin/test/deltaspin_pw_test.cpp b/source/source_lcao/module_deltaspin/test/deltaspin_pw_test.cpp new file mode 100644 index 00000000000..c1226c169fc --- /dev/null +++ b/source/source_lcao/module_deltaspin/test/deltaspin_pw_test.cpp @@ -0,0 +1,566 @@ +#include "gtest/gtest.h" +#include +#include +#include + +#define private public +#include "source_io/module_parameter/parameter.h" +#undef private + +/*********************************************************************** + * Unit tests for DeltaSpin PW support + * + * Strategy: test the core arithmetic of calculate_delta_hcc and + * cal_Mi_pw as pure formulas — no OnsiteProjector or full ABACUS + * framework needed. + ***********************************************************************/ + +class DeltaSpinPwTest : public ::testing::Test +{ + protected: + void SetUp() override {} + void TearDown() override {} +}; + +// ===================================================================== +// calculate_delta_hcc: ps array construction (npol=2, Pauli matrix) +// ===================================================================== + +TEST_F(DeltaSpinPwTest, DeltaHcc_Npol2_SingleAtom) +{ + // npol=2: for each (ib, ip): + // ps[becpind] += coeff0 * becp1 + coeff2 * becp2 + // ps[becpind+nkb] += coeff1 * becp1 + coeff3 * becp2 + // where coeff0 = (lambda_z, 0), coeff1 = (lambda_x, lambda_y), + // coeff2 = (lambda_x, -lambda_y), coeff3 = (-lambda_z, 0) + + const int nat = 1; + const int nproj = 2; // 2 projectors for this atom + const int nbands = 1; + const int nkb = nproj; // total projectors = nproj for single atom + const int npol = 2; + + // delta_lambda for atom 0 + struct { double x, y, z; } delta_lambda = {0.5, 0.3, 0.8}; + + const std::complex coeff0(delta_lambda.z, 0.0); // (0.8, 0) + const std::complex coeff1(delta_lambda.x, delta_lambda.y); // (0.5, 0.3) + const std::complex coeff2(delta_lambda.x, -delta_lambda.y);// (0.5, -0.3) + const std::complex coeff3(-delta_lambda.z, 0.0); // (-0.8, 0) + + // becp: layout [ib * npol * nkb + sum + ip] for up, +nkb for down + std::vector> becp(nbands * npol * nkb, {0.0, 0.0}); + // band 0, projector 0 + becp[0 * npol * nkb + 0] = {1.0, 0.2}; // becp_up[0] + becp[0 * npol * nkb + 0 + nkb] = {0.3, -0.1}; // becp_dn[0] + // band 0, projector 1 + becp[0 * npol * nkb + 1] = {0.5, 0.0}; // becp_up[1] + becp[0 * npol * nkb + 1 + nkb] = {0.0, 0.7}; // becp_dn[1] + + std::vector> ps(nbands * npol * nkb, {0.0, 0.0}); + + int sum = 0; + for(int ib = 0; ib < nbands * npol; ib += npol) + { + for(int ip = 0; ip < nproj; ip++) + { + const int becpind = ib * nkb + sum + ip; + const std::complex becp1 = becp[becpind]; + const std::complex becp2 = becp[becpind + nkb]; + ps[becpind] += coeff0 * becp1 + coeff2 * becp2; + ps[becpind + nkb] += coeff1 * becp1 + coeff3 * becp2; + } + } + + // Verify projector 0: + // ps_up[0] = (0.8,0)*(1.0,0.2) + (0.5,-0.3)*(0.3,-0.1) + // = (0.8, 0.16) + (0.15-0.03, -0.05-0.09) = (0.8,0.16) + (0.12,-0.14) + // = (0.92, 0.02) + EXPECT_NEAR(ps[0].real(), 0.92, 1e-12); + EXPECT_NEAR(ps[0].imag(), 0.02, 1e-12); + + // ps_dn[0] = (0.5,0.3)*(1.0,0.2) + (-0.8,0)*(0.3,-0.1) + // = (0.5-0.06, 0.3+0.1) + (-0.24, 0.08) + // = (0.44, 0.4) + (-0.24, 0.08) = (0.20, 0.48) + EXPECT_NEAR(ps[0 + nkb].real(), 0.20, 1e-12); + EXPECT_NEAR(ps[0 + nkb].imag(), 0.48, 1e-12); +} + +// PLACEHOLDER_DELTASPIN_PW_TESTS + +TEST_F(DeltaSpinPwTest, DeltaHcc_Npol2_MultiAtom) +{ + // Two atoms: verify sum offset advances correctly + const int nat = 2; + const int nproj_0 = 1, nproj_1 = 1; + const int nkb = nproj_0 + nproj_1; // 2 + const int nbands = 1; + const int npol = 2; + + struct Vec3 { double x, y, z; }; + Vec3 delta_lambda[2] = {{1.0, 0.0, 0.0}, {0.0, 0.0, 2.0}}; + + std::vector> becp(nbands * npol * nkb, {0.0, 0.0}); + // atom 0, proj 0: becp_up = (1,0), becp_dn = (0,0) + becp[0] = {1.0, 0.0}; + becp[0 + nkb] = {0.0, 0.0}; + // atom 1, proj 0: becp_up = (0,0), becp_dn = (1,0) + becp[1] = {0.0, 0.0}; + becp[1 + nkb] = {1.0, 0.0}; + + std::vector> ps(nbands * npol * nkb, {0.0, 0.0}); + int nh_iat[2] = {nproj_0, nproj_1}; + + int sum = 0; + for(int iat = 0; iat < nat; iat++) + { + const std::complex c0(delta_lambda[iat].z, 0.0); + const std::complex c1(delta_lambda[iat].x, delta_lambda[iat].y); + const std::complex c2(delta_lambda[iat].x, -delta_lambda[iat].y); + const std::complex c3(-delta_lambda[iat].z, 0.0); + for(int ib = 0; ib < nbands * npol; ib += npol) + { + for(int ip = 0; ip < nh_iat[iat]; ip++) + { + const int becpind = ib * nkb + sum + ip; + const std::complex b1 = becp[becpind]; + const std::complex b2 = becp[becpind + nkb]; + ps[becpind] += c0 * b1 + c2 * b2; + ps[becpind + nkb] += c1 * b1 + c3 * b2; + } + } + sum += nh_iat[iat]; + } + + // atom 0: lambda=(1,0,0), becp_up=(1,0), becp_dn=(0,0) + // ps_up[0] = (0,0)*(1,0) + (1,0)*(0,0) = 0 + // ps_dn[0] = (1,0)*(1,0) + (0,0)*(0,0) = (1,0) + EXPECT_NEAR(ps[0].real(), 0.0, 1e-12); + EXPECT_NEAR(ps[0 + nkb].real(), 1.0, 1e-12); + + // atom 1: lambda=(0,0,2), becp_up=(0,0), becp_dn=(1,0) + // ps_up[1] = (2,0)*(0,0) + (0,0)*(1,0) = 0 + // ps_dn[1] = (0,0)*(0,0) + (-2,0)*(1,0) = (-2,0) + EXPECT_NEAR(ps[1].real(), 0.0, 1e-12); + EXPECT_NEAR(ps[1 + nkb].real(), -2.0, 1e-12); +} + +TEST_F(DeltaSpinPwTest, DeltaHcc_Npol1_SignPositive) +{ + // npol=1: ps[becpind] += sign * lambda_z * becp1 + const int nat = 1; + const int nproj = 2; + const int nkb = nproj; + const int nbands = 1; + const int sign = 1; + const double lambda_z = 0.5; + + std::vector> becp(nbands * nkb, {0.0, 0.0}); + becp[0] = {1.0, 0.3}; + becp[1] = {0.0, -0.5}; + + std::vector> ps(nbands * nkb, {0.0, 0.0}); + double coeff = lambda_z * sign; + int sum = 0; + for(int ib = 0; ib < nbands; ib++) + { + for(int ip = 0; ip < nproj; ip++) + { + const int becpind = ib * nkb + sum + ip; + ps[becpind] += coeff * becp[becpind]; + } + } + + // ps[0] = 0.5 * (1.0, 0.3) = (0.5, 0.15) + EXPECT_NEAR(ps[0].real(), 0.5, 1e-12); + EXPECT_NEAR(ps[0].imag(), 0.15, 1e-12); + // ps[1] = 0.5 * (0, -0.5) = (0, -0.25) + EXPECT_NEAR(ps[1].real(), 0.0, 1e-12); + EXPECT_NEAR(ps[1].imag(), -0.25, 1e-12); +} + +TEST_F(DeltaSpinPwTest, DeltaHcc_Npol1_SignNegative) +{ + const int nkb = 1; + const int nbands = 1; + const int sign = -1; + const double lambda_z = 0.5; + + std::vector> becp(nbands * nkb, {0.0, 0.0}); + becp[0] = {1.0, 0.0}; + + std::vector> ps(nbands * nkb, {0.0, 0.0}); + double coeff = lambda_z * sign; + ps[0] += coeff * becp[0]; + + EXPECT_NEAR(ps[0].real(), -0.5, 1e-12); + EXPECT_NEAR(ps[0].imag(), 0.0, 1e-12); +} + +TEST_F(DeltaSpinPwTest, DeltaHcc_Npol2_ZeroLambda) +{ + // lambda = (0,0,0) => ps should remain zero + const int nkb = 2; + const int nbands = 1; + const int npol = 2; + + std::vector> becp(nbands * npol * nkb, {0.0, 0.0}); + becp[0] = {1.0, 0.5}; + becp[1] = {0.3, -0.2}; + becp[0 + nkb] = {0.7, 0.1}; + becp[1 + nkb] = {-0.4, 0.8}; + + std::vector> ps(nbands * npol * nkb, {0.0, 0.0}); + + const std::complex c0(0.0, 0.0); + const std::complex c1(0.0, 0.0); + const std::complex c2(0.0, 0.0); + const std::complex c3(0.0, 0.0); + + for(int ip = 0; ip < nkb; ip++) + { + ps[ip] += c0 * becp[ip] + c2 * becp[ip + nkb]; + ps[ip + nkb] += c1 * becp[ip] + c3 * becp[ip + nkb]; + } + + for(int i = 0; i < nbands * npol * nkb; i++) + { + EXPECT_NEAR(ps[i].real(), 0.0, 1e-15); + EXPECT_NEAR(ps[i].imag(), 0.0, 1e-15); + } +} + +// ===================================================================== +// cal_Mi_pw: magnetization accumulation from becp +// ===================================================================== + +TEST_F(DeltaSpinPwTest, MiPw_Npol1_SpinUp) +{ + // npol=1, nspin=2: Mi.z += sign * weight * |becp|^2 + // spin-up (sign=+1) + const int nkb = 3; + const int nbands = 2; + const int sign = 1; + const double weights[2] = {1.0, 0.5}; + + std::vector> becp(nbands * nkb, {0.0, 0.0}); + // band 0 + becp[0] = {0.8, 0.0}; + becp[1] = {0.0, 0.6}; + becp[2] = {0.3, 0.4}; + // band 1 + becp[3] = {0.5, 0.0}; + becp[4] = {0.0, 0.0}; + becp[5] = {1.0, 0.0}; + + // Single atom with nh=3 + double Mi_z = 0.0; + for(int ib = 0; ib < nbands; ib++) + { + const double weight = weights[ib]; + double occ = 0.0; + for(int ih = 0; ih < nkb; ih++) + { + const int index = ib * nkb + ih; + occ += (std::conj(becp[index]) * becp[index]).real(); + } + Mi_z += sign * weight * occ; + } + + // band0: |0.8|^2 + |0.6|^2 + |0.3+0.4i|^2 = 0.64 + 0.36 + 0.25 = 1.25, w=1.0 + // band1: |0.5|^2 + 0 + |1.0|^2 = 0.25 + 1.0 = 1.25, w=0.5 + // Mi_z = 1*1.25 + 0.5*1.25 = 1.875 + EXPECT_NEAR(Mi_z, 1.875, 1e-12); +} + +TEST_F(DeltaSpinPwTest, MiPw_Npol1_SpinDown) +{ + // spin-down (sign=-1) + const int nkb = 1; + const int nbands = 1; + const int sign = -1; + const double weight = 2.0; + + std::vector> becp(1, {0.0, 0.0}); + becp[0] = {0.6, 0.8}; // |becp|^2 = 0.36 + 0.64 = 1.0 + + double Mi_z = 0.0; + double occ = (std::conj(becp[0]) * becp[0]).real(); + Mi_z += sign * weight * occ; + + EXPECT_NEAR(Mi_z, -2.0, 1e-12); +} + +TEST_F(DeltaSpinPwTest, MiPw_Npol2_PureZMag) +{ + // npol=2: construct becp so that only z-component is nonzero + // becp_up = (a, 0), becp_dn = (0, 0) + // occ[0] = |a|^2, occ[1]=0, occ[2]=0, occ[3]=0 + // Mi.z = w*(occ0-occ3) = w*|a|^2, Mi.x = 0, Mi.y = 0 + const int nkb = 1; + const int nbands = 1; + const double weight = 1.0; + + std::vector> becp(nbands * 2 * nkb, {0.0, 0.0}); + becp[0] = {0.7, 0.0}; // becp_up + becp[0 + nkb] = {0.0, 0.0}; // becp_dn + + double Mi_x = 0.0, Mi_y = 0.0, Mi_z = 0.0; + std::complex occ[4] = {{0,0},{0,0},{0,0},{0,0}}; + occ[0] = std::conj(becp[0]) * becp[0]; + occ[1] = std::conj(becp[0]) * becp[0 + nkb]; + occ[2] = std::conj(becp[0 + nkb]) * becp[0]; + occ[3] = std::conj(becp[0 + nkb]) * becp[0 + nkb]; + + Mi_z += weight * (occ[0] - occ[3]).real(); + Mi_x += weight * (occ[1] + occ[2]).real(); + Mi_y += weight * (occ[1] - occ[2]).imag(); + + EXPECT_NEAR(Mi_z, 0.49, 1e-12); + EXPECT_NEAR(Mi_x, 0.0, 1e-15); + EXPECT_NEAR(Mi_y, 0.0, 1e-15); +} + +TEST_F(DeltaSpinPwTest, MiPw_Npol2_PureXMag) +{ + // Construct becp so that only x-component is nonzero + // becp_up = (a, 0), becp_dn = (a, 0) with same magnitude + // occ[0] = |a|^2, occ[1] = |a|^2, occ[2] = |a|^2, occ[3] = |a|^2 + // Mi.z = w*(occ0-occ3) = 0 + // Mi.x = w*(occ1+occ2).real = w*2*|a|^2 + // Mi.y = w*(occ1-occ2).imag = 0 + const int nkb = 1; + const int nbands = 1; + const double weight = 1.0; + const double a = 0.5; + + std::vector> becp(nbands * 2 * nkb, {0.0, 0.0}); + becp[0] = {a, 0.0}; + becp[0 + nkb] = {a, 0.0}; + + std::complex occ[4]; + occ[0] = std::conj(becp[0]) * becp[0]; + occ[1] = std::conj(becp[0]) * becp[0 + nkb]; + occ[2] = std::conj(becp[0 + nkb]) * becp[0]; + occ[3] = std::conj(becp[0 + nkb]) * becp[0 + nkb]; + + double Mi_z = weight * (occ[0] - occ[3]).real(); + double Mi_x = weight * (occ[1] + occ[2]).real(); + double Mi_y = weight * (occ[1] - occ[2]).imag(); + + EXPECT_NEAR(Mi_z, 0.0, 1e-15); + EXPECT_NEAR(Mi_x, 0.5, 1e-12); // 2*0.25 + EXPECT_NEAR(Mi_y, 0.0, 1e-15); +} + +TEST_F(DeltaSpinPwTest, MiPw_Npol2_MixedMag) +{ + // General becp: verify all three components + const int nkb = 1; + const int nbands = 1; + const double weight = 1.0; + + std::vector> becp(nbands * 2 * nkb, {0.0, 0.0}); + becp[0] = {0.8, 0.0}; // becp_up + becp[0 + nkb] = {0.0, 0.6}; // becp_dn + + std::complex occ[4]; + occ[0] = std::conj(becp[0]) * becp[0]; // 0.64 + occ[1] = std::conj(becp[0]) * becp[0 + nkb]; // 0.8*(0,0.6) = (0, 0.48) + occ[2] = std::conj(becp[0 + nkb]) * becp[0]; // (0,-0.6)*0.8 = (0, -0.48) + occ[3] = std::conj(becp[0 + nkb]) * becp[0 + nkb]; // 0.36 + + double Mi_z = weight * (occ[0] - occ[3]).real(); + double Mi_x = weight * (occ[1] + occ[2]).real(); + double Mi_y = weight * (occ[1] - occ[2]).imag(); + + EXPECT_NEAR(Mi_z, 0.28, 1e-12); // 0.64 - 0.36 + EXPECT_NEAR(Mi_x, 0.0, 1e-15); // (0,0.48)+(0,-0.48) = 0 + EXPECT_NEAR(Mi_y, 0.96, 1e-12); // imag((0,0.48)-(0,-0.48)) = imag(0,0.96) = 0.96 +} + +TEST_F(DeltaSpinPwTest, MiPw_MultiAtom_BeginIhOffset) +{ + // Two atoms with different nh, verify begin_ih offset + const int nat = 2; + const int nh_0 = 2, nh_1 = 1; + const int nkb = nh_0 + nh_1; // 3 + const int nbands = 1; + const double weight = 1.0; + const int sign = 1; + + std::vector> becp(nbands * nkb, {0.0, 0.0}); + // atom 0: ih=0,1 + becp[0] = {1.0, 0.0}; // |becp|^2 = 1.0 + becp[1] = {0.0, 1.0}; // |becp|^2 = 1.0 + // atom 1: ih=2 + becp[2] = {0.5, 0.5}; // |becp|^2 = 0.5 + + int nh_iat[2] = {nh_0, nh_1}; + double Mi_z[2] = {0.0, 0.0}; + + for(int ib = 0; ib < nbands; ib++) + { + int begin_ih = 0; + for(int iat = 0; iat < nat; iat++) + { + double occ = 0.0; + for(int ih = 0; ih < nh_iat[iat]; ih++) + { + const int index = ib * nkb + begin_ih + ih; + occ += (std::conj(becp[index]) * becp[index]).real(); + } + Mi_z[iat] += sign * weight * occ; + begin_ih += nh_iat[iat]; + } + } + + EXPECT_NEAR(Mi_z[0], 2.0, 1e-12); // 1.0 + 1.0 + EXPECT_NEAR(Mi_z[1], 0.5, 1e-12); // 0.5 +} + +// ===================================================================== +// cal_mw_from_lambda: magnetization re-accumulation from becp_tmp +// ===================================================================== + +TEST_F(DeltaSpinPwTest, MwFromLambda_Npol2_Accumulation) +{ + // Same formula as cal_Mi_pw npol=2, but from becp_tmp + const int nkb = 1; + const int nbands = 1; + const int npol = 2; + const int nk = 2; + const double weights[2] = {1.0, 0.5}; + + const int size_becp = nbands * nkb * npol; + std::vector> becp_tmp(size_becp * nk, {0.0, 0.0}); + // k=0 + becp_tmp[0] = {0.8, 0.0}; // becp_up + becp_tmp[0 + nkb] = {0.0, 0.6}; // becp_dn + // k=1 + becp_tmp[size_becp + 0] = {0.6, 0.0}; + becp_tmp[size_becp + 0 + nkb] = {0.0, 0.8}; + + double Mi_x = 0.0, Mi_y = 0.0, Mi_z = 0.0; + int nh_iat[1] = {1}; + + for(int ik = 0; ik < nk; ik++) + { + const std::complex* becp = &becp_tmp[ik * size_becp]; + for(int ib = 0; ib < nbands; ib++) + { + const double weight = weights[ik]; + int begin_ih = 0; + for(int iat = 0; iat < 1; iat++) + { + std::complex occ[4] = {{0,0},{0,0},{0,0},{0,0}}; + for(int ih = 0; ih < nh_iat[iat]; ih++) + { + const int index = ib * npol * nkb + begin_ih + ih; + occ[0] += std::conj(becp[index]) * becp[index]; + occ[1] += std::conj(becp[index]) * becp[index + nkb]; + occ[2] += std::conj(becp[index + nkb]) * becp[index]; + occ[3] += std::conj(becp[index + nkb]) * becp[index + nkb]; + } + Mi_x += weight * (occ[1] + occ[2]).real(); + Mi_y += weight * (occ[1] - occ[2]).imag(); + Mi_z += weight * (occ[0] - occ[3]).real(); + begin_ih += nh_iat[iat]; + } + } + } + + // k=0, w=1.0: occ0=0.64, occ3=0.36 => dz=0.28, occ1=(0,0.48), occ2=(0,-0.48) => dx=0, dy=0.96 + // k=1, w=0.5: occ0=0.36, occ3=0.64 => dz=-0.28*0.5=-0.14, occ1=(0,0.48), occ2=(0,-0.48) => dy=0.96*0.5=0.48 + EXPECT_NEAR(Mi_z, 0.14, 1e-12); // 0.28 - 0.14 + EXPECT_NEAR(Mi_x, 0.0, 1e-15); + EXPECT_NEAR(Mi_y, 1.44, 1e-12); // 0.96 + 0.48 +} + +TEST_F(DeltaSpinPwTest, MwFromLambda_Npol1_SignHandling) +{ + // npol=1: isk[ik]=0 => sign=+1, isk[ik]=1 => sign=-1 + const int nkb = 1; + const int nbands = 1; + const int nk = 2; + const double weight = 1.0; + const int isk[2] = {0, 1}; // first k spin-up, second k spin-down + + std::vector> becp_tmp(nbands * nkb * nk, {0.0, 0.0}); + becp_tmp[0] = {0.5, 0.0}; // k=0: |becp|^2 = 0.25 + becp_tmp[1] = {0.5, 0.0}; // k=1: |becp|^2 = 0.25 + + double Mi_z = 0.0; + for(int ik = 0; ik < nk; ik++) + { + const int sign = (isk[ik] == 0) ? 1 : -1; + const std::complex* becp = &becp_tmp[ik * nbands * nkb]; + for(int ib = 0; ib < nbands; ib++) + { + double occ = 0.0; + for(int ih = 0; ih < nkb; ih++) + { + const int index = ib * nkb + ih; + occ += (std::conj(becp[index]) * becp[index]).real(); + } + Mi_z += weight * occ * sign; + } + } + + // k=0: +1 * 1.0 * 0.25 = 0.25 + // k=1: -1 * 1.0 * 0.25 = -0.25 + EXPECT_NEAR(Mi_z, 0.0, 1e-15); +} + +// ===================================================================== +// DeltaHcc gemm contribution: h_tmp += becp^H * ps +// ===================================================================== + +TEST_F(DeltaSpinPwTest, DeltaHcc_GemmContribution) +{ + // Verify h_tmp += becp^H * ps for a small 2x2 case + // becp: (npm x nbands), ps: (npm x nbands) + // h_tmp += becp^H * ps = (nbands x npm) * (npm x nbands) + const int nbands = 2; + const int npm = 2; // nkb * npol + + // becp^H means conjugate transpose + std::vector> becp = { + {1.0, 0.0}, {0.0, 1.0}, // column 0: becp[0,0], becp[1,0] + {0.5, 0.0}, {0.0, -0.5} // column 1: becp[0,1], becp[1,1] + }; + std::vector> ps = { + {0.5, 0.0}, {0.0, 0.5}, + {0.3, 0.0}, {0.0, -0.3} + }; + + // Manual: h_tmp[i,j] += sum_k conj(becp[k,i]) * ps[k,j] + // becp stored as becp[k*nbands + i], ps stored as ps[k*nbands + j] + std::vector> h_tmp(nbands * nbands, {0.0, 0.0}); + for(int i = 0; i < nbands; i++) + { + for(int j = 0; j < nbands; j++) + { + for(int k = 0; k < npm; k++) + { + h_tmp[i * nbands + j] += std::conj(becp[k * nbands + i]) * ps[k * nbands + j]; + } + } + } + + // h[0,0] = conj(1)*0.5 + conj(0,1)*(0,0.5) = 0.5 + (0,-1)*(0,0.5) = 0.5 + 0.5 = 1.0 + EXPECT_NEAR(h_tmp[0].real(), 1.0, 1e-12); + EXPECT_NEAR(h_tmp[0].imag(), 0.0, 1e-12); + + // h[0,1] = conj(1)*0.3 + conj(0,1)*(0,-0.3) = 0.3 + (0,-1)*(0,-0.3) = 0.3 + (-0.3) = 0 + EXPECT_NEAR(h_tmp[1].real(), 0.0, 1e-12); + EXPECT_NEAR(h_tmp[1].imag(), 0.0, 1e-12); + + // h[1,0] = conj(0.5)*0.5 + conj(0,-0.5)*(0,0.5) = 0.25 + (0,0.5)*(0,0.5) = 0.25 + (-0.25) = 0 + EXPECT_NEAR(h_tmp[2].real(), 0.0, 1e-12); + EXPECT_NEAR(h_tmp[2].imag(), 0.0, 1e-12); + + // h[1,1] = conj(0.5)*0.3 + conj(0,-0.5)*(0,-0.3) = 0.15 + (0,0.5)*(0,-0.3) = 0.15 + 0.15 = 0.3 + EXPECT_NEAR(h_tmp[3].real(), 0.3, 1e-12); + EXPECT_NEAR(h_tmp[3].imag(), 0.0, 1e-12); +} diff --git a/source/source_lcao/module_deltaspin/test/lambda_update_strategies_test.cpp b/source/source_lcao/module_deltaspin/test/lambda_update_strategies_test.cpp new file mode 100644 index 00000000000..b196bfe030c --- /dev/null +++ b/source/source_lcao/module_deltaspin/test/lambda_update_strategies_test.cpp @@ -0,0 +1,479 @@ +#include "../lambda_update_strategies.h" +#include "gtest/gtest.h" +#include "gmock/gmock.h" +#include +#include +#include + +/************************************************ + * Unit tests for lambda update strategies + * + * - Tested Strategies: + * - LinearResponseUpdate (Scheme B) + * - AugmentedLagrangianUpdate (Scheme C) + * - HybridDelayedUpdate (Scheme D) + * + * - Tested Helpers: + * - compute_rms_error() + * - count_converged() + * - cap_lambda() + ************************************************/ + +namespace +{ + +using ModuleBase::Vector3; + +// =================================================================== +// Helper function tests +// =================================================================== + +class LambdaUpdateHelpersTest : public ::testing::Test +{ + protected: + int nat; + std::vector> Mi; + std::vector> target_mag; + std::vector> constrain; + + void SetUp() override + { + nat = 3; + Mi.push_back(Vector3(1.0, 0.5, 0.3)); + Mi.push_back(Vector3(-0.8, 0.2, 0.1)); + Mi.push_back(Vector3(0.5, 0.5, 0.5)); + + target_mag.push_back(Vector3(2.0, 0.0, 0.0)); + target_mag.push_back(Vector3(-1.0, 0.0, 0.0)); + target_mag.push_back(Vector3(0.5, 0.5, 0.5)); + + constrain.push_back(Vector3(1, 1, 0)); + constrain.push_back(Vector3(1, 0, 0)); + constrain.push_back(Vector3(1, 1, 1)); + } +}; + +TEST_F(LambdaUpdateHelpersTest, ComputeRmsError) +{ + double rms = spinconstrain::compute_rms_error(Mi, target_mag, constrain, nat); + // Constrained: atom0(x,y), atom1(x), atom2(x,y,z) = 6 components + double expected_sum = 1.0*1.0 + 0.5*0.5 + 0.2*0.2 + 0.0 + 0.0 + 0.0; + double expected_rms = std::sqrt(expected_sum / 6.0); + EXPECT_NEAR(rms, expected_rms, 1e-10); +} + +TEST_F(LambdaUpdateHelpersTest, ComputeRmsErrorAlreadyConverged) +{ + Mi[0] = target_mag[0]; + Mi[1] = target_mag[1]; + Mi[2] = target_mag[2]; + double rms = spinconstrain::compute_rms_error(Mi, target_mag, constrain, nat); + EXPECT_NEAR(rms, 0.0, 1e-15); +} + +TEST_F(LambdaUpdateHelpersTest, ComputeRmsErrorNoConstraints) +{ + std::vector> no_constrain(nat, Vector3(0, 0, 0)); + double rms = spinconstrain::compute_rms_error(Mi, target_mag, no_constrain, nat); + EXPECT_NEAR(rms, 0.0, 1e-15); +} + +TEST_F(LambdaUpdateHelpersTest, CountConverged) +{ + int n = spinconstrain::count_converged(Mi, target_mag, constrain, 0.3, nat); + EXPECT_EQ(n, 4); // 1 from atom1 + 3 from atom2 +} + +TEST_F(LambdaUpdateHelpersTest, CountConvergedAll) +{ + Mi[0] = target_mag[0]; + Mi[1] = target_mag[1]; + Mi[2] = target_mag[2]; + int n = spinconstrain::count_converged(Mi, target_mag, constrain, 1e-6, nat); + EXPECT_EQ(n, 6); +} + +TEST_F(LambdaUpdateHelpersTest, CapLambda) +{ + std::vector> lam(nat); + lam[0] = Vector3(15.0, -20.0, 5.0); + lam[1] = Vector3(0.0, 8.0, -12.0); + lam[2] = Vector3(3.0, 3.0, 3.0); + + std::vector> con(nat); + con[0] = Vector3(1, 1, 1); + con[1] = Vector3(0, 1, 0); + con[2] = Vector3(1, 1, 1); + + spinconstrain::cap_lambda(lam, con, 10.0, nat); + + EXPECT_NEAR(lam[0][0], 10.0, 1e-10); + EXPECT_NEAR(lam[0][1], -10.0, 1e-10); + EXPECT_NEAR(lam[0][2], 5.0, 1e-10); + EXPECT_NEAR(lam[1][0], 0.0, 1e-10); + EXPECT_NEAR(lam[1][1], 8.0, 1e-10); + EXPECT_NEAR(lam[1][2], -12.0, 1e-10); + EXPECT_NEAR(lam[2][0], 3.0, 1e-10); + EXPECT_NEAR(lam[2][1], 3.0, 1e-10); + EXPECT_NEAR(lam[2][2], 3.0, 1e-10); +} + +// =================================================================== +// Scheme B: Linear Response Update tests +// =================================================================== + +class LinearResponseTest : public ::testing::Test +{ + protected: + int nat; + std::vector> lambda; + std::vector> Mi; + std::vector> target_mag; + std::vector> constrain; + + void SetUp() override + { + nat = 2; + lambda.push_back(Vector3(0.0, 0.0, 0.0)); + lambda.push_back(Vector3(0.0, 0.0, 0.0)); + Mi.push_back(Vector3(1.0, 0.0, 0.0)); + Mi.push_back(Vector3(-0.5, 0.0, 0.0)); + target_mag.push_back(Vector3(2.0, 0.0, 0.0)); + target_mag.push_back(Vector3(-1.0, 0.0, 0.0)); + constrain.push_back(Vector3(1, 1, 1)); + constrain.push_back(Vector3(1, 1, 1)); + } +}; + +TEST_F(LinearResponseTest, FirstUpdateNoHistory) +{ + spinconstrain::LinearResponseUpdate updater(0.01, 100.0, 0.3, 10.0); + EXPECT_EQ(updater.name(), "LinearResponse"); + EXPECT_FALSE(updater.is_converged()); + + auto result = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, 0, nat); + + EXPECT_NEAR(lambda[0][0], 0.3, 1e-10); + EXPECT_NEAR(lambda[0][1], 0.0, 1e-10); + EXPECT_LT(result.max_lambda, 1.0); + EXPECT_EQ(result.status, "updating"); +} + +TEST_F(LinearResponseTest, ConvergesAfterMultipleSteps) +{ + spinconstrain::LinearResponseUpdate updater(0.01, 100.0, 0.5, 10.0); + double chi = 1.0; + Vector3 Mi_init_0 = Mi[0]; + Vector3 Mi_init_1 = Mi[1]; + + int max_iter = 50; + int converged_iter = -1; + for (int iter = 0; iter < max_iter; ++iter) + { + auto result = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-5, iter, nat); + Mi[0] = Vector3(Mi_init_0.x + chi * lambda[0][0], + Mi_init_0.y + chi * lambda[0][1], + Mi_init_0.z + chi * lambda[0][2]); + Mi[1] = Vector3(Mi_init_1.x + chi * lambda[1][0], + Mi_init_1.y + chi * lambda[1][1], + Mi_init_1.z + chi * lambda[1][2]); + if (updater.is_converged()) + { + EXPECT_LT(result.rms_error, 1e-5); + converged_iter = iter; + break; + } + } + EXPECT_GE(converged_iter, 0) << "Linear response did not converge within " << max_iter; + + double expected_l0 = (target_mag[0][0] - Mi_init_0.x) / chi; + double expected_l1 = (target_mag[1][0] - Mi_init_1.x) / chi; + EXPECT_NEAR(lambda[0][0], expected_l0, 0.1); + EXPECT_NEAR(lambda[1][0], expected_l1, 0.1); +} + +TEST_F(LinearResponseTest, RespectsConstrainFlags) +{ + std::vector> partial_constrain(nat); + partial_constrain[0] = Vector3(1, 0, 0); + partial_constrain[1] = Vector3(0, 0, 0); + + spinconstrain::LinearResponseUpdate updater(0.01, 100.0, 0.3, 10.0); + updater.update_lambda(lambda, Mi, target_mag, partial_constrain, 1e-6, 0, nat); + + EXPECT_NEAR(lambda[0][0], 0.3, 1e-10); + EXPECT_NEAR(lambda[0][1], 0.0, 1e-10); + EXPECT_NEAR(lambda[1][0], 0.0, 1e-10); +} + +TEST_F(LinearResponseTest, CapsLambda) +{ + target_mag[0] = Vector3(100.0, 0.0, 0.0); + spinconstrain::LinearResponseUpdate updater(0.01, 100.0, 1.0, 5.0); + updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, 0, nat); + EXPECT_LE(std::abs(lambda[0][0]), 5.0 + 1e-10); +} + +TEST_F(LinearResponseTest, ChiEstimation) +{ + spinconstrain::LinearResponseUpdate updater(0.01, 100.0, 0.5, 10.0); + double chi_true = 2.0; + Vector3 Mi_init = Mi[0]; + + for (int iter = 0; iter < 5; ++iter) + { + updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, iter, nat); + Mi[0] = Vector3(Mi_init.x + chi_true * lambda[0][0], 0.0, 0.0); + Mi[1] = Vector3(-0.5, 0.0, 0.0); + } + + const auto& chi = updater.get_chi(); + EXPECT_GT(chi[0][0], 0.5); + EXPECT_LT(chi[0][0], 50.0); +} + +// =================================================================== +// Scheme C: Augmented Lagrangian Update tests +// =================================================================== + +class AugmentedLagrangianTest : public ::testing::Test +{ + protected: + int nat; + std::vector> lambda; + std::vector> Mi; + std::vector> target_mag; + std::vector> constrain; + + void SetUp() override + { + nat = 2; + lambda.push_back(Vector3(0.0, 0.0, 0.0)); + lambda.push_back(Vector3(0.0, 0.0, 0.0)); + Mi.push_back(Vector3(1.0, 0.0, 0.0)); + Mi.push_back(Vector3(-0.5, 0.0, 0.0)); + target_mag.push_back(Vector3(2.0, 0.0, 0.0)); + target_mag.push_back(Vector3(-1.0, 0.0, 0.0)); + constrain.push_back(Vector3(1, 0, 0)); + constrain.push_back(Vector3(1, 0, 0)); + } +}; + +TEST_F(AugmentedLagrangianTest, FirstUpdate) +{ + spinconstrain::AugmentedLagrangianUpdate updater(0.1, 10.0, 1.5, 5, 10.0); + EXPECT_EQ(updater.name(), "AugmentedLagrangian"); + + auto result = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, 0, nat); + + EXPECT_NEAR(lambda[0][0], -0.1, 1e-10); + EXPECT_NEAR(lambda[0][1], 0.0, 1e-10); + EXPECT_NEAR(lambda[1][0], 0.05, 1e-10); + EXPECT_NEAR(updater.get_mu(), 0.1, 1e-10); + EXPECT_FALSE(updater.is_converged()); +} + +TEST_F(AugmentedLagrangianTest, MuGrowth) +{ + spinconstrain::AugmentedLagrangianUpdate updater(0.1, 10.0, 2.0, 3, 10.0); + for (int iter = 0; iter < 10; ++iter) + { + updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, iter, nat); + } + EXPECT_NEAR(updater.get_mu(), 0.8, 1e-10); +} + +TEST_F(AugmentedLagrangianTest, MuCappedAtMax) +{ + spinconstrain::AugmentedLagrangianUpdate updater(0.1, 1.0, 2.0, 1, 10.0); + for (int iter = 0; iter < 10; ++iter) + { + updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, iter, nat); + } + EXPECT_NEAR(updater.get_mu(), 1.0, 1e-10); +} + +TEST_F(AugmentedLagrangianTest, ConvergesWithInvertedResponse) +{ + // Inverted response model: Mi = M_target - chi * lambda + // Increasing lambda REDUCES the error — models constraint physics correctly + spinconstrain::AugmentedLagrangianUpdate updater(0.1, 10.0, 1.5, 5, 10.0); + double chi = 1.0; + + int max_iter = 100; + int converged_iter = -1; + for (int iter = 0; iter < max_iter; ++iter) + { + auto result = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-3, iter, nat); + + // Inverted response: Mi approaches M_target as lambda → 0 + Mi[0] = Vector3(target_mag[0][0] - chi * lambda[0][0], 0.0, 0.0); + Mi[1] = Vector3(target_mag[1][0] - chi * lambda[1][0], 0.0, 0.0); + + if (updater.is_converged()) + { + EXPECT_LT(result.rms_error, 1e-3); + converged_iter = iter; + break; + } + } + + EXPECT_GE(converged_iter, 0) << "AL did not converge within " << max_iter; + EXPECT_NEAR(lambda[0][0], 0.0, 0.5); +} + +TEST_F(AugmentedLagrangianTest, ResetMu) +{ + spinconstrain::AugmentedLagrangianUpdate updater(0.1, 10.0, 2.0, 1, 10.0); + for (int iter = 0; iter < 5; ++iter) + { + updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, iter, nat); + } + EXPECT_GT(updater.get_mu(), 0.1); + updater.reset_mu(); + EXPECT_NEAR(updater.get_mu(), 0.1, 1e-10); +} + +// =================================================================== +// Scheme D: Hybrid Delayed Update tests +// =================================================================== + +class HybridDelayedTest : public ::testing::Test +{ + protected: + int nat; + std::vector> lambda; + std::vector> Mi; + std::vector> target_mag; + std::vector> constrain; + + void SetUp() override + { + nat = 2; + lambda.push_back(Vector3(0.0, 0.0, 0.0)); + lambda.push_back(Vector3(0.0, 0.0, 0.0)); + Mi.push_back(Vector3(1.0, 0.0, 0.0)); + Mi.push_back(Vector3(-0.5, 0.0, 0.0)); + target_mag.push_back(Vector3(2.0, 0.0, 0.0)); + target_mag.push_back(Vector3(-1.0, 0.0, 0.0)); + constrain.push_back(Vector3(1, 1, 1)); + constrain.push_back(Vector3(1, 1, 1)); + } +}; + +TEST_F(HybridDelayedTest, EarlyPhaseSkip) +{ + spinconstrain::HybridDelayedUpdate updater(1e-3, 0.1, 10.0, 1.5, 5, 10, 10.0); + updater.set_drho(1.0); + + auto result = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, 0, nat); + EXPECT_EQ(result.status, "skipped_early"); + EXPECT_EQ(updater.get_phase(), "early"); + EXPECT_NEAR(lambda[0][0], 0.0, 1e-10); +} + +TEST_F(HybridDelayedTest, MidPhaseUpdate) +{ + spinconstrain::HybridDelayedUpdate updater(1e-3, 0.1, 10.0, 1.5, 5, 10, 10.0); + updater.set_drho(5e-3); + + auto result = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, 0, nat); + EXPECT_EQ(updater.get_phase(), "mid"); + EXPECT_NEAR(lambda[0][0], -0.1, 1e-10); +} + +TEST_F(HybridDelayedTest, LatePhaseUpdate) +{ + spinconstrain::HybridDelayedUpdate updater(1e-3, 0.1, 10.0, 1.5, 5, 10, 10.0); + updater.set_drho(1e-5); + + auto result = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, 0, nat); + EXPECT_EQ(updater.get_phase(), "late"); + EXPECT_NEAR(lambda[0][0], -0.1, 1e-10); +} + +TEST_F(HybridDelayedTest, FallbackSignal) +{ + spinconstrain::HybridDelayedUpdate updater(1e-3, 0.1, 10.0, 1.5, 5, 10, 10.0); + updater.set_drho(1e-5); + + for (int iter = 0; iter < 5; ++iter) + { + auto result = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, iter, nat); + if (iter >= 2 && result.status == "fallback_triggered") + { + EXPECT_TRUE(true); + return; + } + } + FAIL() << "Fallback was not signaled after several iterations"; +} + +TEST_F(HybridDelayedTest, Reset) +{ + spinconstrain::HybridDelayedUpdate updater(1e-3, 0.1, 10.0, 1.5, 5, 10, 10.0); + updater.set_drho(1e-5); + for (int iter = 0; iter < 10; ++iter) + { + updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, iter, nat); + } + updater.reset(); + EXPECT_EQ(updater.get_phase(), "early"); +} + +TEST_F(HybridDelayedTest, PhaseTransitions) +{ + spinconstrain::HybridDelayedUpdate updater(1e-3, 0.1, 10.0, 1.5, 5, 10, 10.0); + + updater.set_drho(1.0); + auto r1 = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, 0, nat); + EXPECT_EQ(updater.get_phase(), "early"); + EXPECT_EQ(r1.status, "skipped_early"); + + updater.set_drho(5e-3); + updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, 1, nat); + EXPECT_EQ(updater.get_phase(), "mid"); + + updater.set_drho(1e-5); + updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-6, 2, nat); + EXPECT_EQ(updater.get_phase(), "late"); +} + +TEST_F(HybridDelayedTest, ConvergesWithInvertedResponse) +{ + spinconstrain::HybridDelayedUpdate updater(1e-3, 0.1, 10.0, 1.5, 5, 10, 10.0); + updater.set_drho(1e-5); + double chi = 1.0; + + int max_iter = 100; + int converged_iter = -1; + for (int iter = 0; iter < max_iter; ++iter) + { + auto result = updater.update_lambda(lambda, Mi, target_mag, constrain, 1e-3, iter, nat); + + Mi[0] = Vector3(target_mag[0][0] - chi * lambda[0][0], + target_mag[0][1] - chi * lambda[0][1], + target_mag[0][2] - chi * lambda[0][2]); + Mi[1] = Vector3(target_mag[1][0] - chi * lambda[1][0], + target_mag[1][1] - chi * lambda[1][1], + target_mag[1][2] - chi * lambda[1][2]); + + if (updater.is_converged()) + { + EXPECT_LT(result.rms_error, 1e-3); + converged_iter = iter; + break; + } + } + + EXPECT_GE(converged_iter, 0) << "Hybrid did not converge within " << max_iter + << ". Final phase: " << updater.get_phase(); +} + +} // namespace + +int main(int argc, char** argv) +{ + ::testing::InitGoogleTest(&argc, argv); + return RUN_ALL_TESTS(); +} diff --git a/source/source_lcao/module_dftu/CMakeLists.txt b/source/source_lcao/module_dftu/CMakeLists.txt index 42a58af7ba6..f41322b665c 100644 --- a/source/source_lcao/module_dftu/CMakeLists.txt +++ b/source/source_lcao/module_dftu/CMakeLists.txt @@ -19,3 +19,7 @@ add_library( if(ENABLE_COVERAGE) add_coverage(dftu) endif() + +if(BUILD_TESTING) + add_subdirectory(test) +endif() diff --git a/source/source_lcao/module_dftu/dftu.cpp b/source/source_lcao/module_dftu/dftu.cpp index 2680aed37a6..f3f306ad61e 100644 --- a/source/source_lcao/module_dftu/dftu.cpp +++ b/source/source_lcao/module_dftu/dftu.cpp @@ -33,6 +33,7 @@ double Plus_U::uramping = 0.0; // increase U by uramping, default is -1.0 int Plus_U::omc=0; // occupation matrix control int Plus_U::mixing_dftu=0; //whether to mix locale +int Plus_U::nspin=0; bool Plus_U::Yukawa=false; // whether to use Yukawa potential @@ -73,6 +74,7 @@ void Plus_U::init(UnitCell& cell, // unitcell class const int npol = PARAM.globalv.npol; // number of polarization directions const int nlocal = PARAM.globalv.nlocal; // number of total local orbitals const int nspin = PARAM.inp.nspin; // number of spins + Plus_U::nspin = nspin; // mohan update 2025-11-06 Plus_U::energy_u = 0.0; @@ -89,6 +91,10 @@ void Plus_U::init(UnitCell& cell, // unitcell class // it:index of type of atom for (int it = 0; it < cell.ntype; ++it) { + if(!has_correlated_orbital(it)) + { + continue; + } for (int ia = 0; ia < cell.atoms[it].na; ia++) { // ia:index of atoms of this type @@ -98,9 +104,28 @@ void Plus_U::init(UnitCell& cell, // unitcell class locale[iat].resize(cell.atoms[it].nwl + 1); locale_save[iat].resize(cell.atoms[it].nwl + 1); - const int tlp1_npol = (this->orbital_corr[it]*2+1)*npol; - this->eff_pot_pw_index[iat] = pot_index; - pot_index += tlp1_npol * tlp1_npol; + const int tlp1_npol = (get_orbital_corr(it)*2+1)*npol; + const int tlp1 = 2 * get_orbital_corr(it) + 1; + const int elem_size = tlp1 * tlp1; + // eff_pot_pw_index: per-atom offset into eff_pot_pw (and uom_array) + // + // nspin=1: offset = sum(tlp1^2 for preceding atoms), total = sum(all tlp1^2) + // nspin=2: same per-spin-channel offset; after the loop, pot_index *= 2 + // to create split layout: [all_spin_up | all_spin_down] + // spin-up at eff_pot_pw[eff_pot_pw_index[iat] + mm] + // spin-down at eff_pot_pw[size/2 + eff_pot_pw_index[iat] + mm] + // nspin=4: offset = sum(tlp1_npol^2) where tlp1_npol = (2l+1)*npol = 2*(2l+1) + // each atom occupies (2*tlp1)^2 = 4*tlp1^2 entries for 4 Pauli blocks + if(nspin == 4) + { + this->eff_pot_pw_index[iat] = pot_index; + pot_index += tlp1_npol * tlp1_npol; + } + else // nspin=1 or nspin=2: one tlp1^2 block per atom per spin channel + { + this->eff_pot_pw_index[iat] = pot_index; + pot_index += elem_size; + } for (int l = 0; l <= cell.atoms[it].nwl; l++) { @@ -166,7 +191,13 @@ void Plus_U::init(UnitCell& cell, // unitcell class } } // allocate memory for eff_pot_pw + // nspin=2: split layout [all_spin_up | all_spin_down], double the size + // nspin=4: each atom already has 4*tlp1^2 (tlp1_npol^2) entries for Pauli blocks + if (nspin == 2) pot_index *= 2; + this->eff_pot_pw.resize(pot_index, 0.0); + this->uom_array.resize(pot_index, 0.0); + this->uom_save.resize(pot_index, 0.0); if (Yukawa) { @@ -208,7 +239,7 @@ void Plus_U::init(UnitCell& cell, // unitcell class this->local_occup_bcast(cell); #endif - initialed_locale = true; + mark_locale_initialized(); this->copy_locale(cell); } else @@ -216,12 +247,12 @@ void Plus_U::init(UnitCell& cell, // unitcell class if (PARAM.inp.init_chg == "file") { std::stringstream sst; - sst << PARAM.globalv.global_out_dir << "onsite.dm"; + sst << PARAM.globalv.global_readin_dir << "onsite.dm"; this->read_occup_m(cell,sst.str()); #ifdef __MPI this->local_occup_bcast(cell); #endif - initialed_locale = true; + mark_locale_initialized(); } else { @@ -240,7 +271,7 @@ void Plus_U::cal_energy_correction(const UnitCell& ucell, { ModuleBase::TITLE("Plus_U", "cal_energy_correction"); ModuleBase::timer::start("Plus_U", "cal_energy_correction"); - if (!initialed_locale) + if (!is_locale_initialized()) { ModuleBase::timer::end("Plus_U", "cal_energy_correction"); return; @@ -254,7 +285,7 @@ void Plus_U::cal_energy_correction(const UnitCell& ucell, for (int T = 0; T < ucell.ntype; T++) { const int NL = ucell.atoms[T].nwl + 1; - const int LC = orbital_corr[T]; + const int LC = get_orbital_corr(T); for (int I = 0; I < ucell.atoms[T].na; I++) { if (LC == -1) @@ -263,11 +294,11 @@ void Plus_U::cal_energy_correction(const UnitCell& ucell, } const int iat = ucell.itia2iat(T, I); - const int L = orbital_corr[T]; + const int L = get_orbital_corr(T); for (int l = 0; l < NL; l++) { - if (l != orbital_corr[T]) + if (l != get_orbital_corr(T)) { continue; } diff --git a/source/source_lcao/module_dftu/dftu.h b/source/source_lcao/module_dftu/dftu.h index 56d213386d7..bc87978d25d 100644 --- a/source/source_lcao/module_dftu/dftu.h +++ b/source/source_lcao/module_dftu/dftu.h @@ -4,8 +4,8 @@ #include "source_cell/klist.h" #include "source_cell/unitcell.h" #include "source_basis/module_ao/parallel_orbitals.h" -#ifdef __LCAO #include "source_estate/module_charge/charge_mixing.h" +#ifdef __LCAO #include "source_hamilt/hamilt.h" #include "source_lcao/module_hcontainer/hcontainer.h" #include "source_estate/module_dm/density_matrix.h" @@ -62,6 +62,27 @@ class Plus_U static double uramping; // increase U by uramping, default is -1.0 static int omc; // occupation matrix control static int mixing_dftu; //whether to mix locale + static int nspin; // spin channel count (1, 2, or 4), set during init + + // --- Accessors for static data (prefer these over direct member access) --- + + /// get Hubbard U for atom type it + static double get_hubbard_u(int it) { return U[it]; } + + /// get target Hubbard U0 for atom type it + static double get_hubbard_u0(int it) { return U0[it]; } + + /// number of atom types with Hubbard U parameters + static int get_num_u_types() { return static_cast(U.size()); } + + /// get correlated orbital angular momentum for atom type it (-1 = none) + static int get_orbital_corr(int it) { return orbital_corr[it]; } + + /// whether atom type it has a correlated orbital + static bool has_correlated_orbital(int it) { return orbital_corr[it] != -1; } + + /// raw data pointer to orbital_corr (for kernel interfaces) + static const int* get_orbital_corr_data() { return orbital_corr.data(); } private: @@ -118,21 +139,49 @@ class Plus_U const void* psi_in, const ModuleBase::matrix& wg_in, const UnitCell& cell, - const double& mixing_beta); + Charge_Mixing* p_chgmix); /// calculate the local DFT+U effective potential matrix for PW base. void cal_VU_pot_pw(const int spin); - /// get effective potential matrix for PW base - const std::complex* get_eff_pot_pw(const int iat) const - { - return &(eff_pot_pw[this->eff_pot_pw_index[iat]]); - } - - int get_size_eff_pot_pw() const - { - return eff_pot_pw.size(); - } + /// get effective potential pointer for the given spin channel (PW basis) + /// + /// nspin=1: isk is ignored, returns &eff_pot_pw[0] + /// nspin=2: isk selects spin-up (0) or spin-down (1) half of the + /// split layout [all_up | all_dn] + /// nspin=4: isk is ignored, returns &eff_pot_pw[0] (all Pauli blocks) + const std::complex* get_eff_pot_pw_spin(const int isk) const + { + if (nspin == 2 && isk == 1) + { + return eff_pot_pw.data() + eff_pot_pw.size() / 2; + } + return eff_pot_pw.data(); + } + + /// get size of effective potential for a single spin channel (PW basis) + /// + /// nspin=1: full array size + /// nspin=2: half of the total (one spin channel in split layout) + /// nspin=4: full array size (all Pauli blocks are packed together) + int get_size_eff_pot_pw_spin() const + { + return (nspin == 2) ? static_cast(eff_pot_pw.size() / 2) + : static_cast(eff_pot_pw.size()); + } + + /// get effective potential matrix for PW base (per-atom, raw index) + /// @deprecated Use get_eff_pot_pw_spin() for nspin-aware access. + [[deprecated("Use get_eff_pot_pw_spin() for nspin-aware access")]] + const std::complex* get_eff_pot_pw(const int iat) const + { + return &(eff_pot_pw[this->eff_pot_pw_index[iat]]); + } + + int get_size_eff_pot_pw() const + { + return eff_pot_pw.size(); + } #ifdef __LCAO // calculate the local occupation number matrix @@ -153,6 +202,15 @@ class Plus_U // dftu can be calculated only after locale has been initialed bool initialed_locale = false; + // --- Accessors for initialed_locale --- + bool is_locale_initialized() const { return initialed_locale; } + void mark_locale_initialized() { initialed_locale = true; } + void mark_locale_dirty() { initialed_locale = false; } + + // --- Accessors for mixing_dftu --- + static bool is_mixing_enabled() { return mixing_dftu != 0; } + static void enable_mixing() { mixing_dftu = 1; } + private: void copy_locale(const UnitCell& ucell); @@ -161,8 +219,36 @@ class Plus_U std::vector> eff_pot_pw; std::vector eff_pot_pw_index; + std::vector uom_array; + std::vector uom_save; + + void set_locale(const UnitCell& ucell); public: + /// get occupation matrix element locale[iat][l][n][spin](m1,m2) + double get_locale(const int iat, const int l, const int n, const int spin, + const int m1, const int m2) const + { + return locale[iat][l][n][spin](m1, m2); + } + + /// set occupation matrix element locale[iat][l][n][spin](m1,m2) + void set_locale(const int iat, const int l, const int n, const int spin, + const int m1, const int m2, const double val) + { + locale[iat][l][n][spin](m1, m2) = val; + } + + /// get flat occupation matrix for an atom's correlated orbital. + /// nspin=1: fills occ with locale[iat][l][0][0] data + /// nspin=2: fills occ with interleaved locale[iat][l][0][0] and [1] data + /// nspin=4: fills occ with locale[iat][l][0][0] data (all 4 Pauli blocks) + void get_locale_flat(const int iat, const int l, std::vector& occ) const; + + /// set flat occupation matrix for an atom's correlated orbital (write-back) + void set_locale_flat(const int iat, const int l, const int spin, + const std::vector& occ); + // local occupancy matrix of the correlated subspace // locale: the out put local occupation number matrix of correlated electrons in the current electronic step // locale_save: the input local occupation number matrix of correlated electrons in the current electronic step diff --git a/source/source_lcao/module_dftu/dftu_force.cpp b/source/source_lcao/module_dftu/dftu_force.cpp index 7bdce056d3c..a2b6ffca4bf 100644 --- a/source/source_lcao/module_dftu/dftu_force.cpp +++ b/source/source_lcao/module_dftu/dftu_force.cpp @@ -252,7 +252,7 @@ void Plus_U::cal_force_k(const UnitCell& ucell, for (int it = 0; it < ucell.ntype; it++) { const int NL = ucell.atoms[it].nwl + 1; - const int LC = orbital_corr[it]; + const int LC = get_orbital_corr(it); if (LC == -1) continue; @@ -262,7 +262,7 @@ void Plus_U::cal_force_k(const UnitCell& ucell, for (int l = 0; l < NL; l++) { - if (l != orbital_corr[it]) + if (l != get_orbital_corr(it)) continue; const int N = ucell.atoms[it].l_nchi[l]; diff --git a/source/source_lcao/module_dftu/dftu_hamilt.cpp b/source/source_lcao/module_dftu/dftu_hamilt.cpp index bb7f59a69f4..e2c37039960 100644 --- a/source/source_lcao/module_dftu/dftu_hamilt.cpp +++ b/source/source_lcao/module_dftu/dftu_hamilt.cpp @@ -11,7 +11,7 @@ void Plus_U::cal_eff_pot_mat_complex(const int ik, const std::complex* sk) { ModuleBase::TITLE("Plus_U", "cal_eff_pot_c"); - if (!this->initialed_locale) + if (!is_locale_initialized()) { return; } @@ -64,7 +64,7 @@ void Plus_U::cal_eff_pot_mat_complex(const int ik, void Plus_U::cal_eff_pot_mat_real(const int ik, double* eff_pot, const std::vector& isk, const double* sk) { ModuleBase::TITLE("Plus_U", "cal_eff_pot_r"); - if (!this->initialed_locale) + if (!is_locale_initialized()) { return; } diff --git a/source/source_lcao/module_dftu/dftu_io.cpp b/source/source_lcao/module_dftu/dftu_io.cpp index 737c1c590a3..d44113d1be9 100644 --- a/source/source_lcao/module_dftu/dftu_io.cpp +++ b/source/source_lcao/module_dftu/dftu_io.cpp @@ -18,9 +18,9 @@ void Plus_U::output(const UnitCell &ucell) { const int N = ucell.atoms[T].l_nchi[L]; - if (L >= orbital_corr[T] && orbital_corr[T] != -1) + if (L >= get_orbital_corr(T) && has_correlated_orbital(T)) { - if (L != orbital_corr[T]) + if (L != get_orbital_corr(T)) { continue; } @@ -86,12 +86,12 @@ void Plus_U::write_occup_m(const UnitCell& ucell, for (int T = 0; T < ucell.ntype; T++) { - if (orbital_corr[T] == -1) + if (!has_correlated_orbital(T)) { continue; } const int NL = ucell.atoms[T].nwl + 1; - const int LC = orbital_corr[T]; + const int LC = get_orbital_corr(T); for (int I = 0; I < ucell.atoms[T].na; I++) { @@ -101,7 +101,7 @@ void Plus_U::write_occup_m(const UnitCell& ucell, for (int l = 0; l < NL; l++) { - if (l != orbital_corr[T]) + if (l != get_orbital_corr(T)) { continue; } @@ -290,11 +290,11 @@ void Plus_U::read_occup_m(const UnitCell& ucell, T = ucell.iat2it[iat]; const int NL = ucell.atoms[T].nwl + 1; - const int LC = orbital_corr[T]; + const int LC = get_orbital_corr(T); for (int l = 0; l < NL; l++) { - if (l != orbital_corr[T]) + if (l != get_orbital_corr(T)) { continue; } @@ -410,7 +410,7 @@ void Plus_U::local_occup_bcast(const UnitCell& ucell) for (int T = 0; T < ucell.ntype; T++) { - if (orbital_corr[T] == -1) + if (!has_correlated_orbital(T)) { continue; } @@ -418,11 +418,11 @@ void Plus_U::local_occup_bcast(const UnitCell& ucell) for (int I = 0; I < ucell.atoms[T].na; I++) { const int iat = ucell.itia2iat(T, I); - const int L = orbital_corr[T]; + const int L = get_orbital_corr(T); for (int l = 0; l <= ucell.atoms[T].nwl; l++) { - if (l != orbital_corr[T]) + if (l != get_orbital_corr(T)) { continue; } diff --git a/source/source_lcao/module_dftu/dftu_occup.cpp b/source/source_lcao/module_dftu/dftu_occup.cpp index 1babe0cad18..54890acfbe3 100644 --- a/source/source_lcao/module_dftu/dftu_occup.cpp +++ b/source/source_lcao/module_dftu/dftu_occup.cpp @@ -6,6 +6,12 @@ #endif #include "source_base/module_external/scalapack_connector.h" +// copy_locale — save current locale to locale_save and uom_save +// +// nspin=1: single spin channel, uom_save[eff_pot_pw_index[iat]+mm] +// nspin=2: split layout — spin-up at uom_save[index+mm], +// spin-down at uom_save[half_size+index+mm] +// nspin=4: all 4 Pauli blocks packed contiguously from index void Plus_U::copy_locale(const UnitCell& ucell) { ModuleBase::TITLE("Plus_U", "copy_locale"); @@ -13,29 +19,40 @@ void Plus_U::copy_locale(const UnitCell& ucell) for (int T = 0; T < ucell.ntype; T++) { - if (orbital_corr[T] == -1) - { - continue; - } + int target_l = get_orbital_corr(T); + if (target_l == -1) + continue; for (int I = 0; I < ucell.atoms[T].na; I++) { const int iat = ucell.itia2iat(T, I); - for (int l = 0; l < ucell.atoms[T].nwl + 1; l++) + if (PARAM.inp.nspin == 4) { - const int N = ucell.atoms[T].l_nchi[l]; - - for (int n = 0; n < N; n++) + locale_save[iat][target_l][0][0] = locale[iat][target_l][0][0]; + // nspin=4 locale matrix already contains all spin components interleaved + if(this->uom_save.size() != 0) { - if (PARAM.inp.nspin == 4) + const int size = locale[iat][target_l][0][0].nr * locale[iat][target_l][0][0].nc; + for(int mm=0; mmuom_save[eff_pot_pw_index[iat]+mm] = locale[iat][target_l][0][0].c[mm]; } - else if (PARAM.inp.nspin == 1 || PARAM.inp.nspin == 2) + } + } + else if (PARAM.inp.nspin == 1 || PARAM.inp.nspin == 2) + { + locale_save[iat][target_l][0][0] = locale[iat][target_l][0][0]; + locale_save[iat][target_l][0][1] = locale[iat][target_l][0][1]; + // save locale matrix for spin=0,1 to uom_save + if(this->uom_save.size() != 0) + { + const int size = locale[iat][target_l][0][0].nr * locale[iat][target_l][0][0].nc; + const int half_size = this->uom_save.size() / 2; + for(int mm=0; mmuom_save[eff_pot_pw_index[iat]+mm] = locale[iat][target_l][0][0].c[mm]; + this->uom_save[half_size + eff_pot_pw_index[iat]+mm] = locale[iat][target_l][0][1].c[mm]; } } } @@ -51,7 +68,7 @@ void Plus_U::zero_locale(const UnitCell& ucell) for (int T = 0; T < ucell.ntype; T++) { - if (orbital_corr[T] == -1) + if (!has_correlated_orbital(T)) { continue; } @@ -92,7 +109,7 @@ void Plus_U::mix_locale(const UnitCell& ucell, for (int T = 0; T < ucell.ntype; T++) { - if (orbital_corr[T] == -1) + if (!has_correlated_orbital(T)) { continue; } @@ -123,6 +140,79 @@ void Plus_U::mix_locale(const UnitCell& ucell, ModuleBase::timer::end("Plus_U", "mix_locale"); } +// set_locale — restore locale from uom_array (after mixing) +// +// nspin=1: locale[iat][l][n][0] from uom_array[eff_pot_pw_index[iat]+mm] +// nspin=2: spin-up from uom_array[index+mm], +// spin-down from uom_array[half_size+index+mm] +// nspin=4: all 4 Pauli blocks from uom_array[index+mm], mm in [0, 4*tlp1^2) +void Plus_U::set_locale(const UnitCell& ucell) +{ + ModuleBase::TITLE("Plus_U", "set_locale"); + ModuleBase::timer::start("Plus_U", "set_locale"); + + for (int T = 0; T < ucell.ntype; T++) + { + if (!has_correlated_orbital(T)) continue; + const int l = get_orbital_corr(T); + for (int I = 0; I < ucell.atoms[T].na; I++) + { + const int iat = ucell.itia2iat(T, I); + if (PARAM.inp.nspin == 4) + { + for(int mm = 0; mm < locale[iat][l][0][0].nr * locale[iat][l][0][0].nc; mm++) + locale[iat][l][0][0].c[mm] = this->uom_array[eff_pot_pw_index[iat] + mm]; + } + else if (PARAM.inp.nspin == 1 || PARAM.inp.nspin == 2) + { + const int half_size = this->uom_array.size() / 2; + for(int mm = 0; mm < locale[iat][l][0][0].nr * locale[iat][l][0][0].nc; mm++) + { + locale[iat][l][0][0].c[mm] = this->uom_array[eff_pot_pw_index[iat] + mm]; + if (PARAM.inp.nspin == 2) + { + locale[iat][l][0][1].c[mm] = this->uom_array[half_size + eff_pot_pw_index[iat] + mm]; + } + } + } + } + } + + ModuleBase::timer::end("Plus_U", "set_locale"); +} + +void Plus_U::get_locale_flat(const int iat, const int l, std::vector& occ) const +{ + const int tlp1 = 2 * l + 1; + const int size = tlp1 * tlp1; + if (nspin == 2) + { + for (int is = 0; is < 2; is++) + { + for (int i = 0; i < size; i++) + { + occ[is * size + i] = locale[iat][l][0][is].c[i]; + } + } + } + else + { + for (int i = 0; i < static_cast(occ.size()); i++) + { + occ[i] = locale[iat][l][0][0].c[i]; + } + } +} + +void Plus_U::set_locale_flat(const int iat, const int l, const int spin, + const std::vector& occ) +{ + for (int i = 0; i < static_cast(occ.size()); i++) + { + locale[iat][l][0][spin].c[i] = occ[i]; + } +} + #ifdef __LCAO void Plus_U::cal_occup_m_k(const int iter, @@ -210,7 +300,7 @@ void Plus_U::cal_occup_m_k(const int iter, for (int it = 0; it < ucell.ntype; it++) { const int NL = ucell.atoms[it].nwl + 1; - const int LC = orbital_corr[it]; + const int LC = get_orbital_corr(it); if (LC == -1) { @@ -223,7 +313,7 @@ void Plus_U::cal_occup_m_k(const int iter, for (int l = 0; l < NL; l++) { - if (l != orbital_corr[it]) + if (l != get_orbital_corr(it)) { continue; } @@ -284,7 +374,7 @@ void Plus_U::cal_occup_m_k(const int iter, for (int it = 0; it < ucell.ntype; it++) { const int NL = ucell.atoms[it].nwl + 1; - const int LC = orbital_corr[it]; + const int LC = get_orbital_corr(it); if (LC == -1) { @@ -297,7 +387,7 @@ void Plus_U::cal_occup_m_k(const int iter, for (int l = 0; l < NL; l++) { - if (l != orbital_corr[it]) + if (l != get_orbital_corr(it)) { continue; } @@ -371,12 +461,12 @@ void Plus_U::cal_occup_m_k(const int iter, } // end ia } // end it - if(mixing_dftu && initialed_locale) + if(is_mixing_enabled() && is_locale_initialized()) { this->mix_locale(ucell,mixing_beta); } - this->initialed_locale = true; + mark_locale_initialized(); ModuleBase::timer::end("Plus_U", "cal_occup_m_k"); return; } @@ -430,7 +520,7 @@ void Plus_U::cal_occup_m_gamma(const int iter, for (int it = 0; it < ucell.ntype; it++) { const int NL = ucell.atoms[it].nwl + 1; - if (orbital_corr[it] == -1) + if (!has_correlated_orbital(it)) { continue; } @@ -440,7 +530,7 @@ void Plus_U::cal_occup_m_gamma(const int iter, for (int l = 0; l < NL; l++) { - if (l != orbital_corr[it]) + if (l != get_orbital_corr(it)) { continue; } @@ -529,12 +619,12 @@ void Plus_U::cal_occup_m_gamma(const int iter, } // it } // is - if(mixing_dftu && initialed_locale) + if(is_mixing_enabled() && is_locale_initialized()) { this->mix_locale(ucell,mixing_beta); } - this->initialed_locale = true; + mark_locale_initialized(); ModuleBase::timer::end("Plus_U", "cal_occup_m_gamma"); return; } diff --git a/source/source_lcao/module_dftu/dftu_pw.cpp b/source/source_lcao/module_dftu/dftu_pw.cpp index 7a1a9bac3a6..5e44647e7ab 100644 --- a/source/source_lcao/module_dftu/dftu_pw.cpp +++ b/source/source_lcao/module_dftu/dftu_pw.cpp @@ -4,13 +4,30 @@ #include "source_io/module_parameter/parameter.h" #include "source_base/timer.h" - -/// calculate occupation matrix for DFT+U +/// calculate occupation matrix for DFT+U (PW basis) +/// +/// nspin=1 (npol=1): single spin channel; locale[iat][l][n][0] only; +/// eff_pot_pw has one block of tlp1^2 per atom. +/// +/// nspin=2 (npol=1): two spin channels stored separately: +/// locale[iat][l][n][0] = spin-up, locale[iat][l][n][1] = spin-down; +/// becp indices: ib*nkb + begin_ih + m (same formula for both spins); +/// spin channel selected by `is` derived from ik >= nk/2; +/// eff_pot_pw split layout: [all_spin_up | all_spin_down]; +/// uom_array split layout: [all_spin_up | all_spin_down]; +/// VU spin-down stored at eff_pot_pw.size()/2 + eff_pot_pw_index[iat]. +/// +/// nspin=4 (npol=2): spinor calculation; +/// locale has a single matrix of size (2*tlp1) x (2*tlp1) per atom +/// storing all 4 Pauli blocks contiguously; +/// becp indices: ib*npol*nkb + begin_ih + m_begin + m (with spinor offset); +/// eff_pot_pw has tlp1_npol^2 = 4*tlp1^2 entries per atom; +/// after VU calculation, Pauli→spin transformation is applied. void Plus_U::cal_occ_pw(const int iter, const void* psi_in, const ModuleBase::matrix& wg_in, const UnitCell& cell, - const double& mixing_beta) + Charge_Mixing* p_chgmix) { ModuleBase::timer::start("Plus_U", "cal_occ_pw"); this->copy_locale(cell); @@ -20,58 +37,83 @@ void Plus_U::cal_occ_pw(const int iter, { auto* onsite_p = projectors::OnsiteProjector::get_instance(); const psi::Psi>* psi_p = (const psi::Psi>*)psi_in; - // loop over k-points to calculate Mi of \sum_{k,i,l,m} const int nbands = psi_p->get_nbands(); + const int npol = psi_p->get_npol(); for(int ik = 0; ik < psi_p->get_nk(); ik++) { + int is = 0; + if(PARAM.inp.nspin == 2 && ik >= psi_p->get_nk()/2) + { + is = 1; + } psi_p->fix_k(ik); onsite_p->tabulate_atomic(ik); - onsite_p->overlap_proj_psi(nbands*psi_p->get_npol(), psi_p->get_pointer()); + onsite_p->overlap_proj_psi(nbands*npol, psi_p->get_pointer()); const std::complex* becp = onsite_p->get_h_becp(); - // becp(nbands*npol , nkb) - // mag = wg * \sum_{nh}becp * becp - int nkb = onsite_p->get_size_becp() / nbands / psi_p->get_npol(); + int nkb = onsite_p->get_size_becp() / nbands / npol; + int begin_ih = 0; for(int iat = 0; iat < cell.nat; iat++) { const int it = cell.iat2it[iat]; const int nh = onsite_p->get_nh(iat); - const int target_l = this->orbital_corr[it]; - if(target_l == -1) + const int target_l = get_orbital_corr(it); + if(!has_correlated_orbital(it)) { begin_ih += nh; continue; } - // m = l^2, l^2+1, ..., (l+1)^2-1 const int m_begin = target_l * target_l; const int tlp1 = 2 * target_l + 1; const int tlp1_2 = tlp1 * tlp1; - for(int ib = 0;ib occ[4]; - occ[0] = weight * conj(becp[index_m1]) * becp[index_m2]; - occ[1] = weight * conj(becp[index_m1]) * becp[index_m2 + nkb]; - occ[2] = weight * conj(becp[index_m1 + nkb]) * becp[index_m2]; - occ[3] = weight * conj(becp[index_m1 + nkb]) * becp[index_m2 + nkb]; - this->locale[iat][target_l][0][0].c[ind_m1m2] += (occ[0] + occ[3]).real(); - this->locale[iat][target_l][0][0].c[ind_m1m2 + tlp1_2] += (occ[1] + occ[2]).real(); - this->locale[iat][target_l][0][0].c[ind_m1m2 + 2 * tlp1_2] += (occ[1] - occ[2]).imag(); - this->locale[iat][target_l][0][0].c[ind_m1m2 + 3 * tlp1_2] += (occ[0] - occ[3]).real(); - ind_m1m2++; + const int index_m1 = ib*npol*nkb + begin_ih + m_begin + m1; + for(int m2 = 0; m2 < tlp1; m2++) + { + const int index_m2 = ib*npol*nkb + begin_ih + m_begin + m2; + std::complex occ[4]; + occ[0] = weight * conj(becp[index_m1]) * becp[index_m2]; + occ[1] = weight * conj(becp[index_m1]) * becp[index_m2 + nkb]; + occ[2] = weight * conj(becp[index_m1 + nkb]) * becp[index_m2]; + occ[3] = weight * conj(becp[index_m1 + nkb]) * becp[index_m2 + nkb]; + this->locale[iat][target_l][0][0].c[ind_m1m2] += (occ[0] + occ[3]).real(); + this->locale[iat][target_l][0][0].c[ind_m1m2 + tlp1_2] += (occ[1] + occ[2]).real(); + this->locale[iat][target_l][0][0].c[ind_m1m2 + 2 * tlp1_2] += (occ[1] - occ[2]).imag(); + this->locale[iat][target_l][0][0].c[ind_m1m2 + 3 * tlp1_2] += (occ[0] - occ[3]).real(); + ind_m1m2++; + } } - } - }// ib + }// ib + } + else // nspin=1 or nspin=2 + { + for(int ib = 0;iblocale[iat][target_l][0][is].c[ind_m1m2] += weight * (conj(becp[index_m1]) * becp[index_m2]).real(); + ind_m1m2++; + } + } + }// ib + } begin_ih += nh; }// iat + }// ik } #if defined(__CUDA) || defined(__ROCM) @@ -79,141 +121,250 @@ void Plus_U::cal_occ_pw(const int iter, { auto* onsite_p = projectors::OnsiteProjector::get_instance(); const psi::Psi, base_device::DEVICE_GPU>* psi_p = (const psi::Psi, base_device::DEVICE_GPU>*)psi_in; - // loop over k-points to calculate Mi of \sum_{k,i,l,m} const int nbands = psi_p->get_nbands(); + const int npol = psi_p->get_npol(); for(int ik = 0; ik < psi_p->get_nk(); ik++) { + int is = 0; + if(PARAM.inp.nspin == 2 && ik >= psi_p->get_nk()/2) + { + is = 1; + } psi_p->fix_k(ik); onsite_p->tabulate_atomic(ik); - onsite_p->overlap_proj_psi(nbands*psi_p->get_npol(), psi_p->get_pointer()); + onsite_p->overlap_proj_psi(nbands*npol, psi_p->get_pointer()); const std::complex* becp = onsite_p->get_h_becp(); - // becp(nbands*npol , nkb) - // mag = wg * \sum_{nh}becp * becp - int nkb = onsite_p->get_size_becp() / nbands / psi_p->get_npol(); + int nkb = onsite_p->get_size_becp() / nbands / npol; int begin_ih = 0; for(int iat = 0; iat < cell.nat; iat++) { const int it = cell.iat2it[iat]; const int nh = onsite_p->get_nh(iat); - const int target_l = this->orbital_corr[it]; - if(target_l == -1) + const int target_l = get_orbital_corr(it); + if(!has_correlated_orbital(it)) { begin_ih += nh; continue; } - // m = l^2, l^2+1, ..., (l+1)^2-1 const int m_begin = target_l * target_l; const int tlp1 = 2 * target_l + 1; const int tlp1_2 = tlp1 * tlp1; - for(int ib = 0;ib occ[4]; - occ[0] = weight * conj(becp[index_m1]) * becp[index_m2]; - occ[1] = weight * conj(becp[index_m1]) * becp[index_m2 + nkb]; - occ[2] = weight * conj(becp[index_m1 + nkb]) * becp[index_m2]; - occ[3] = weight * conj(becp[index_m1 + nkb]) * becp[index_m2 + nkb]; - this->locale[iat][target_l][0][0].c[ind_m1m2] += (occ[0] + occ[3]).real(); - this->locale[iat][target_l][0][0].c[ind_m1m2 + tlp1_2] += (occ[1] + occ[2]).real(); - this->locale[iat][target_l][0][0].c[ind_m1m2 + 2 * tlp1_2] += (occ[1] - occ[2]).imag(); - this->locale[iat][target_l][0][0].c[ind_m1m2 + 3 * tlp1_2] += (occ[0] - occ[3]).real(); - ind_m1m2++; + const int index_m1 = ib*npol*nkb + begin_ih + m_begin + m1; + for(int m2 = 0; m2 < tlp1; m2++) + { + const int index_m2 = ib*npol*nkb + begin_ih + m_begin + m2; + std::complex occ[4]; + occ[0] = weight * conj(becp[index_m1]) * becp[index_m2]; + occ[1] = weight * conj(becp[index_m1]) * becp[index_m2 + nkb]; + occ[2] = weight * conj(becp[index_m1 + nkb]) * becp[index_m2]; + occ[3] = weight * conj(becp[index_m1 + nkb]) * becp[index_m2 + nkb]; + this->locale[iat][target_l][0][0].c[ind_m1m2] += (occ[0] + occ[3]).real(); + this->locale[iat][target_l][0][0].c[ind_m1m2 + tlp1_2] += (occ[1] + occ[2]).real(); + this->locale[iat][target_l][0][0].c[ind_m1m2 + 2 * tlp1_2] += (occ[1] - occ[2]).imag(); + this->locale[iat][target_l][0][0].c[ind_m1m2 + 3 * tlp1_2] += (occ[0] - occ[3]).real(); + ind_m1m2++; + } } - } - }// ib + }// ib + } + else // nspin=1 or nspin=2 + { + for(int ib = 0;iblocale[iat][target_l][0][is].c[ind_m1m2] += weight * (conj(becp[index_m1]) * becp[index_m2]).real(); + ind_m1m2++; + } + } + }// ib + } begin_ih += nh; }// iat }// ik } #endif - Plus_U::energy_u = 0.0; - // reduce mag from all k-pools + // reduce locale from all k-pools for(int iat = 0; iat < cell.nat; iat++) { const int it = cell.iat2it[iat]; - const int target_l = this->orbital_corr[it]; - if(target_l == -1) + const int target_l = get_orbital_corr(it); + if(!has_correlated_orbital(it)) { continue; } const int size = (2 * target_l + 1) * (2 * target_l + 1); - Parallel_Reduce::reduce_double_allpool(PARAM.inp.kpar, - PARAM.globalv.nproc_in_pool, - this->locale[iat][target_l][0][0].c, - size * PARAM.inp.nspin); + if(PARAM.inp.nspin != 4) + { + Parallel_Reduce::reduce_double_allpool(PARAM.inp.kpar, + PARAM.globalv.nproc_in_pool, + this->locale[iat][target_l][0][0].c, + size); + if(PARAM.inp.nspin == 2) + { + Parallel_Reduce::reduce_double_allpool(PARAM.inp.kpar, + PARAM.globalv.nproc_in_pool, + this->locale[iat][target_l][0][1].c, + size); + } + } + else + { + Parallel_Reduce::reduce_double_allpool(PARAM.inp.kpar, + PARAM.globalv.nproc_in_pool, + this->locale[iat][target_l][0][0].c, + size * 4); + } + + // save locale matrix for this iat to uom_array + if(this->uom_array.size() != 0) + { + for(int mm=0;mmuom_array[eff_pot_pw_index[iat]+mm] = this->locale[iat][target_l][0][0].c[mm]; + } + if(PARAM.inp.nspin == 2) + { + const int half_size = this->uom_array.size() / 2; + for(int mm=0;mmuom_array[half_size + eff_pot_pw_index[iat]+mm] = this->locale[iat][target_l][0][1].c[mm]; + } + } + } + } + + // mixing + if(is_mixing_enabled() && p_chgmix != nullptr) + { + p_chgmix->mix_uom(this->uom_array, this->uom_save); + this->set_locale(cell); + } + + Plus_U::energy_u = 0.0; + const double weight_eu = (PARAM.inp.nspin == 1) ? 1.0 : (PARAM.inp.nspin == 2) ? 0.5 : 0.25; + const double diag_coeff = (PARAM.inp.nspin == 4) ? 1.0 : 0.5; + // calculate VU and energy (locale already reduced above) + for(int iat = 0; iat < cell.nat; iat++) + { + const int it = cell.iat2it[iat]; + const int target_l = get_orbital_corr(it); + if(!has_correlated_orbital(it)) + { + continue; + } + const int size = (2 * target_l + 1) * (2 * target_l + 1); //update effective potential const double u_value = this->U[it]; std::complex* vu_iat = &(this->eff_pot_pw[this->eff_pot_pw_index[iat]]); const int m_size = 2 * target_l + 1; - for (int m1 = 0; m1 < m_size; m1++) + + if(PARAM.inp.nspin == 4) { - for (int m2 = 0; m2 < m_size; m2++) + for (int m1 = 0; m1 < m_size; m1++) { - vu_iat[m1 * m_size + m2] = u_value * - (1.0 * (m1 == m2) - this->locale[iat][target_l][0][0].c[m2 * m_size + m1]); - Plus_U::energy_u += u_value * 0.25 * this->locale[iat][target_l][0][0].c[m2 * m_size + m1] - * this->locale[iat][target_l][0][0].c[m1 * m_size + m2]; + for (int m2 = 0; m2 < m_size; m2++) + { + vu_iat[m1 * m_size + m2] = u_value * + (diag_coeff * (m1 == m2) - this->locale[iat][target_l][0][0].c[m2 * m_size + m1]); + Plus_U::energy_u += u_value * weight_eu * this->locale[iat][target_l][0][0].c[m2 * m_size + m1] + * this->locale[iat][target_l][0][0].c[m1 * m_size + m2]; + } } - } - for (int is = 1; is < 4; ++is) - { - int start = is * m_size * m_size; + for (int is = 1; is < 4; ++is) + { + int start = is * m_size * m_size; + for (int m1 = 0; m1 < m_size; m1++) + { + for (int m2 = 0; m2 < m_size; m2++) + { + vu_iat[start + m1 * m_size + m2] = u_value * + (0 - this->locale[iat][target_l][0][0].c[start + m2 * m_size + m1]); + Plus_U::energy_u += u_value * weight_eu + * this->locale[iat][target_l][0][0].c[start + m2 * m_size + m1] + * this->locale[iat][target_l][0][0].c[start + m1 * m_size + m2]; + } + } + } + // transfer from Pauli matrix representation to spin representation for (int m1 = 0; m1 < m_size; m1++) { for (int m2 = 0; m2 < m_size; m2++) { - vu_iat[start + m1 * m_size + m2] = u_value * - (0 - this->locale[iat][target_l][0][0].c[start + m2 * m_size + m1]); - Plus_U::energy_u += u_value * 0.25 - * this->locale[iat][target_l][0][0].c[start + m2 * m_size + m1] - * this->locale[iat][target_l][0][0].c[start + m1 * m_size + m2]; + int index[4]; + index[0] = m1 * m_size + m2; + index[1] = m1 * m_size + m2 + size; + index[2] = m1 * m_size + m2 + size * 2; + index[3] = m1 * m_size + m2 + size * 3; + std::complex vu_tmp[4]; + for (int i = 0; i < 4; i++) + { + vu_tmp[i] = vu_iat[index[i]]; + } + vu_iat[index[0]] = 0.5 * (vu_tmp[0] + vu_tmp[3]); + vu_iat[index[3]] = 0.5 * (vu_tmp[0] - vu_tmp[3]); + vu_iat[index[1]] = 0.5 * (vu_tmp[1] + std::complex(0.0, 1.0) * vu_tmp[2]); + vu_iat[index[2]] = 0.5 * (vu_tmp[1] - std::complex(0.0, 1.0) * vu_tmp[2]); } } } - // transfer from Pauli matrix representation to spin representation - for (int m1 = 0; m1 < m_size; m1++) + else // nspin=1 or nspin=2 { - for (int m2 = 0; m2 < m_size; m2++) + // spin-up channel + for (int m1 = 0; m1 < m_size; m1++) { - int index[4]; - index[0] = m1 * m_size + m2; - index[1] = m1 * m_size + m2 + size; - index[2] = m1 * m_size + m2 + size * 2; - index[3] = m1 * m_size + m2 + size * 3; - std::complex vu_tmp[4]; - for (int i = 0; i < 4; i++) + for (int m2 = 0; m2 < m_size; m2++) { - vu_tmp[i] = vu_iat[index[i]]; + vu_iat[m1 * m_size + m2] = u_value * + (diag_coeff * (m1 == m2) - this->locale[iat][target_l][0][0].c[m2 * m_size + m1]); + Plus_U::energy_u += u_value * weight_eu * this->locale[iat][target_l][0][0].c[m2 * m_size + m1] + * this->locale[iat][target_l][0][0].c[m1 * m_size + m2]; + } + } + // spin-down channel for nspin=2 + if(PARAM.inp.nspin == 2) + { + std::complex* vu_iat1 = &(this->eff_pot_pw[this->eff_pot_pw.size()/2 + this->eff_pot_pw_index[iat]]); + for (int m1 = 0; m1 < m_size; m1++) + { + for (int m2 = 0; m2 < m_size; m2++) + { + vu_iat1[m1 * m_size + m2] = u_value * + (diag_coeff * (m1 == m2) - this->locale[iat][target_l][0][1].c[m2 * m_size + m1]); + Plus_U::energy_u += u_value * weight_eu * this->locale[iat][target_l][0][1].c[m2 * m_size + m1] + * this->locale[iat][target_l][0][1].c[m1 * m_size + m2]; + } } - vu_iat[index[0]] = 0.5 * (vu_tmp[0] + vu_tmp[3]); - vu_iat[index[3]] = 0.5 * (vu_tmp[0] - vu_tmp[3]); - vu_iat[index[1]] = 0.5 * (vu_tmp[1] + std::complex(0.0, 1.0) * vu_tmp[2]); - vu_iat[index[2]] = 0.5 * (vu_tmp[1] - std::complex(0.0, 1.0) * vu_tmp[2]); } } } - if(mixing_dftu && initialed_locale) - { - this->mix_locale(cell, mixing_beta); - } - // update effective potential ModuleBase::timer::end("Plus_U", "cal_occ_pw"); } /// calculate the local DFT+U effective potential matrix for PW base. +/// TODO: implement VU potential calculation for PW basis void Plus_U::cal_VU_pot_pw(const int spin) { - + // Placeholder: VU potential for PW is computed via cal_eff_pot_mat_* in the + // onsite projector path. This function is reserved for future direct-PW implementation. + (void)spin; } diff --git a/source/source_lcao/module_dftu/test/CMakeLists.txt b/source/source_lcao/module_dftu/test/CMakeLists.txt new file mode 100644 index 00000000000..82d179d52b3 --- /dev/null +++ b/source/source_lcao/module_dftu/test/CMakeLists.txt @@ -0,0 +1,5 @@ +AddTest( + TARGET dftu_pw_test + LIBS ${math_libs} base device parameter + SOURCES dftu_pw_test.cpp +) diff --git a/source/source_lcao/module_dftu/test/dftu_pw_test.cpp b/source/source_lcao/module_dftu/test/dftu_pw_test.cpp new file mode 100644 index 00000000000..5fd0083861c --- /dev/null +++ b/source/source_lcao/module_dftu/test/dftu_pw_test.cpp @@ -0,0 +1,1057 @@ +#include "gtest/gtest.h" +#include +#define private public +#include "source_io/module_parameter/parameter.h" +#undef private + +/*********************************************************************** + * Unit tests for DFT+U PW nspin=1/2/4 support (PR-2) + * + * Strategy: test energy weights and becp index logic as pure + * arithmetic — no need to link against full ABACUS libraries. + * set_locale is tested via integration tests. + ***********************************************************************/ + +class DftuPwTest : public ::testing::Test +{ + protected: + void SetUp() override {} + void TearDown() override {} +}; + +// ===================================================================== +// Energy weight tests +// ===================================================================== + +TEST_F(DftuPwTest, EnergyWeightsNspin1) +{ + PARAM.input.nspin = 1; + double weight_eu = 1; + switch(PARAM.inp.nspin) + { + case 1: weight_eu = 1.0; break; + case 2: weight_eu = 0.5; break; + case 4: weight_eu = 0.25; break; + default: break; + } + const double diag_coeff = PARAM.inp.nspin == 4 ? 1.0 : 0.5; + EXPECT_DOUBLE_EQ(weight_eu, 1.0); + EXPECT_DOUBLE_EQ(diag_coeff, 0.5); +} + +TEST_F(DftuPwTest, EnergyWeightsNspin2) +{ + PARAM.input.nspin = 2; + double weight_eu = 1; + switch(PARAM.inp.nspin) + { + case 1: weight_eu = 1.0; break; + case 2: weight_eu = 0.5; break; + case 4: weight_eu = 0.25; break; + default: break; + } + const double diag_coeff = PARAM.inp.nspin == 4 ? 1.0 : 0.5; + EXPECT_DOUBLE_EQ(weight_eu, 0.5); + EXPECT_DOUBLE_EQ(diag_coeff, 0.5); +} + +TEST_F(DftuPwTest, EnergyWeightsNspin4) +{ + PARAM.input.nspin = 4; + double weight_eu = 1; + switch(PARAM.inp.nspin) + { + case 1: weight_eu = 1.0; break; + case 2: weight_eu = 0.5; break; + case 4: weight_eu = 0.25; break; + default: break; + } + const double diag_coeff = PARAM.inp.nspin == 4 ? 1.0 : 0.5; + EXPECT_DOUBLE_EQ(weight_eu, 0.25); + EXPECT_DOUBLE_EQ(diag_coeff, 1.0); +} + +// ===================================================================== +// Becp index tests +// ===================================================================== + +TEST_F(DftuPwTest, OccupNspin12Index) +{ + const int nkb = 10, begin_ih = 3, m_begin = 4, m = 2, ib = 5; + // nspin=1/2: index = ib*nkb + begin_ih + m_begin + m + const int index_nspin12 = ib * nkb + begin_ih + m_begin + m; + EXPECT_EQ(index_nspin12, 59); + // different from nspin=4 + const int index_nspin4 = ib * 2 * nkb + begin_ih + m_begin + m; + EXPECT_NE(index_nspin12, index_nspin4); +} + +TEST_F(DftuPwTest, OccupNspin4Index) +{ + const int nkb = 10, begin_ih = 3, m_begin = 4, m = 2, ib = 5; + const int index_nspin4 = ib * 2 * nkb + begin_ih + m_begin + m; + EXPECT_EQ(index_nspin4, 109); +} + +// ===================================================================== +// set_locale logic tests (pure array copy, no UnitCell needed) +// ===================================================================== + +TEST_F(DftuPwTest, SetLocaleNspin4) +{ + // Simulate set_locale for nspin=4: uom_array -> locale copy + PARAM.input.nspin = 4; + const int mat_size = 10; // (2*2+1)*2 for d-orbital with npol=2 + const int total = mat_size * mat_size; // 100 + + std::vector uom_array(total); + for(int i = 0; i < total; i++) + uom_array[i] = static_cast(i + 1); + + // Simulate locale as raw array (same as ModuleBase::matrix::c) + std::vector locale_c(total, 0.0); + + // nspin=4 branch: direct copy + for(int mm = 0; mm < total; mm++) + locale_c[mm] = uom_array[mm]; + + for(int i = 0; i < total; i++) + EXPECT_DOUBLE_EQ(locale_c[i], static_cast(i + 1)); +} + +TEST_F(DftuPwTest, SetLocaleNspin2) +{ + // Simulate set_locale for nspin=2: uom_array -> locale copy (spin-up + spin-down) + PARAM.input.nspin = 2; + const int mat_size = 5; // 2*2+1 for d-orbital + const int size_per_spin = mat_size * mat_size; // 25 + const int total = size_per_spin * 2; // 50 + + std::vector uom_array(total); + for(int i = 0; i < size_per_spin; i++) + { + uom_array[i] = static_cast(i + 1); // spin-up + uom_array[i + size_per_spin] = static_cast(i + 101); // spin-down + } + + std::vector locale_up(size_per_spin, 0.0); + std::vector locale_dn(size_per_spin, 0.0); + + // nspin=1/2 branch: copy both spin channels + const int nr_nc = size_per_spin; // locale[iat][l][0][0].nr * locale[iat][l][0][0].nc + for(int mm = 0; mm < nr_nc; mm++) + { + locale_up[mm] = uom_array[mm]; + locale_dn[mm] = uom_array[mm + nr_nc]; + } + + for(int i = 0; i < size_per_spin; i++) + { + EXPECT_DOUBLE_EQ(locale_up[i], static_cast(i + 1)); + EXPECT_DOUBLE_EQ(locale_dn[i], static_cast(i + 101)); + } +} + +// ===================================================================== +// VU effective potential tests (cal_occ_pw logic) +// ===================================================================== + +TEST_F(DftuPwTest, VUPotNspin1_DiagonalLocale) +{ + // For nspin=1: VU[m1,m2] = U * (0.5*delta(m1,m2) - locale[m2*m_size+m1]) + // With diagonal locale: locale[m,m] = 0.3 + const double U_val = 4.0; + const int m_size = 5; // d-orbital: 2*2+1 + const int size = m_size * m_size; + + std::vector locale_c(size, 0.0); + for(int m = 0; m < m_size; m++) + locale_c[m * m_size + m] = 0.3; // diagonal + + std::vector> vu(size, {0.0, 0.0}); + for(int m1 = 0; m1 < m_size; m1++) + { + for(int m2 = 0; m2 < m_size; m2++) + { + const double diag_coeff = 0.5; // nspin != 4 + vu[m1 * m_size + m2] = U_val * + (diag_coeff * (m1 == m2) - locale_c[m2 * m_size + m1]); + } + } + + // diagonal: U*(0.5 - 0.3) = 4.0*0.2 = 0.8 + for(int m = 0; m < m_size; m++) + EXPECT_DOUBLE_EQ(vu[m * m_size + m].real(), 0.8); + + // off-diagonal: U*(0 - 0) = 0 + EXPECT_DOUBLE_EQ(vu[0 * m_size + 1].real(), 0.0); + EXPECT_DOUBLE_EQ(vu[1 * m_size + 0].real(), 0.0); +} + +TEST_F(DftuPwTest, VUPotNspin1_OffDiagonalLocale) +{ + // locale has off-diagonal elements + const double U_val = 3.0; + const int m_size = 3; // p-orbital: 2*1+1 + const int size = m_size * m_size; + + std::vector locale_c(size, 0.0); + locale_c[0 * m_size + 1] = 0.1; // locale(0,1) = 0.1 + locale_c[1 * m_size + 0] = 0.2; // locale(1,0) = 0.2 + + std::vector> vu(size, {0.0, 0.0}); + for(int m1 = 0; m1 < m_size; m1++) + { + for(int m2 = 0; m2 < m_size; m2++) + { + vu[m1 * m_size + m2] = U_val * + (0.5 * (m1 == m2) - locale_c[m2 * m_size + m1]); + } + } + + // VU[0,1] = U * (0 - locale[1*3+0]) = 3.0 * (-0.2) = -0.6 + EXPECT_DOUBLE_EQ(vu[0 * m_size + 1].real(), -0.6); + // VU[1,0] = U * (0 - locale[0*3+1]) = 3.0 * (-0.1) = -0.3 + EXPECT_DOUBLE_EQ(vu[1 * m_size + 0].real(), -0.3); +} + +TEST_F(DftuPwTest, VUPotNspin2_TwoSpinChannels) +{ + // nspin=2: two independent spin channels with same formula + const double U_val = 5.0; + const int m_size = 3; + const int size = m_size * m_size; + + std::vector locale_up(size, 0.0); + std::vector locale_dn(size, 0.0); + locale_up[0] = 0.4; // locale_up(0,0) = 0.4 + locale_dn[0] = 0.1; // locale_dn(0,0) = 0.1 + + // VU_up[0,0] = U*(0.5 - 0.4) = 0.5 + double vu_up_00 = U_val * (0.5 - locale_up[0 * m_size + 0]); + EXPECT_DOUBLE_EQ(vu_up_00, 0.5); + + // VU_dn[0,0] = U*(0.5 - 0.1) = 2.0 + double vu_dn_00 = U_val * (0.5 - locale_dn[0 * m_size + 0]); + EXPECT_DOUBLE_EQ(vu_dn_00, 2.0); +} + +TEST_F(DftuPwTest, VUPotNspin4_PauliTransform) +{ + // nspin=4: after computing VU in Pauli basis, transform to spin basis + // vu_spin[0] = 0.5*(vu_pauli[0] + vu_pauli[3]) + // vu_spin[3] = 0.5*(vu_pauli[0] - vu_pauli[3]) + // vu_spin[1] = 0.5*(vu_pauli[1] + i*vu_pauli[2]) + // vu_spin[2] = 0.5*(vu_pauli[1] - i*vu_pauli[2]) + const int m_size = 3; + const int size = m_size * m_size; + + // For a single (m1,m2) pair, test the Pauli->spin transform + std::complex vu_pauli[4]; + vu_pauli[0] = {1.0, 0.0}; // charge channel + vu_pauli[1] = {0.5, 0.0}; // sigma_x + vu_pauli[2] = {0.3, 0.0}; // sigma_y + vu_pauli[3] = {0.2, 0.0}; // sigma_z + + std::complex vu_spin[4]; + vu_spin[0] = 0.5 * (vu_pauli[0] + vu_pauli[3]); + vu_spin[3] = 0.5 * (vu_pauli[0] - vu_pauli[3]); + vu_spin[1] = 0.5 * (vu_pauli[1] + std::complex(0.0, 1.0) * vu_pauli[2]); + vu_spin[2] = 0.5 * (vu_pauli[1] - std::complex(0.0, 1.0) * vu_pauli[2]); + + EXPECT_DOUBLE_EQ(vu_spin[0].real(), 0.6); // 0.5*(1.0+0.2) + EXPECT_DOUBLE_EQ(vu_spin[0].imag(), 0.0); + EXPECT_DOUBLE_EQ(vu_spin[3].real(), 0.4); // 0.5*(1.0-0.2) + EXPECT_DOUBLE_EQ(vu_spin[3].imag(), 0.0); + EXPECT_DOUBLE_EQ(vu_spin[1].real(), 0.25); // 0.5*0.5 + EXPECT_DOUBLE_EQ(vu_spin[1].imag(), 0.15); // 0.5*0.3 + EXPECT_DOUBLE_EQ(vu_spin[2].real(), 0.25); // 0.5*0.5 + EXPECT_DOUBLE_EQ(vu_spin[2].imag(), -0.15);// -0.5*0.3 +} + +// ===================================================================== +// Energy calculation tests +// ===================================================================== + +TEST_F(DftuPwTest, EnergyNspin1_DiagonalLocale) +{ + // E_U = sum_{m1,m2} U * weight_eu * locale[m2,m1] * locale[m1,m2] + // weight_eu = 1.0 for nspin=1 + const double U_val = 4.0; + const int m_size = 3; + const int size = m_size * m_size; + + std::vector locale_c(size, 0.0); + locale_c[0 * m_size + 0] = 0.5; + locale_c[1 * m_size + 1] = 0.3; + locale_c[2 * m_size + 2] = 0.2; + + double energy_u = 0.0; + const double weight_eu = 1.0; + for(int m1 = 0; m1 < m_size; m1++) + { + for(int m2 = 0; m2 < m_size; m2++) + { + energy_u += U_val * weight_eu * locale_c[m2 * m_size + m1] + * locale_c[m1 * m_size + m2]; + } + } + + // Only diagonal contributes: U * (0.5^2 + 0.3^2 + 0.2^2) = 4*(0.25+0.09+0.04) = 4*0.38 = 1.52 + EXPECT_DOUBLE_EQ(energy_u, 1.52); +} + +TEST_F(DftuPwTest, EnergyNspin2_TwoChannels) +{ + // nspin=2: weight_eu = 0.5, sum over both spin channels + const double U_val = 2.0; + const int m_size = 3; + const int size = m_size * m_size; + const double weight_eu = 0.5; + + std::vector locale_up(size, 0.0); + std::vector locale_dn(size, 0.0); + locale_up[0] = 0.4; // (0,0) + locale_dn[0] = 0.6; // (0,0) + + double energy_u = 0.0; + // spin-up contribution + for(int m1 = 0; m1 < m_size; m1++) + for(int m2 = 0; m2 < m_size; m2++) + energy_u += U_val * weight_eu * locale_up[m2 * m_size + m1] * locale_up[m1 * m_size + m2]; + // spin-down contribution + for(int m1 = 0; m1 < m_size; m1++) + for(int m2 = 0; m2 < m_size; m2++) + energy_u += U_val * weight_eu * locale_dn[m2 * m_size + m1] * locale_dn[m1 * m_size + m2]; + + // U*0.5*(0.4^2 + 0.6^2) = 2*0.5*(0.16+0.36) = 0.52 + EXPECT_DOUBLE_EQ(energy_u, 0.52); +} + +TEST_F(DftuPwTest, EnergyNspin4_WithOffDiagonal) +{ + // nspin=4: weight_eu = 0.25, includes off-diagonal Pauli components + const double U_val = 2.0; + const int m_size = 2; // simplified: s-orbital would be 1, use 2 for test + const int size = m_size * m_size; + const double weight_eu = 0.25; + + // 4 Pauli components stored contiguously + std::vector locale_c(size * 4, 0.0); + // charge channel (is=0) + locale_c[0] = 0.5; locale_c[1] = 0.1; + locale_c[2] = 0.1; locale_c[3] = 0.5; + // sigma_x (is=1) + locale_c[size + 0] = 0.2; locale_c[size + 1] = 0.0; + locale_c[size + 2] = 0.0; locale_c[size + 3] = 0.2; + + double energy_u = 0.0; + for(int is = 0; is < 4; is++) + { + int start = is * size; + for(int m1 = 0; m1 < m_size; m1++) + { + for(int m2 = 0; m2 < m_size; m2++) + { + energy_u += U_val * weight_eu + * locale_c[start + m2 * m_size + m1] + * locale_c[start + m1 * m_size + m2]; + } + } + } + + // is=0: 2*0.25*(0.5*0.5 + 0.1*0.1 + 0.1*0.1 + 0.5*0.5) = 0.5*(0.25+0.01+0.01+0.25) = 0.26 + // is=1: 2*0.25*(0.2*0.2 + 0 + 0 + 0.2*0.2) = 0.5*(0.04+0.04) = 0.04 + // is=2,3: 0 + EXPECT_DOUBLE_EQ(energy_u, 0.30); +} + +// ===================================================================== +// Locale accumulation from becp (cal_occ_pw core loop) +// ===================================================================== + +TEST_F(DftuPwTest, LocaleAccumNspin12) +{ + // nspin=1/2: locale[m1*m_size+m2] += weight * real(conj(becp[m1]) * becp[m2]) + const int m_size = 3; // p-orbital + const int nkb = 5; + const int begin_ih = 0; + const int m_begin = 0; // target_l=1, m_begin = 1*1 = 1... but for test simplicity use 0 + const int nbands = 2; + const double weights[2] = {1.0, 0.5}; + + // becp array: becp[ib*nkb + begin_ih + m_begin + m] + std::vector> becp(nbands * nkb, {0.0, 0.0}); + // band 0 + becp[0 * nkb + 0] = {1.0, 0.0}; + becp[0 * nkb + 1] = {0.0, 1.0}; + becp[0 * nkb + 2] = {0.5, 0.5}; + // band 1 + becp[1 * nkb + 0] = {0.5, 0.0}; + becp[1 * nkb + 1] = {0.5, -0.5}; + becp[1 * nkb + 2] = {0.0, 1.0}; + + std::vector locale_c(m_size * m_size, 0.0); + for(int ib = 0; ib < nbands; ib++) + { + const double weight = weights[ib]; + int ind_m1m2 = 0; + for(int m1 = 0; m1 < m_size; m1++) + { + const int index_m1 = ib * nkb + begin_ih + m_begin + m1; + for(int m2 = 0; m2 < m_size; m2++) + { + const int index_m2 = ib * nkb + begin_ih + m_begin + m2; + locale_c[ind_m1m2] += weight * (std::conj(becp[index_m1]) * becp[index_m2]).real(); + ind_m1m2++; + } + } + } + + // band0, w=1.0: conj(becp0)*becp0 = |1|^2=1, conj(becp0)*becp1 = 1*(0,1)=(0,1)->real=0 + // locale[0,0] from band0 = 1.0*1.0 = 1.0 + // band1, w=0.5: conj(becp0)*becp0 = |0.5|^2=0.25 + // locale[0,0] from band1 = 0.5*0.25 = 0.125 + EXPECT_DOUBLE_EQ(locale_c[0], 1.125); // 1.0 + 0.125 + + // locale[1,1]: band0 = 1.0*|i|^2 = 1.0, band1 = 0.5*|(0.5,-0.5)|^2 = 0.5*0.5 = 0.25 + EXPECT_DOUBLE_EQ(locale_c[4], 1.25); +} + +TEST_F(DftuPwTest, LocaleAccumNspin4_PauliComponents) +{ + // nspin=4: 4 Pauli components from becp with npol=2 + // occ[0] = w * conj(becp_up[m1]) * becp_up[m2] + // occ[1] = w * conj(becp_up[m1]) * becp_dn[m2] + // occ[2] = w * conj(becp_dn[m1]) * becp_up[m2] + // occ[3] = w * conj(becp_dn[m1]) * becp_dn[m2] + // locale[ind] += (occ[0]+occ[3]).real() -- charge + // locale[ind+size] += (occ[1]+occ[2]).real() -- sigma_x + // locale[ind+2*size] += (occ[1]-occ[2]).imag() -- sigma_y + // locale[ind+3*size] += (occ[0]-occ[3]).real() -- sigma_z + + const int m_size = 1; // s-orbital for simplicity + const int nkb = 2; + const int nbands = 1; + const double weight = 1.0; + + // becp layout: becp[ib*2*nkb + begin_ih + m] (up) + // becp[ib*2*nkb + begin_ih + m + nkb] (down) + std::vector> becp(nbands * 2 * nkb, {0.0, 0.0}); + // m=0 only (s-orbital) + becp[0 * 2 * nkb + 0] = {0.8, 0.0}; // becp_up[m=0] + becp[0 * 2 * nkb + 0 + nkb] = {0.0, 0.6}; // becp_dn[m=0] + + const int size = m_size * m_size; // 1 + std::vector locale_c(size * 4, 0.0); + + for(int ib = 0; ib < nbands; ib++) + { + int ind_m1m2 = 0; + for(int m1 = 0; m1 < m_size; m1++) + { + const int index_m1 = ib * 2 * nkb + 0 + m1; + for(int m2 = 0; m2 < m_size; m2++) + { + const int index_m2 = ib * 2 * nkb + 0 + m2; + std::complex occ[4]; + occ[0] = weight * std::conj(becp[index_m1]) * becp[index_m2]; + occ[1] = weight * std::conj(becp[index_m1]) * becp[index_m2 + nkb]; + occ[2] = weight * std::conj(becp[index_m1 + nkb]) * becp[index_m2]; + occ[3] = weight * std::conj(becp[index_m1 + nkb]) * becp[index_m2 + nkb]; + locale_c[ind_m1m2] += (occ[0] + occ[3]).real(); + locale_c[ind_m1m2 + size] += (occ[1] + occ[2]).real(); + locale_c[ind_m1m2 + 2 * size] += (occ[1] - occ[2]).imag(); + locale_c[ind_m1m2 + 3 * size] += (occ[0] - occ[3]).real(); + ind_m1m2++; + } + } + } + + // becp_up = (0.8, 0), becp_dn = (0, 0.6) + // occ[0] = conj(0.8)*0.8 = 0.64 + // occ[1] = conj(0.8)*(0,0.6) = 0.8*(0,0.6) = (0, 0.48) + // occ[2] = conj(0,0.6)*0.8 = (0,-0.6)*0.8 = (0, -0.48) + // occ[3] = conj(0,0.6)*(0,0.6) = (0,-0.6)*(0,0.6) = 0.36 + EXPECT_DOUBLE_EQ(locale_c[0], 1.0); // (0.64+0.36).real = 1.0 (charge) + EXPECT_DOUBLE_EQ(locale_c[1], 0.0); // (occ1+occ2).real = ((0,0.48)+(0,-0.48)).real = 0 + EXPECT_DOUBLE_EQ(locale_c[2], 0.96); // (occ1-occ2).imag = ((0,0.48)-(0,-0.48)).imag = 0.96 + EXPECT_DOUBLE_EQ(locale_c[3], 0.28); // (occ0-occ3).real = (0.64-0.36) = 0.28 (sigma_z) +} + +TEST_F(DftuPwTest, CopyLocaleToUomSave_Nspin2) +{ + // Verify copy_locale logic for split layout: [all_up | all_dn] + const int m_size = 3; + const int size = m_size * m_size; + + std::vector locale_spin0(size), locale_spin1(size); + for(int i = 0; i < size; i++) + { + locale_spin0[i] = static_cast(i + 1); + locale_spin1[i] = static_cast(i + 100); + } + + std::vector uom_save(size * 2, 0.0); + const int eff_pot_index = 0; + const int half_size = uom_save.size() / 2; + for(int mm = 0; mm < size; mm++) + { + uom_save[eff_pot_index + mm] = locale_spin0[mm]; + uom_save[half_size + eff_pot_index + mm] = locale_spin1[mm]; + } + + for(int i = 0; i < size; i++) + { + EXPECT_DOUBLE_EQ(uom_save[i], static_cast(i + 1)); + EXPECT_DOUBLE_EQ(uom_save[half_size + i], static_cast(i + 100)); + } +} + +TEST_F(DftuPwTest, CopyLocaleToUomSave_Nspin4) +{ + // nspin=4: 4 blocks stored contiguously + const int m_size = 3; + const int size = m_size * m_size; + const int total = size * 4; // 4 Pauli components + + std::vector locale_c(total); + for(int i = 0; i < total; i++) + locale_c[i] = static_cast(i + 1); + + std::vector uom_save(total, 0.0); + const int eff_pot_index = 0; + for(int mm = 0; mm < size; mm++) + { + uom_save[eff_pot_index + mm] = locale_c[mm]; + uom_save[eff_pot_index + mm + size] = locale_c[mm + size]; + uom_save[eff_pot_index + mm + 2 * size] = locale_c[mm + 2 * size]; + uom_save[eff_pot_index + mm + 3 * size] = locale_c[mm + 3 * size]; + } + + for(int i = 0; i < total; i++) + EXPECT_DOUBLE_EQ(uom_save[i], static_cast(i + 1)); +} + +// ===================================================================== +// Step 1: VU calculation test for nspin=2 (isolated from kernel) +// This tests the complete cal_occ_pw vu calculation path: +// becp -> locale -> vu_up/vu_dn +// ===================================================================== + +TEST_F(DftuPwTest, VU_Calculation_Nspin2_FullPath) +{ + // Simulate complete vu calculation for nspin=2 + // This is the EXACT logic from cal_occ_pw, isolated from kernel + + const int m_size = 5; // d-orbital: 2*2+1 + const int size = m_size * m_size; // 25 + const double U_val = 5.0; + const double weight_eu = 0.5; // nspin=2 + const double diag_coeff = 0.5; + + // Simulated locale values (would normally come from becp accumulation) + std::vector locale_up(size, 0.0); + std::vector locale_dn(size, 0.0); + // Set diagonal values typical for occupied d-orbitals + for(int m = 0; m < m_size; m++) + { + locale_up[m * m_size + m] = 0.8; + locale_dn[m * m_size + m] = 0.2; + } + + // Calculate VU for spin-up + std::vector> vu_up(size, {0.0, 0.0}); + for(int m1 = 0; m1 < m_size; m1++) + { + for(int m2 = 0; m2 < m_size; m2++) + { + vu_up[m1 * m_size + m2] = U_val * + (diag_coeff * (m1 == m2) - locale_up[m2 * m_size + m1]); + } + } + + // Calculate VU for spin-down + std::vector> vu_dn(size, {0.0, 0.0}); + for(int m1 = 0; m1 < m_size; m1++) + { + for(int m2 = 0; m2 < m_size; m2++) + { + vu_dn[m1 * m_size + m2] = U_val * + (diag_coeff * (m1 == m2) - locale_dn[m2 * m_size + m1]); + } + } + + // Verify spin-up VU + // diagonal: U*(0.5 - 0.8) = 5*(-0.3) = -1.5 + for(int m = 0; m < m_size; m++) + { + EXPECT_DOUBLE_EQ(vu_up[m * m_size + m].real(), -1.5); + EXPECT_DOUBLE_EQ(vu_up[m * m_size + m].imag(), 0.0); + } + // off-diagonal: U*(0 - 0) = 0 + EXPECT_DOUBLE_EQ(vu_up[0 * m_size + 1].real(), 0.0); + EXPECT_DOUBLE_EQ(vu_up[1 * m_size + 0].real(), 0.0); + + // Verify spin-down VU + // diagonal: U*(0.5 - 0.2) = 5*(0.3) = 1.5 + for(int m = 0; m < m_size; m++) + { + EXPECT_DOUBLE_EQ(vu_dn[m * m_size + m].real(), 1.5); + EXPECT_DOUBLE_EQ(vu_dn[m * m_size + m].imag(), 0.0); + } + // off-diagonal: U*(0 - 0) = 0 + EXPECT_DOUBLE_EQ(vu_dn[0 * m_size + 1].real(), 0.0); + EXPECT_DOUBLE_EQ(vu_dn[1 * m_size + 0].real(), 0.0); + + // Verify energy calculation + double energy_u = 0.0; + for(int m1 = 0; m1 < m_size; m1++) + for(int m2 = 0; m2 < m_size; m2++) + { + energy_u += U_val * weight_eu * locale_up[m2 * m_size + m1] * locale_up[m1 * m_size + m2]; + energy_u += U_val * weight_eu * locale_dn[m2 * m_size + m1] * locale_dn[m1 * m_size + m2]; + } + // Only diagonal: 5 orbitals per spin channel + // spin-up: 5 * U * weight_eu * 0.8*0.8 = 5 * 5.0 * 0.5 * 0.64 = 8.0 + // spin-down: 5 * U * weight_eu * 0.2*0.2 = 5 * 5.0 * 0.5 * 0.04 = 0.5 + // total = 8.5 + EXPECT_DOUBLE_EQ(energy_u, 8.5); +} + +// ===================================================================== +// Step 2: Test vu_device sync for nspin=2 +// This verifies the vu transfer from eff_pot_pw to vu_device +// ===================================================================== + +TEST_F(DftuPwTest, VU_DeviceSync_Nspin2) +{ + // Simulate eff_pot_pw layout for nspin=2 + const int m_size = 5; + const int size = m_size * m_size; + const int total_size = size * 2; // spin-up + spin-down + + std::vector> eff_pot_pw(total_size); + // Initialize with known values + for(int i = 0; i < size; i++) + { + eff_pot_pw[i] = {static_cast(i + 1), 0.0}; // spin-up + eff_pot_pw[i + size] = {static_cast(i + 100), 0.0}; // spin-down + } + + // Simulate vu_device sync for spin-down (isk[ik] == 1) + const int size_eff_pot_pw = total_size / 2; + std::vector> vu_device(size_eff_pot_pw); + // memcpy from eff_pot_pw[0] + size_eff_pot_pw + for(int i = 0; i < size_eff_pot_pw; i++) + { + vu_device[i] = eff_pot_pw[i + size_eff_pot_pw]; + } + + // Verify vu_device contains spin-down values + for(int i = 0; i < size; i++) + { + EXPECT_DOUBLE_EQ(vu_device[i].real(), static_cast(i + 100)); + EXPECT_DOUBLE_EQ(vu_device[i].imag(), 0.0); + } +} + +// ===================================================================== +// Step 3: Test onsite_ps_op kernel for nspin=2 (npol=1) +// This tests the vu application to ps without full ABACUS integration +// ===================================================================== + +TEST_F(DftuPwTest, OnsitePsOpKernel_Nspin2_Npol1) +{ + // Simulate the npol=1 branch of onsite_ps_op kernel + const int npm = 4; // number of bands (npm/npol for npol=1) + const int npol = 1; + const int tnp = 10; // total number of projectors + const int orb_l = 2; // d-orbital + const int tlp1 = 2 * orb_l + 1; // 5 + const int nat = 2; + + // vu array: 2 atoms, each with tlp1*tlp1 = 25 elements + std::vector> vu(nat * tlp1 * tlp1); + for(int i = 0; i < nat * tlp1 * tlp1; i++) + vu[i] = {static_cast(i + 1), 0.0}; + + // ip_m: maps each projector to m index within its atom + // First atom (iat=0): projectors 0-4 map to m=0-4 + // Second atom (iat=1): projectors 5-9 map to m=0-4 + std::vector ip_m = {0, 1, 2, 3, 4, 0, 1, 2, 3, 4}; + std::vector ip_iat = {0, 0, 0, 0, 0, 1, 1, 1, 1, 1}; + std::vector vu_begin_iat = {0, tlp1 * tlp1}; + + // becp: npm * tnp + std::vector> becp(npm * tnp, {0.0, 0.0}); + // Set some non-zero becp values + for(int ib = 0; ib < npm; ib++) + for(int ip = 0; ip < tnp; ip++) + becp[ib * tnp + ip] = {static_cast(ib + ip + 1), 0.0}; + + // ps: tnp * npm + std::vector> ps(tnp * npm, {0.0, 0.0}); + + // Kernel logic for npol=1 (EXACT copy from onsite_op.cpp) + for(int ib = 0; ib < npm; ib++) + { + for(int ip = 0; ip < tnp; ip++) + { + int m1 = ip_m[ip]; + if(m1 < 0) continue; + int iat = ip_iat[ip]; + const std::complex* vu_iat = vu.data() + vu_begin_iat[iat]; + int ip2_begin = ip - m1; + int ip2_end = ip - m1 + tlp1; + const int psind = ip * npm + ib; + for(int ip2 = ip2_begin; ip2 < ip2_end; ip2++) + { + const int becpind = ib * tnp + ip2; + int m2 = ip_m[ip2]; + const int index_mm = m1 * tlp1 + m2; + ps[psind] += vu_iat[index_mm] * becp[becpind]; + } + } + } + + // Verify ps[0] (ib=0, ip=0) + // m1=0, iat=0, vu_iat=vu[0..] + // ip2 from 0 to 5 + std::complex expected_ps00 = {0.0, 0.0}; + for(int ip2 = 0; ip2 < tlp1; ip2++) + { + const int becpind = 0 * tnp + ip2; + int m2 = ip_m[ip2]; + const int index_mm = 0 * tlp1 + m2; + expected_ps00 += vu[index_mm] * becp[becpind]; + } + EXPECT_DOUBLE_EQ(ps[0].real(), expected_ps00.real()); + EXPECT_DOUBLE_EQ(ps[0].imag(), expected_ps00.imag()); +} + +// ===================================================================== +// Step 4: Test spin-up only path (isolate from spin-down) +// ===================================================================== + +TEST_F(DftuPwTest, SpinUpOnly_Path_Nspin2) +{ + // Test that spin-up calculation is independent and correct + const int m_size = 5; + const int size = m_size * m_size; + const double U_val = 5.0; + const double diag_coeff = 0.5; + + // Only set spin-up locale + std::vector locale_up(size, 0.0); + for(int m = 0; m < m_size; m++) + locale_up[m * m_size + m] = 0.8; + + // Calculate VU for spin-up only + std::vector> vu_up(size, {0.0, 0.0}); + for(int m1 = 0; m1 < m_size; m1++) + { + for(int m2 = 0; m2 < m_size; m2++) + { + vu_up[m1 * m_size + m2] = U_val * + (diag_coeff * (m1 == m2) - locale_up[m2 * m_size + m1]); + } + } + + // Verify diagonal values + for(int m = 0; m < m_size; m++) + EXPECT_DOUBLE_EQ(vu_up[m * m_size + m].real(), -1.5); // 5*(0.5-0.8) + + // Verify off-diagonal are zero + for(int m1 = 0; m1 < m_size; m1++) + for(int m2 = 0; m2 < m_size; m2++) + if(m1 != m2) + EXPECT_DOUBLE_EQ(vu_up[m1 * m_size + m2].real(), 0.0); +} + +// ===================================================================== +// Step 5: Test spin-down only path (isolate from spin-up) +// ===================================================================== + +TEST_F(DftuPwTest, SpinDownOnly_Path_Nspin2) +{ + // Test that spin-down calculation is independent and correct + const int m_size = 5; + const int size = m_size * m_size; + const double U_val = 5.0; + const double diag_coeff = 0.5; + + // Only set spin-down locale + std::vector locale_dn(size, 0.0); + for(int m = 0; m < m_size; m++) + locale_dn[m * m_size + m] = 0.2; + + // Calculate VU for spin-down only + std::vector> vu_dn(size, {0.0, 0.0}); + for(int m1 = 0; m1 < m_size; m1++) + { + for(int m2 = 0; m2 < m_size; m2++) + { + vu_dn[m1 * m_size + m2] = U_val * + (diag_coeff * (m1 == m2) - locale_dn[m2 * m_size + m1]); + } + } + + // Verify diagonal values + for(int m = 0; m < m_size; m++) + EXPECT_DOUBLE_EQ(vu_dn[m * m_size + m].real(), 1.5); // 5*(0.5-0.2) + + // Verify off-diagonal are zero + for(int m1 = 0; m1 < m_size; m1++) + for(int m2 = 0; m2 < m_size; m2++) + if(m1 != m2) + EXPECT_DOUBLE_EQ(vu_dn[m1 * m_size + m2].real(), 0.0); +} + +// ===================================================================== +// Multi-atom split layout test for nspin=2 +// Verifies that the split layout [all_up | all_dn] works correctly +// with multiple correlated atoms (the P0-1 bug fix) +// ===================================================================== + +TEST_F(DftuPwTest, MultiAtomSplitLayout_Nspin2) +{ + // 2 correlated atoms with d-orbital (l=2) + const int nat = 2; + const int m_size = 5; + const int size = m_size * m_size; // 25 per atom per spin + const int P = nat * size; // 50 = total spin-up block size + const int total = P * 2; // 100 = total array size (split: up|dn) + + // eff_pot_pw_index: split layout, each atom gets `size` entries + std::vector eff_pot_pw_index(nat); + eff_pot_pw_index[0] = 0; + eff_pot_pw_index[1] = size; // 25 + + // --- Test uom_array writing (dftu_pw.cpp logic) --- + std::vector uom_array(total, 0.0); + // Simulate locale values for both atoms + std::vector locale_up_0(size, 0.0), locale_dn_0(size, 0.0); + std::vector locale_up_1(size, 0.0), locale_dn_1(size, 0.0); + for(int m = 0; m < m_size; m++) + { + locale_up_0[m * m_size + m] = 0.8; + locale_dn_0[m * m_size + m] = 0.2; + locale_up_1[m * m_size + m] = 0.7; + locale_dn_1[m * m_size + m] = 0.3; + } + + // Write to uom_array using split layout + const int half_size = total / 2; // P = 50 + // atom 0 + for(int mm = 0; mm < size; mm++) + { + uom_array[eff_pot_pw_index[0] + mm] = locale_up_0[mm]; + uom_array[half_size + eff_pot_pw_index[0] + mm] = locale_dn_0[mm]; + } + // atom 1 + for(int mm = 0; mm < size; mm++) + { + uom_array[eff_pot_pw_index[1] + mm] = locale_up_1[mm]; + uom_array[half_size + eff_pot_pw_index[1] + mm] = locale_dn_1[mm]; + } + + // Verify split layout: first half = all spin-up, second half = all spin-down + // atom 0 up: [0..24] + EXPECT_DOUBLE_EQ(uom_array[0], 0.8); // locale_up_0 diagonal + // atom 1 up: [25..49] + EXPECT_DOUBLE_EQ(uom_array[size + 0], 0.7); // locale_up_1 diagonal + // atom 0 dn: [50..74] + EXPECT_DOUBLE_EQ(uom_array[half_size + 0], 0.2); // locale_dn_0 diagonal + // atom 1 dn: [75..99] + EXPECT_DOUBLE_EQ(uom_array[half_size + size + 0], 0.3); // locale_dn_1 diagonal + + // --- Test set_locale reading (dftu_occup.cpp logic) --- + std::vector read_up_0(size, 0.0), read_dn_0(size, 0.0); + std::vector read_up_1(size, 0.0), read_dn_1(size, 0.0); + + for(int mm = 0; mm < size; mm++) + { + // atom 0 + read_up_0[mm] = uom_array[eff_pot_pw_index[0] + mm]; + read_dn_0[mm] = uom_array[half_size + eff_pot_pw_index[0] + mm]; + // atom 1 + read_up_1[mm] = uom_array[eff_pot_pw_index[1] + mm]; + read_dn_1[mm] = uom_array[half_size + eff_pot_pw_index[1] + mm]; + } + + for(int mm = 0; mm < size; mm++) + { + EXPECT_DOUBLE_EQ(read_up_0[mm], locale_up_0[mm]); + EXPECT_DOUBLE_EQ(read_dn_0[mm], locale_dn_0[mm]); + EXPECT_DOUBLE_EQ(read_up_1[mm], locale_up_1[mm]); + EXPECT_DOUBLE_EQ(read_dn_1[mm], locale_dn_1[mm]); + } + + // --- Test VU writing (dftu_pw.cpp logic) --- + std::vector> eff_pot_pw(total, {0.0, 0.0}); + const double U_val = 5.0; + const double diag_coeff = 0.5; + + // atom 0 spin-up VU + std::complex* vu_up_0 = &eff_pot_pw[eff_pot_pw_index[0]]; + for(int m1 = 0; m1 < m_size; m1++) + for(int m2 = 0; m2 < m_size; m2++) + vu_up_0[m1 * m_size + m2] = U_val * (diag_coeff * (m1 == m2) - locale_up_0[m2 * m_size + m1]); + + // atom 0 spin-down VU (split layout: offset by half_size) + std::complex* vu_dn_0 = &eff_pot_pw[eff_pot_pw.size() / 2 + eff_pot_pw_index[0]]; + for(int m1 = 0; m1 < m_size; m1++) + for(int m2 = 0; m2 < m_size; m2++) + vu_dn_0[m1 * m_size + m2] = U_val * (diag_coeff * (m1 == m2) - locale_dn_0[m2 * m_size + m1]); + + // atom 1 spin-up VU + std::complex* vu_up_1 = &eff_pot_pw[eff_pot_pw_index[1]]; + for(int m1 = 0; m1 < m_size; m1++) + for(int m2 = 0; m2 < m_size; m2++) + vu_up_1[m1 * m_size + m2] = U_val * (diag_coeff * (m1 == m2) - locale_up_1[m2 * m_size + m1]); + + // atom 1 spin-down VU + std::complex* vu_dn_1 = &eff_pot_pw[eff_pot_pw.size() / 2 + eff_pot_pw_index[1]]; + for(int m1 = 0; m1 < m_size; m1++) + for(int m2 = 0; m2 < m_size; m2++) + vu_dn_1[m1 * m_size + m2] = U_val * (diag_coeff * (m1 == m2) - locale_dn_1[m2 * m_size + m1]); + + // Verify VU values + // atom 0 up diagonal: 5*(0.5-0.8) = -1.5 + EXPECT_DOUBLE_EQ(vu_up_0[0].real(), -1.5); + // atom 0 dn diagonal: 5*(0.5-0.2) = 1.5 + EXPECT_DOUBLE_EQ(vu_dn_0[0].real(), 1.5); + // atom 1 up diagonal: 5*(0.5-0.7) = -1.0 + EXPECT_DOUBLE_EQ(vu_up_1[0].real(), -1.0); + // atom 1 dn diagonal: 5*(0.5-0.3) = 1.0 + EXPECT_DOUBLE_EQ(vu_dn_1[0].real(), 1.0); + + // Verify no overlap between atoms in VU arrays + // atom 0 up ends at index 24, atom 1 up starts at 25 — no overlap + EXPECT_NE(vu_up_0[0], vu_up_1[0]); + // atom 0 dn starts at half_size=50, atom 1 dn starts at half_size+25=75 — no overlap + EXPECT_NE(vu_dn_0[0], vu_dn_1[0]); +} + +// ===================================================================== +// Test that split layout copy_locale/uom_save is consistent +// with set_locale/uom_array round-trip for multi-atom nspin=2 +// ===================================================================== + +TEST_F(DftuPwTest, RoundTripCopyAndSetLocale_Nspin2_MultiAtom) +{ + const int nat = 2; + const int m_size = 5; + const int size = m_size * m_size; + const int P = nat * size; + const int total = P * 2; + + std::vector eff_pot_pw_index = {0, size}; + std::vector uom_save(total, 0.0); + std::vector uom_array(total, 0.0); + + // Simulate locale values + std::vector> locale_up(nat, std::vector(size, 0.0)); + std::vector> locale_dn(nat, std::vector(size, 0.0)); + for(int iat = 0; iat < nat; iat++) + for(int m = 0; m < m_size; m++) + { + locale_up[iat][m * m_size + m] = 0.9 - iat * 0.1; + locale_dn[iat][m * m_size + m] = 0.1 + iat * 0.1; + } + + // copy_locale -> uom_save (split layout) + const int half_size = total / 2; + for(int iat = 0; iat < nat; iat++) + for(int mm = 0; mm < size; mm++) + { + uom_save[eff_pot_pw_index[iat] + mm] = locale_up[iat][mm]; + uom_save[half_size + eff_pot_pw_index[iat] + mm] = locale_dn[iat][mm]; + } + + // cal_occ_pw -> uom_array (split layout) + for(int iat = 0; iat < nat; iat++) + for(int mm = 0; mm < size; mm++) + { + uom_array[eff_pot_pw_index[iat] + mm] = locale_up[iat][mm]; + uom_array[half_size + eff_pot_pw_index[iat] + mm] = locale_dn[iat][mm]; + } + + // Mixing would compare uom_array with uom_save — verify they match + for(int i = 0; i < total; i++) + EXPECT_DOUBLE_EQ(uom_array[i], uom_save[i]); + + // set_locale reads back from uom_array + std::vector> read_up(nat, std::vector(size, 0.0)); + std::vector> read_dn(nat, std::vector(size, 0.0)); + for(int iat = 0; iat < nat; iat++) + for(int mm = 0; mm < size; mm++) + { + read_up[iat][mm] = uom_array[eff_pot_pw_index[iat] + mm]; + read_dn[iat][mm] = uom_array[half_size + eff_pot_pw_index[iat] + mm]; + } + + // Verify round-trip consistency + for(int iat = 0; iat < nat; iat++) + for(int mm = 0; mm < size; mm++) + { + EXPECT_DOUBLE_EQ(read_up[iat][mm], locale_up[iat][mm]); + EXPECT_DOUBLE_EQ(read_dn[iat][mm], locale_dn[iat][mm]); + } +} + +// ===================================================================== +// get_locale_flat / set_locale_flat logic tests (pure arithmetic) +// +// These test the nspin-dependent packing/unpacking logic without +// requiring a Plus_U instance, by simulating the same operations. +// ===================================================================== + +TEST_F(DftuPwTest, LocaleFlatPackNspin1) +{ + PARAM.input.nspin = 1; + const int tlp1 = 3; + const int size = tlp1 * tlp1; + std::vector locale_spin0(size); + for (int i = 0; i < size; i++) locale_spin0[i] = static_cast(i); + std::vector occ(size); + for (int i = 0; i < size; i++) occ[i] = locale_spin0[i]; + for (int i = 0; i < size; i++) EXPECT_DOUBLE_EQ(occ[i], static_cast(i)); +} + +TEST_F(DftuPwTest, LocaleFlatPackNspin2) +{ + PARAM.input.nspin = 2; + const int tlp1 = 3; + const int size = tlp1 * tlp1; + std::vector locale_spin0(size), locale_spin1(size); + for (int i = 0; i < size; i++) + { + locale_spin0[i] = static_cast(i); + locale_spin1[i] = static_cast(i + 100); + } + std::vector occ(2 * size); + for (int i = 0; i < size; i++) + { + occ[i] = locale_spin0[i]; + occ[size + i] = locale_spin1[i]; + } + for (int i = 0; i < size; i++) + { + EXPECT_DOUBLE_EQ(occ[i], static_cast(i)); + EXPECT_DOUBLE_EQ(occ[size + i], static_cast(i + 100)); + } +} + +TEST_F(DftuPwTest, LocaleFlatSetRoundTrip) +{ + const int tlp1 = 2; + const int size = tlp1 * tlp1; + std::vector locale_data(size, 0.0); + std::vector occ(size); + for (int i = 0; i < size; i++) occ[i] = static_cast(i + 50); + for (int i = 0; i < size; i++) locale_data[i] = occ[i]; + for (int i = 0; i < size; i++) + EXPECT_DOUBLE_EQ(locale_data[i], static_cast(i + 50)); +} diff --git a/source/source_lcao/module_operator_lcao/dftu_force_stress.hpp b/source/source_lcao/module_operator_lcao/dftu_force_stress.hpp index 9b5958e4056..38c96025fa5 100644 --- a/source/source_lcao/module_operator_lcao/dftu_force_stress.hpp +++ b/source/source_lcao/module_operator_lcao/dftu_force_stress.hpp @@ -49,7 +49,7 @@ void DFTU>::cal_force_stress(const bool cal_force, int T0=0; int I0=0; ucell->iat2iait(iat0, &I0, &T0); - if(this->dftu->orbital_corr[T0] == -1) + if(!this->dftu->has_correlated_orbital(T0)) { continue; } @@ -71,11 +71,11 @@ void DFTU>::cal_force_stress(const bool cal_force, int T0=0; int I0=0; ucell->iat2iait(iat0, &I0, &T0); - const int target_L = this->dftu->orbital_corr[T0]; - if (target_L == -1) - { - continue; - } + if (!this->dftu->has_correlated_orbital(T0)) + { + continue; + } + const int target_L = this->dftu->get_orbital_corr(T0); const int tlp1 = 2 * target_L + 1; AdjacentAtomInfo& adjs = this->adjs_all[atom_index_all[iat0]]; @@ -139,22 +139,7 @@ void DFTU>::cal_force_stress(const bool cal_force, } // first iteration to calculate occupation matrix std::vector occ(tlp1 * tlp1 * this->nspin, 0); - if(this->nspin ==2) - { - for (int i = 0; i < occ.size(); i++) - { - const int is = i / (tlp1 * tlp1); - const int ii = i % (tlp1 * tlp1); - occ[i] = this->dftu->locale[iat0][target_L][0][is].c[ii]; - } - } - else - { - for (int i = 0; i < occ.size(); i++) - { - occ[i] = this->dftu->locale[iat0][target_L][0][0].c[i]; - } - } + this->dftu->get_locale_flat(iat0, target_L, occ); // calculate VU const double u_value = this->dftu->U[T0]; diff --git a/source/source_lcao/module_operator_lcao/dftu_lcao.cpp b/source/source_lcao/module_operator_lcao/dftu_lcao.cpp index 3189f05f13c..3f3d5f0f396 100644 --- a/source/source_lcao/module_operator_lcao/dftu_lcao.cpp +++ b/source/source_lcao/module_operator_lcao/dftu_lcao.cpp @@ -55,11 +55,11 @@ void hamilt::DFTU>::initialize_HR(const Grid_Driver int T0=0; int I0=0; ucell->iat2iait(iat0, &I0, &T0); - const int target_L = this->dftu->orbital_corr[T0]; - if (target_L == -1) - { - continue; - } + if (!this->dftu->has_correlated_orbital(T0)) + { + continue; + } + const int target_L = this->dftu->get_orbital_corr(T0); AdjacentAtomInfo adjs; GridD->Find_atom(*ucell, tau0, T0, I0, &adjs); @@ -107,12 +107,12 @@ void hamilt::DFTU>::cal_nlm_all(const Parallel_Orbi int T0=0; int I0=0; ucell->iat2iait(iat0, &I0, &T0); - const int target_L = this->dftu->orbital_corr[T0]; - if (target_L == -1) - { - continue; - } - const int tlp1 = 2 * target_L + 1; + if (!this->dftu->has_correlated_orbital(T0)) + { + continue; + } + const int target_L = this->dftu->get_orbital_corr(T0); + const int tlp1 = 2 * target_L + 1; AdjacentAtomInfo& adjs = this->adjs_all[atom_index++]; // calculate and save the table of two-center integrals @@ -177,7 +177,7 @@ template void hamilt::DFTU>::contributeHR() { ModuleBase::TITLE("DFTU", "contributeHR"); - if (this->dftu->get_dmr(0) == nullptr && this->dftu->initialed_locale == false) + if (this->dftu->get_dmr(0) == nullptr && !this->dftu->is_locale_initialized()) { // skip the calculation if dm_in_dftu is nullptr return; } @@ -203,11 +203,11 @@ void hamilt::DFTU>::contributeHR() auto tau0 = ucell->get_tau(iat0); int T0, I0; ucell->iat2iait(iat0, &I0, &T0); - const int target_L = this->dftu->orbital_corr[T0]; - if (target_L == -1) - { - continue; - } + if (!this->dftu->has_correlated_orbital(T0)) + { + continue; + } + const int target_L = this->dftu->get_orbital_corr(T0); const int tlp1 = 2 * target_L + 1; AdjacentAtomInfo& adjs = this->adjs_all[atom_index++]; @@ -215,7 +215,7 @@ void hamilt::DFTU>::contributeHR() // first iteration to calculate occupation matrix const int spin_fold = (this->nspin == 4) ? 4 : 1; std::vector occ(tlp1 * tlp1 * spin_fold, 0.0); - if (this->dftu->initialed_locale == false) + if (!this->dftu->is_locale_initialized()) { const hamilt::HContainer* dmR_current = this->dftu->get_dmr(this->current_spin); for (int ad1 = 0; ad1 < adjs.adj_num + 1; ++ad1) @@ -249,20 +249,18 @@ void hamilt::DFTU>::contributeHR() Parallel_Reduce::reduce_all(occ.data(), occ.size()); #endif // save occ to dftu - for (int i = 0; i < occ.size(); i++) + if (this->nspin == 1) { - if (this->nspin == 1) - { - occ[i] *= 0.5; - } - this->dftu->locale[iat0][target_L][0][this->current_spin].c[i] = occ[i]; + for (auto& v : occ) { v *= 0.5; } } + this->dftu->set_locale_flat(iat0, target_L, this->current_spin, occ); } else // use readin locale to calculate occupation matrix { - for (int i = 0; i < occ.size(); i++) + for (int i = 0; i < static_cast(occ.size()); i++) { - occ[i] = this->dftu->locale[iat0][target_L][0][this->current_spin].c[i]; + occ[i] = this->dftu->get_locale(iat0, target_L, 0, this->current_spin, + i / (2 * target_L + 1), i % (2 * target_L + 1)); } // set initialed_locale to false to avoid using readin locale in next iteration } @@ -321,7 +319,7 @@ void hamilt::DFTU>::contributeHR() // for readin onsite_dm, set initialed_locale to false to avoid using readin locale in next iteration if (this->current_spin == this->nspin - 1 || this->nspin == 4) { - this->dftu->initialed_locale = false; + this->dftu->mark_locale_dirty(); } // update this->current_spin: only nspin=2 iterate change it between 0 and 1 diff --git a/source/source_lcao/module_operator_lcao/dspin_lcao.cpp b/source/source_lcao/module_operator_lcao/dspin_lcao.cpp index 7954ae8ab22..743b13f35a3 100644 --- a/source/source_lcao/module_operator_lcao/dspin_lcao.cpp +++ b/source/source_lcao/module_operator_lcao/dspin_lcao.cpp @@ -29,6 +29,8 @@ hamilt::DeltaSpin>::DeltaSpin(HS_Matrix_K* hsk_ this->lambda_save.resize(this->ucell->nat * 3, 0.0); this->update_lambda_.resize(this->nspin, false); + this->B_I_data.resize(this->ucell->nat); + this->B_I_nproj.resize(this->ucell->nat, 0); } // destructor @@ -346,6 +348,18 @@ void hamilt::DeltaSpin>::cal_pre_HR() } } + // Save B_I overlap data for subspace projection optimization + this->B_I_data[iat].clear(); + this->B_I_nproj[iat] = max_l_plus_1 * max_l_plus_1; + for (int ad = 0; ad < adjs.adj_num + 1; ++ad) + { + BI_AdjacentData bi_ad; + bi_ad.iat_adj = this->ucell->itia2iat(adjs.ntype[ad], adjs.natom[ad]); + bi_ad.R_index = adjs.box[ad]; + bi_ad.nlm = nlm_iat0[ad]; + this->B_I_data[iat].push_back(std::move(bi_ad)); + } + // fourth step: calculate the for (int ad1 = 0; ad1 < adjs.adj_num + 1; ++ad1) { @@ -525,6 +539,89 @@ void hamilt::DeltaSpin, std::complex exp(ik·R) +// C_k is the 2D-block distributed wavefunction matrix +template +void hamilt::DeltaSpin>::cal_PI_sub( + const ModuleBase::Vector3& kvec_d, + const std::complex* psi_k, + const int nbands_global, + std::vector>>& PI_sub) const +{ + const int nat = this->ucell->nat; + PI_sub.resize(nat); + + const int nrow_local = this->paraV->get_row_size(); // local rows of C_k + const int ncol_local = this->paraV->ncol_bands; // local band columns of C_k + const int lda = nrow_local; // leading dimension (column-major for ScaLAPACK) + + for (int iat = 0; iat < nat; iat++) + { + if (!this->constraint_atom_list[iat]) + { + PI_sub[iat].clear(); + continue; + } + + const int r = this->B_I_nproj[iat]; + // D_I_local: r × nbands_global, initialized to zero + // We accumulate local contributions, then MPI_Allreduce + std::vector> D_I(r * nbands_global, {0.0, 0.0}); + + for (const auto& bi_ad : this->B_I_data[iat]) + { + // Phase factor: exp(i * 2pi * k · R) + const double arg = 2.0 * M_PI * (kvec_d.x * bi_ad.R_index.x + + kvec_d.y * bi_ad.R_index.y + + kvec_d.z * bi_ad.R_index.z); + const std::complex phase(cos(arg), sin(arg)); + + for (const auto& [iw_global, nlm_vec] : bi_ad.nlm) + { + // Check if this global orbital index is in our local rows + const int iw_local = this->paraV->global2local_row(iw_global); + if (iw_local < 0) { continue; + } + + // D_I[lm, jb_global] += nlm_vec[lm] * phase * C_k[iw_local, jb_local] + // C_k is column-major: C_k[irow, icol] = psi_k[irow + icol * lda] + for (int jb_local = 0; jb_local < ncol_local; jb_local++) + { + const int jb_global = this->paraV->local2global_col(jb_local); + const std::complex c_val = phase * psi_k[iw_local + jb_local * lda]; + for (int lm = 0; lm < r; lm++) + { + D_I[lm * nbands_global + jb_global] += nlm_vec[lm] * c_val; + } + } + } + } + + // MPI_Allreduce to sum D_I across all processes +#ifdef __MPI + MPI_Allreduce(MPI_IN_PLACE, D_I.data(), 2 * r * nbands_global, + MPI_DOUBLE, MPI_SUM, this->paraV->comm()); +#endif + + // Compute P_I_sub = D_I^dag D_I (nbands × nbands Hermitian matrix) + // Using zgemm: C = alpha * A^H * B + beta * C + // A = D_I (r × nbands), B = D_I (r × nbands) + // C = P_I_sub (nbands × nbands) + PI_sub[iat].resize(nbands_global * nbands_global, {0.0, 0.0}); + const std::complex one = {1.0, 0.0}; + const std::complex zero_c = {0.0, 0.0}; + // zgemm: P = D^H * D, where D is r × nbands (row-major: D[lm][jb]) + // In column-major (Fortran) convention for BLAS: + // D stored as nbands_global × r (transposed view) + // We want P = D^H * D = (r×nb)^H * (r×nb) = nb×nb + zgemm_("C", "N", &nbands_global, &nbands_global, &r, + &one, D_I.data(), &r, + D_I.data(), &r, + &zero_c, PI_sub[iat].data(), &nbands_global); + } +} + #include "dspin_force_stress.hpp" template class hamilt::DeltaSpin>; diff --git a/source/source_lcao/module_operator_lcao/dspin_lcao.h b/source/source_lcao/module_operator_lcao/dspin_lcao.h index 291d2b87d9f..1e0135ac9a5 100644 --- a/source/source_lcao/module_operator_lcao/dspin_lcao.h +++ b/source/source_lcao/module_operator_lcao/dspin_lcao.h @@ -8,6 +8,7 @@ #include "source_lcao/module_operator_lcao/operator_lcao.h" #include "source_lcao/module_hcontainer/hcontainer.h" #include +#include namespace hamilt { @@ -66,6 +67,18 @@ class DeltaSpin> : public OperatorLCAO ModuleBase::matrix& force, ModuleBase::matrix& stress); + /// @brief Compute P_I_sub(k) = D_I(k)^dag D_I(k) for all constrained atoms + /// Uses saved B_I overlaps and 2D-block distributed wavefunctions + /// @param kvec_d k-point in direct coordinates (for phase factor) + /// @param psi_k wavefunction coefficients C_k (2D-block distributed) + /// @param nbands_global global number of bands + /// @param PI_sub output: PI_sub[iat] is nbands×nbands Hermitian matrix (gathered to all procs) + /// Only filled for constrained atoms; empty for unconstrained. + void cal_PI_sub(const ModuleBase::Vector3& kvec_d, + const std::complex* psi_k, + const int nbands_global, + std::vector>>& PI_sub) const; + private: const UnitCell* ucell = nullptr; @@ -154,6 +167,16 @@ class DeltaSpin> : public OperatorLCAO bool initialized = false; int spin_num = 1; std::vector update_lambda_; + + /// @brief Saved B_I overlap data for subspace projection optimization + /// For each constrained atom I, stores the overlaps organized by adjacent atoms + struct BI_AdjacentData { + int iat_adj; ///< global atom index of adjacent atom + ModuleBase::Vector3 R_index; ///< cell index of adjacent atom + std::unordered_map> nlm; ///< iw_global -> + }; + std::vector> B_I_data; ///< [iat][adj_index] + std::vector B_I_nproj; ///< r = max_l_plus_1^2 per constrained atom }; } diff --git a/source/source_lcao/module_operator_lcao/test/CMakeLists.txt b/source/source_lcao/module_operator_lcao/test/CMakeLists.txt index a1c52935cf1..304cc92e327 100644 --- a/source/source_lcao/module_operator_lcao/test/CMakeLists.txt +++ b/source/source_lcao/module_operator_lcao/test/CMakeLists.txt @@ -82,10 +82,10 @@ AddTest( AddTest( TARGET MODULE_LCAO_operator_dftu_test - LIBS parameter ${math_libs} psi base device container - SOURCES test_dftu.cpp ../dftu_lcao.cpp ../../module_hcontainer/func_folding.cpp - ../../module_hcontainer/base_matrix.cpp ../../module_hcontainer/hcontainer.cpp ../../module_hcontainer/atom_pair.cpp - ../../../source_basis/module_ao/parallel_orbitals.cpp + LIBS parameter ${math_libs} psi base device container + SOURCES test_dftu.cpp ../dftu_lcao.cpp ../../module_hcontainer/func_folding.cpp + ../../module_hcontainer/base_matrix.cpp ../../module_hcontainer/hcontainer.cpp ../../module_hcontainer/atom_pair.cpp + ../../../source_basis/module_ao/parallel_orbitals.cpp ../../../source_basis/module_ao/ORB_atomic_lm.cpp tmp_mocks.cpp ../../../source_hamilt/operator.cpp ) diff --git a/source/source_lcao/module_operator_lcao/test/test_dftu.cpp b/source/source_lcao/module_operator_lcao/test/test_dftu.cpp index 31adb426ad4..20723a11e6a 100644 --- a/source/source_lcao/module_operator_lcao/test/test_dftu.cpp +++ b/source/source_lcao/module_operator_lcao/test/test_dftu.cpp @@ -23,6 +23,28 @@ const hamilt::HContainer* Plus_U::get_dmr(int ispin) const return tmp_DMR; } +void Plus_U::get_locale_flat(const int iat, const int l, std::vector& occ) const +{ + const int tlp1 = 2 * l + 1; + const int tlp1_2 = tlp1 * tlp1; + occ.resize(tlp1_2); + for (int i = 0; i < tlp1_2; i++) + { + occ[i] = locale[iat][l][0][0].c[i]; + } +} + +void Plus_U::set_locale_flat(const int iat, const int l, const int spin, + const std::vector& occ) +{ + const int tlp1 = 2 * l + 1; + const int tlp1_2 = tlp1 * tlp1; + for (int i = 0; i < tlp1_2 && i < static_cast(occ.size()); i++) + { + locale[iat][l][0][spin].c[i] = occ[i]; + } +} + //--------------------------------------- // Unit test of Plus_U class // Plus_U is a derivative class of Operator, it is used to calculate the kinetic matrix diff --git a/source/source_lcao/module_rt/CMakeLists.txt b/source/source_lcao/module_rt/CMakeLists.txt index 046632f3138..a1056c33474 100644 --- a/source/source_lcao/module_rt/CMakeLists.txt +++ b/source/source_lcao/module_rt/CMakeLists.txt @@ -16,7 +16,6 @@ if(ENABLE_LCAO) td_folding.cpp solve_propagation.cpp boundary_fix.cpp - td_moving_gauge.cpp ) if(USE_CUDA) diff --git a/source/source_lcao/module_rt/evolve_elec.cpp b/source/source_lcao/module_rt/evolve_elec.cpp index ede4365258e..54e234e2a56 100644 --- a/source/source_lcao/module_rt/evolve_elec.cpp +++ b/source/source_lcao/module_rt/evolve_elec.cpp @@ -10,9 +10,9 @@ namespace module_rt { template -Evolve_elec::Evolve_elec() {}; +Evolve_elec::Evolve_elec(){}; template -Evolve_elec::~Evolve_elec() {}; +Evolve_elec::~Evolve_elec(){}; template ct::DeviceType Evolve_elec::ct_device_type = ct::DeviceTypeToEnum::value; @@ -33,11 +33,7 @@ void Evolve_elec::solve_psi(const int& istep, std::ofstream& ofs_running, const int propagator, const bool use_tensor, - const bool use_lapack, - module_rt::TD_MovingGauge* td_mg, - const UnitCell* ucell, - const std::vector>& kvec_d, - const bool use_td_moving_gauge) + const bool use_lapack) { ModuleBase::TITLE("Evolve_elec", "solve_psi"); ModuleBase::timer::start("Evolve_elec", "solve_psi"); @@ -61,13 +57,6 @@ void Evolve_elec::solve_psi(const int& istep, if (!use_tensor) { - // Construct the local P_k matrix for moving spatial gauge, CPU only for now - std::vector> P_k_local(para_orb.nloc, {0.0, 0.0}); - if (use_td_moving_gauge && td_mg != nullptr) - { - td_mg->get_P_k(ucell, kvec_d[ik], P_k_local.data(), para_orb.nloc, para_orb.ncol); - } - const int len_HS_laststep = use_lapack ? nlocal * nlocal : para_orb.nloc; evolve_psi(nband, nlocal, @@ -77,8 +66,6 @@ void Evolve_elec::solve_psi(const int& istep, psi_laststep[0].get_pointer(), Hk_laststep.data>() + ik * len_HS_laststep, Sk_laststep.data>() + ik * len_HS_laststep, - P_k_local.data(), - use_td_moving_gauge, &(ekb(ik, 0)), propagator, ofs_running, diff --git a/source/source_lcao/module_rt/evolve_elec.h b/source/source_lcao/module_rt/evolve_elec.h index cabd57e947d..3c2aa95cf6d 100644 --- a/source/source_lcao/module_rt/evolve_elec.h +++ b/source/source_lcao/module_rt/evolve_elec.h @@ -13,7 +13,6 @@ #include "source_lcao/hamilt_lcao.h" #include "source_lcao/module_rt/gather_mat.h" // MPI gathering and distributing functions #include "source_lcao/module_rt/kernels/cublasmp_context.h" -#include "source_lcao/module_rt/td_moving_gauge.h" #include "source_psi/psi.h" //----------------------------------------------------------- @@ -159,11 +158,7 @@ class Evolve_elec std::ofstream& ofs_running, const int propagator, const bool use_tensor, - const bool use_lapack, - module_rt::TD_MovingGauge* td_mg, - const UnitCell* ucell, - const std::vector>& kvec_d, - const bool use_td_moving_gauge); + const bool use_lapack); // ct_device_type = ct::DeviceType::CpuDevice or ct::DeviceType::GpuDevice static ct::DeviceType ct_device_type; diff --git a/source/source_lcao/module_rt/evolve_psi.cpp b/source/source_lcao/module_rt/evolve_psi.cpp index 5f3b1556057..ea3b40f293d 100644 --- a/source/source_lcao/module_rt/evolve_psi.cpp +++ b/source/source_lcao/module_rt/evolve_psi.cpp @@ -24,8 +24,6 @@ void evolve_psi(const int nband, std::complex* psi_k_laststep, std::complex* H_laststep, std::complex* S_laststep, - std::complex* P_k, - const bool use_td_moving_gauge, double* ekb, int propagator, std::ofstream& ofs_running, @@ -87,15 +85,8 @@ void evolve_psi(const int nband, { /// @brief solve the propagation equation /// @input Stmp, Htmp, psi_k_laststep - /// @output psi_k - if (use_td_moving_gauge) - { - solve_propagation(pv, nband, nlocal, PARAM.inp.td_dt, Stmp, Htmp, P_k, psi_k_laststep, psi_k); - } - else - { - solve_propagation(pv, nband, nlocal, PARAM.inp.td_dt, Stmp, Htmp, psi_k_laststep, psi_k); - } + /// @output psi_k + solve_propagation(pv, nband, nlocal, PARAM.inp.td_dt, Stmp, Htmp, psi_k_laststep, psi_k); } // (4)->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> diff --git a/source/source_lcao/module_rt/evolve_psi.h b/source/source_lcao/module_rt/evolve_psi.h index 287b47e10c0..34a29d68812 100644 --- a/source/source_lcao/module_rt/evolve_psi.h +++ b/source/source_lcao/module_rt/evolve_psi.h @@ -23,8 +23,6 @@ void evolve_psi(const int nband, std::complex* psi_k_laststep, std::complex* H_laststep, std::complex* S_laststep, - std::complex* P_k, - const bool use_td_moving_gauge, double* ekb, int propagator, std::ofstream& ofs_running, diff --git a/source/source_lcao/module_rt/solve_propagation.cpp b/source/source_lcao/module_rt/solve_propagation.cpp index f17837bf9a7..298bf2eef94 100644 --- a/source/source_lcao/module_rt/solve_propagation.cpp +++ b/source/source_lcao/module_rt/solve_propagation.cpp @@ -1,9 +1,8 @@ #include "solve_propagation.h" - +#include "source_base/module_external/scalapack_connector.h" +#include "source_base/module_external/blas_connector.h" #include "source_base/constants.h" #include "source_base/global_function.h" -#include "source_base/module_external/blas_connector.h" -#include "source_base/module_external/scalapack_connector.h" #include @@ -11,13 +10,13 @@ namespace module_rt { #ifdef __MPI void solve_propagation(const Parallel_Orbitals* pv, - const int nband, - const int nlocal, - const double dt, - const std::complex* Stmp, - const std::complex* Htmp, - const std::complex* psi_k_laststep, - std::complex* psi_k) + const int nband, + const int nlocal, + const double dt, + const std::complex* Stmp, + const std::complex* Htmp, + const std::complex* psi_k_laststep, + std::complex* psi_k) { // (1) init A,B and copy Htmp to A & B std::complex* operator_A = new std::complex[pv->nloc]; @@ -29,7 +28,7 @@ void solve_propagation(const Parallel_Orbitals* pv, BlasConnector::copy(pv->nloc, Htmp, 1, operator_B, 1); const double dt_au = dt / ModuleBase::AU_to_FS; - + // ->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> // (2) compute operator_A & operator_B by GEADD // operator_A = Stmp + i*para * Htmp; beta2 = para = 0.25 * dt @@ -38,118 +37,78 @@ void solve_propagation(const Parallel_Orbitals* pv, std::complex beta1 = {0.0, -0.25 * dt_au}; std::complex beta2 = {0.0, 0.25 * dt_au}; - ScalapackConnector::geadd('N', nlocal, nlocal, alpha, Stmp, 1, 1, pv->desc, beta2, operator_A, 1, 1, pv->desc); - ScalapackConnector::geadd('N', nlocal, nlocal, alpha, Stmp, 1, 1, pv->desc, beta1, operator_B, 1, 1, pv->desc); - // ->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - // (3) b = operator_B @ psi_k_laststep - std::complex* tmp_b = new std::complex[pv->nloc_wfc]; - ScalapackConnector::gemm('N', - 'N', - nlocal, - nband, - nlocal, - 1.0, - operator_B, - 1, - 1, - pv->desc, - psi_k_laststep, - 1, - 1, - pv->desc_wfc, - 0.0, - tmp_b, - 1, - 1, - pv->desc_wfc); - // get ipiv - int* ipiv = new int[pv->nloc]; - int info = 0; - // (4) solve Ac=b - ScalapackConnector::gesv(nlocal, nband, operator_A, 1, 1, pv->desc, ipiv, tmp_b, 1, 1, pv->desc_wfc, &info); - - // copy solution to psi_k - BlasConnector::copy(pv->nloc_wfc, tmp_b, 1, psi_k, 1); - - delete[] tmp_b; - delete[] ipiv; - delete[] operator_A; - delete[] operator_B; -} - -void solve_propagation(const Parallel_Orbitals* pv, - const int nband, - const int nlocal, - const double dt, - const std::complex* Stmp, - const std::complex* Htmp, - const std::complex* P_k, // <--- 接收 P_k - const std::complex* psi_k_laststep, - std::complex* psi_k) -{ - // Print message for debugging, should be removed later - std::cout << "Entering solve_propagation with moving gauge P_k..." << std::endl; - // (1) init A, B and compute HPtmp = Htmp + P_k - std::complex* operator_A = new std::complex[pv->nloc]; - std::complex* operator_B = new std::complex[pv->nloc]; - - // Add up Htmp and P_k to get the effective Hamiltonian matrix for moving spatial gauge - for (int i = 0; i < pv->nloc; ++i) - { - operator_A[i] = Htmp[i] + P_k[i]; - operator_B[i] = Htmp[i] + P_k[i]; - } - - const double dt_au = dt / ModuleBase::AU_to_FS; - - // ->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - // (2) compute operator_A & operator_B by GEADD - // operator_A = Stmp + i*para * (Htmp + P_k); - // operator_B = Stmp - i*para * (Htmp + P_k); - std::complex alpha = {1.0, 0.0}; - std::complex beta1 = {0.0, -0.25 * dt_au}; - std::complex beta2 = {0.0, 0.25 * dt_au}; - - ScalapackConnector::geadd('N', nlocal, nlocal, alpha, Stmp, 1, 1, pv->desc, beta2, operator_A, 1, 1, pv->desc); - ScalapackConnector::geadd('N', nlocal, nlocal, alpha, Stmp, 1, 1, pv->desc, beta1, operator_B, 1, 1, pv->desc); - + ScalapackConnector::geadd('N', + nlocal, + nlocal, + alpha, + Stmp, + 1, + 1, + pv->desc, + beta2, + operator_A, + 1, + 1, + pv->desc); + ScalapackConnector::geadd('N', + nlocal, + nlocal, + alpha, + Stmp, + 1, + 1, + pv->desc, + beta1, + operator_B, + 1, + 1, + pv->desc); // ->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> // (3) b = operator_B @ psi_k_laststep std::complex* tmp_b = new std::complex[pv->nloc_wfc]; ScalapackConnector::gemm('N', - 'N', - nlocal, - nband, - nlocal, - 1.0, - operator_B, - 1, - 1, - pv->desc, - psi_k_laststep, - 1, - 1, - pv->desc_wfc, - 0.0, - tmp_b, - 1, - 1, - pv->desc_wfc); - - // get ipiv + 'N', + nlocal, + nband, + nlocal, + 1.0, + operator_B, + 1, + 1, + pv->desc, + psi_k_laststep, + 1, + 1, + pv->desc_wfc, + 0.0, + tmp_b, + 1, + 1, + pv->desc_wfc); + //get ipiv int* ipiv = new int[pv->nloc]; int info = 0; - // (4) solve Ac=b - ScalapackConnector::gesv(nlocal, nband, operator_A, 1, 1, pv->desc, ipiv, tmp_b, 1, 1, pv->desc_wfc, &info); - - // copy solution to psi_k + ScalapackConnector::gesv(nlocal, + nband, + operator_A, + 1, + 1, + pv->desc, + ipiv, + tmp_b, + 1, + 1, + pv->desc_wfc, + &info); + + //copy solution to psi_k BlasConnector::copy(pv->nloc_wfc, tmp_b, 1, psi_k, 1); - delete[] tmp_b; - delete[] ipiv; - delete[] operator_A; - delete[] operator_B; + delete []tmp_b; + delete []ipiv; + delete []operator_A; + delete []operator_B; } #endif // __MPI } // namespace module_rt diff --git a/source/source_lcao/module_rt/solve_propagation.h b/source/source_lcao/module_rt/solve_propagation.h index 23a38ac25c0..309c552570d 100644 --- a/source/source_lcao/module_rt/solve_propagation.h +++ b/source/source_lcao/module_rt/solve_propagation.h @@ -2,47 +2,32 @@ #define TD_SOLVE_PROPAGATION_H #include "source_basis/module_ao/parallel_orbitals.h" - #include namespace module_rt { #ifdef __MPI /** - * @brief solve propagation equation A@c(t+dt) = B@c(t) - * - * @param[in] pv information of parallel - * @param[in] nband number of bands - * @param[in] nlocal number of orbitals - * @param[in] dt time interval - * @param[in] Stmp overlap matrix S(t+dt/2) - * @param[in] Htmp H(t+dt/2) - * @param[in] psi_k_laststep psi of last step - * @param[out] psi_k psi of this step - */ +* @brief solve propagation equation A@c(t+dt) = B@c(t) +* +* @param[in] pv information of parallel +* @param[in] nband number of bands +* @param[in] nlocal number of orbitals +* @param[in] dt time interval +* @param[in] Stmp overlap matrix S(t+dt/2) +* @param[in] Htmp H(t+dt/2) +* @param[in] psi_k_laststep psi of last step +* @param[out] psi_k psi of this step +*/ void solve_propagation(const Parallel_Orbitals* pv, - const int nband, - const int nlocal, - const double dt, - const std::complex* Stmp, - const std::complex* Htmp, - const std::complex* psi_k_laststep, - std::complex* psi_k); + const int nband, + const int nlocal, + const double dt, + const std::complex* Stmp, + const std::complex* Htmp, + const std::complex* psi_k_laststep, + std::complex* psi_k); -/** - * @brief solve propagation equation A@c(t+dt) = B@c(t) with moving spatial gauge P_k - * - * @param[in] P_k moving spatial gauge matrix - */ -void solve_propagation(const Parallel_Orbitals* pv, - const int nband, - const int nlocal, - const double dt, - const std::complex* Stmp, - const std::complex* Htmp, - const std::complex* P_k, - const std::complex* psi_k_laststep, - std::complex* psi_k); #endif } // namespace module_rt diff --git a/source/source_lcao/module_rt/td_moving_gauge.cpp b/source/source_lcao/module_rt/td_moving_gauge.cpp deleted file mode 100644 index 80b7915019a..00000000000 --- a/source/source_lcao/module_rt/td_moving_gauge.cpp +++ /dev/null @@ -1,308 +0,0 @@ -#include "td_moving_gauge.h" - -#include "source_base/global_function.h" -#include "source_base/libm/libm.h" // sincos - -namespace module_rt -{ - -TD_MovingGauge::~TD_MovingGauge() -{ - for (int i = 0; i < nat_; ++i) - { - delete DR_x_[i]; - delete DR_y_[i]; - delete DR_z_[i]; - } -} - -template -void TD_MovingGauge::init_DR(const hamilt::HContainer* sR_template, - const UnitCell* ucell, - const Parallel_Orbitals* paraV, - TwoCenterIntegrator* intor) -{ - nat_ = ucell->nat; - DR_x_.resize(nat_, nullptr); - DR_y_.resize(nat_, nullptr); - DR_z_.resize(nat_, nullptr); - - // 1. Allocate an HContainer for each atom - for (int K = 0; K < nat_; ++K) - { - DR_x_[K] = new hamilt::HContainer(paraV); - DR_y_[K] = new hamilt::HContainer(paraV); - DR_z_[K] = new hamilt::HContainer(paraV); - } - - // 2. Construct the sparsity pattern based on sR_template, only allocate terms where delta_{JK} is non-zero - for (int iap = 0; iap < sR_template->size_atom_pairs(); ++iap) - { - const auto& ap = sR_template->get_atom_pair(iap); - int iat1 = ap.get_atom_i(); - int iat2 = ap.get_atom_j(); // target ket atom J - - hamilt::AtomPair ap_x(iat1, iat2, paraV); - hamilt::AtomPair ap_y(iat1, iat2, paraV); - hamilt::AtomPair ap_z(iat1, iat2, paraV); - - for (int ir = 0; ir < ap.get_R_size(); ++ir) - { - auto R_idx = ap.get_R_index(ir); - ap_x.get_HR_values(R_idx.x, R_idx.y, R_idx.z); - ap_y.get_HR_values(R_idx.x, R_idx.y, R_idx.z); - ap_z.get_HR_values(R_idx.x, R_idx.y, R_idx.z); - } - - // Only insert this pair into the container corresponding to atom iat2 - DR_x_[iat2]->insert_pair(ap_x); - DR_y_[iat2]->insert_pair(ap_y); - DR_z_[iat2]->insert_pair(ap_z); - } - - // 3. Allocate memory for all DR_[K] containers - for (int K = 0; K < nat_; ++K) - { - DR_x_[K]->allocate(nullptr, true); - DR_y_[K]->allocate(nullptr, true); - DR_z_[K]->allocate(nullptr, true); - } - - // 4. Calculate and fill the R-space derivatives of the overlap matrix - int npol = ucell->get_npol(); - for (int iap = 0; iap < sR_template->size_atom_pairs(); ++iap) - { - const auto& ap = sR_template->get_atom_pair(iap); - int iat1 = ap.get_atom_i(); - int iat2 = ap.get_atom_j(); - - int T1 = ucell->iat2it[iat1]; - int T2 = ucell->iat2it[iat2]; - const Atom& atom1 = ucell->atoms[T1]; - const Atom& atom2 = ucell->atoms[T2]; - - auto row_indexes = paraV->get_indexes_row(iat1); - auto col_indexes = paraV->get_indexes_col(iat2); - - for (int ir = 0; ir < ap.get_R_size(); ++ir) - { - auto R_idx = ap.get_R_index(ir); - ModuleBase::Vector3 dtau = ucell->cal_dtau(iat1, iat2, R_idx); - - int R_arr[3] = {R_idx.x, R_idx.y, R_idx.z}; - double* dx = DR_x_[iat2]->data(iat1, iat2, R_arr); - double* dy = DR_y_[iat2]->data(iat1, iat2, R_arr); - double* dz = DR_z_[iat2]->data(iat1, iat2, R_arr); - - for (int iw1l = 0; iw1l < row_indexes.size(); iw1l += npol) - { - const int iw1 = row_indexes[iw1l] / npol; - int L1 = atom1.iw2l[iw1]; - int N1 = atom1.iw2n[iw1]; - int m1 = atom1.iw2m[iw1]; - int M1 = (m1 % 2 == 0) ? -m1 / 2 : (m1 + 1) / 2; - - for (int iw2l = 0; iw2l < col_indexes.size(); iw2l += npol) - { - const int iw2 = col_indexes[iw2l] / npol; - int L2 = atom2.iw2l[iw2]; - int N2 = atom2.iw2n[iw2]; - int m2 = atom2.iw2m[iw2]; - int M2 = (m2 % 2 == 0) ? -m2 / 2 : (m2 + 1) / 2; - - double olm[4] = {0.0, 0.0, 0.0, 0.0}; - // out stores the integral value in olm[0], grad_out stores the gradient in olm[1] to olm[3] - intor->calculate(T1, L1, N1, M1, T2, L2, N2, M2, dtau * ucell->lat0, &olm[0], &olm[1]); - - // Handle the spin dimension (the overlap is diagonal in spin space) - for (int is1 = 0; is1 < npol; ++is1) - { - for (int is2 = 0; is2 < npol; ++is2) - { - int r_offset = iw1l + is1; - int c_offset = iw2l + is2; - int linear_idx = r_offset * col_indexes.size() + c_offset; - - if (is1 == is2) - { - dx[linear_idx] = olm[1]; - dy[linear_idx] = olm[2]; - dz[linear_idx] = olm[3]; - } - else - { - dx[linear_idx] = 0.0; - dy[linear_idx] = 0.0; - dz[linear_idx] = 0.0; - } - } - } - } - } - } - } -} - -template -void TD_MovingGauge::update_DR(const hamilt::HContainer* sR_template, - const UnitCell* ucell, - const Parallel_Orbitals* paraV, - TwoCenterIntegrator* intor) -{ - // Update the R-space derivatives of the overlap matrix with the current atomic positions - int npol = ucell->get_npol(); - for (int iap = 0; iap < sR_template->size_atom_pairs(); ++iap) - { - const auto& ap = sR_template->get_atom_pair(iap); - int iat1 = ap.get_atom_i(); - int iat2 = ap.get_atom_j(); - - int T1 = ucell->iat2it[iat1]; - int T2 = ucell->iat2it[iat2]; - const Atom& atom1 = ucell->atoms[T1]; - const Atom& atom2 = ucell->atoms[T2]; - - auto row_indexes = paraV->get_indexes_row(iat1); - auto col_indexes = paraV->get_indexes_col(iat2); - - for (int ir = 0; ir < ap.get_R_size(); ++ir) - { - auto R_idx = ap.get_R_index(ir); - ModuleBase::Vector3 dtau = ucell->cal_dtau(iat1, iat2, R_idx); - - int R_arr[3] = {R_idx.x, R_idx.y, R_idx.z}; - double* dx = DR_x_[iat2]->data(iat1, iat2, R_arr); - double* dy = DR_y_[iat2]->data(iat1, iat2, R_arr); - double* dz = DR_z_[iat2]->data(iat1, iat2, R_arr); - - for (int iw1l = 0; iw1l < row_indexes.size(); iw1l += npol) - { - const int iw1 = row_indexes[iw1l] / npol; - int L1 = atom1.iw2l[iw1]; - int N1 = atom1.iw2n[iw1]; - int m1 = atom1.iw2m[iw1]; - int M1 = (m1 % 2 == 0) ? -m1 / 2 : (m1 + 1) / 2; - - for (int iw2l = 0; iw2l < col_indexes.size(); iw2l += npol) - { - const int iw2 = col_indexes[iw2l] / npol; - int L2 = atom2.iw2l[iw2]; - int N2 = atom2.iw2n[iw2]; - int m2 = atom2.iw2m[iw2]; - int M2 = (m2 % 2 == 0) ? -m2 / 2 : (m2 + 1) / 2; - - double olm[4] = {0.0, 0.0, 0.0, 0.0}; - intor->calculate(T1, L1, N1, M1, T2, L2, N2, M2, dtau * ucell->lat0, &olm[0], &olm[1]); - - for (int is1 = 0; is1 < npol; ++is1) - { - for (int is2 = 0; is2 < npol; ++is2) - { - int r_offset = iw1l + is1; - int c_offset = iw2l + is2; - int linear_idx = r_offset * col_indexes.size() + c_offset; - - if (is1 == is2) - { - dx[linear_idx] = olm[1]; - dy[linear_idx] = olm[2]; - dz[linear_idx] = olm[3]; - } - else - { - dx[linear_idx] = 0.0; - dy[linear_idx] = 0.0; - dz[linear_idx] = 0.0; - } - } - } - } - } - } - } -} - -template -void TD_MovingGauge::get_D_k(int K, const ModuleBase::Vector3& kvec_d, TK* Dk_x, TK* Dk_y, TK* Dk_z, int hk_ld) - const -{ - hamilt::folding_HR(*(DR_x_[K]), Dk_x, kvec_d, hk_ld, 1); - hamilt::folding_HR(*(DR_y_[K]), Dk_y, kvec_d, hk_ld, 1); - hamilt::folding_HR(*(DR_z_[K]), Dk_z, kvec_d, hk_ld, 1); -} - -template -void TD_MovingGauge::get_P_k(const UnitCell* ucell, - const ModuleBase::Vector3& kvec_d, - TK* P_k, - int matrix_size, - int hk_ld) const -{ - std::vector Dk_x(matrix_size, TK(0.0, 0.0)); - std::vector Dk_y(matrix_size, TK(0.0, 0.0)); - std::vector Dk_z(matrix_size, TK(0.0, 0.0)); - - for (int K = 0; K < nat_; ++K) - { - std::fill(Dk_x.begin(), Dk_x.end(), TK(0.0, 0.0)); - std::fill(Dk_y.begin(), Dk_y.end(), TK(0.0, 0.0)); - std::fill(Dk_z.begin(), Dk_z.end(), TK(0.0, 0.0)); - - this->get_D_k(K, kvec_d, Dk_x.data(), Dk_y.data(), Dk_z.data(), hk_ld); - - // Obtain the real-time velocity of atom K from the UnitCell (in Hartree atomic units) - int it = ucell->iat2it[K]; - int ia = ucell->iat2ia[K]; - double vx = ucell->atoms[it].vel[ia].x; - double vy = ucell->atoms[it].vel[ia].y; - double vz = ucell->atoms[it].vel[ia].z; - - // Construct the coefficients: P = -i * v * D - // Unit conversion: Hartree a.u. to Rydberg a.u. requires multiplying - TK coef_x(0.0, -2.0 * vx); - TK coef_y(0.0, -2.0 * vy); - TK coef_z(0.0, -2.0 * vz); - - // Accumulate the contribution from atom K to the P_k matrix - for (int i = 0; i < matrix_size; ++i) - { - P_k[i] += coef_x * Dk_x[i] + coef_y * Dk_y[i] + coef_z * Dk_z[i]; - } - } -} - -template void TD_MovingGauge::init_DR(const hamilt::HContainer* sR_template, - const UnitCell* ucell, - const Parallel_Orbitals* paraV, - TwoCenterIntegrator* intor); - -template void TD_MovingGauge::init_DR>(const hamilt::HContainer>* sR_template, - const UnitCell* ucell, - const Parallel_Orbitals* paraV, - TwoCenterIntegrator* intor); - -template void TD_MovingGauge::update_DR(const hamilt::HContainer* sR_template, - const UnitCell* ucell, - const Parallel_Orbitals* paraV, - TwoCenterIntegrator* intor); - -template void TD_MovingGauge::update_DR>( - const hamilt::HContainer>* sR_template, - const UnitCell* ucell, - const Parallel_Orbitals* paraV, - TwoCenterIntegrator* intor); - -template void TD_MovingGauge::get_D_k>(int K, - const ModuleBase::Vector3& kvec_d, - std::complex* Dk_x, - std::complex* Dk_y, - std::complex* Dk_z, - int hk_ld) const; - -template void TD_MovingGauge::get_P_k>(const UnitCell* ucell, - const ModuleBase::Vector3& kvec_d, - std::complex* P_k, - int matrix_size, - int hk_ld) const; - -} // namespace module_rt diff --git a/source/source_lcao/module_rt/td_moving_gauge.h b/source/source_lcao/module_rt/td_moving_gauge.h deleted file mode 100644 index 449d800ba2e..00000000000 --- a/source/source_lcao/module_rt/td_moving_gauge.h +++ /dev/null @@ -1,61 +0,0 @@ -#ifndef TD_MOVING_GAUGE_H -#define TD_MOVING_GAUGE_H - -#include "source_basis/module_ao/parallel_orbitals.h" -#include "source_basis/module_nao/two_center_integrator.h" -#include "source_cell/unitcell.h" -#include "source_lcao/module_hcontainer/hcontainer.h" -#include "source_lcao/module_hcontainer/hcontainer_funcs.h" - -#include -#include - -namespace module_rt -{ - -class TD_MovingGauge -{ - public: - TD_MovingGauge() = default; - ~TD_MovingGauge(); - - // Initialize the R-space derivative matrices D_R (x, y, z) - // using the provided sR_template for consistent sparse atomic pair topology - // D_{K,\mu\nu}(R) = <\phi_{\mu 0}|∂\phi_{\nu R}/∂\tau_K> where tau_K is the position of atom K - template - void init_DR(const hamilt::HContainer* sR_template, - const UnitCell* ucell, - const Parallel_Orbitals* paraV, - TwoCenterIntegrator* intor); - - // Update the R-space matrix D_R (x, y, z) - template - void update_DR(const hamilt::HContainer* sR_template, - const UnitCell* ucell, - const Parallel_Orbitals* paraV, - TwoCenterIntegrator* intor); - - // Fourier transform D(R) to D(k) - // Note: folding_HR performs an accumulation (+=) operation, need to ensure Dk matrices are zeroed before calling - // D_{K,\mu\nu}(k) = \sum_R e^{ikR} D_{K,\mu\nu}(R) - template - void get_D_k(int K, const ModuleBase::Vector3& kvec_d, TK* Dk_x, TK* Dk_y, TK* Dk_z, int hk_ld) const; - - // Calculate the moving spatial gauge matrix P_k and accumulate it to the input P_k matrix - // Note: The unit is converted to Rydberg atomic units, and multiplied by 2 internally - // P_{\mu\nu}(k) = -i \sum_K vel_K \cdot D_{K,\mu\nu}(k) where vel_K is the velocity of atom K - template - void get_P_k(const UnitCell* ucell, const ModuleBase::Vector3& kvec_d, TK* P_k, int matrix_size, int hk_ld) - const; - - private: - int nat_ = 0; - - std::vector*> DR_x_; - std::vector*> DR_y_; - std::vector*> DR_z_; -}; - -} // namespace module_rt - -#endif // TD_MOVING_GAUGE_H diff --git a/source/source_main/driver.cpp b/source/source_main/driver.cpp index c22e4d08fba..2c16cac42ae 100644 --- a/source/source_main/driver.cpp +++ b/source/source_main/driver.cpp @@ -28,7 +28,6 @@ void Driver::init() // 2) Print the current time, since it may run a long time. time_t time_start = std::time(nullptr); - ModuleBase::timer::start(); // 3) Welcome to the atomic world! Let's do some fancy stuff here. this->atomic_world(); diff --git a/source/source_pw/module_pwdft/deltaspin_pw.cpp b/source/source_pw/module_pwdft/deltaspin_pw.cpp index caf8ea7852f..ae80d8337b2 100644 --- a/source/source_pw/module_pwdft/deltaspin_pw.cpp +++ b/source/source_pw/module_pwdft/deltaspin_pw.cpp @@ -29,10 +29,11 @@ bool run_deltaspin_lambda_loop(const int iter, return true; } /// Case 2: Magnetic moments already converged in previous iteration. - /// Continue to refine lambda in subsequent SCF iterations. + /// The lambda values and charge density were already updated in Case 1. + /// Skip the solver so the SCF can converge with the existing charge density. + /// Re-running the lambda loop would re-update the charge density and disrupt SCF mixing. else if (sc.mag_converged()) { - sc.run_lambda_loop(iter); return true; } diff --git a/source/source_pw/module_pwdft/dftu_pw.cpp b/source/source_pw/module_pwdft/dftu_pw.cpp index 475a34620a8..8ad75288f57 100644 --- a/source/source_pw/module_pwdft/dftu_pw.cpp +++ b/source/source_pw/module_pwdft/dftu_pw.cpp @@ -5,14 +5,14 @@ namespace pw { void iter_init_dftu_pw(const int iter, - const int istep, - Plus_U& dftu, - const void* psi, - const ModuleBase::matrix& wg, - const UnitCell& ucell, - const Input_para& inp) + const int istep, + Plus_U& dftu, + const void* psi, + const ModuleBase::matrix& wg, + const UnitCell& ucell, + Charge_Mixing* p_chgmix) { - if (!inp.dft_plus_u) + if (!p_chgmix || !PARAM.inp.dft_plus_u) { return; } @@ -24,7 +24,7 @@ void iter_init_dftu_pw(const int iter, if (dftu.omc != 2) { - dftu.cal_occ_pw(iter, psi, wg, ucell, inp.mixing_beta); + dftu.cal_occ_pw(iter, psi, wg, ucell, p_chgmix); } dftu.output(ucell); } diff --git a/source/source_pw/module_pwdft/dftu_pw.h b/source/source_pw/module_pwdft/dftu_pw.h index 8a30b04e766..96c67ef4b08 100644 --- a/source/source_pw/module_pwdft/dftu_pw.h +++ b/source/source_pw/module_pwdft/dftu_pw.h @@ -4,6 +4,7 @@ #include "source_io/module_parameter/parameter.h" #include "source_cell/unitcell.h" #include "source_base/matrix.h" +#include "source_estate/module_charge/charge_mixing.h" class Plus_U; @@ -16,7 +17,7 @@ void iter_init_dftu_pw(const int iter, const void* psi, const ModuleBase::matrix& wg, const UnitCell& ucell, - const Input_para& inp); + Charge_Mixing* p_chgmix); } diff --git a/source/source_pw/module_pwdft/forces.cpp b/source/source_pw/module_pwdft/forces.cpp index 6888d89dacc..f608a7f4a8a 100644 --- a/source/source_pw/module_pwdft/forces.cpp +++ b/source/source_pw/module_pwdft/forces.cpp @@ -49,6 +49,7 @@ void Forces::cal_force(UnitCell& ucell, ModuleBase::matrix forcecc(nat, 3); ModuleBase::matrix forcenl(nat, 3); ModuleBase::matrix forcescc(nat, 3); + ModuleBase::matrix forcepaw(nat, 3); ModuleBase::matrix forceonsite(nat, 3); // Force due to local ionic potential diff --git a/source/source_pw/module_pwdft/forces_onsite.cpp b/source/source_pw/module_pwdft/forces_onsite.cpp index afa5cfcfe94..40d49fa20ba 100644 --- a/source/source_pw/module_pwdft/forces_onsite.cpp +++ b/source/source_pw/module_pwdft/forces_onsite.cpp @@ -12,7 +12,7 @@ void Forces::cal_force_onsite(ModuleBase::matrix& force_onsite, const ModuleBase::matrix& wg, const ModulePW::PW_Basis_K* wfc_basis, const UnitCell& ucell_in, - const Plus_U &dftu, // mohan add 2025-11-06 + const Plus_U &dftu, const psi::Psi , Device>* psi_in) { ModuleBase::TITLE("Forces", "cal_force_onsite"); @@ -22,7 +22,6 @@ void Forces::cal_force_onsite(ModuleBase::matrix& force_onsite, } ModuleBase::timer::start("Forces", "cal_force_onsite"); - // allocate memory for the force FPTYPE* force = nullptr; resmem_var_op()(force, ucell_in.nat * 3); base_device::memory::set_memory_op()(force, 0.0, ucell_in.nat * 3); @@ -30,9 +29,8 @@ void Forces::cal_force_onsite(ModuleBase::matrix& force_onsite, auto* onsite_p = projectors::OnsiteProjector::get_instance(); const int nks = wfc_basis->nks; - for (int ik = 0; ik < nks; ik++) // loop k points + for (int ik = 0; ik < nks; ik++) { - // skip zero weights to speed up int nbands_occ = wg.nc; while (wg(ik, nbands_occ - 1) == 0.0) { @@ -44,32 +42,25 @@ void Forces::cal_force_onsite(ModuleBase::matrix& force_onsite, } const int npm = nbands_occ; onsite_p->get_fs_tools()->cal_becp(ik, npm); - // calculate becp = for all beta functions for (int ipol = 0; ipol < 3; ipol++) { - // calculate dbecp = for all beta functions onsite_p->get_fs_tools()->cal_dbecp_f(ik, npm, ipol); } - // calculate the force_i = \sum_{n,k}f_{nk}\sum_I \sum_{lm,l'm'}D_{l,l'}^{I} becp * dbecp_i - // force for DFT+U if(PARAM.inp.dft_plus_u) { - onsite_p->get_fs_tools()->cal_force_dftu(ik, npm, force, - dftu.orbital_corr.data(), dftu.get_eff_pot_pw(0), dftu.get_size_eff_pot_pw(), wg.c); + onsite_p->cal_force_onsite_dftu(ik, npm, force, dftu, nks, wg.c); } if(PARAM.inp.sc_mag_switch) { spinconstrain::SpinConstrain>& sc = spinconstrain::SpinConstrain>::getScInstance(); - const std::vector>& lambda = sc.get_sc_lambda(); - onsite_p->get_fs_tools()->cal_force_dspin(ik, npm, force, lambda.data(), wg.c); + onsite_p->cal_force_onsite_dspin(ik, npm, force, sc.get_sc_lambda().data(), wg.c); } - } // end ik + } syncmem_var_d2h_op()(force_onsite.c, force, force_onsite.nr * force_onsite.nc); delmem_var_op()(force); - // sum up force_onsite from all processors Parallel_Reduce::reduce_all(force_onsite.c, force_onsite.nr * force_onsite.nc); ModuleBase::timer::end("Forces", "cal_force_onsite"); diff --git a/source/source_pw/module_pwdft/hamilt_pw.cpp b/source/source_pw/module_pwdft/hamilt_pw.cpp index 152e3451428..27a56cbe11b 100644 --- a/source/source_pw/module_pwdft/hamilt_pw.cpp +++ b/source/source_pw/module_pwdft/hamilt_pw.cpp @@ -272,7 +272,7 @@ void HamiltPW::sPsi(const T* psi_in, // psi this->ppcell->nkb, &one, this->vkb, - this->ppcell->vkbnc, + this->ppcell->vkb.nc, psi_in, inc, &zero, @@ -288,7 +288,7 @@ void HamiltPW::sPsi(const T* psi_in, // psi npw, &one, this->vkb, - this->ppcell->vkbnc, + this->ppcell->vkb.nc, psi_in, nrow, &zero, @@ -360,7 +360,7 @@ void HamiltPW::sPsi(const T* psi_in, // psi this->ppcell->nkb, &one, this->vkb, - this->ppcell->vkbnc, + this->ppcell->vkb.nc, ps, inc, &one, @@ -376,7 +376,7 @@ void HamiltPW::sPsi(const T* psi_in, // psi this->ppcell->nkb, &one, this->vkb, - this->ppcell->vkbnc, + this->ppcell->vkb.nc, ps, this->ppcell->nkb, &one, diff --git a/source/source_pw/module_pwdft/kernels/cuda/force_op.cu b/source/source_pw/module_pwdft/kernels/cuda/force_op.cu index 1466ba47acc..f5e9c1f4ac6 100644 --- a/source/source_pw/module_pwdft/kernels/cuda/force_op.cu +++ b/source/source_pw/module_pwdft/kernels/cuda/force_op.cu @@ -434,7 +434,7 @@ __global__ void cal_force_onsite(int wg_nc, const thrust::complex dbb3 = conj(dbecp[inkb0 + nkb]) * becp[inkb + nkb]; const FPTYPE tmp = -fac - * (coefficients0 * dbb0 + coefficients1 * dbb1 + coefficients2 * dbb2 + coefficients3 * dbb3) + * (coefficients0 * dbb0 + coefficients1 * dbb2 + coefficients2 * dbb1 + coefficients3 * dbb3) .real(); atomicAdd(force + iat * forcenl_nc + ipol, tmp); } @@ -454,6 +454,7 @@ void cal_force_nl_op::operator()(const base_dev const int& nbands, const int& ik, const int& nkb, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE& tpiba, @@ -493,6 +494,7 @@ void cal_force_nl_op::operator()(const base_dev const int& nbands, const int& ik, const int& nkb, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE& tpiba, diff --git a/source/source_pw/module_pwdft/kernels/cuda/onsite_op.cu b/source/source_pw/module_pwdft/kernels/cuda/onsite_op.cu index 68aee02047d..35ca4f77f74 100644 --- a/source/source_pw/module_pwdft/kernels/cuda/onsite_op.cu +++ b/source/source_pw/module_pwdft/kernels/cuda/onsite_op.cu @@ -20,15 +20,28 @@ __global__ void onsite_op(const int npm, const thrust::complex* becp) { const int ip = blockIdx.x; - const int nbands = npm / npol; - for (int ib = threadIdx.x; ib < nbands; ib += blockDim.x) + if(npol == 2) { - int ib2 = ib * npol; - int iat = ip_iat[ip]; - const int psind = ip * npm + ib2; - const int becpind = ib2 * tnp + ip; - ps[psind] += lambda_coeff[iat * 4] * becp[becpind] + lambda_coeff[iat * 4 + 2] * becp[becpind + tnp]; - ps[psind + 1] += lambda_coeff[iat * 4 + 1] * becp[becpind] + lambda_coeff[iat * 4 + 3] * becp[becpind + tnp]; + const int nbands = npm / npol; + for (int ib = threadIdx.x; ib < nbands; ib += blockDim.x) + { + int ib2 = ib * npol; + int iat = ip_iat[ip]; + const int psind = ip * npm + ib2; + const int becpind = ib2 * tnp + ip; + ps[psind] += lambda_coeff[iat * 4] * becp[becpind] + lambda_coeff[iat * 4 + 2] * becp[becpind + tnp]; + ps[psind + 1] += lambda_coeff[iat * 4 + 1] * becp[becpind] + lambda_coeff[iat * 4 + 3] * becp[becpind + tnp]; + } + } + else // npol == 1 + { + for (int ib = threadIdx.x; ib < npm; ib += blockDim.x) + { + int iat = ip_iat[ip]; + const int psind = ip * npm + ib; + const int becpind = ib * tnp + ip; + ps[psind] += lambda_coeff[iat] * becp[becpind]; + } } } @@ -48,26 +61,49 @@ __global__ void onsite_op(const int npm, int m1 = ip_m[ip]; if (m1 >= 0) { - const int nbands = npm / npol; - for (int ib = threadIdx.x; ib < nbands; ib += blockDim.x) + if (npol == 2) { - int ib2 = ib * npol; - int iat = ip_iat[ip]; - const thrust::complex* vu_iat = vu + vu_begin_iat[iat]; - int orb_l = orb_l_iat[iat]; - int tlp1 = 2 * orb_l + 1; - int tlp1_2 = tlp1 * tlp1; - int ip2_begin = ip - m1; - int ip2_end = ip - m1 + tlp1; - const int psind = ip * npm + ib2; - for (int ip2 = ip2_begin; ip2 < ip2_end; ip2++) + const int nbands = npm / npol; + for (int ib = threadIdx.x; ib < nbands; ib += blockDim.x) + { + int ib2 = ib * npol; + int iat = ip_iat[ip]; + const thrust::complex* vu_iat = vu + vu_begin_iat[iat]; + int orb_l = orb_l_iat[iat]; + int tlp1 = 2 * orb_l + 1; + int tlp1_2 = tlp1 * tlp1; + int ip2_begin = ip - m1; + int ip2_end = ip - m1 + tlp1; + const int psind = ip * npm + ib2; + for (int ip2 = ip2_begin; ip2 < ip2_end; ip2++) + { + const int becpind = ib2 * tnp + ip2; + int m2 = ip_m[ip2]; + const int index_mm = m1 * tlp1 + m2; + ps[psind] += vu_iat[index_mm] * becp[becpind] + vu_iat[index_mm + tlp1_2 * 2] * becp[becpind + tnp]; + ps[psind + 1] += vu_iat[index_mm + tlp1_2 * 1] * becp[becpind] + + vu_iat[index_mm + tlp1_2 * 3] * becp[becpind + tnp]; + } + } + } + else // npol == 1, nspin=1 or nspin=2 + { + for (int ib = threadIdx.x; ib < npm; ib += blockDim.x) { - const int becpind = ib2 * tnp + ip2; - int m2 = ip_m[ip2]; - const int index_mm = m1 * tlp1 + m2; - ps[psind] += vu_iat[index_mm] * becp[becpind] + vu_iat[index_mm + tlp1_2 * 2] * becp[becpind + tnp]; - ps[psind + 1] += vu_iat[index_mm + tlp1_2 * 1] * becp[becpind] - + vu_iat[index_mm + tlp1_2 * 3] * becp[becpind + tnp]; + int iat = ip_iat[ip]; + const thrust::complex* vu_iat = vu + vu_begin_iat[iat]; + int orb_l = orb_l_iat[iat]; + int tlp1 = 2 * orb_l + 1; + int ip2_begin = ip - m1; + int ip2_end = ip - m1 + tlp1; + const int psind = ip * npm + ib; + for (int ip2 = ip2_begin; ip2 < ip2_end; ip2++) + { + const int becpind = ib * tnp + ip2; + int m2 = ip_m[ip2]; + const int index_mm = m1 * tlp1 + m2; + ps[psind] += vu_iat[index_mm] * becp[becpind]; + } } } } diff --git a/source/source_pw/module_pwdft/kernels/cuda/stress_op.cu b/source/source_pw/module_pwdft/kernels/cuda/stress_op.cu index 58a8e219e5c..df08221e361 100644 --- a/source/source_pw/module_pwdft/kernels/cuda/stress_op.cu +++ b/source/source_pw/module_pwdft/kernels/cuda/stress_op.cu @@ -1031,7 +1031,7 @@ __global__ void cal_stress_onsite( const thrust::complex dbb1 = conj(dbecp[inkb]) * becp[inkb + nkb]; const thrust::complex dbb2 = conj(dbecp[inkb + nkb]) * becp[inkb]; const thrust::complex dbb3 = conj(dbecp[inkb + nkb]) * becp[inkb + nkb]; - stress_var -= fac * (coefficients0 * dbb0 + coefficients1 * dbb1 + coefficients2 * dbb2 + coefficients3 * dbb3).real(); + stress_var -= fac * (coefficients0 * dbb0 + coefficients1 * dbb2 + coefficients2 * dbb1 + coefficients3 * dbb3).real(); } ++iat; sum+=nprojs; @@ -1051,6 +1051,7 @@ void cal_stress_nl_op::operator()(const base_de const int& ntype, const int& wg_nc, const int& ik, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE* d_wg, @@ -1084,6 +1085,7 @@ void cal_stress_nl_op::operator()(const base_de const int& ntype, const int& wg_nc, const int& ik, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE* d_wg, diff --git a/source/source_pw/module_pwdft/kernels/force_op.cpp b/source/source_pw/module_pwdft/kernels/force_op.cpp index 0e0c34ccdde..cc7823f3ec3 100644 --- a/source/source_pw/module_pwdft/kernels/force_op.cpp +++ b/source/source_pw/module_pwdft/kernels/force_op.cpp @@ -292,6 +292,7 @@ struct cal_force_nl_op const int& nbands, const int& ik, const int& nkb, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE& tpiba, @@ -321,7 +322,7 @@ struct cal_force_nl_op { for (int ib = 0; ib < nbands_occ; ib++) { - const int ib2 = ib*2; + const int ib2 = ib*npol; FPTYPE local_force[3] = {0, 0, 0}; FPTYPE fac = d_wg[ik * wg_nc + ib] * 2.0 * tpiba; int iat = iat0 + ia; @@ -330,36 +331,47 @@ struct cal_force_nl_op { const int inkb = sum + ip; const int m = ip - ip_begin; - // out<<"\n ps = "< ps[4]; - for(int i = 0; i < 4; i++) + if(npol == 2) { - ps[i] = vu[(i * tlp1_2 + m * tlp1 + m2)]; - } + std::complex ps[4]; + for(int i = 0; i < 4; i++) + { + ps[i] = vu[(i * tlp1_2 + m * tlp1 + m2)]; + } - for (int ipol = 0; ipol < 3; ipol++) - { - const int index0 = ipol * nbands * 2 * nkb + ib2 * nkb + inkb; - const int index1 = ib2 * nkb + jnkb; - const std::complex dbb0 = conj(dbecp[index0]) * becp[index1]; - const std::complex dbb1 = conj(dbecp[index0]) * becp[index1 + nkb]; - const std::complex dbb2 = conj(dbecp[index0 + nkb]) * becp[index1]; - const std::complex dbb3 = conj(dbecp[index0 + nkb]) * becp[index1 + nkb]; + for (int iforce = 0; iforce < 3; iforce++) + { + const int index0 = iforce * nbands * npol * nkb + ib2 * nkb + inkb; + const int index1 = ib2 * nkb + jnkb; + const std::complex dbb0 = conj(dbecp[index0]) * becp[index1]; + const std::complex dbb1 = conj(dbecp[index0]) * becp[index1 + nkb]; + const std::complex dbb2 = conj(dbecp[index0 + nkb]) * becp[index1]; + const std::complex dbb3 = conj(dbecp[index0 + nkb]) * becp[index1 + nkb]; - local_force[ipol] -= fac * (ps[0] * dbb0 + ps[1] * dbb1 + ps[2] * dbb2 + ps[3] * dbb3).real(); + local_force[iforce] -= fac * (ps[0] * dbb0 + ps[1] * dbb1 + ps[2] * dbb2 + ps[3] * dbb3).real(); + } + } + else if(npol == 1) + { + for (int iforce = 0; iforce < 3; iforce++) + { + const int index0 = iforce * nbands * npol * nkb + ib2 * nkb + inkb; + const int index1 = ib2 * nkb + jnkb; + local_force[iforce] -= fac * (vu[(m * tlp1 + m2)] * conj(dbecp[index0]) * becp[index1]).real(); + } } } } - for (int ipol = 0; ipol < 3; ++ipol) + for (int iforce = 0; iforce < 3; ++iforce) { - force[iat * forcenl_nc + ipol] += local_force[ipol]; + force[iat * forcenl_nc + iforce] += local_force[iforce]; } } - vu += 4 * tlp1_2;// step for vu + vu += npol * npol * tlp1_2;// step for vu } // end ia iat0 += atom_na[it]; sum0 += atom_na[it] * nproj; @@ -374,6 +386,7 @@ struct cal_force_nl_op const int& nbands, const int& ik, const int& nkb, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE& tpiba, @@ -398,25 +411,43 @@ struct cal_force_nl_op const std::complex coefficients3(-1 * lambda[iat*3+2], 0.0); for (int ib = 0; ib < nbands_occ; ib++) { - const int ib2 = ib*2; FPTYPE local_force[3] = {0, 0, 0}; FPTYPE fac = d_wg[ik * wg_nc + ib] * 2.0 * tpiba; - for (int ip = 0; ip < nproj; ip++) + if (npol == 2) { - const int inkb = sum + ip; + const int ib2 = ib * 2; + for (int ip = 0; ip < nproj; ip++) + { + const int inkb = sum + ip; - for (int ipol = 0; ipol < 3; ipol++) + for (int ipol = 0; ipol < 3; ipol++) + { + const int index0 = ipol * nbands * 2 * nkb + ib2 * nkb + inkb; + const int index1 = ib2 * nkb + inkb; + const std::complex dbb0 = conj(dbecp[index0]) * becp[index1]; + const std::complex dbb1 = conj(dbecp[index0]) * becp[index1 + nkb]; + const std::complex dbb2 = conj(dbecp[index0 + nkb]) * becp[index1]; + const std::complex dbb3 = conj(dbecp[index0 + nkb]) * becp[index1 + nkb]; + + local_force[ipol] -= fac * (coefficients0 * dbb0 + coefficients1 * dbb2 + coefficients2 * dbb1 + coefficients3 * dbb3).real(); + } + } // ip + } + else if (npol == 1) + { + for (int ip = 0; ip < nproj; ip++) { - const int index0 = ipol * nbands * 2 * nkb + ib2 * nkb + inkb; - const int index1 = ib2 * nkb + inkb; - const std::complex dbb0 = conj(dbecp[index0]) * becp[index1]; - const std::complex dbb1 = conj(dbecp[index0]) * becp[index1 + nkb]; - const std::complex dbb2 = conj(dbecp[index0 + nkb]) * becp[index1]; - const std::complex dbb3 = conj(dbecp[index0 + nkb]) * becp[index1 + nkb]; + const int inkb = sum + ip; - local_force[ipol] -= fac * (coefficients0 * dbb0 + coefficients1 * dbb1 + coefficients2 * dbb2 + coefficients3 * dbb3).real(); - } - }//ip + for (int ipol = 0; ipol < 3; ipol++) + { + const int index0 = ipol * nbands * nkb + ib * nkb + inkb; + const int index1 = ib * nkb + inkb; + const FPTYPE dbb = (conj(dbecp[index0]) * becp[index1]).real(); + local_force[ipol] -= fac * lambda[iat*3+2] * dbb; + } + } // ip + } for (int ipol = 0; ipol < 3; ++ipol) { force[iat * forcenl_nc + ipol] += local_force[ipol]; diff --git a/source/source_pw/module_pwdft/kernels/force_op.h b/source/source_pw/module_pwdft/kernels/force_op.h index e31721913c7..67c2e85f625 100644 --- a/source/source_pw/module_pwdft/kernels/force_op.h +++ b/source/source_pw/module_pwdft/kernels/force_op.h @@ -121,6 +121,7 @@ struct cal_force_nl_op const int& nbands, const int& ik, const int& nkb, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE& tpiba, @@ -139,6 +140,7 @@ struct cal_force_nl_op const int& nbands, const int& ik, const int& nkb, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE& tpiba, @@ -250,6 +252,7 @@ struct cal_force_nl_op const int& nbands, const int& ik, const int& nkb, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE& tpiba, @@ -268,6 +271,7 @@ struct cal_force_nl_op const int& nbands, const int& ik, const int& nkb, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE& tpiba, diff --git a/source/source_pw/module_pwdft/kernels/onsite_op.cpp b/source/source_pw/module_pwdft/kernels/onsite_op.cpp index c9d7d14432c..8ac4e8fb846 100644 --- a/source/source_pw/module_pwdft/kernels/onsite_op.cpp +++ b/source/source_pw/module_pwdft/kernels/onsite_op.cpp @@ -16,23 +16,42 @@ struct onsite_ps_op std::complex* ps, const std::complex* becp) { + if(npol == 2) + { #ifdef _OPENMP #pragma omp parallel for collapse(2) #endif - for (int ib = 0; ib < npm / npol; ib++) + for (int ib = 0; ib < npm / npol; ib++) + { + for (int ip = 0; ip < tnp; ip++) + { + int ib2 = ib * npol; + int iat = ip_iat[ip]; + const int psind = ip * npm + ib2; + const int becpind = ib2 * tnp + ip; + ps[psind] += lambda_array[iat * 4] * becp[becpind] + + lambda_array[iat * 4 + 2] * becp[becpind + tnp]; + ps[psind + 1] += lambda_array[iat * 4 + 1] * becp[becpind] + + lambda_array[iat * 4 + 3] * becp[becpind + tnp]; + } // end ip + } // end ib + } + else // npol == 1, nspin=1 or nspin=2 { - for (int ip = 0; ip < tnp; ip++) +#ifdef _OPENMP +#pragma omp parallel for collapse(2) +#endif + for (int ib = 0; ib < npm; ib++) { - int ib2 = ib * npol; - int iat = ip_iat[ip]; - const int psind = ip * npm + ib2; - const int becpind = ib2 * tnp + ip; - ps[psind] += lambda_array[iat * 4] * becp[becpind] - + lambda_array[iat * 4 + 2] * becp[becpind + tnp]; - ps[psind + 1] += lambda_array[iat * 4 + 1] * becp[becpind] - + lambda_array[iat * 4 + 3] * becp[becpind + tnp]; - } // end ip - } // end ib + for (int ip = 0; ip < tnp; ip++) + { + int iat = ip_iat[ip]; + const int psind = ip * npm + ib; + const int becpind = ib * tnp + ip; + ps[psind] += lambda_array[iat] * becp[becpind]; + } // end ip + } // end ib + } }; // kernel for DFT+U calculation @@ -48,6 +67,8 @@ struct onsite_ps_op std::complex* ps, const std::complex* becp) { + if(npol == 2) + { #ifdef _OPENMP #pragma omp parallel for collapse(2) #endif @@ -78,6 +99,35 @@ struct onsite_ps_op } } // end ip } // end ib + } + else // npol == 1, nspin=1 or nspin=2 + { +#ifdef _OPENMP +#pragma omp parallel for collapse(2) +#endif + for (int ib = 0; ib < npm; ib++) + { + for (int ip = 0; ip < tnp; ip++) + { + int m1 = ip_m[ip]; + if(m1 < 0) continue; + int iat = ip_iat[ip]; + const std::complex* vu_iat = vu + vu_begin_iat[iat]; + int orb_l = orb_l_iat[iat]; + int tlp1 = 2 * orb_l + 1; + int ip2_begin = ip - m1; + int ip2_end = ip - m1 + tlp1; + const int psind = ip * npm + ib; + for(int ip2 = ip2_begin;ip2 dbb3 = conj(dbecp[inkb0 + nkb]) * becp[inkb + nkb]; const FPTYPE tmp = -fac - * (coefficients0 * dbb0 + coefficients1 * dbb1 + coefficients2 * dbb2 + coefficients3 * dbb3) + * (coefficients0 * dbb0 + coefficients1 * dbb2 + coefficients2 * dbb1 + coefficients3 * dbb3) .real(); atomicAdd(force + iat * forcenl_nc + ipol, tmp); } diff --git a/source/source_pw/module_pwdft/kernels/rocm/onsite_op.hip.cu b/source/source_pw/module_pwdft/kernels/rocm/onsite_op.hip.cu index 0826368deac..e3871aa95cb 100644 --- a/source/source_pw/module_pwdft/kernels/rocm/onsite_op.hip.cu +++ b/source/source_pw/module_pwdft/kernels/rocm/onsite_op.hip.cu @@ -20,15 +20,28 @@ __global__ void onsite_op(const int npm, const thrust::complex* becp) { const int ip = blockIdx.x; - const int nbands = npm / npol; - for (int ib = threadIdx.x; ib < nbands; ib += blockDim.x) + if(npol == 2) { - int ib2 = ib * npol; - int iat = ip_iat[ip]; - const int psind = ip * npm + ib2; - const int becpind = ib2 * tnp + ip; - ps[psind] += lambda_coeff[iat * 4] * becp[becpind] + lambda_coeff[iat * 4 + 2] * becp[becpind + tnp]; - ps[psind + 1] += lambda_coeff[iat * 4 + 1] * becp[becpind] + lambda_coeff[iat * 4 + 3] * becp[becpind + tnp]; + const int nbands = npm / npol; + for (int ib = threadIdx.x; ib < nbands; ib += blockDim.x) + { + int ib2 = ib * npol; + int iat = ip_iat[ip]; + const int psind = ip * npm + ib2; + const int becpind = ib2 * tnp + ip; + ps[psind] += lambda_coeff[iat * 4] * becp[becpind] + lambda_coeff[iat * 4 + 2] * becp[becpind + tnp]; + ps[psind + 1] += lambda_coeff[iat * 4 + 1] * becp[becpind] + lambda_coeff[iat * 4 + 3] * becp[becpind + tnp]; + } + } + else // npol == 1 + { + for (int ib = threadIdx.x; ib < npm; ib += blockDim.x) + { + int iat = ip_iat[ip]; + const int psind = ip * npm + ib; + const int becpind = ib * tnp + ip; + ps[psind] += lambda_coeff[iat] * becp[becpind]; + } } } @@ -48,26 +61,49 @@ __global__ void onsite_op(const int npm, int m1 = ip_m[ip]; if (m1 >= 0) { - const int nbands = npm / npol; - for (int ib = threadIdx.x; ib < nbands; ib += blockDim.x) + if (npol == 2) { - int ib2 = ib * npol; - int iat = ip_iat[ip]; - const thrust::complex* vu_iat = vu + vu_begin_iat[iat]; - int orb_l = orb_l_iat[iat]; - int tlp1 = 2 * orb_l + 1; - int tlp1_2 = tlp1 * tlp1; - int ip2_begin = ip - m1; - int ip2_end = ip - m1 + tlp1; - const int psind = ip * npm + ib2; - for (int ip2 = ip2_begin; ip2 < ip2_end; ip2++) + const int nbands = npm / npol; + for (int ib = threadIdx.x; ib < nbands; ib += blockDim.x) + { + int ib2 = ib * npol; + int iat = ip_iat[ip]; + const thrust::complex* vu_iat = vu + vu_begin_iat[iat]; + int orb_l = orb_l_iat[iat]; + int tlp1 = 2 * orb_l + 1; + int tlp1_2 = tlp1 * tlp1; + int ip2_begin = ip - m1; + int ip2_end = ip - m1 + tlp1; + const int psind = ip * npm + ib2; + for (int ip2 = ip2_begin; ip2 < ip2_end; ip2++) + { + const int becpind = ib2 * tnp + ip2; + int m2 = ip_m[ip2]; + const int index_mm = m1 * tlp1 + m2; + ps[psind] += vu_iat[index_mm] * becp[becpind] + vu_iat[index_mm + tlp1_2 * 2] * becp[becpind + tnp]; + ps[psind + 1] += vu_iat[index_mm + tlp1_2 * 1] * becp[becpind] + + vu_iat[index_mm + tlp1_2 * 3] * becp[becpind + tnp]; + } + } + } + else // npol == 1, nspin=1 or nspin=2 + { + for (int ib = threadIdx.x; ib < npm; ib += blockDim.x) { - const int becpind = ib2 * tnp + ip2; - int m2 = ip_m[ip2]; - const int index_mm = m1 * tlp1 + m2; - ps[psind] += vu_iat[index_mm] * becp[becpind] + vu_iat[index_mm + tlp1_2 * 2] * becp[becpind + tnp]; - ps[psind + 1] += vu_iat[index_mm + tlp1_2 * 1] * becp[becpind] - + vu_iat[index_mm + tlp1_2 * 3] * becp[becpind + tnp]; + int iat = ip_iat[ip]; + const thrust::complex* vu_iat = vu + vu_begin_iat[iat]; + int orb_l = orb_l_iat[iat]; + int tlp1 = 2 * orb_l + 1; + int ip2_begin = ip - m1; + int ip2_end = ip - m1 + tlp1; + const int psind = ip * npm + ib; + for (int ip2 = ip2_begin; ip2 < ip2_end; ip2++) + { + const int becpind = ib * tnp + ip2; + int m2 = ip_m[ip2]; + const int index_mm = m1 * tlp1 + m2; + ps[psind] += vu_iat[index_mm] * becp[becpind]; + } } } } diff --git a/source/source_pw/module_pwdft/kernels/rocm/stress_op.hip.cu b/source/source_pw/module_pwdft/kernels/rocm/stress_op.hip.cu index dd3a053f029..c36e00da421 100644 --- a/source/source_pw/module_pwdft/kernels/rocm/stress_op.hip.cu +++ b/source/source_pw/module_pwdft/kernels/rocm/stress_op.hip.cu @@ -1019,7 +1019,7 @@ __global__ void cal_stress_onsite( const thrust::complex dbb1 = conj(dbecp[inkb]) * becp[inkb + nkb]; const thrust::complex dbb2 = conj(dbecp[inkb + nkb]) * becp[inkb]; const thrust::complex dbb3 = conj(dbecp[inkb + nkb]) * becp[inkb + nkb]; - stress_var -= fac * (coefficients0 * dbb0 + coefficients1 * dbb1 + coefficients2 * dbb2 + coefficients3 * dbb3).real(); + stress_var -= fac * (coefficients0 * dbb0 + coefficients1 * dbb2 + coefficients2 * dbb1 + coefficients3 * dbb3).real(); } ++iat; sum+=nprojs; diff --git a/source/source_pw/module_pwdft/kernels/stress_op.cpp b/source/source_pw/module_pwdft/kernels/stress_op.cpp index 169b9c932c3..1c1d062a3eb 100644 --- a/source/source_pw/module_pwdft/kernels/stress_op.cpp +++ b/source/source_pw/module_pwdft/kernels/stress_op.cpp @@ -252,6 +252,7 @@ struct cal_stress_nl_op const int& ntype, const int& wg_nc, const int& ik, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE* d_wg, @@ -263,7 +264,7 @@ struct cal_stress_nl_op { // std::cout << " DFT+U kernel called " << std::endl; FPTYPE local_stress = 0; - int iat = 0, sum = 0; + int sum = 0; for (int it = 0; it < ntype; it++) { const int orbital_l = orbital_corr[it]; @@ -281,35 +282,53 @@ struct cal_stress_nl_op { for (int ib = 0; ib < nbands_occ; ib++) { - const int ib2 = ib*2; + const int ib2 = ib*npol; FPTYPE fac = d_wg[ik * wg_nc + ib]; - for (int ip1 = ip_begin; ip1 < ip_end; ip1++) + switch (npol) { - const int m1 = ip1 - ip_begin; - const int inkb1 = ib2 * nkb + sum + ia * nproj + ip1; - // out<<"\n ps = "< ps[4]; - for(int i = 0; i < 4; i++) + const int m1 = ip1 - ip_begin; + const int inkb1 = ib2 * nkb + sum + ia * nproj + ip1; + for (int ip2 = ip_begin; ip2 < ip_end; ip2++) { - ps[i] = vu[(i * tlp1_2 + m1 * tlp1 + m2)]; + const int m2 = ip2 - ip_begin; + const int inkb2 = ib2 * nkb + sum + ia * nproj + ip2; + local_stress -= fac * (vu[m1 * tlp1 + m2] * (conj(dbecp[inkb1]) * becp[inkb2])).real(); } - const int inkb2 = ib2 * nkb + sum + ia * nproj + ip2; - - const std::complex dbb0 = conj(dbecp[inkb1]) * becp[inkb2]; - const std::complex dbb1 = conj(dbecp[inkb1]) * becp[nkb + inkb2]; - const std::complex dbb2 = conj(dbecp[nkb + inkb1]) * becp[inkb2]; - const std::complex dbb3 = conj(dbecp[nkb + inkb1]) * becp[nkb + inkb2]; - local_stress -= fac * (ps[0] * dbb0 + ps[1] * dbb1 + ps[2] * dbb2 + ps[3] * dbb3).real(); - } - } // end ip + } // end ip + break; + case 2: + for (int ip1 = ip_begin; ip1 < ip_end; ip1++) + { + const int m1 = ip1 - ip_begin; + const int inkb1 = ib2 * nkb + sum + ia * nproj + ip1; + for (int ip2 = ip_begin; ip2 < ip_end; ip2++) + { + const int m2 = ip2 - ip_begin; + std::complex ps[4]; + for(int i = 0; i < 4; i++) + { + ps[i] = vu[(i * tlp1_2 + m1 * tlp1 + m2)]; + } + const int inkb2 = ib2 * nkb + sum + ia * nproj + ip2; + + const std::complex dbb0 = conj(dbecp[inkb1]) * becp[inkb2]; + const std::complex dbb1 = conj(dbecp[inkb1]) * becp[nkb + inkb2]; + const std::complex dbb2 = conj(dbecp[nkb + inkb1]) * becp[inkb2]; + const std::complex dbb3 = conj(dbecp[nkb + inkb1]) * becp[nkb + inkb2]; + local_stress -= fac * (ps[0] * dbb0 + ps[1] * dbb1 + ps[2] * dbb2 + ps[3] * dbb3).real(); + } + } // end ip + break; + default: + break; + } }// ib - vu += 4 * tlp1_2;// step for vu + vu += npol * npol * tlp1_2;// step for vu }// ia sum += atom_na[it] * nproj; - iat += atom_na[it]; } // end it *stress += local_stress; }; @@ -320,6 +339,7 @@ struct cal_stress_nl_op const int& ntype, const int& wg_nc, const int& ik, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE* d_wg, @@ -336,25 +356,43 @@ struct cal_stress_nl_op for (int ia = 0; ia < atom_na[it]; ia++) { int iat = iat0 + ia; - const std::complex coefficients0(lambda[iat*3+2], 0.0); - const std::complex coefficients1(lambda[iat*3] , lambda[iat*3+1]); - const std::complex coefficients2(lambda[iat*3] , -1 * lambda[iat*3+1]); - const std::complex coefficients3(-1 * lambda[iat*3+2], 0.0); - for (int ib = 0; ib < nbands_occ; ib++) + if (npol == 2) { - const int ib2 = ib*2; - FPTYPE fac = d_wg[ik * wg_nc + ib]; - for (int ip = 0; ip < nproj; ip++) + const std::complex coefficients0(lambda[iat*3+2], 0.0); + const std::complex coefficients1(lambda[iat*3] , lambda[iat*3+1]); + const std::complex coefficients2(lambda[iat*3] , -1 * lambda[iat*3+1]); + const std::complex coefficients3(-1 * lambda[iat*3+2], 0.0); + for (int ib = 0; ib < nbands_occ; ib++) { - const int inkb1 = ib2 * nkb + sum + ia * nproj + ip; - - const std::complex dbb0 = conj(dbecp[inkb1]) * becp[inkb1]; - const std::complex dbb1 = conj(dbecp[inkb1]) * becp[nkb + inkb1]; - const std::complex dbb2 = conj(dbecp[nkb + inkb1]) * becp[inkb1]; - const std::complex dbb3 = conj(dbecp[nkb + inkb1]) * becp[nkb + inkb1]; - local_stress -= fac * (coefficients0 * dbb0 + coefficients1 * dbb1 + coefficients2 * dbb2 + coefficients3 * dbb3).real(); - } // end ip - }// ib + const int ib2 = ib * 2; + FPTYPE fac = d_wg[ik * wg_nc + ib]; + for (int ip = 0; ip < nproj; ip++) + { + const int inkb1 = ib2 * nkb + sum + ia * nproj + ip; + + const std::complex dbb0 = conj(dbecp[inkb1]) * becp[inkb1]; + const std::complex dbb1 = conj(dbecp[inkb1]) * becp[nkb + inkb1]; + const std::complex dbb2 = conj(dbecp[nkb + inkb1]) * becp[inkb1]; + const std::complex dbb3 = conj(dbecp[nkb + inkb1]) * becp[nkb + inkb1]; + local_stress -= fac * (coefficients0 * dbb0 + coefficients1 * dbb2 + coefficients2 * dbb1 + coefficients3 * dbb3).real(); + } // end ip + } // ib + } + else if (npol == 1) + { + const FPTYPE coefficients0(lambda[iat*3+2]); + for (int ib = 0; ib < nbands_occ; ib++) + { + FPTYPE fac = d_wg[ik * wg_nc + ib]; + for (int ip = 0; ip < nproj; ip++) + { + const int inkb = ib * nkb + sum + ia * nproj + ip; + + const FPTYPE dbb = (conj(dbecp[inkb]) * becp[inkb]).real(); + local_stress -= fac * coefficients0 * dbb; + } // end ip + } // ib + } }// ia sum += atom_na[it] * nproj; iat0 += atom_na[it]; diff --git a/source/source_pw/module_pwdft/kernels/stress_op.h b/source/source_pw/module_pwdft/kernels/stress_op.h index fc81f355e41..b5d60e42a9c 100644 --- a/source/source_pw/module_pwdft/kernels/stress_op.h +++ b/source/source_pw/module_pwdft/kernels/stress_op.h @@ -129,6 +129,7 @@ struct cal_stress_nl_op const int& ntype, const int& wg_nc, const int& ik, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE* d_wg, @@ -144,6 +145,7 @@ struct cal_stress_nl_op const int& ntype, const int& wg_nc, const int& ik, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE* d_wg, @@ -334,6 +336,7 @@ struct cal_stress_nl_op const int& ntype, const int& wg_nc, const int& ik, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE* d_wg, @@ -349,6 +352,7 @@ struct cal_stress_nl_op const int& ntype, const int& wg_nc, const int& ik, + const int& npol, const int* atom_nh, const int* atom_na, const FPTYPE* d_wg, diff --git a/source/source_pw/module_pwdft/onsite_proj.cpp b/source/source_pw/module_pwdft/onsite_proj.cpp index 82bd3516fd4..c6ce26d8b8b 100644 --- a/source/source_pw/module_pwdft/onsite_proj.cpp +++ b/source/source_pw/module_pwdft/onsite_proj.cpp @@ -6,6 +6,9 @@ #include #include "source_pw/module_pwdft/onsite_proj.h" #include "source_pw/module_pwdft/onsite_proj_print.h" +#include "source_lcao/module_dftu/dftu.h" +#include "source_lcao/module_deltaspin/spin_constrain.h" +#include "source_io/module_parameter/parameter.h" #include "source_base/projgen.h" #include "source_base/kernels/math_kernel_op.h" @@ -111,6 +114,7 @@ void projectors::OnsiteProjector::init(const std::string& orbital_dir { this->ucell = ucell_in; this->ntype = ucell_in->ntype; + this->isk_ = kv.isk.data(); this->pw_basis_ = &pw_basis; this->sf_ = &sf; @@ -287,6 +291,7 @@ void projectors::OnsiteProjector::tabulate_atomic(const int ik, const // CACHE 1 - if cache the tab_, can be reused for SCF and RELAX calculation // [in] pw_basis, ik, omega, tpiba, irow2it this->ik_ = ik; + this->becp_ready_ = false; this->npw_ = pw_basis_->npwk[ik]; this->npwx_ = pw_basis_->npwk_max; // std::vector> q(this->npw_); @@ -340,7 +345,8 @@ void projectors::OnsiteProjector::tabulate_atomic(const int ik, const template void projectors::OnsiteProjector::overlap_proj_psi( const int npm, - const std::complex* ppsi) + const std::complex* ppsi, + const int ld_psi) { ModuleBase::timer::start("OnsiteProj", "overlap"); // STAGE 3 - cal_becp @@ -398,11 +404,13 @@ void projectors::OnsiteProjector::overlap_proj_psi( this->h_becp = this->becp; } } - this->fs_tools->cal_becp(ik_, npm/npol, this->becp, ppsi); // in cal_becp, npm should be the one not multiplied by npol + this->fs_tools->cal_becp(ik_, npm/npol, this->becp, ppsi, ld_psi > 0 ? ld_psi : this->npwx_); // in cal_becp, npm should be the one not multiplied by npol if(this->device == base_device::GpuDevice) { syncmem_complex_d2h_op()(h_becp, this->becp, this->size_becp); } + this->becp_ready_ = true; + this->ik_becp_ = this->ik_; ModuleBase::timer::end("OnsiteProj", "overlap"); } @@ -582,6 +590,46 @@ void projectors::OnsiteProjector::cal_occupations( ModuleBase::timer::end("OnsiteProj", "cal_occupation"); } +template +void projectors::OnsiteProjector::cal_force_onsite_dftu(int ik, int npm, T* force, + const Plus_U& dftu, int nks, + const double* wg_ik) const +{ + const int isk_val = this->isk_ ? this->isk_[ik] : 0; + const std::complex* vu_ptr = dftu.get_eff_pot_pw_spin(isk_val); + const int vu_size = dftu.get_size_eff_pot_pw_spin(); + this->fs_tools->cal_force_dftu(ik, npm, force, + dftu.get_orbital_corr_data(), vu_ptr, vu_size, wg_ik); +} + +template +double projectors::OnsiteProjector::cal_stress_onsite_dftu(int ik, int npm, + const Plus_U& dftu, int nks, + const double* wg_ik) const +{ + const int isk_val = this->isk_ ? this->isk_[ik] : 0; + const std::complex* vu_ptr = dftu.get_eff_pot_pw_spin(isk_val); + const int vu_size = dftu.get_size_eff_pot_pw_spin(); + return this->fs_tools->cal_stress_dftu(ik, npm, + dftu.get_orbital_corr_data(), vu_ptr, vu_size, wg_ik); +} + +template +void projectors::OnsiteProjector::cal_force_onsite_dspin(int ik, int npm, T* force, + const ModuleBase::Vector3* lambda, + const double* wg_ik) const +{ + this->fs_tools->cal_force_dspin(ik, npm, force, lambda, wg_ik); +} + +template +double projectors::OnsiteProjector::cal_stress_onsite_dspin(int ik, int npm, + const ModuleBase::Vector3* lambda, + const double* wg_ik) const +{ + return this->fs_tools->cal_stress_dspin(ik, npm, lambda, wg_ik); +} + template class projectors::OnsiteProjector; #if ((defined __CUDA) || (defined __ROCM)) template class projectors::OnsiteProjector; diff --git a/source/source_pw/module_pwdft/onsite_proj.h b/source/source_pw/module_pwdft/onsite_proj.h index 34c39e1fcd3..fdb83355ac3 100644 --- a/source/source_pw/module_pwdft/onsite_proj.h +++ b/source/source_pw/module_pwdft/onsite_proj.h @@ -7,6 +7,7 @@ #include "source_pw/module_pwdft/radial_proj.h" #include "source_psi/psi.h" #include "source_pw/module_pwdft/onsite_proj_tools.h" +#include "source_lcao/module_dftu/dftu.h" #include #include @@ -43,9 +44,13 @@ namespace projectors */ void tabulate_atomic(const int ik, const char grad = 'n'); + /// compute becp = ; ld_psi is the leading dimension of psi + /// (defaults to npwx if 0, but should be ngk[ik] when called from + /// the Davidson/CG solver where psi stride varies per k-point) void overlap_proj_psi( const int npm, - const std::complex* ppsi + const std::complex* ppsi, + const int ld_psi = 0 ); void read_abacus_orb(std::ifstream& ifs, std::string& elem, @@ -81,8 +86,31 @@ namespace projectors int get_npwx() const { return npwx_; } const int& get_nh(int iat) const { return iat_nh[iat]; } + bool is_becp_ready(int ik) const { return becp_ready_ && ik_becp_ == ik; } + void invalidate_becp() { becp_ready_ = false; } + hamilt::Onsite_Proj_tools* get_fs_tools() const { return fs_tools; } + /// high-level: compute DFT+U force contribution for one k-point + void cal_force_onsite_dftu(int ik, int npm, T* force, + const Plus_U& dftu, int nks, + const double* wg_ik) const; + + /// high-level: compute DFT+U stress contribution for one k-point + double cal_stress_onsite_dftu(int ik, int npm, + const Plus_U& dftu, int nks, + const double* wg_ik) const; + + /// high-level: compute DeltaSpin force contribution for one k-point + void cal_force_onsite_dspin(int ik, int npm, T* force, + const ModuleBase::Vector3* lambda, + const double* wg_ik) const; + + /// high-level: compute DeltaSpin stress contribution for one k-point + double cal_stress_onsite_dspin(int ik, int npm, + const ModuleBase::Vector3* lambda, + const double* wg_ik) const; + private: OnsiteProjector(){}; ~OnsiteProjector(); @@ -105,6 +133,8 @@ namespace projectors int npw_ = 0; int npwx_ = 0; int ik_ = 0; + bool becp_ready_ = false; + int ik_becp_ = -1; std::vector> it2ia; std::vector rgrid; std::vector> projs; @@ -114,6 +144,8 @@ namespace projectors const UnitCell* ucell = nullptr; + const int* isk_ = nullptr; ///< spin index per k-point (from K_Vectors) + const ModulePW::PW_Basis_K* pw_basis_ = nullptr; // level1: the plane wave basis, need ik Structure_Factor* sf_ = nullptr; // level2: the structure factor calculator int ntype = 0; diff --git a/source/source_pw/module_pwdft/onsite_proj_tools.cpp b/source/source_pw/module_pwdft/onsite_proj_tools.cpp index fec5f0a9fb2..66cd54134ae 100644 --- a/source/source_pw/module_pwdft/onsite_proj_tools.cpp +++ b/source/source_pw/module_pwdft/onsite_proj_tools.cpp @@ -278,7 +278,8 @@ template void Onsite_Proj_tools::cal_becp(int ik, int npm, std::complex* becp_in, - const std::complex* ppsi_in) + const std::complex* ppsi_in, + int npwx) { ModuleBase::TITLE("Onsite_Proj_tools", "cal_becp"); ModuleBase::timer::start("Onsite_Proj_tools", "cal_becp"); @@ -434,7 +435,7 @@ void Onsite_Proj_tools::cal_becp(int ik, this->ppcell_vkb, npw, ppsi, - this->max_npw, + npwx > 0 ? npwx : this->max_npw, &ModuleBase::ZERO, becp_tmp, this->nkb); @@ -830,6 +831,7 @@ void Onsite_Proj_tools::cal_force_dftu(int ik, d_wg = const_cast(h_wg); } const int force_nc = 3; + const int npol = this->ucell_->get_npol(); cal_force_nl_op()(this->ctx, npm, this->nbands, @@ -838,6 +840,7 @@ void Onsite_Proj_tools::cal_force_dftu(int ik, this->nbands, ik, nkb, + npol, atom_nh, atom_na, this->ucell_->tpiba, @@ -885,6 +888,7 @@ void Onsite_Proj_tools::cal_force_dspin(int ik, d_wg = const_cast(h_wg); } const int force_nc = 3; + const int npol = this->ucell_->get_npol(); cal_force_nl_op()(this->ctx, npm, this->nbands, @@ -893,6 +897,7 @@ void Onsite_Proj_tools::cal_force_dspin(int ik, this->nbands, ik, nkb, + npol, atom_nh, atom_na, this->ucell_->tpiba, @@ -919,6 +924,7 @@ double Onsite_Proj_tools::cal_stress_dftu(int ik, const FPTYPE* h_wg) { double stress_out = 0.0; + const int npol = this->ucell_->get_npol(); int* orb_corr_tmp = nullptr; std::complex* vu_tmp = nullptr; @@ -947,6 +953,7 @@ double Onsite_Proj_tools::cal_stress_dftu(int ik, this->ntype, this->nbands, ik, + npol, atom_nh, atom_na, d_wg, @@ -961,7 +968,6 @@ double Onsite_Proj_tools::cal_stress_dftu(int ik, delmem_var_op()(stress_device); delmem_complex_op()(vu_tmp); delmem_int_op()(orb_corr_tmp); - std::cout << "BUG: DFT+U (GPU) stress_out = " << stress_out << std::endl; } else #endif @@ -976,6 +982,7 @@ double Onsite_Proj_tools::cal_stress_dftu(int ik, this->ntype, this->nbands, ik, + npol, atom_nh, atom_na, d_wg, @@ -997,6 +1004,7 @@ double Onsite_Proj_tools::cal_stress_dspin(int ik, const FPTYPE* h_wg) { double stress_out = 0.0; + const int npol = this->ucell_->get_npol(); std::vector lambda_array(this->ucell_->nat * 3); for (int iat = 0; iat < this->ucell_->nat; iat++) @@ -1025,6 +1033,7 @@ double Onsite_Proj_tools::cal_stress_dspin(int ik, this->ntype, this->nbands, ik, + npol, atom_nh, atom_na, d_wg, @@ -1051,6 +1060,7 @@ double Onsite_Proj_tools::cal_stress_dspin(int ik, this->ntype, this->nbands, ik, + npol, atom_nh, atom_na, d_wg, diff --git a/source/source_pw/module_pwdft/onsite_proj_tools.h b/source/source_pw/module_pwdft/onsite_proj_tools.h index e877a85070c..0b7ef73b83f 100644 --- a/source/source_pw/module_pwdft/onsite_proj_tools.h +++ b/source/source_pw/module_pwdft/onsite_proj_tools.h @@ -62,7 +62,7 @@ class Onsite_Proj_tools /** * @brief calculate the becp = for all beta functions */ - void cal_becp(int ik, int npm, std::complex* becp_in = nullptr, const std::complex* ppsi_in = nullptr); + void cal_becp(int ik, int npm, std::complex* becp_in = nullptr, const std::complex* ppsi_in = nullptr, int npwx = 0); /** * @brief calculate the dbecp_{ij} = for all beta functions * stress_{ij} = -1/omega \sum_{n,k}f_{nk} \sum_I \sum_{lm,l'm'}D_{l,l'}^{I} becp * dbecp_{ij} also calculated diff --git a/source/source_pw/module_pwdft/op_pw_nl.cpp b/source/source_pw/module_pwdft/op_pw_nl.cpp index d3551808ea7..0e5de357b1d 100644 --- a/source/source_pw/module_pwdft/op_pw_nl.cpp +++ b/source/source_pw/module_pwdft/op_pw_nl.cpp @@ -172,7 +172,7 @@ void Nonlocal>::add_nonlocal_pp(T *hpsi_in, const T *becp, this->ppcell->nkb, &this->one, this->vkb, - this->ppcell->vkbnc, + this->ppcell->vkb.nc, this->ps, inc, &this->one, @@ -197,7 +197,7 @@ void Nonlocal>::add_nonlocal_pp(T *hpsi_in, const T *becp, this->ppcell->nkb, &this->one, this->vkb, - this->ppcell->vkbnc, + this->ppcell->vkb.nc, this->ps, npm, &this->one, @@ -251,7 +251,7 @@ void Nonlocal>::act( nkb, &this->one, this->vkb, - this->ppcell->vkbnc, + this->ppcell->vkb.nc, tmpsi_in, inc, &this->zero, @@ -276,7 +276,7 @@ void Nonlocal>::act( this->npw, &this->one, this->vkb, - this->ppcell->vkbnc, + this->ppcell->vkb.nc, tmpsi_in, max_npw, &this->zero, diff --git a/source/source_pw/module_pwdft/op_pw_proj.cpp b/source/source_pw/module_pwdft/op_pw_proj.cpp index 8c7cddfc89c..261131f555d 100644 --- a/source/source_pw/module_pwdft/op_pw_proj.cpp +++ b/source/source_pw/module_pwdft/op_pw_proj.cpp @@ -70,16 +70,14 @@ void OnsiteProj>::init(const int ik_in) // this function sum up each non-local pseudopotential located on each atom, //-------------------------------------------------------------------------- template -void OnsiteProj>::add_onsite_proj(T *hpsi_in, const int npol, const int m) const +void OnsiteProj>::add_onsite_proj(T *hpsi_in, const int npol, const int m, const int npwx) const { ModuleBase::timer::start("OnsiteProj", "add_onsite_proj"); auto* onsite_p = projectors::OnsiteProjector::get_instance(); - // apply the operator to the wavefunction - //std::cout << "use of tab_atomic at " << __FILE__ << ": " << __LINE__ << std::endl; const std::complex* tab_atomic = onsite_p->get_tab_atomic(); const int npw = onsite_p->get_npw(); - const int npwx = onsite_p->get_npwx(); + // npwx passed as parameter char transa = 'N'; char transb = 'T'; int npm = m; @@ -102,12 +100,10 @@ void OnsiteProj>::add_onsite_proj(T *hpsi_in, const int np } template -void OnsiteProj>::update_becp(const T *psi_in, const int npol, const int m) const +void OnsiteProj>::update_becp(const T *psi_in, const int npol, const int m, const int npwx) const { auto* onsite_p = projectors::OnsiteProjector::get_instance(); - // calculate - // std::cout << __FILE__ << ":" << __LINE__ << " nbands = " << m << std::endl; - onsite_p->overlap_proj_psi(m, psi_in); + onsite_p->overlap_proj_psi(m, psi_in, npwx); } template @@ -168,46 +164,88 @@ void OnsiteProj>::cal_ps_delta_spin(const int npol, const tnp, this->lambda_coeff, this->ps, becp); +} - /*int sum = 0; - if (npol == 1) - { - const int current_spin = this->isk[this->ik]; - } - else +// cal_ps_dftu — compute ps = VU * becp for DFT+U Hamiltonian contribution +// +// eff_pot_pw layout by nspin: +// nspin=1: [iat0_tlp1^2 | iat1_tlp1^2 | ...] +// single spin channel, full array uploaded +// nspin=2: [iat0_up | iat1_up | ... | iat0_dn | iat1_dn | ...] +// split layout — first half is spin-up, second half spin-down. +// For isk==1 (spin-down k-point), only the second half is +// uploaded to vu_device so that vu_begin_iat[iat] indexes +// correctly into the spin-down block. +// nspin=4: [iat0_Pauli_4blocks | iat1_Pauli_4blocks | ...] +// 4*(2l+1)^2 entries per atom; kernel uses npol=2 spinor +// structure with 2x2 Pauli matrix coefficients. +// +// vu_begin_iat is computed as tlp1^2 * npol^2 per atom at init time, +// which gives the correct offset for each nspin case: +// nspin=1: tlp1^2 * 1 = tlp1^2 +// nspin=2: tlp1^2 * 1 = tlp1^2 (per spin channel, selected by isk) +// nspin=4: tlp1^2 * 4 = (2*tlp1)^2 +template +void OnsiteProj>::setup_pw_dftu_indices() const +{ + this->init_dftu = true; + auto* onsite_p = projectors::OnsiteProjector::get_instance(); + const int npol = this->ucell->get_npol(); + + resmem_int_op()(this->orb_l_iat, this->ucell->nat); + resmem_int_op()(this->ip_m, onsite_p->get_tot_nproj()); + resmem_int_op()(this->vu_begin_iat, this->ucell->nat); + resmem_int_op()(this->ip_iat, onsite_p->get_tot_nproj()); + + std::vector ip_iat0(onsite_p->get_tot_nproj()); + std::vector ip_m0(onsite_p->get_tot_nproj()); + std::vector vu_begin_iat0(this->ucell->nat); + std::vector orb_l_iat0(this->ucell->nat); + int ip0 = 0; + int vu_begin = 0; + for(int iat=0;iatucell->nat;iat++) { - for (int iat = 0; iat < this->ucell->nat; iat++) + const int it = this->ucell->iat2it[iat]; + const int target_l = this->dftu->get_orbital_corr(it); + orb_l_iat0[iat] = target_l; + const int nproj = onsite_p->get_nh(iat); + if(target_l == -1) { - const int nproj = onsite_p->get_nh(iat); - if(constrain[iat].x == 0 && constrain[iat].y == 0 && constrain[iat].z == 0) + for(int ip=0;ip coefficients0(lambda[iat][2], 0.0); - const std::complex coefficients1(lambda[iat][0] , lambda[iat][1]); - const std::complex coefficients2(lambda[iat][0] , -1 * lambda[iat][1]); - const std::complex coefficients3(-1 * lambda[iat][2], 0.0); - // each atom has nproj, means this is with structure factor; - // each projector (each atom) must multiply coefficient - // with all the other projectors. - for (int ib = 0; ib < m; ib+=2) + vu_begin_iat0[iat] = 0; + continue; + } + else + { + const int tlp1 = 2 * target_l + 1; + vu_begin_iat0[iat] = vu_begin; + vu_begin += tlp1 * tlp1 * npol * npol; + const int m_begin = target_l * target_l; + const int m_end = (target_l + 1) * (target_l + 1); + for(int ip=0;ip= m_begin && ip < m_end) { - const int psind = (sum + ip) * m + ib; - const int becpind = ib * tnp + sum + ip; - const std::complex becp1 = becp[becpind]; - const std::complex becp2 = becp[becpind + tnp]; - ps[psind] += coefficients0 * becp1 - + coefficients2 * becp2; - ps[psind + 1] += coefficients1 * becp1 - + coefficients3 * becp2; - } // end ip - } // end ib - sum += nproj; - } // end iat - }*/ + ip_m0[ip0++] = ip - m_begin; + } + else + { + ip_m0[ip0++] = -1; + } + } + } + } + syncmem_int_h2d_op()(this->orb_l_iat, orb_l_iat0.data(), this->ucell->nat); + syncmem_int_h2d_op()(this->ip_iat, ip_iat0.data(), onsite_p->get_tot_nproj()); + syncmem_int_h2d_op()(this->ip_m, ip_m0.data(), onsite_p->get_tot_nproj()); + syncmem_int_h2d_op()(this->vu_begin_iat, vu_begin_iat0.data(), this->ucell->nat); + + resmem_complex_op()(this->vu_device, dftu->get_size_eff_pot_pw()); } template @@ -223,8 +261,6 @@ void OnsiteProj>::cal_ps_dftu( auto* onsite_p = projectors::OnsiteProjector::get_instance(); const std::complex* becp = onsite_p->get_becp(); - // T *ps = new T[tnp * m]; - // ModuleBase::GlobalFunc::ZEROS(ps, m * tnp); if (this->nkb_m < m * tnp) { resmem_complex_op()(this->ps, tnp * m, "OnsiteProj::ps"); this->nkb_m = m * tnp; @@ -236,140 +272,40 @@ void OnsiteProj>::cal_ps_dftu( if(!this->init_dftu) { - this->init_dftu = true; - //prepare orb_l_iat, ip_m, vu_begin_iat and vu_device - resmem_int_op()(this->orb_l_iat, this->ucell->nat); - resmem_int_op()(this->ip_m, onsite_p->get_tot_nproj()); - resmem_int_op()(this->vu_begin_iat, this->ucell->nat); - // recal the ip_iat - resmem_int_op()(this->ip_iat, onsite_p->get_tot_nproj()); - std::vector ip_iat0(onsite_p->get_tot_nproj()); - std::vector ip_m0(onsite_p->get_tot_nproj()); - std::vector vu_begin_iat0(this->ucell->nat); - std::vector orb_l_iat0(this->ucell->nat); - int ip0 = 0; - int vu_begin = 0; - for(int iat=0;iatucell->nat;iat++) - { - const int it = this->ucell->iat2it[iat]; - const int target_l = this->dftu->orbital_corr[it]; - orb_l_iat0[iat] = target_l; - const int nproj = onsite_p->get_nh(iat); - if(target_l == -1) - { - for(int ip=0;ip= m_begin && ip < m_end) - { - ip_m0[ip0++] = ip - m_begin; - } - else - { - ip_m0[ip0++] = -1; - } - } - } - } - syncmem_int_h2d_op()(this->orb_l_iat, orb_l_iat0.data(), this->ucell->nat); - syncmem_int_h2d_op()(this->ip_iat, ip_iat0.data(), onsite_p->get_tot_nproj()); - syncmem_int_h2d_op()(this->ip_m, ip_m0.data(), onsite_p->get_tot_nproj()); - syncmem_int_h2d_op()(this->vu_begin_iat, vu_begin_iat0.data(), this->ucell->nat); - - resmem_complex_op()(this->vu_device, dftu->get_size_eff_pot_pw()); + this->setup_pw_dftu_indices(); } - syncmem_complex_h2d_op()(this->vu_device, dftu->get_eff_pot_pw(0), dftu->get_size_eff_pot_pw()); - + const int isk_val = (PARAM.inp.nspin == 2) ? this->isk[this->ik] : 0; + const std::complex* vu_host = dftu->get_eff_pot_pw_spin(isk_val); + const int vu_size = dftu->get_size_eff_pot_pw_spin(); + syncmem_complex_h2d_op()(this->vu_device, vu_host, vu_size); hamilt::onsite_ps_op()( - this->ctx, // device context - m, + this->ctx, + m, npol, this->orb_l_iat, this->ip_iat, this->ip_m, - this->vu_begin_iat, - tnp, + this->vu_begin_iat, + tnp, this->vu_device, this->ps, becp); - - /* - int sum = 0; - if (npol == 1) - { - const int current_spin = this->isk[this->ik]; - } - else - { - for (int iat = 0; iat < this->ucell->nat; iat++) - { - const int it = this->ucell->iat2it[iat]; - const int target_l = dftu->orbital_corr[it]; - const int nproj = onsite_p->get_nh(iat); - if(target_l == -1) - { - sum += nproj; - continue; - } - const int ip_begin = target_l * target_l; - const int ip_end = (target_l + 1) * (target_l + 1); - const int tlp1 = 2 * target_l + 1; - const int tlp1_2 = tlp1 * tlp1; - const std::complex* vu = dftu->get_eff_pot_pw(iat); - // each projector (each atom) must multiply coefficient - // with all the other projectors. - for (int ib = 0; ib < m; ib+=2) - { - for (int ip2 = ip_begin; ip2 < ip_end; ip2++) - { - const int psind = (sum + ip2) * m + ib; - const int m2 = ip2 - ip_begin; - for (int ip1 = ip_begin; ip1 < ip_end; ip1++) - { - const int becpind1 = ib * tnp + sum + ip1; - const int m1 = ip1 - ip_begin; - const int index_mm = m1 * tlp1 + m2; - const std::complex becp1 = becp[becpind1]; - const std::complex becp2 = becp[becpind1 + tnp]; - ps[psind] += vu[index_mm] * becp1 - + vu[index_mm + tlp1_2 * 2] * becp2; - ps[psind + 1] += vu[index_mm + tlp1_2 * 1] * becp1 - + vu[index_mm + tlp1_2 * 3] * becp2; - } // end ip1 - } // end ip2 - } // end ib - sum += nproj; - } // end iat - }*/ } template<> void OnsiteProj, base_device::DEVICE_CPU>>::add_onsite_proj( std::complex *hpsi_in, const int npol, - const int m) const + const int m, + const int npwx) const {} template<> void OnsiteProj, base_device::DEVICE_CPU>>::update_becp( const std::complex *psi_in, const int npol, - const int m) const + const int m, + const int npwx) const {} template<> @@ -389,14 +325,16 @@ template<> void OnsiteProj, base_device::DEVICE_GPU>>::add_onsite_proj( std::complex *hpsi_in, const int npol, - const int m) const + const int m, + const int npwx) const {} template<> void OnsiteProj, base_device::DEVICE_GPU>>::update_becp( const std::complex *psi_in, const int npol, - const int m) const + const int m, + const int npwx) const {} template<> @@ -412,6 +350,21 @@ void OnsiteProj, base_device::DEVICE_GPU>>::cal_p {} #endif +// OnsiteProj::act — apply DFT+U and/or DeltaSpin Hamiltonian correction +// +// Leading dimension note: +// The Davidson/CG solver allocates psi and hpsi with stride ld_psi = ngk[ik] +// (the number of G-vectors for the current k-point), NOT npwx (the maximum +// across all k-points). We must pass ld_psi = nbasis/npol through the +// GEMM chain to avoid buffer overflow when ngk[ik] < npwx. +// +// nspin handling in cal_ps_dftu: +// nspin=1 (npol=1): single spin channel, no spin selection needed +// nspin=2 (npol=1): eff_pot_pw uses split layout [all_up | all_dn]; +// spin-up k-points (isk=0) read from the first half; +// spin-down k-points (isk=1) read from the second half. +// nspin=4 (npol=2): all 4 Pauli blocks stored per-atom; kernel uses +// 2x2 spinor structure with tlp1_npol^2 entries per atom. template void OnsiteProj>::act( const int nbands, @@ -423,10 +376,11 @@ void OnsiteProj>::act( const bool is_first_node)const { ModuleBase::timer::start("Operator", "OnsiteProjPW"); - this->update_becp(tmpsi_in, npol, nbands); + const int ld_psi = nbasis / npol; + this->update_becp(tmpsi_in, npol, nbands, ld_psi); this->cal_ps_delta_spin(npol, nbands); this->cal_ps_dftu(npol, nbands); - this->add_onsite_proj(tmhpsi, npol, nbands); + this->add_onsite_proj(tmhpsi, npol, nbands, ld_psi); ModuleBase::timer::end("Operator", "OnsiteProjPW"); } diff --git a/source/source_pw/module_pwdft/op_pw_proj.h b/source/source_pw/module_pwdft/op_pw_proj.h index 50207cc7b78..bd8044724da 100644 --- a/source/source_pw/module_pwdft/op_pw_proj.h +++ b/source/source_pw/module_pwdft/op_pw_proj.h @@ -54,9 +54,12 @@ class OnsiteProj> : public OperatorPW void cal_ps_dftu(const int npol, const int m) const; - void update_becp(const T* psi_in, const int npol, const int m) const; + /// one-time setup of DFT+U PW index arrays (orb_l_iat, ip_iat, ip_m, vu_begin_iat) + void setup_pw_dftu_indices() const; - void add_onsite_proj(T *hpsi_in, const int npol, const int m) const; + void update_becp(const T* psi_in, const int npol, const int m, const int npwx) const; + + void add_onsite_proj(T *hpsi_in, const int npol, const int m, const int npwx) const; const int* isk = nullptr; diff --git a/source/source_pw/module_pwdft/setup_pot.cpp b/source/source_pw/module_pwdft/setup_pot.cpp index 4a194bc0483..11f4f5c69d7 100644 --- a/source/source_pw/module_pwdft/setup_pot.cpp +++ b/source/source_pw/module_pwdft/setup_pot.cpp @@ -98,6 +98,7 @@ void pw::setup_pot(const int istep, PARAM.inp.sccut, PARAM.inp.sc_drop_thr, ucell, + PARAM.inp.sc_direction_only, nullptr, // parallel orbitals PARAM.inp.nspin, kv, diff --git a/source/source_pw/module_pwdft/stress_onsite.cpp b/source/source_pw/module_pwdft/stress_onsite.cpp index 1b9a08bb882..99f69c910dd 100644 --- a/source/source_pw/module_pwdft/stress_onsite.cpp +++ b/source/source_pw/module_pwdft/stress_onsite.cpp @@ -98,18 +98,10 @@ void Stress_Func::stress_onsite( // Calculate dbecp_s = fs_tools->cal_dbecp_s(ik, num_occupied_bands, ipol, jpol); - // Add DFT+U contribution if enabled if (PARAM.inp.dft_plus_u) { - // Calculate DFT+U stress contribution - double dftu_stress = fs_tools->cal_stress_dftu( - ik, - num_occupied_bands, - dftu.orbital_corr.data(), - dftu.get_eff_pot_pw(0), - dftu.get_size_eff_pot_pw(), - wg.c - ); + double dftu_stress = onsite_projector->cal_stress_onsite_dftu( + ik, num_occupied_bands, dftu, nks, wg.c); sigma_onsite[idx] += dftu_stress; #ifdef __DEBUG @@ -117,23 +109,13 @@ void Stress_Func::stress_onsite( #endif } - // Add spin constraint contribution if enabled if (PARAM.inp.sc_mag_switch) { - // Get spin constraint instance spinconstrain::SpinConstrain>& spin_constrain = spinconstrain::SpinConstrain>::getScInstance(); - // Get lambda parameters - const std::vector>& lambda = spin_constrain.get_sc_lambda(); - - // Calculate spin constraint stress contribution - double dspin_stress = fs_tools->cal_stress_dspin( - ik, - num_occupied_bands, - lambda.data(), - wg.c - ); + double dspin_stress = onsite_projector->cal_stress_onsite_dspin( + ik, num_occupied_bands, spin_constrain.get_sc_lambda().data(), wg.c); sigma_onsite[idx] += dspin_stress; } diff --git a/source/source_pw/module_pwdft/vnl_pw.cpp b/source/source_pw/module_pwdft/vnl_pw.cpp index 83c42adac89..d6b5274d51f 100644 --- a/source/source_pw/module_pwdft/vnl_pw.cpp +++ b/source/source_pw/module_pwdft/vnl_pw.cpp @@ -214,17 +214,10 @@ void pseudopot_cell_vnl::init(const UnitCell& ucell, // dq+4)*cell_factor; this->lmaxq = 2 * this->lmaxkb + 1; int npwx = this->wfcpw->npwk_max; - this->vkbnc = npwx; if (nkb > 0 && allocate_vkb) { - if (!this->use_gpu_) - { - vkb.create(nkb, npwx); - ModuleBase::Memory::record("VNL::vkb", nkb * npwx * sizeof(std::complex)); - } - // GPU path: vkb ComplexMatrix is not allocated. - // Column dimension is stored in vkbnc for gemm/gemv leading dimension. - // Actual GPU buffers (c_vkb/z_vkb) are allocated below. + vkb.create(nkb, npwx); + ModuleBase::Memory::record("VNL::vkb", nkb * npwx * sizeof(std::complex)); } // this->nqx = 10000; // calculted in allocate_nlpot.f90 diff --git a/source/source_pw/module_pwdft/vnl_pw.h b/source/source_pw/module_pwdft/vnl_pw.h index 6282b138657..93a593e9257 100644 --- a/source/source_pw/module_pwdft/vnl_pw.h +++ b/source/source_pw/module_pwdft/vnl_pw.h @@ -108,10 +108,6 @@ class pseudopot_cell_vnl std::complex*** vkb_alpha; Structure_Factor* psf = nullptr; - // Column dimension of vkb matrix (= npwx), used as leading dimension in gemm/gemv. - // On GPU path vkb ComplexMatrix is not allocated to save CPU memory; this stores the dimension. - int vkbnc = 0; - // other variables std::complex Cal_C(int alpha, int lu, int mu, int L, int M); diff --git a/source/source_pw/module_pwdft/vnl_pw_grad.cpp b/source/source_pw/module_pwdft/vnl_pw_grad.cpp index 65984cf581a..135fe475944 100644 --- a/source/source_pw/module_pwdft/vnl_pw_grad.cpp +++ b/source/source_pw/module_pwdft/vnl_pw_grad.cpp @@ -91,11 +91,6 @@ void pseudopot_cell_vnl::getgradq_vnl(const UnitCell& ucell, ModuleBase::YlmReal::grad_Ylm_Real(x1, npw, gk, ylm, dylm[0], dylm[1], dylm[2]); - // GPU path skips vkb allocation in init(); allocate now if needed - if (this->vkb.nc == 0 && this->nkb > 0 && this->vkbnc > 0) { - this->vkb.create(this->nkb, this->vkbnc); - } - int jkb = 0; for(int it = 0;it < ucell.ntype;it++) { diff --git a/tests/01_PW/035_PW_15_SO/log_all_fix.txt b/tests/01_PW/035_PW_15_SO/log_all_fix.txt new file mode 100644 index 00000000000..0c68c0f61e0 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_all_fix.txt @@ -0,0 +1,114 @@ + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 10:30:11 2026 +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(1.0224e-05 SEC) : SETUP UNITCELL + DONE(0.00217029 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0227723 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.121629 SEC) : LOCAL POTENTIAL + DONE(0.156523 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.156625 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.680372 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 10.16 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 4.09 + DS3 0.00e+00 0.00e+00 8.68e-01 8.72e-01 -1.67677930e+03 4.55612625e+00 1.0730e+00 2.01 + DS4 0.00e+00 0.00e+00 7.46e-01 7.69e-01 -1.67820852e+03 -1.42921557e+00 8.1469e-02 2.88 + DS5 0.00e+00 0.00e+00 7.61e-01 7.70e-01 -1.67833326e+03 -1.24741925e-01 2.3457e-02 3.77 + DS6 0.00e+00 0.00e+00 7.60e-01 7.69e-01 -1.67835572e+03 -2.24548962e-02 3.2082e-03 1.08 + DS7 0.00e+00 0.00e+00 7.42e-01 7.50e-01 -1.67836230e+03 -6.58348573e-03 9.5446e-04 0.27 + DS8 0.00e+00 0.00e+00 7.31e-01 7.38e-01 -1.67836427e+03 -1.97256808e-03 1.3430e-04 0.08 + DS9 0.00e+00 0.00e+00 7.28e-01 7.35e-01 -1.67836476e+03 -4.90206437e-04 5.3510e-05 0.19 + DS10 0.00e+00 0.00e+00 7.32e-01 7.40e-01 -1.67836488e+03 -1.12483838e-04 2.8637e-05 0.10 + DS11 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836501e+03 -1.31704337e-04 1.1546e-05 0.07 + DS12 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -6.86851807e-05 3.3868e-06 0.08 + DS13 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -9.16607677e-06 2.4541e-06 0.07 + DS14 0.00e+00 0.00e+00 7.34e-01 7.41e-01 -1.67836510e+03 -1.34461071e-05 3.5635e-07 0.09 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -10683.9706741759 -396.2387945264 396.2241082742 + -396.2387945264 -10683.9707016515 396.2241283692 + 396.2241082742 396.2241283692 -10626.8786336910 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -10664.940003 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + Driver atomic_world 25.68 1 25.68 100.00 + total 25.66 14 1.83 99.93 + PW_Basis_Sup recip2real 0.30 250 0.00 1.17 + Relax_Driver relax_driver 25.50 1 25.50 99.31 + ESolver_KS runner 25.48 1 25.48 99.23 + ESolver_KS_PW before_scf 0.52 1 0.52 2.04 + Potential cal_veff 0.57 15 0.04 2.22 + PW_Basis_Sup real2recip 0.39 289 0.00 1.53 + PotXC cal_veff 0.51 15 0.03 1.98 + XC_Functional v_xc 0.51 15 0.03 1.97 + PSIPrepare initialize_psi 0.44 1 0.44 1.71 + psi_init random_t 0.44 2 0.22 1.70 + psi_init stick_to_pool 0.28 27664 0.00 1.08 + ESolver_KS_PW hamilt2rho_single 24.24 14 1.73 94.39 + HSolverPW solve 24.24 14 1.73 94.39 + HSolverPW solve_psik 21.31 28 0.76 82.97 + Diago_DavSubspace diag_once 21.21 28 0.76 82.61 + Diago_DavSubspace first 5.13 28 0.18 19.99 + Operator hPsi 17.46 110 0.16 67.99 + Operator veff_pw 17.11 110 0.16 66.62 + PW_Basis_K recip2real 11.21 8480 0.00 43.64 + PW_Basis_K real2recip 8.70 6352 0.00 33.87 + Operator nonlocal_pw 0.34 110 0.00 1.34 + Diago_DavSubspace cal_elem 0.40 110 0.00 1.57 + Diago_DavSubspace cal_grad 15.50 82 0.19 60.36 + ElecStatePW psiToRho 2.88 14 0.21 11.20 +------------------------------------------------------------------- + + + START Time : Sun May 3 10:30:11 2026 + FINISH Time : Sun May 3 10:30:40 2026 + TOTAL Time : 29 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/035_PW_15_SO/log_dev_fresh.txt b/tests/01_PW/035_PW_15_SO/log_dev_fresh.txt new file mode 100644 index 00000000000..3ea86664e9d --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_dev_fresh.txt @@ -0,0 +1,116 @@ +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 0f9d7d97e (Thu Apr 30 12:48:20 2026 +0800) + + Sun May 3 10:26:48 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(0.0236263 SEC) : SETUP UNITCELL + DONE(0.0258316 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0370996 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0492078 SEC) : LOCAL POTENTIAL + DONE(0.0792156 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.0792726 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.131711 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 0.62 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 0.15 + DS3 0.00e+00 0.00e+00 8.68e-01 8.72e-01 -1.67677930e+03 4.55612625e+00 1.0730e+00 0.11 + DS4 0.00e+00 0.00e+00 7.46e-01 7.69e-01 -1.67820852e+03 -1.42921557e+00 8.1469e-02 0.09 + DS5 0.00e+00 0.00e+00 7.61e-01 7.70e-01 -1.67833326e+03 -1.24741925e-01 2.3457e-02 0.13 + DS6 0.00e+00 0.00e+00 7.60e-01 7.69e-01 -1.67835572e+03 -2.24548962e-02 3.2082e-03 0.07 + DS7 0.00e+00 0.00e+00 7.42e-01 7.50e-01 -1.67836230e+03 -6.58348573e-03 9.5446e-04 0.14 + DS8 0.00e+00 0.00e+00 7.31e-01 7.38e-01 -1.67836427e+03 -1.97256808e-03 1.3430e-04 0.07 + DS9 0.00e+00 0.00e+00 7.28e-01 7.35e-01 -1.67836476e+03 -4.90206437e-04 5.3510e-05 0.10 + DS10 0.00e+00 0.00e+00 7.32e-01 7.40e-01 -1.67836488e+03 -1.12483839e-04 2.8637e-05 0.06 + DS11 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836501e+03 -1.31704337e-04 1.1546e-05 0.10 + DS12 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -6.86851811e-05 3.3868e-06 0.09 + DS13 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -9.16607600e-06 2.4541e-06 0.12 + DS14 0.00e+00 0.00e+00 7.34e-01 7.41e-01 -1.67836510e+03 -1.34461075e-05 3.5635e-07 0.19 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -10683.9706741759 -396.2387945264 396.2241082742 + -396.2387945264 -10683.9707016515 396.2241283692 + 396.2241082742 396.2241283692 -10626.8786336910 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -10664.940003 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + total 2.16 15 0.14 100.00 + Driver atomic_world 2.16 1 2.16 100.00 + PW_Basis_Sup recip2real 0.04 250 0.00 1.75 + ppcell_vnl init_vnl 0.03 1 0.03 1.18 + Relax_Driver relax_driver 2.08 1 2.08 96.25 + ESolver_KS runner 2.08 1 2.08 95.92 + ESolver_KS_PW before_scf 0.05 1 0.05 2.42 + H_Ewald_pw compute_ewald 0.02 1 0.02 1.14 + Potential cal_veff 0.08 15 0.01 3.87 + PW_Basis_Sup real2recip 0.05 289 0.00 2.43 + PotXC cal_veff 0.08 15 0.01 3.66 + XC_Functional v_xc 0.08 15 0.01 3.65 + ESolver_KS_PW hamilt2rho_single 1.91 14 0.14 88.35 + HSolverPW solve 1.91 14 0.14 88.33 + HSolverPW solve_psik 1.72 28 0.06 79.32 + Diago_DavSubspace diag_once 1.69 28 0.06 78.13 + Diago_DavSubspace first 0.50 28 0.02 23.31 + Operator hPsi 1.28 110 0.01 59.26 + Operator veff_pw 1.22 110 0.01 56.38 + PW_Basis_K recip2real 0.76 8480 0.00 35.27 + PW_Basis_K real2recip 0.61 6352 0.00 28.15 + Operator nonlocal_pw 0.06 110 0.00 2.81 + Nonlocal add_nonlocal_pp 0.03 110 0.00 1.18 + Diago_DavSubspace cal_elem 0.06 110 0.00 2.88 + Diago_DavSubspace diag_zhegvx 0.16 110 0.00 7.35 + Diago_DavSubspace cal_grad 0.98 82 0.01 45.24 + Diago_DavSubspace last 0.03 73 0.00 1.44 + ElecStatePW psiToRho 0.18 14 0.01 8.31 +------------------------------------------------------------------- + + + START Time : Sun May 3 10:26:48 2026 + FINISH Time : Sun May 3 10:26:50 2026 + TOTAL Time : 2 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/035_PW_15_SO/log_dev_np4.txt b/tests/01_PW/035_PW_15_SO/log_dev_np4.txt new file mode 100644 index 00000000000..1dfcab69834 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_dev_np4.txt @@ -0,0 +1,116 @@ +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 0f9d7d97e (Thu Apr 30 12:48:20 2026 +0800) + + Sun May 3 09:53:39 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(0.0332596 SEC) : SETUP UNITCELL + DONE(0.0366598 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0414821 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0673018 SEC) : LOCAL POTENTIAL + DONE(0.102441 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.102543 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.20761 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 0.89 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 0.18 + DS3 0.00e+00 0.00e+00 8.68e-01 8.72e-01 -1.67677930e+03 4.55612625e+00 1.0730e+00 0.09 + DS4 0.00e+00 0.00e+00 7.46e-01 7.69e-01 -1.67820852e+03 -1.42921557e+00 8.1469e-02 0.06 + DS5 0.00e+00 0.00e+00 7.61e-01 7.70e-01 -1.67833326e+03 -1.24741925e-01 2.3457e-02 0.15 + DS6 0.00e+00 0.00e+00 7.60e-01 7.69e-01 -1.67835572e+03 -2.24548962e-02 3.2082e-03 0.12 + DS7 0.00e+00 0.00e+00 7.42e-01 7.50e-01 -1.67836230e+03 -6.58348573e-03 9.5446e-04 1.30 + DS8 0.00e+00 0.00e+00 7.31e-01 7.38e-01 -1.67836427e+03 -1.97256808e-03 1.3430e-04 0.21 + DS9 0.00e+00 0.00e+00 7.28e-01 7.35e-01 -1.67836476e+03 -4.90206436e-04 5.3510e-05 0.30 + DS10 0.00e+00 0.00e+00 7.32e-01 7.40e-01 -1.67836488e+03 -1.12483839e-04 2.8637e-05 0.10 + DS11 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836501e+03 -1.31704337e-04 1.1546e-05 0.10 + DS12 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -6.86851807e-05 3.3868e-06 0.12 + DS13 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -9.16607658e-06 2.4541e-06 0.13 + DS14 0.00e+00 0.00e+00 7.34e-01 7.41e-01 -1.67836510e+03 -1.34461078e-05 3.5635e-07 0.06 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -10683.9706741759 -396.2387945264 396.2241082742 + -396.2387945264 -10683.9707016515 396.2241283692 + 396.2241082742 396.2241283692 -10626.8786336910 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -10664.940003 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + total 4.03 15 0.27 100.00 + Driver atomic_world 4.03 1 4.03 100.00 + PW_Basis_Sup recip2real 0.07 250 0.00 1.78 + Relax_Driver relax_driver 3.92 1 3.92 97.40 + ESolver_KS runner 3.91 1 3.91 97.14 + ESolver_KS_PW before_scf 0.10 1 0.10 2.60 + Potential cal_veff 0.10 15 0.01 2.39 + PW_Basis_Sup real2recip 0.07 289 0.00 1.82 + PotXC cal_veff 0.08 15 0.01 2.03 + XC_Functional v_xc 0.08 15 0.01 2.03 + PSIPrepare initialize_psi 0.09 1 0.09 2.33 + psi_init random_t 0.09 2 0.05 2.28 + psi_init stick_to_pool 0.06 27664 0.00 1.44 + ESolver_KS_PW hamilt2rho_single 3.61 14 0.26 89.64 + HSolverPW solve 3.61 14 0.26 89.63 + HSolverPW solve_psik 3.18 28 0.11 78.99 + Diago_DavSubspace diag_once 3.15 28 0.11 78.27 + Diago_DavSubspace first 0.56 28 0.02 13.87 + Operator hPsi 2.63 110 0.02 65.34 + Operator veff_pw 2.55 110 0.02 63.25 + PW_Basis_K recip2real 1.59 8480 0.00 39.38 + PW_Basis_K real2recip 1.34 6352 0.00 33.32 + Operator nonlocal_pw 0.08 110 0.00 2.01 + Diago_DavSubspace cal_elem 0.11 110 0.00 2.74 + Diago_DavSubspace diag_zhegvx 0.18 110 0.00 4.35 + Diago_DavSubspace cal_grad 2.32 82 0.03 57.68 + ElecStatePW psiToRho 0.42 14 0.03 10.33 + Charge_Mixing get_drho 0.05 14 0.00 1.25 +------------------------------------------------------------------- + + + START Time : Sun May 3 09:53:39 2026 + FINISH Time : Sun May 3 09:53:43 2026 + TOTAL Time : 4 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/035_PW_15_SO/log_dev_v2.txt b/tests/01_PW/035_PW_15_SO/log_dev_v2.txt new file mode 100644 index 00000000000..2f11684fb3e --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_dev_v2.txt @@ -0,0 +1,116 @@ +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 0f9d7d97e (Thu Apr 30 12:48:20 2026 +0800) + + Sun May 3 11:36:34 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(0.030315 SEC) : SETUP UNITCELL + DONE(0.0305225 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0370012 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0565552 SEC) : LOCAL POTENTIAL + DONE(0.0896376 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.0897285 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.275543 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 1.19 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 0.33 + DS3 0.00e+00 0.00e+00 8.68e-01 8.72e-01 -1.67677930e+03 4.55612625e+00 1.0730e+00 0.10 + DS4 0.00e+00 0.00e+00 7.46e-01 7.69e-01 -1.67820852e+03 -1.42921557e+00 8.1469e-02 0.10 + DS5 0.00e+00 0.00e+00 7.61e-01 7.70e-01 -1.67833326e+03 -1.24741925e-01 2.3457e-02 0.12 + DS6 0.00e+00 0.00e+00 7.60e-01 7.69e-01 -1.67835572e+03 -2.24548962e-02 3.2082e-03 0.11 + DS7 0.00e+00 0.00e+00 7.42e-01 7.50e-01 -1.67836230e+03 -6.58348573e-03 9.5446e-04 0.39 + DS8 0.00e+00 0.00e+00 7.31e-01 7.38e-01 -1.67836427e+03 -1.97256807e-03 1.3430e-04 0.16 + DS9 0.00e+00 0.00e+00 7.28e-01 7.35e-01 -1.67836476e+03 -4.90206437e-04 5.3510e-05 0.22 + DS10 0.00e+00 0.00e+00 7.32e-01 7.40e-01 -1.67836488e+03 -1.12483838e-04 2.8637e-05 0.11 + DS11 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836501e+03 -1.31704337e-04 1.1546e-05 0.10 + DS12 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -6.86851807e-05 3.3868e-06 0.11 + DS13 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -9.16607697e-06 2.4541e-06 0.19 + DS14 0.00e+00 0.00e+00 7.34e-01 7.41e-01 -1.67836510e+03 -1.34461071e-05 3.5635e-07 0.13 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -10683.9706741759 -396.2387945264 396.2241082742 + -396.2387945264 -10683.9707016515 396.2241283692 + 396.2241082742 396.2241283692 -10626.8786336910 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -10664.940003 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + total 3.68 15 0.25 100.00 + Driver atomic_world 3.68 1 3.68 100.00 + PW_Basis_Sup recip2real 0.06 250 0.00 1.73 + Relax_Driver relax_driver 3.58 1 3.58 97.51 + ESolver_KS runner 3.55 1 3.55 96.68 + ESolver_KS_PW before_scf 0.19 1 0.19 5.05 + Potential cal_veff 0.11 15 0.01 2.87 + PW_Basis_Sup real2recip 0.05 289 0.00 1.49 + PotXC cal_veff 0.09 15 0.01 2.54 + XC_Functional v_xc 0.09 15 0.01 2.53 + PSIPrepare initialize_psi 0.17 1 0.17 4.64 + psi_init random_t 0.17 2 0.09 4.63 + psi_init stick_to_pool 0.11 27664 0.00 3.01 + ESolver_KS_PW hamilt2rho_single 3.21 14 0.23 87.32 + HSolverPW solve 3.21 14 0.23 87.32 + HSolverPW solve_psik 2.83 28 0.10 77.05 + Diago_DavSubspace diag_once 2.79 28 0.10 75.89 + Diago_DavSubspace first 0.88 28 0.03 24.05 + Operator hPsi 2.30 110 0.02 62.47 + Operator veff_pw 2.22 110 0.02 60.43 + PW_Basis_K recip2real 1.37 8480 0.00 37.39 + PW_Basis_K real2recip 1.17 6352 0.00 31.87 + Operator nonlocal_pw 0.07 110 0.00 1.99 + Diago_DavSubspace cal_elem 0.07 110 0.00 1.88 + Diago_DavSubspace diag_zhegvx 0.18 110 0.00 4.81 + Diago_DavSubspace cal_grad 1.66 82 0.02 45.15 + Diago_DavSubspace last 0.04 73 0.00 1.22 + ElecStatePW psiToRho 0.36 14 0.03 9.78 +------------------------------------------------------------------- + + + START Time : Sun May 3 11:36:34 2026 + FINISH Time : Sun May 3 11:36:38 2026 + TOTAL Time : 4 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/035_PW_15_SO/log_final.txt b/tests/01_PW/035_PW_15_SO/log_final.txt new file mode 100644 index 00000000000..670673e6b62 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_final.txt @@ -0,0 +1,61 @@ + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 11:41:06 2026 +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(1.433e-05 SEC) : SETUP UNITCELL + DONE(0.00395945 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0470998 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.238311 SEC) : LOCAL POTENTIAL + DONE(0.305711 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.305784 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(4.74629 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 119.68 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 30.44 + DS3 0.00e+00 0.00e+00 8.68e-01 8.72e-01 -1.67677930e+03 4.55612625e+00 1.0730e+00 24.67 + DS4 0.00e+00 0.00e+00 7.46e-01 7.69e-01 -1.67820852e+03 -1.42921557e+00 8.1469e-02 22.22 + DS5 0.00e+00 0.00e+00 7.61e-01 7.70e-01 -1.67833326e+03 -1.24741925e-01 2.3457e-02 27.82 + DS6 0.00e+00 0.00e+00 7.60e-01 7.69e-01 -1.67835572e+03 -2.24548962e-02 3.2082e-03 28.66 + DS7 0.00e+00 0.00e+00 7.42e-01 7.50e-01 -1.67836230e+03 -6.58348573e-03 9.5446e-04 16.99 diff --git a/tests/01_PW/035_PW_15_SO/log_pr_correct.txt b/tests/01_PW/035_PW_15_SO/log_pr_correct.txt new file mode 100644 index 00000000000..0b1a32515e5 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_pr_correct.txt @@ -0,0 +1,56 @@ +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 11:32:46 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(9.947e-06 SEC) : SETUP UNITCELL + DONE(0.00671472 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0815573 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.430605 SEC) : LOCAL POTENTIAL + DONE(0.479579 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.479772 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(3.3452 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 106.62 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 0.57 diff --git a/tests/01_PW/035_PW_15_SO/log_pr_fixed.txt b/tests/01_PW/035_PW_15_SO/log_pr_fixed.txt new file mode 100644 index 00000000000..39f8bd62865 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_pr_fixed.txt @@ -0,0 +1,118 @@ +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 09:57:21 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(1.2009e-05 SEC) : SETUP UNITCELL + DONE(0.00110645 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0045507 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0123942 SEC) : LOCAL POTENTIAL + DONE(0.0494793 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.0495481 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.339499 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 0.85 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 0.25 + DS3 0.00e+00 0.00e+00 8.68e-01 8.72e-01 -1.67677930e+03 4.55612625e+00 1.0730e+00 0.10 + DS4 0.00e+00 0.00e+00 7.46e-01 7.69e-01 -1.67820852e+03 -1.42921557e+00 8.1469e-02 0.10 + DS5 0.00e+00 0.00e+00 7.61e-01 7.70e-01 -1.67833326e+03 -1.24741925e-01 2.3457e-02 0.11 + DS6 0.00e+00 0.00e+00 7.60e-01 7.69e-01 -1.67835572e+03 -2.24548962e-02 3.2082e-03 0.10 + DS7 0.00e+00 0.00e+00 7.42e-01 7.50e-01 -1.67836230e+03 -6.58348573e-03 9.5446e-04 0.12 + DS8 0.00e+00 0.00e+00 7.31e-01 7.38e-01 -1.67836427e+03 -1.97256808e-03 1.3430e-04 0.13 + DS9 0.00e+00 0.00e+00 7.28e-01 7.35e-01 -1.67836476e+03 -4.90206436e-04 5.3510e-05 0.13 + DS10 0.00e+00 0.00e+00 7.32e-01 7.40e-01 -1.67836488e+03 -1.12483838e-04 2.8637e-05 0.10 + DS11 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836501e+03 -1.31704337e-04 1.1546e-05 0.13 + DS12 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -6.86851806e-05 3.3868e-06 0.15 + DS13 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -9.16607677e-06 2.4541e-06 0.12 + DS14 0.00e+00 0.00e+00 7.34e-01 7.41e-01 -1.67836510e+03 -1.34461075e-05 3.5635e-07 0.13 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -10677.0852150830 -396.2017451132 396.2491608088 + -396.2017451132 -10680.4013171834 396.1655869911 + 396.2491608088 396.1655869911 -10619.9881143378 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -10659.158216 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + Driver atomic_world 2.90 1 2.90 100.00 + total 2.88 14 0.21 99.31 + PW_Basis_Sup recip2real 0.10 250 0.00 3.52 + Relax_Driver relax_driver 2.83 1 2.83 97.56 + ESolver_KS runner 2.81 1 2.81 96.80 + ESolver_KS_PW before_scf 0.29 1 0.29 9.98 + Potential init_pot 0.12 1 0.12 3.98 + Potential cal_veff 0.20 15 0.01 6.84 + PW_Basis_Sup real2recip 0.10 289 0.00 3.58 + PotXC cal_veff 0.18 15 0.01 6.10 + XC_Functional v_xc 0.18 15 0.01 6.09 + PSIPrepare initialize_psi 0.17 1 0.17 5.97 + psi_init random_t 0.17 2 0.08 5.85 + psi_init stick_to_pool 0.11 27664 0.00 3.88 + ESolver_KS_PW hamilt2rho_single 2.37 14 0.17 81.73 + HSolverPW solve 2.37 14 0.17 81.73 + HSolverPW solve_psik 2.00 28 0.07 68.94 + Diago_DavSubspace diag_once 1.98 28 0.07 68.07 + Diago_DavSubspace first 0.54 28 0.02 18.63 + Operator hPsi 1.52 110 0.01 52.20 + Operator veff_pw 1.45 110 0.01 49.77 + PW_Basis_K recip2real 1.02 8480 0.00 35.19 + PW_Basis_K real2recip 0.73 6352 0.00 25.22 + Operator nonlocal_pw 0.07 110 0.00 2.37 + Nonlocal add_nonlocal_pp 0.03 110 0.00 1.07 + Diago_DavSubspace cal_elem 0.06 110 0.00 2.10 + Diago_DavSubspace diag_zhegvx 0.17 110 0.00 5.96 + Diago_DavSubspace cal_grad 1.19 82 0.01 41.08 + Diago_DavSubspace last 0.05 73 0.00 1.65 + ElecStatePW psiToRho 0.34 14 0.02 11.82 +------------------------------------------------------------------- + + + START Time : Sun May 3 09:57:21 2026 + FINISH Time : Sun May 3 09:57:27 2026 + TOTAL Time : 6 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/035_PW_15_SO/log_pr_fresh.txt b/tests/01_PW/035_PW_15_SO/log_pr_fresh.txt new file mode 100644 index 00000000000..b7de0605028 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_pr_fresh.txt @@ -0,0 +1,115 @@ +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 10:26:50 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(1.0535e-05 SEC) : SETUP UNITCELL + DONE(0.00131561 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0131958 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0224664 SEC) : LOCAL POTENTIAL + DONE(0.0563405 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.0564181 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.0978204 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 0.64 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 0.12 + DS3 0.00e+00 0.00e+00 8.68e-01 8.72e-01 -1.67677930e+03 4.55612625e+00 1.0730e+00 0.09 + DS4 0.00e+00 0.00e+00 7.46e-01 7.69e-01 -1.67820852e+03 -1.42921557e+00 8.1469e-02 0.18 + DS5 0.00e+00 0.00e+00 7.61e-01 7.70e-01 -1.67833326e+03 -1.24741925e-01 2.3457e-02 0.33 + DS6 0.00e+00 0.00e+00 7.60e-01 7.69e-01 -1.67835572e+03 -2.24548962e-02 3.2082e-03 0.24 + DS7 0.00e+00 0.00e+00 7.42e-01 7.50e-01 -1.67836230e+03 -6.58348573e-03 9.5446e-04 0.13 + DS8 0.00e+00 0.00e+00 7.31e-01 7.38e-01 -1.67836427e+03 -1.97256807e-03 1.3430e-04 0.11 + DS9 0.00e+00 0.00e+00 7.28e-01 7.35e-01 -1.67836476e+03 -4.90206437e-04 5.3510e-05 0.15 + DS10 0.00e+00 0.00e+00 7.32e-01 7.40e-01 -1.67836488e+03 -1.12483838e-04 2.8637e-05 0.07 + DS11 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836501e+03 -1.31704337e-04 1.1546e-05 0.15 + DS12 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -6.86851807e-05 3.3868e-06 0.09 + DS13 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -9.16607677e-06 2.4541e-06 0.09 + DS14 0.00e+00 0.00e+00 7.34e-01 7.41e-01 -1.67836510e+03 -1.34461073e-05 3.5635e-07 0.14 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -10677.0852150830 -396.2017451132 396.2491608088 + -396.2017451132 -10680.4013171834 396.1655869911 + 396.2491608088 396.1655869911 -10619.9881143378 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -10659.158216 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + Driver atomic_world 2.68 1 2.68 100.00 + total 2.66 14 0.19 98.99 + PW_Basis_Sup recip2real 0.05 250 0.00 1.86 + ppcell_vnl init_vnl 0.03 1 0.03 1.11 + Relax_Driver relax_driver 2.60 1 2.60 96.82 + ESolver_KS runner 2.58 1 2.58 96.13 + ESolver_KS_PW before_scf 0.04 1 0.04 1.54 + Potential cal_veff 0.09 15 0.01 3.35 + PW_Basis_Sup real2recip 0.05 289 0.00 1.94 + PotXC cal_veff 0.08 15 0.01 3.03 + XC_Functional v_xc 0.08 15 0.01 3.01 + PSIPrepare initialize_psi 0.04 1 0.04 1.40 + psi_init random_t 0.04 2 0.02 1.39 + ESolver_KS_PW hamilt2rho_single 2.40 14 0.17 89.52 + HSolverPW solve 2.40 14 0.17 89.52 + HSolverPW solve_psik 2.18 28 0.08 81.22 + Diago_DavSubspace diag_once 2.14 28 0.08 79.91 + Diago_DavSubspace first 0.63 28 0.02 23.36 + Operator hPsi 1.62 110 0.01 60.26 + Operator veff_pw 1.56 110 0.01 58.04 + PW_Basis_K recip2real 0.88 8480 0.00 32.85 + PW_Basis_K real2recip 0.85 6352 0.00 31.71 + Operator nonlocal_pw 0.06 110 0.00 2.15 + Diago_DavSubspace cal_elem 0.08 110 0.00 2.86 + Diago_DavSubspace diag_zhegvx 0.16 110 0.00 5.91 + Diago_DavSubspace cal_grad 1.30 82 0.02 48.42 + ElecStatePW psiToRho 0.21 14 0.01 7.65 +------------------------------------------------------------------- + + + START Time : Sun May 3 10:26:50 2026 + FINISH Time : Sun May 3 10:26:53 2026 + TOTAL Time : 3 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/035_PW_15_SO/log_pr_np4.txt b/tests/01_PW/035_PW_15_SO/log_pr_np4.txt new file mode 100644 index 00000000000..9c5a7e6b7eb --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_pr_np4.txt @@ -0,0 +1,117 @@ + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 55690612c (Sat May 2 13:10:55 2026 +0800) + + Sun May 3 09:54:29 2026 +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(9.938e-06 SEC) : SETUP UNITCELL + DONE(0.00258591 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.00815335 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0209517 SEC) : LOCAL POTENTIAL + DONE(0.0517217 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.0518776 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.142905 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 0.80 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 0.19 + DS3 0.00e+00 0.00e+00 8.68e-01 8.72e-01 -1.67677930e+03 4.55612625e+00 1.0730e+00 0.09 + DS4 0.00e+00 0.00e+00 7.46e-01 7.69e-01 -1.67820852e+03 -1.42921557e+00 8.1469e-02 0.11 + DS5 0.00e+00 0.00e+00 7.61e-01 7.70e-01 -1.67833326e+03 -1.24741925e-01 2.3457e-02 0.19 + DS6 0.00e+00 0.00e+00 7.60e-01 7.69e-01 -1.67835572e+03 -2.24548962e-02 3.2082e-03 0.11 + DS7 0.00e+00 0.00e+00 7.42e-01 7.50e-01 -1.67836230e+03 -6.58348573e-03 9.5446e-04 0.17 + DS8 0.00e+00 0.00e+00 7.31e-01 7.38e-01 -1.67836427e+03 -1.97256808e-03 1.3430e-04 0.11 + DS9 0.00e+00 0.00e+00 7.28e-01 7.35e-01 -1.67836476e+03 -4.90206437e-04 5.3510e-05 0.08 + DS10 0.00e+00 0.00e+00 7.32e-01 7.40e-01 -1.67836488e+03 -1.12483838e-04 2.8637e-05 0.12 + DS11 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836501e+03 -1.31704337e-04 1.1546e-05 0.10 + DS12 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -6.86851813e-05 3.3868e-06 0.13 + DS13 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -9.16607619e-06 2.4541e-06 0.33 + DS14 0.00e+00 0.00e+00 7.34e-01 7.41e-01 -1.67836510e+03 -1.34461073e-05 3.5635e-07 0.30 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -10677.0852150830 -396.2017451132 396.2491608088 + -396.2017451132 -10680.4013171834 396.1655869911 + 396.2491608088 396.1655869911 -10619.9881143378 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -10659.158216 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + Driver atomic_world 3.06 1 3.06 100.00 + total 3.03 14 0.22 99.14 + PW_Basis_Sup recip2real 0.09 250 0.00 3.07 + Relax_Driver relax_driver 2.98 1 2.98 97.39 + ESolver_KS runner 2.95 1 2.95 96.49 + ESolver_KS_PW before_scf 0.09 1 0.09 2.97 + Potential cal_veff 0.16 15 0.01 5.10 + PW_Basis_Sup real2recip 0.08 289 0.00 2.72 + PotXC cal_veff 0.14 15 0.01 4.57 + XC_Functional v_xc 0.14 15 0.01 4.56 + PSIPrepare initialize_psi 0.07 1 0.07 2.17 + psi_init random_t 0.07 2 0.03 2.15 + psi_init stick_to_pool 0.05 27664 0.00 1.59 + ESolver_KS_PW hamilt2rho_single 2.64 14 0.19 86.51 + HSolverPW solve 2.64 14 0.19 86.50 + HSolverPW solve_psik 2.35 28 0.08 77.01 + Diago_DavSubspace diag_once 2.33 28 0.08 76.31 + Diago_DavSubspace first 0.66 28 0.02 21.68 + Operator hPsi 1.83 110 0.02 59.89 + Operator veff_pw 1.76 110 0.02 57.63 + PW_Basis_K recip2real 1.12 8480 0.00 36.48 + PW_Basis_K real2recip 0.88 6352 0.00 28.83 + Operator nonlocal_pw 0.07 110 0.00 2.20 + Nonlocal add_nonlocal_pp 0.03 110 0.00 1.00 + Diago_DavSubspace cal_elem 0.08 110 0.00 2.57 + Diago_DavSubspace diag_zhegvx 0.19 110 0.00 6.07 + Diago_DavSubspace cal_grad 1.42 82 0.02 46.56 + Diago_DavSubspace last 0.03 73 0.00 1.06 + ElecStatePW psiToRho 0.27 14 0.02 8.68 +------------------------------------------------------------------- + + + START Time : Sun May 3 09:54:29 2026 + FINISH Time : Sun May 3 09:54:32 2026 + TOTAL Time : 3 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/035_PW_15_SO/log_v2.txt b/tests/01_PW/035_PW_15_SO/log_v2.txt new file mode 100644 index 00000000000..d9a1d0acec2 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/log_v2.txt @@ -0,0 +1,120 @@ +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 11:35:24 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(8.644e-06 SEC) : SETUP UNITCELL + DONE(0.0042798 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + As 1 + Ga 1 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0141793 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0260913 SEC) : LOCAL POTENTIAL + DONE(0.0581521 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.266724 + DONE(0.05822 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.11065 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 0.00e+00 0.00e+00 2.00e+00 2.00e+00 -1.59867930e+03 0.00000000e+00 3.1042e+01 0.79 + DS2 0.00e+00 0.00e+00 9.75e-01 1.06e+00 -1.68133543e+03 -8.26561268e+01 3.8628e+00 0.20 + DS3 0.00e+00 0.00e+00 8.68e-01 8.72e-01 -1.67677930e+03 4.55612625e+00 1.0730e+00 0.12 + DS4 0.00e+00 0.00e+00 7.46e-01 7.69e-01 -1.67820852e+03 -1.42921557e+00 8.1469e-02 0.14 + DS5 0.00e+00 0.00e+00 7.61e-01 7.70e-01 -1.67833326e+03 -1.24741925e-01 2.3457e-02 0.17 + DS6 0.00e+00 0.00e+00 7.60e-01 7.69e-01 -1.67835572e+03 -2.24548962e-02 3.2082e-03 0.14 + DS7 0.00e+00 0.00e+00 7.42e-01 7.50e-01 -1.67836230e+03 -6.58348573e-03 9.5446e-04 0.15 + DS8 0.00e+00 0.00e+00 7.31e-01 7.38e-01 -1.67836427e+03 -1.97256807e-03 1.3430e-04 0.16 + DS9 0.00e+00 0.00e+00 7.28e-01 7.35e-01 -1.67836476e+03 -4.90206436e-04 5.3510e-05 0.11 + DS10 0.00e+00 0.00e+00 7.32e-01 7.40e-01 -1.67836488e+03 -1.12483839e-04 2.8637e-05 0.28 + DS11 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836501e+03 -1.31704337e-04 1.1546e-05 0.12 + DS12 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -6.86851807e-05 3.3868e-06 0.19 + DS13 0.00e+00 0.00e+00 7.33e-01 7.41e-01 -1.67836508e+03 -9.16607619e-06 2.4541e-06 0.18 + DS14 0.00e+00 0.00e+00 7.34e-01 7.41e-01 -1.67836510e+03 -1.34461080e-05 3.5635e-07 0.38 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -10677.0852150830 -396.2017451131 396.2491608088 + -396.2017451131 -10680.4013171834 396.1655869911 + 396.2491608088 396.1655869911 -10619.9881143378 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -10659.158216 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + Driver atomic_world 3.28 1 3.28 100.00 + total 3.26 14 0.23 99.42 + PW_Basis_Sup recip2real 0.09 250 0.00 2.79 + Relax_Driver relax_driver 3.20 1 3.20 97.61 + ESolver_KS runner 3.18 1 3.18 96.80 + ESolver_KS_PW before_scf 0.05 1 0.05 1.59 + Potential cal_veff 0.17 15 0.01 5.25 + PW_Basis_Sup real2recip 0.10 289 0.00 3.15 + PotXC cal_veff 0.15 15 0.01 4.52 + XC_Functional v_xc 0.15 15 0.01 4.50 + PSIPrepare initialize_psi 0.04 1 0.04 1.16 + psi_init random_t 0.04 2 0.02 1.15 + ESolver_KS_PW hamilt2rho_single 2.86 14 0.20 87.19 + HSolverPW solve 2.86 14 0.20 87.18 + HSolverPW solve_psik 2.54 28 0.09 77.57 + Diago_DavSubspace diag_once 2.49 28 0.09 75.99 + Diago_DavSubspace first 0.76 28 0.03 23.02 + Operator hPsi 1.80 110 0.02 54.97 + Operator veff_pw 1.72 110 0.02 52.44 + PW_Basis_K recip2real 1.09 8480 0.00 33.16 + PW_Basis_K real2recip 0.90 6352 0.00 27.42 + Operator nonlocal_pw 0.08 110 0.00 2.48 + Nonlocal add_nonlocal_pp 0.04 110 0.00 1.28 + Diago_DavSubspace cal_elem 0.09 110 0.00 2.70 + Diago_DavSubspace diag_zhegvx 0.20 110 0.00 6.11 + Diago_DavSubspace cal_grad 1.47 82 0.02 44.65 + Diago_DavSubspace last 0.04 73 0.00 1.12 + ElecStatePW psiToRho 0.30 14 0.02 9.17 + Charge_Mixing mix_rho 0.06 13 0.00 1.75 + Charge_Mixing mix_rho_recip 0.06 13 0.00 1.71 + Broyden_Mixing tem_cal_coef 0.04 13 0.00 1.14 + Charge_Mixing recip_hartree 0.04 136 0.00 1.11 +------------------------------------------------------------------- + + + START Time : Sun May 3 11:35:24 2026 + FINISH Time : Sun May 3 11:35:27 2026 + TOTAL Time : 3 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/035_PW_15_SO/result_all_fix.out b/tests/01_PW/035_PW_15_SO/result_all_fix.out new file mode 100644 index 00000000000..1b437968bef --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/result_all_fix.out @@ -0,0 +1,5 @@ +etotref -1678.3650981686610066 +etotperatomref -839.1825490843 +totalforceref 1.740848 +totalstressref 34372.194072 +totaltimeref 25.68 diff --git a/tests/01_PW/035_PW_15_SO/result_dev_np4.out b/tests/01_PW/035_PW_15_SO/result_dev_np4.out new file mode 100644 index 00000000000..a32b38e9299 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/result_dev_np4.out @@ -0,0 +1,5 @@ +etotref -1678.3650981686614614 +etotperatomref -839.1825490843 +totalforceref 1.739332 +totalstressref 34372.194072 +totaltimeref 4.03 diff --git a/tests/01_PW/035_PW_15_SO/result_final.out b/tests/01_PW/035_PW_15_SO/result_final.out new file mode 100644 index 00000000000..797117b6d0c --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/result_final.out @@ -0,0 +1,5 @@ +etotref +etotperatomref +totalforceref 0.0 +totalstressref 0.0 +totaltimeref diff --git a/tests/01_PW/035_PW_15_SO/result_pr_fixed.out b/tests/01_PW/035_PW_15_SO/result_pr_fixed.out new file mode 100644 index 00000000000..793630ed73c --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/result_pr_fixed.out @@ -0,0 +1,5 @@ +etotref -1678.3650981686610066 +etotperatomref -839.1825490843 +totalforceref 1.740848 +totalstressref 34354.707632 +totaltimeref 2.90 diff --git a/tests/01_PW/035_PW_15_SO/result_pr_np4.out b/tests/01_PW/035_PW_15_SO/result_pr_np4.out new file mode 100644 index 00000000000..41410ff42be --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/result_pr_np4.out @@ -0,0 +1,5 @@ +etotref -1678.3650981686610066 +etotperatomref -839.1825490843 +totalforceref 1.740848 +totalstressref 34354.707632 +totaltimeref 3.06 diff --git a/tests/01_PW/035_PW_15_SO/result_v2.out b/tests/01_PW/035_PW_15_SO/result_v2.out new file mode 100644 index 00000000000..446d7141fb3 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/result_v2.out @@ -0,0 +1,5 @@ +etotref -1678.3650981686614614 +etotperatomref -839.1825490843 +totalforceref 1.740848 +totalstressref 34354.707632 +totaltimeref 3.28 diff --git a/tests/01_PW/035_PW_15_SO/result_v2_check.out b/tests/01_PW/035_PW_15_SO/result_v2_check.out new file mode 100644 index 00000000000..0becf5e1a82 --- /dev/null +++ b/tests/01_PW/035_PW_15_SO/result_v2_check.out @@ -0,0 +1,5 @@ +etotref -1678.3650981686612340 +etotperatomref -839.1825490843 +totalforceref 1.739332 +totalstressref 34372.194072 +totaltimeref 3.68 diff --git a/tests/01_PW/099_PW_DJ_SO/log_dev_np1.txt b/tests/01_PW/099_PW_DJ_SO/log_dev_np1.txt new file mode 100644 index 00000000000..b99d0cca01c --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/log_dev_np1.txt @@ -0,0 +1,123 @@ +Info: Local MPI proc number: 1,OpenMP thread number: 1,Total thread number: 1,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 0f9d7d97e (Thu Apr 30 12:48:20 2026 +0800) + + Sun May 3 09:53:18 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 8 for Fe: [Ar] 3d6 4s2 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(0.0392222 SEC) : SETUP UNITCELL + DONE(0.0393218 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 1 1 1 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM + Fe 2 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0422761 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0474001 SEC) : LOCAL POTENTIAL + DONE(0.0583962 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.361328 + DONE(0.0934921 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.115037 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 -3.50e-03 -3.47e-03 -3.47e-03 1.94e-01 -5.93364556e+03 0.00000000e+00 6.0771e+01 0.17 + DS2 -1.98e-02 -1.99e-02 -1.99e-02 5.37e-02 -5.61422656e+03 3.19418997e+02 2.7921e+01 0.10 + DS3 3.63e-01 3.63e-01 3.63e-01 6.30e-01 -5.66083219e+03 -4.66056224e+01 9.3630e-01 0.12 + DS4 6.56e-01 6.55e-01 6.55e-01 1.13e+00 -5.66314277e+03 -2.31058782e+00 9.7970e-01 0.09 + DS5 1.13e+00 1.13e+00 1.13e+00 1.96e+00 -5.66288810e+03 2.54671960e-01 8.4319e-01 0.09 + DS6 1.53e+00 1.53e+00 1.53e+00 2.66e+00 -5.65330287e+03 9.58522743e+00 6.5627e-01 0.09 + DS7 3.47e+00 3.46e+00 3.46e+00 6.00e+00 -5.66100286e+03 -7.69999107e+00 3.6125e-01 0.12 + DS8 4.01e+00 3.99e+00 3.99e+00 6.93e+00 -5.66254900e+03 -1.54613591e+00 3.2290e-01 0.10 + DS9 4.09e+00 4.06e+00 4.06e+00 7.05e+00 -5.66250300e+03 4.59968832e-02 2.6478e-01 0.09 + DS10 4.08e+00 4.04e+00 4.04e+00 7.03e+00 -5.66203009e+03 4.72914699e-01 1.4532e-01 0.10 + DS11 4.29e+00 4.26e+00 4.26e+00 7.40e+00 -5.66220039e+03 -1.70299816e-01 2.5843e-02 0.10 + DS12 4.63e+00 4.59e+00 4.59e+00 7.98e+00 -5.66227859e+03 -7.81989233e-02 6.0138e-02 0.09 + DS13 4.64e+00 4.60e+00 4.60e+00 8.00e+00 -5.66242306e+03 -1.44469498e-01 2.8920e-02 0.10 + DS14 4.64e+00 4.60e+00 4.60e+00 8.00e+00 -5.66243570e+03 -1.26406010e-02 2.4667e-02 0.09 + DS15 4.65e+00 4.61e+00 4.61e+00 8.01e+00 -5.66242785e+03 7.85163952e-03 1.4419e-02 0.07 + DS16 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66242977e+03 -1.92022552e-03 6.7341e-03 0.12 + DS17 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66241657e+03 1.31921202e-02 4.8540e-03 0.12 + DS18 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239501e+03 2.15685563e-02 4.0954e-03 0.09 + DS19 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66238415e+03 1.08550899e-02 1.5128e-03 0.10 + SCF restart after this step! + DS20 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66286314e+03 -4.78985858e-01 1.4389e-04 0.10 + DS21 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239196e+03 4.71176545e-01 6.4087e-05 0.09 + DS22 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66238948e+03 2.48237415e-03 5.5053e-06 0.08 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -31999.2856202446 64.7867976142 64.7955475894 + 64.7867976142 -33600.9735805777 560.6550312603 + 64.7955475894 560.6550312603 -33600.9824691361 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -33067.080557 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + total 2.37 15 0.16 100.00 + Driver atomic_world 2.37 1 2.37 100.00 + PW_Basis_Sup recip2real 0.04 397 0.00 1.70 + PSIPrepare prepare_init 0.03 1 0.03 1.47 + psi_init_atomic tabulate 0.03 1 0.03 1.47 + Relax_Driver relax_driver 2.28 1 2.28 95.97 + ESolver_KS runner 2.24 1 2.24 94.41 + Potential cal_veff 0.18 23 0.01 7.74 + PW_Basis_Sup real2recip 0.04 463 0.00 1.79 + PotXC cal_veff 0.18 23 0.01 7.42 + XC_Functional v_xc 0.19 25 0.01 7.82 + ESolver_KS_PW hamilt2rho_single 1.99 22 0.09 83.75 + HSolverPW solve 1.99 22 0.09 83.74 + HSolverPW solve_psik 1.68 44 0.04 70.83 + Diago_DavSubspace diag_once 1.68 44 0.04 70.68 + Diago_DavSubspace first 0.64 44 0.01 26.98 + Operator hPsi 1.40 187 0.01 59.08 + Operator veff_pw 1.30 187 0.01 54.94 + PW_Basis_K recip2real 0.80 11858 0.00 33.88 + PW_Basis_K real2recip 0.53 8338 0.00 22.30 + Operator nonlocal_pw 0.08 187 0.00 3.39 + Nonlocal add_nonlocal_pp 0.06 187 0.00 2.47 + Diago_DavSubspace cal_elem 0.03 187 0.00 1.30 + Diago_DavSubspace diag_zhegvx 0.19 187 0.00 7.87 + Diago_DavSubspace cal_grad 0.82 143 0.01 34.33 + Diago_DavSubspace last 0.03 86 0.00 1.45 + ElecStatePW psiToRho 0.30 22 0.01 12.59 + Stress_PW cal_stress 0.02 1 0.02 1.01 +------------------------------------------------------------------- + + + START Time : Sun May 3 09:53:18 2026 + FINISH Time : Sun May 3 09:53:20 2026 + TOTAL Time : 2 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/099_PW_DJ_SO/log_dev_np4.txt b/tests/01_PW/099_PW_DJ_SO/log_dev_np4.txt new file mode 100644 index 00000000000..3447cc7fe57 --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/log_dev_np4.txt @@ -0,0 +1,123 @@ +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 0f9d7d97e (Thu Apr 30 12:48:20 2026 +0800) + + Sun May 3 09:52:57 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 8 for Fe: [Ar] 3d6 4s2 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(0.0337128 SEC) : SETUP UNITCELL + DONE(0.0346183 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM + Fe 2 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0401519 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.072056 SEC) : LOCAL POTENTIAL + DONE(0.0793844 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.0878906 + DONE(0.107285 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.151379 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 -3.50e-03 -3.47e-03 -3.47e-03 1.94e-01 -5.93364516e+03 0.00000000e+00 6.0771e+01 0.48 + DS2 -1.98e-02 -1.99e-02 -1.99e-02 5.38e-02 -5.61422933e+03 3.19415829e+02 2.7921e+01 0.22 + DS3 3.64e-01 3.62e-01 3.63e-01 6.30e-01 -5.66083209e+03 -4.66027629e+01 9.3631e-01 0.12 + DS4 6.56e-01 6.54e-01 6.54e-01 1.13e+00 -5.66314237e+03 -2.31027985e+00 9.7969e-01 0.13 + DS5 1.14e+00 1.13e+00 1.13e+00 1.96e+00 -5.66288782e+03 2.54552547e-01 8.4317e-01 0.12 + DS6 1.54e+00 1.53e+00 1.53e+00 2.66e+00 -5.65330389e+03 9.58392934e+00 6.5624e-01 0.09 + DS7 3.48e+00 3.45e+00 3.45e+00 6.00e+00 -5.66100392e+03 -7.70002981e+00 3.6100e-01 0.16 + DS8 4.02e+00 3.98e+00 3.98e+00 6.93e+00 -5.66255040e+03 -1.54648025e+00 3.2295e-01 0.17 + DS9 4.09e+00 4.05e+00 4.05e+00 7.05e+00 -5.66250306e+03 4.73376558e-02 2.6493e-01 0.13 + DS10 4.08e+00 4.04e+00 4.04e+00 7.03e+00 -5.66202969e+03 4.73376077e-01 1.4527e-01 0.13 + DS11 4.29e+00 4.25e+00 4.25e+00 7.40e+00 -5.66220119e+03 -1.71501950e-01 2.5845e-02 0.12 + DS12 4.64e+00 4.59e+00 4.59e+00 7.98e+00 -5.66227828e+03 -7.70963311e-02 6.0170e-02 0.13 + DS13 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66242325e+03 -1.44967260e-01 2.8907e-02 0.13 + DS14 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66243546e+03 -1.22093845e-02 2.4687e-02 0.11 + DS15 4.65e+00 4.61e+00 4.61e+00 8.01e+00 -5.66242798e+03 7.48036237e-03 1.4412e-02 0.10 + DS16 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66242955e+03 -1.56605404e-03 6.6989e-03 0.12 + DS17 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66241570e+03 1.38475343e-02 4.8441e-03 0.06 + DS18 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239561e+03 2.00926869e-02 4.0264e-03 0.10 + DS19 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66238472e+03 1.08890233e-02 1.3802e-03 0.07 + SCF restart after this step! + DS20 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66288453e+03 -4.99809949e-01 1.4626e-04 0.11 + DS21 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239029e+03 4.94239545e-01 3.0808e-04 0.11 + DS22 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239089e+03 -5.99121242e-04 7.3385e-06 0.10 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -31999.6098887833 65.0629329717 64.9895792749 + 65.0629329717 -33601.2027303285 560.2485373745 + 64.9895792749 560.2485373745 -33601.1924915668 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -33067.335037 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + total 3.18 15 0.21 100.00 + Driver atomic_world 3.18 1 3.18 100.00 + Charge atomic_rho 0.03 2 0.02 1.07 + PW_Basis_Sup recip2real 0.07 397 0.00 2.06 + Relax_Driver relax_driver 3.07 1 3.07 96.59 + ESolver_KS runner 3.04 1 3.04 95.62 + ESolver_KS_PW before_scf 0.04 1 0.04 1.38 + Potential cal_veff 0.12 23 0.01 3.93 + PW_Basis_Sup real2recip 0.08 463 0.00 2.63 + PotXC cal_veff 0.12 23 0.01 3.73 + XC_Functional v_xc 0.12 25 0.00 3.81 + ESolver_KS_PW hamilt2rho_single 2.80 22 0.13 88.32 + HSolverPW solve 2.80 22 0.13 88.30 + HSolverPW solve_psik 2.38 44 0.05 75.10 + Diago_DavSubspace diag_once 2.37 44 0.05 74.74 + Diago_DavSubspace first 0.87 44 0.02 27.27 + Operator hPsi 1.87 197 0.01 58.80 + Operator veff_pw 1.76 197 0.01 55.29 + PW_Basis_K recip2real 1.19 11904 0.00 37.44 + PW_Basis_K real2recip 0.93 8384 0.00 29.21 + Operator nonlocal_pw 0.06 197 0.00 2.02 + Operator OnsiteProjPW 0.05 197 0.00 1.44 + OnsiteProj overlap 0.05 241 0.00 1.54 + Onsite_Proj_tools cal_becp 0.05 245 0.00 1.60 + Diago_DavSubspace cal_elem 0.07 197 0.00 2.23 + Diago_DavSubspace diag_zhegvx 0.29 197 0.00 9.01 + Diago_DavSubspace cal_grad 1.19 153 0.01 37.36 + ElecStatePW psiToRho 0.41 22 0.02 12.80 +------------------------------------------------------------------- + + + START Time : Sun May 3 09:52:57 2026 + FINISH Time : Sun May 3 09:53:00 2026 + TOTAL Time : 3 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/099_PW_DJ_SO/log_final.txt b/tests/01_PW/099_PW_DJ_SO/log_final.txt new file mode 100644 index 00000000000..5683c705208 --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/log_final.txt @@ -0,0 +1,70 @@ + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 11:41:03 2026 +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 8 for Fe: [Ar] 3d6 4s2 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(9.884e-06 SEC) : SETUP UNITCELL + DONE(0.00292437 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + Fe 2 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.00712453 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0337331 SEC) : LOCAL POTENTIAL + DONE(0.0423397 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.0878906 + DONE(0.066817 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.122505 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 -3.50e-03 -3.47e-03 -3.47e-03 1.94e-01 -5.93364516e+03 0.00000000e+00 6.0771e+01 0.37 + DS2 -1.98e-02 -1.99e-02 -1.99e-02 5.38e-02 -5.61422933e+03 3.19415829e+02 2.7921e+01 0.19 + DS3 3.64e-01 3.62e-01 3.63e-01 6.30e-01 -5.66083209e+03 -4.66027629e+01 9.3631e-01 0.50 + DS4 6.56e-01 6.54e-01 6.54e-01 1.13e+00 -5.66314237e+03 -2.31027985e+00 9.7969e-01 0.20 + DS5 1.14e+00 1.13e+00 1.13e+00 1.96e+00 -5.66288782e+03 2.54552547e-01 8.4317e-01 0.15 + DS6 1.54e+00 1.53e+00 1.53e+00 2.66e+00 -5.65330389e+03 9.58392934e+00 6.5624e-01 0.20 + DS7 3.48e+00 3.45e+00 3.45e+00 6.00e+00 -5.66100392e+03 -7.70002981e+00 3.6100e-01 0.52 + DS8 4.02e+00 3.98e+00 3.98e+00 6.93e+00 -5.66255040e+03 -1.54648025e+00 3.2295e-01 15.43 + DS9 4.09e+00 4.05e+00 4.05e+00 7.05e+00 -5.66250306e+03 4.73376558e-02 2.6493e-01 24.33 + DS10 4.08e+00 4.04e+00 4.04e+00 7.03e+00 -5.66202969e+03 4.73376077e-01 1.4527e-01 22.20 + DS11 4.29e+00 4.25e+00 4.25e+00 7.40e+00 -5.66220119e+03 -1.71501951e-01 2.5845e-02 26.38 + DS12 4.64e+00 4.59e+00 4.59e+00 7.98e+00 -5.66227828e+03 -7.70963305e-02 6.0170e-02 29.10 + DS13 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66242325e+03 -1.44967260e-01 2.8907e-02 29.12 + DS14 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66243546e+03 -1.22093846e-02 2.4687e-02 23.60 + DS15 4.65e+00 4.61e+00 4.61e+00 8.01e+00 -5.66242798e+03 7.48036259e-03 1.4412e-02 26.20 + DS16 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66242955e+03 -1.56605404e-03 6.6989e-03 28.97 + DS17 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66241570e+03 1.38475345e-02 4.8441e-03 23.27 + DS18 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239561e+03 2.00926873e-02 4.0264e-03 26.64 diff --git a/tests/01_PW/099_PW_DJ_SO/log_pr_correct.txt b/tests/01_PW/099_PW_DJ_SO/log_pr_correct.txt new file mode 100644 index 00000000000..5a00ae0ec02 --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/log_pr_correct.txt @@ -0,0 +1,60 @@ + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 11:32:44 2026 +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 8 for Fe: [Ar] 3d6 4s2 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(9.998e-06 SEC) : SETUP UNITCELL + DONE(0.000129725 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + Fe 2 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.00638782 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0215191 SEC) : LOCAL POTENTIAL + DONE(0.0357419 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.0878906 + DONE(0.0607546 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.0915078 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 -3.50e-03 -3.47e-03 -3.47e-03 1.94e-01 -5.93364516e+03 0.00000000e+00 6.0771e+01 0.92 + DS2 -1.98e-02 -1.99e-02 -1.99e-02 5.38e-02 -5.61422933e+03 3.19415829e+02 2.7921e+01 0.56 + DS3 3.64e-01 3.62e-01 3.63e-01 6.30e-01 -5.66083209e+03 -4.66027629e+01 9.3631e-01 0.33 + DS4 6.56e-01 6.54e-01 6.54e-01 1.13e+00 -5.66314237e+03 -2.31027985e+00 9.7969e-01 0.47 + DS5 1.14e+00 1.13e+00 1.13e+00 1.96e+00 -5.66288782e+03 2.54552547e-01 8.4317e-01 14.08 + DS6 1.54e+00 1.53e+00 1.53e+00 2.66e+00 -5.65330389e+03 9.58392934e+00 6.5624e-01 24.04 + DS7 3.48e+00 3.45e+00 3.45e+00 6.00e+00 -5.66100392e+03 -7.70002981e+00 3.6100e-01 34.89 + DS8 4.02e+00 3.98e+00 3.98e+00 6.93e+00 -5.66255040e+03 -1.54648025e+00 3.2295e-01 28.96 diff --git a/tests/01_PW/099_PW_DJ_SO/log_pr_fixed.txt b/tests/01_PW/099_PW_DJ_SO/log_pr_fixed.txt new file mode 100644 index 00000000000..acb5dca1422 --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/log_pr_fixed.txt @@ -0,0 +1,122 @@ +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 09:57:03 2026 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 8 for Fe: [Ar] 3d6 4s2 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(1.1268e-05 SEC) : SETUP UNITCELL + DONE(0.00231719 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + Fe 2 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0125158 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0251662 SEC) : LOCAL POTENTIAL + DONE(0.0328194 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.0878906 + DONE(0.0604581 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.0907335 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 -3.50e-03 -3.47e-03 -3.47e-03 1.94e-01 -5.93364516e+03 0.00000000e+00 6.0771e+01 0.26 + DS2 -1.98e-02 -1.99e-02 -1.99e-02 5.38e-02 -5.61422933e+03 3.19415829e+02 2.7921e+01 0.17 + DS3 3.64e-01 3.62e-01 3.63e-01 6.30e-01 -5.66083209e+03 -4.66027629e+01 9.3631e-01 0.15 + DS4 6.56e-01 6.54e-01 6.54e-01 1.13e+00 -5.66314237e+03 -2.31027985e+00 9.7969e-01 0.14 + DS5 1.14e+00 1.13e+00 1.13e+00 1.96e+00 -5.66288782e+03 2.54552547e-01 8.4317e-01 0.11 + DS6 1.54e+00 1.53e+00 1.53e+00 2.66e+00 -5.65330389e+03 9.58392934e+00 6.5624e-01 0.27 + DS7 3.48e+00 3.45e+00 3.45e+00 6.00e+00 -5.66100392e+03 -7.70002981e+00 3.6100e-01 0.16 + DS8 4.02e+00 3.98e+00 3.98e+00 6.93e+00 -5.66255040e+03 -1.54648025e+00 3.2295e-01 0.12 + DS9 4.09e+00 4.05e+00 4.05e+00 7.05e+00 -5.66250306e+03 4.73376558e-02 2.6493e-01 0.10 + DS10 4.08e+00 4.04e+00 4.04e+00 7.03e+00 -5.66202969e+03 4.73376077e-01 1.4527e-01 0.12 + DS11 4.29e+00 4.25e+00 4.25e+00 7.40e+00 -5.66220119e+03 -1.71501950e-01 2.5845e-02 0.12 + DS12 4.64e+00 4.59e+00 4.59e+00 7.98e+00 -5.66227828e+03 -7.70963308e-02 6.0170e-02 0.16 + DS13 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66242325e+03 -1.44967260e-01 2.8907e-02 0.11 + DS14 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66243546e+03 -1.22093845e-02 2.4687e-02 0.14 + DS15 4.65e+00 4.61e+00 4.61e+00 8.01e+00 -5.66242798e+03 7.48036244e-03 1.4412e-02 0.12 + DS16 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66242955e+03 -1.56605404e-03 6.6989e-03 0.12 + DS17 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66241570e+03 1.38475344e-02 4.8441e-03 0.07 + DS18 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239561e+03 2.00926871e-02 4.0264e-03 0.12 + DS19 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66238472e+03 1.08890228e-02 1.3802e-03 0.10 + SCF restart after this step! + DS20 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66288453e+03 -4.99809949e-01 1.4626e-04 0.12 + DS21 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239029e+03 4.94239547e-01 3.0808e-04 0.16 + DS22 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239089e+03 -5.99121350e-04 7.3385e-06 0.09 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -31999.5520569430 65.0633550480 64.9894611795 + 65.0633550480 -33601.1727637891 560.2487427657 + 64.9894611795 560.2487427657 -33601.1336857629 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -33067.286169 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + Driver atomic_world 3.19 1 3.19 100.00 + total 3.15 14 0.23 98.96 + PW_Basis_Sup recip2real 0.05 397 0.00 1.72 + Relax_Driver relax_driver 3.09 1 3.09 97.03 + ESolver_KS runner 3.05 1 3.05 95.77 + Potential cal_veff 0.10 23 0.00 3.15 + PW_Basis_Sup real2recip 0.08 463 0.00 2.56 + PotXC cal_veff 0.09 23 0.00 2.83 + XC_Functional v_xc 0.10 25 0.00 3.15 + ESolver_KS_PW hamilt2rho_single 2.84 22 0.13 89.23 + HSolverPW solve 2.84 22 0.13 89.22 + HSolverPW solve_psik 2.39 44 0.05 75.08 + Diago_DavSubspace diag_once 2.38 44 0.05 74.71 + Diago_DavSubspace first 0.75 44 0.02 23.52 + Operator hPsi 1.83 197 0.01 57.52 + Operator veff_pw 1.71 197 0.01 53.67 + PW_Basis_K recip2real 1.19 11904 0.00 37.43 + PW_Basis_K real2recip 0.90 8384 0.00 28.31 + Operator nonlocal_pw 0.07 197 0.00 2.22 + Operator OnsiteProjPW 0.05 197 0.00 1.58 + OnsiteProj overlap 0.05 241 0.00 1.46 + Onsite_Proj_tools cal_becp 0.05 245 0.00 1.47 + Diago_DavSubspace cal_elem 0.06 197 0.00 1.90 + Diago_DavSubspace diag_zhegvx 0.32 197 0.00 9.93 + Diago_DavSubspace cal_grad 1.28 153 0.01 40.24 + ElecStatePW psiToRho 0.43 22 0.02 13.61 + Charge_Mixing get_drho 0.03 22 0.00 1.08 +------------------------------------------------------------------- + + + START Time : Sun May 3 09:57:03 2026 + FINISH Time : Sun May 3 09:57:06 2026 + TOTAL Time : 3 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/099_PW_DJ_SO/log_pr_np4.txt b/tests/01_PW/099_PW_DJ_SO/log_pr_np4.txt new file mode 100644 index 00000000000..6acad8020ab --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/log_pr_np4.txt @@ -0,0 +1,123 @@ + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 55690612c (Sat May 2 13:10:55 2026 +0800) + + Sun May 3 09:54:07 2026 +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 8 for Fe: [Ar] 3d6 4s2 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(8.1e-06 SEC) : SETUP UNITCELL + DONE(0.00118406 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + Fe 2 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.00644934 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0128259 SEC) : LOCAL POTENTIAL + DONE(0.020645 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.0878906 + DONE(0.0450215 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.0919448 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 -3.50e-03 -3.47e-03 -3.47e-03 1.94e-01 -5.93364516e+03 0.00000000e+00 6.0771e+01 0.36 + DS2 -1.98e-02 -1.99e-02 -1.99e-02 5.38e-02 -5.61422933e+03 3.19415829e+02 2.7921e+01 0.18 + DS3 3.64e-01 3.62e-01 3.63e-01 6.30e-01 -5.66083209e+03 -4.66027629e+01 9.3631e-01 0.21 + DS4 6.56e-01 6.54e-01 6.54e-01 1.13e+00 -5.66314237e+03 -2.31027985e+00 9.7969e-01 0.14 + DS5 1.14e+00 1.13e+00 1.13e+00 1.96e+00 -5.66288782e+03 2.54552547e-01 8.4317e-01 0.37 + DS6 1.54e+00 1.53e+00 1.53e+00 2.66e+00 -5.65330389e+03 9.58392934e+00 6.5624e-01 0.23 + DS7 3.48e+00 3.45e+00 3.45e+00 6.00e+00 -5.66100392e+03 -7.70002981e+00 3.6100e-01 0.66 + DS8 4.02e+00 3.98e+00 3.98e+00 6.93e+00 -5.66255040e+03 -1.54648025e+00 3.2295e-01 1.53 + DS9 4.09e+00 4.05e+00 4.05e+00 7.05e+00 -5.66250306e+03 4.73376558e-02 2.6493e-01 0.18 + DS10 4.08e+00 4.04e+00 4.04e+00 7.03e+00 -5.66202969e+03 4.73376077e-01 1.4527e-01 0.14 + DS11 4.29e+00 4.25e+00 4.25e+00 7.40e+00 -5.66220119e+03 -1.71501951e-01 2.5845e-02 0.12 + DS12 4.64e+00 4.59e+00 4.59e+00 7.98e+00 -5.66227828e+03 -7.70963306e-02 6.0170e-02 0.18 + DS13 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66242325e+03 -1.44967260e-01 2.8907e-02 0.12 + DS14 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66243546e+03 -1.22093846e-02 2.4687e-02 0.14 + DS15 4.65e+00 4.61e+00 4.61e+00 8.01e+00 -5.66242798e+03 7.48036252e-03 1.4412e-02 0.09 + DS16 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66242955e+03 -1.56605403e-03 6.6989e-03 0.15 + DS17 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66241570e+03 1.38475345e-02 4.8441e-03 0.15 + DS18 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239561e+03 2.00926872e-02 4.0264e-03 0.12 + DS19 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66238472e+03 1.08890224e-02 1.3802e-03 0.14 + SCF restart after this step! + DS20 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66288453e+03 -4.99809950e-01 1.4626e-04 0.15 + DS21 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239029e+03 4.94239548e-01 3.0808e-04 0.16 + DS22 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239089e+03 -5.99121450e-04 7.3385e-06 0.10 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -32078.3250525856 67.5008795626 67.4184104029 + 67.5008795626 -33686.4942094489 559.5736765290 + 67.4184104029 559.5736765290 -33686.4455147576 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -33150.421592 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + Driver atomic_world 5.80 1 5.80 100.00 + total 5.76 14 0.41 99.38 + PW_Basis_Sup recip2real 0.12 397 0.00 2.06 + Relax_Driver relax_driver 5.71 1 5.71 98.56 + ESolver_KS runner 5.66 1 5.66 97.70 + Potential cal_veff 0.21 23 0.01 3.66 + PW_Basis_Sup real2recip 0.15 463 0.00 2.56 + PotXC cal_veff 0.19 23 0.01 3.36 + XC_Functional v_xc 0.21 25 0.01 3.58 + ESolver_KS_PW hamilt2rho_single 5.26 22 0.24 90.80 + HSolverPW solve 5.26 22 0.24 90.80 + HSolverPW solve_psik 4.64 44 0.11 80.07 + Diago_DavSubspace diag_once 4.62 44 0.11 79.72 + Diago_DavSubspace first 1.08 44 0.02 18.64 + Operator hPsi 3.84 197 0.02 66.23 + Operator veff_pw 3.65 197 0.02 62.92 + PW_Basis_K recip2real 2.26 11904 0.00 38.95 + PW_Basis_K real2recip 1.93 8384 0.00 33.30 + Operator nonlocal_pw 0.11 197 0.00 1.85 + Operator OnsiteProjPW 0.08 197 0.00 1.41 + OnsiteProj overlap 0.08 241 0.00 1.35 + Onsite_Proj_tools cal_becp 0.08 245 0.00 1.42 + Diago_DavSubspace cal_elem 0.06 197 0.00 1.09 + Diago_DavSubspace diag_zhegvx 0.32 197 0.00 5.45 + Diago_DavSubspace cal_grad 3.20 153 0.02 55.12 + ElecStatePW psiToRho 0.60 22 0.03 10.32 + Charge_Mixing get_drho 0.06 22 0.00 1.08 + Charge_Mixing mix_rho 0.06 20 0.00 1.04 +------------------------------------------------------------------- + + + START Time : Sun May 3 09:54:07 2026 + FINISH Time : Sun May 3 09:54:13 2026 + TOTAL Time : 6 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/099_PW_DJ_SO/log_v2.txt b/tests/01_PW/099_PW_DJ_SO/log_v2.txt new file mode 100644 index 00000000000..89fb998774a --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/log_v2.txt @@ -0,0 +1,121 @@ + + ABACUS v3.11.0-beta.1 + + Atomic-orbital Based Ab-initio Computation at UStc + + Website: http://abacus.ustc.edu.cn/ + Documentation: https://abacus.deepmodeling.com/ + Repository: https://github.com/abacusmodeling/abacus-develop + https://github.com/deepmodeling/abacus-develop + Commit: 5837a6526 (Sun May 3 09:44:20 2026 +0800) + + Sun May 3 11:37:55 2026 +Info: Local MPI proc number: 4,OpenMP thread number: 3,Total thread number: 12,Local thread limit: 14 + MAKE THE DIR : OUT.autotest/ + RUNNING WITH DEVICE : CPU / Intel(R) Core(TM) Ultra 5 225H (x1) + WARNING: some of potential function is set to zero cause of less than 1e-30. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + Warning: the number of valence electrons in pseudopotential > 8 for Fe: [Ar] 3d6 4s2 + Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. + If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + UNIFORM GRID DIM : 24 * 24 * 24 + UNIFORM GRID DIM(BIG): 24 * 24 * 24 + DONE(9.762e-06 SEC) : SETUP UNITCELL + DONE(0.00206507 SEC) : INIT K-POINTS + ---------------------------------------------------------------- + Self-consistent calculations for electrons + ---------------------------------------------------------------- + SPIN KPOINTS PROCESSES THREADS/PROC THREADS/TOTAL + 4 2 4 3 12 + ---------------------------------------------------------------- + Use plane wave basis + ---------------------------------------------------------------- + ELEMENT NATOM XC + Fe 2 + ---------------------------------------------------------------- + Initial plane wave basis and FFT box + ---------------------------------------------------------------- + DONE(0.0107033 SEC) : INIT PLANEWAVE + START CHARGE : atomic + DONE(0.0343627 SEC) : LOCAL POTENTIAL + DONE(0.0477787 SEC) : NON-LOCAL POTENTIAL + MEMORY FOR PSI (MB) : 0.0878906 + DONE(0.0753504 SEC) : INIT BASIS + + ================================================================ + SELF-CONSISTENT: + ================================================================ + DONE(0.112708 SEC) : INIT SCF + ITER TMAGX TMAGY TMAGZ AMAG ETOT/eV EDIFF/eV DRHO TIME/s + DS1 -3.50e-03 -3.47e-03 -3.47e-03 1.94e-01 -5.93364516e+03 0.00000000e+00 6.0771e+01 0.64 + DS2 -1.98e-02 -1.99e-02 -1.99e-02 5.38e-02 -5.61422933e+03 3.19415829e+02 2.7921e+01 0.83 + DS3 3.64e-01 3.62e-01 3.63e-01 6.30e-01 -5.66083209e+03 -4.66027629e+01 9.3631e-01 0.72 + DS4 6.56e-01 6.54e-01 6.54e-01 1.13e+00 -5.66314237e+03 -2.31027985e+00 9.7969e-01 1.36 + DS5 1.14e+00 1.13e+00 1.13e+00 1.96e+00 -5.66288782e+03 2.54552547e-01 8.4317e-01 0.35 + DS6 1.54e+00 1.53e+00 1.53e+00 2.66e+00 -5.65330389e+03 9.58392934e+00 6.5624e-01 0.22 + DS7 3.48e+00 3.45e+00 3.45e+00 6.00e+00 -5.66100392e+03 -7.70002981e+00 3.6100e-01 0.23 + DS8 4.02e+00 3.98e+00 3.98e+00 6.93e+00 -5.66255040e+03 -1.54648025e+00 3.2295e-01 0.21 + DS9 4.09e+00 4.05e+00 4.05e+00 7.05e+00 -5.66250306e+03 4.73376558e-02 2.6493e-01 0.16 + DS10 4.08e+00 4.04e+00 4.04e+00 7.03e+00 -5.66202969e+03 4.73376077e-01 1.4527e-01 0.21 + DS11 4.29e+00 4.25e+00 4.25e+00 7.40e+00 -5.66220119e+03 -1.71501951e-01 2.5845e-02 0.13 + DS12 4.64e+00 4.59e+00 4.59e+00 7.98e+00 -5.66227828e+03 -7.70963304e-02 6.0170e-02 0.18 + DS13 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66242325e+03 -1.44967260e-01 2.8907e-02 0.18 + DS14 4.65e+00 4.60e+00 4.60e+00 8.00e+00 -5.66243546e+03 -1.22093846e-02 2.4687e-02 0.17 + DS15 4.65e+00 4.61e+00 4.61e+00 8.01e+00 -5.66242798e+03 7.48036260e-03 1.4412e-02 0.18 + DS16 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66242955e+03 -1.56605403e-03 6.6989e-03 0.44 + DS17 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66241570e+03 1.38475346e-02 4.8441e-03 0.22 + DS18 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239561e+03 2.00926873e-02 4.0264e-03 0.15 + DS19 4.65e+00 4.61e+00 4.61e+00 8.02e+00 -5.66238472e+03 1.08890221e-02 1.3802e-03 0.17 + SCF restart after this step! + DS20 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66288453e+03 -4.99809951e-01 1.4626e-04 0.19 + DS21 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239029e+03 4.94239549e-01 3.0808e-04 0.24 + DS22 4.66e+00 4.61e+00 4.61e+00 8.02e+00 -5.66239089e+03 -5.99121512e-04 7.3385e-06 0.14 + ---------------------------------------------------------------- + Stress_x Stress_y Stress_z + ---------------------------------------------------------------- + -32078.3250525754 67.5008795602 67.4184104099 + 67.5008795602 -33686.4942094585 559.5736765351 + 67.4184104099 559.5736765351 -33686.4455147623 + ---------------------------------------------------------------- + TOTAL-PRESSURE (EXCLUDE KINETIC PART OF IONS): -33150.421592 kbar + + TIME STATISTICS +------------------------------------------------------------------- + CLASS_NAME NAME TIME/s CALLS AVG/s PER/% +------------------------------------------------------------------- + Driver atomic_world 7.53 1 7.53 100.00 + total 7.50 14 0.54 99.53 + PW_Basis_Sup recip2real 0.10 397 0.00 1.38 + Relax_Driver relax_driver 7.42 1 7.42 98.52 + ESolver_KS runner 7.37 1 7.37 97.84 + Potential cal_veff 0.22 23 0.01 2.92 + PW_Basis_Sup real2recip 0.18 463 0.00 2.41 + PotXC cal_veff 0.18 23 0.01 2.44 + XC_Functional v_xc 0.20 25 0.01 2.68 + ESolver_KS_PW hamilt2rho_single 6.98 22 0.32 92.63 + HSolverPW solve 6.98 22 0.32 92.63 + HSolverPW solve_psik 5.95 44 0.14 78.96 + Diago_DavSubspace diag_once 5.93 44 0.13 78.74 + Diago_DavSubspace first 2.13 44 0.05 28.26 + Operator hPsi 5.07 197 0.03 67.33 + Operator veff_pw 4.86 197 0.02 64.56 + PW_Basis_K recip2real 3.29 11904 0.00 43.70 + PW_Basis_K real2recip 2.50 8384 0.00 33.26 + Operator nonlocal_pw 0.09 197 0.00 1.25 + Operator OnsiteProjPW 0.11 197 0.00 1.48 + OnsiteProj overlap 0.12 241 0.00 1.64 + Onsite_Proj_tools cal_becp 0.12 245 0.00 1.66 + Diago_DavSubspace cal_elem 0.15 197 0.00 1.95 + Diago_DavSubspace diag_zhegvx 0.37 197 0.00 4.96 + Diago_DavSubspace cal_grad 3.33 153 0.02 44.21 + ElecStatePW psiToRho 1.01 22 0.05 13.41 +------------------------------------------------------------------- + + + START Time : Sun May 3 11:37:55 2026 + FINISH Time : Sun May 3 11:38:03 2026 + TOTAL Time : 8 + SEE INFORMATION IN : OUT.autotest/ diff --git a/tests/01_PW/099_PW_DJ_SO/result_dev_np1.out b/tests/01_PW/099_PW_DJ_SO/result_dev_np1.out new file mode 100644 index 00000000000..7712d6b3f76 --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/result_dev_np1.out @@ -0,0 +1,5 @@ +etotref -5662.3894775916605795 +etotperatomref -2831.1947387958 +totalforceref 17.718002 +totalstressref 100581.716424 +totaltimeref 2.37 diff --git a/tests/01_PW/099_PW_DJ_SO/result_dev_np4.out b/tests/01_PW/099_PW_DJ_SO/result_dev_np4.out new file mode 100644 index 00000000000..a24ab3f48b2 --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/result_dev_np4.out @@ -0,0 +1,5 @@ +etotref -5662.3908859906132420 +etotperatomref -2831.1954429953 +totalforceref 17.965510 +totalstressref 100582.607209 +totaltimeref 3.18 diff --git a/tests/01_PW/099_PW_DJ_SO/result_final.out b/tests/01_PW/099_PW_DJ_SO/result_final.out new file mode 100644 index 00000000000..797117b6d0c --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/result_final.out @@ -0,0 +1,5 @@ +etotref +etotperatomref +totalforceref 0.0 +totalstressref 0.0 +totaltimeref diff --git a/tests/01_PW/099_PW_DJ_SO/result_pr_fixed.out b/tests/01_PW/099_PW_DJ_SO/result_pr_fixed.out new file mode 100644 index 00000000000..417295da7fa --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/result_pr_fixed.out @@ -0,0 +1,5 @@ +etotref -5662.3908859905586723 +etotperatomref -2831.1954429953 +totalforceref 17.965520 +totalstressref 100582.461625 +totaltimeref 3.19 diff --git a/tests/01_PW/099_PW_DJ_SO/result_pr_np4.out b/tests/01_PW/099_PW_DJ_SO/result_pr_np4.out new file mode 100644 index 00000000000..43e7f0ff4f8 --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/result_pr_np4.out @@ -0,0 +1,5 @@ +etotref -5662.3908859905150166 +etotperatomref -2831.1954429953 +totalforceref 17.963892 +totalstressref 100840.250711 +totaltimeref 5.80 diff --git a/tests/01_PW/099_PW_DJ_SO/result_v2.out b/tests/01_PW/099_PW_DJ_SO/result_v2.out new file mode 100644 index 00000000000..fa945c71015 --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/result_v2.out @@ -0,0 +1,5 @@ +etotref -5662.3908859906141515 +etotperatomref -2831.1954429953 +totalforceref 17.963892 +totalstressref 100840.250711 +totaltimeref 4.14 diff --git a/tests/01_PW/099_PW_DJ_SO/result_v2_check.out b/tests/01_PW/099_PW_DJ_SO/result_v2_check.out new file mode 100644 index 00000000000..595310827fc --- /dev/null +++ b/tests/01_PW/099_PW_DJ_SO/result_v2_check.out @@ -0,0 +1,5 @@ +etotref -5662.3908859904895507 +etotperatomref -2831.1954429952 +totalforceref 17.963892 +totalstressref 100840.250711 +totaltimeref 7.53 diff --git a/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/INPUT b/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/INPUT new file mode 100644 index 00000000000..7b498cf0ca4 --- /dev/null +++ b/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/INPUT @@ -0,0 +1,23 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 + +nspin 2 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + + + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/KPT b/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/KPT new file mode 100644 index 00000000000..c289c0158aa --- /dev/null +++ b/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Gamma +1 1 1 0 0 0 diff --git a/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/STRU b/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/STRU new file mode 100644 index 00000000000..8535c1db16e --- /dev/null +++ b/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 +0.51 0.51 0.51 mag -2.0 diff --git a/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/result.ref b/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/result.ref new file mode 100644 index 00000000000..7cb6f604546 --- /dev/null +++ b/tests/17_DS_DFTU/01_LCAO_SPIN_S2_Z/result.ref @@ -0,0 +1 @@ +etotref -6787.961875326573 diff --git a/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/INPUT b/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/INPUT new file mode 100644 index 00000000000..163c7b3bcd6 --- /dev/null +++ b/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/INPUT @@ -0,0 +1,20 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/KPT b/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/KPT new file mode 100644 index 00000000000..c289c0158aa --- /dev/null +++ b/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Gamma +1 1 1 0 0 0 diff --git a/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/STRU b/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/STRU new file mode 100644 index 00000000000..a96b8d1a0e3 --- /dev/null +++ b/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 diff --git a/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/result.ref b/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/result.ref new file mode 100644 index 00000000000..0e166c01309 --- /dev/null +++ b/tests/17_DS_DFTU/02_LCAO_SPIN_S4_XYZ/result.ref @@ -0,0 +1 @@ +etotref -6787.961880425138 diff --git a/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/INPUT b/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/INPUT new file mode 100644 index 00000000000..1eb50a84479 --- /dev/null +++ b/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/INPUT @@ -0,0 +1,28 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 + +nspin 2 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/KPT b/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/STRU b/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/STRU new file mode 100644 index 00000000000..8535c1db16e --- /dev/null +++ b/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 +0.51 0.51 0.51 mag -2.0 diff --git a/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/result.ref b/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/result.ref new file mode 100644 index 00000000000..3608c565a82 --- /dev/null +++ b/tests/17_DS_DFTU/03_LCAO_DFTU_S2_Z/result.ref @@ -0,0 +1 @@ +etotref -6772.0999515218118177 diff --git a/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/INPUT b/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/INPUT new file mode 100644 index 00000000000..7daab2ff56e --- /dev/null +++ b/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/INPUT @@ -0,0 +1,28 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 + +noncolin 1 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/KPT b/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/STRU b/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/STRU new file mode 100644 index 00000000000..63c4d14399c --- /dev/null +++ b/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/result.ref b/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/result.ref new file mode 100644 index 00000000000..7c939091461 --- /dev/null +++ b/tests/17_DS_DFTU/04_LCAO_DFTU_S4_XY/result.ref @@ -0,0 +1 @@ +etotref -6772.1004497577005168 diff --git a/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/INPUT b/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/INPUT new file mode 100644 index 00000000000..efb3db1a055 --- /dev/null +++ b/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/INPUT @@ -0,0 +1,27 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/KPT b/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/STRU b/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/STRU new file mode 100644 index 00000000000..a96b8d1a0e3 --- /dev/null +++ b/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 diff --git a/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/result.ref b/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/result.ref new file mode 100644 index 00000000000..5829531a565 --- /dev/null +++ b/tests/17_DS_DFTU/05_LCAO_DFTU_S4_XYZ/result.ref @@ -0,0 +1 @@ +etotref -6772.1004562034922856 diff --git a/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/INPUT b/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/INPUT new file mode 100644 index 00000000000..567770e830b --- /dev/null +++ b/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/INPUT @@ -0,0 +1,20 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 50 +gamma_only 0 +nspin 2 +nbands 28 +scf_thr 1.0e-6 +scf_nmax 100 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +pseudo_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/KPT b/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/KPT new file mode 100644 index 00000000000..c289c0158aa --- /dev/null +++ b/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Gamma +1 1 1 0 0 0 diff --git a/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/STRU b/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/STRU new file mode 100644 index 00000000000..7d8feef3406 --- /dev/null +++ b/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/STRU @@ -0,0 +1,18 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 +0.51 0.51 0.51 mag -2.0 diff --git a/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/result.ref b/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/result.ref new file mode 100644 index 00000000000..5a43c537250 --- /dev/null +++ b/tests/17_DS_DFTU/06_PW_SPIN_S2_Z/result.ref @@ -0,0 +1,3 @@ +etotref -6807.727140777411 +etotperatomref -3403.8635703887 +totaltimeref 2.73 diff --git a/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/INPUT b/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/INPUT new file mode 100644 index 00000000000..f0efbfb4f01 --- /dev/null +++ b/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/INPUT @@ -0,0 +1,21 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +kpar 1 +pseudo_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/KPT b/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/KPT new file mode 100644 index 00000000000..c289c0158aa --- /dev/null +++ b/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Gamma +1 1 1 0 0 0 diff --git a/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/STRU b/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/STRU new file mode 100644 index 00000000000..d8ea895cf0b --- /dev/null +++ b/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/STRU @@ -0,0 +1,18 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 diff --git a/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/result.ref b/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/result.ref new file mode 100644 index 00000000000..c17d6b8de03 --- /dev/null +++ b/tests/17_DS_DFTU/07_PW_SPIN_S4_XYZ/result.ref @@ -0,0 +1,3 @@ +etotref -6350.021298529959 +etotperatomref -3175.0106492650 +totaltimeref 1.53 diff --git a/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/INPUT b/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/INPUT new file mode 100644 index 00000000000..88bcde220e8 --- /dev/null +++ b/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/INPUT @@ -0,0 +1,29 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 50 +gamma_only 0 +device cpu + +nspin 2 +nbands 28 +scf_thr 1.0e-6 +scf_nmax 100 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/KPT b/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/STRU b/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/STRU new file mode 100644 index 00000000000..8535c1db16e --- /dev/null +++ b/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 +0.51 0.51 0.51 mag -2.0 diff --git a/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/result.ref b/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/result.ref new file mode 100644 index 00000000000..de5a702e338 --- /dev/null +++ b/tests/17_DS_DFTU/08_PW_DFTU_S2_Z/result.ref @@ -0,0 +1,3 @@ +etotref -6792.3335167101049592 +etotperatomref -3396.1667583551 +totaltimeref 21.98 diff --git a/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/INPUT b/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/INPUT new file mode 100644 index 00000000000..c36bf764591 --- /dev/null +++ b/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/INPUT @@ -0,0 +1,29 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +device cpu + +noncolin 1 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/KPT b/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/STRU b/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/STRU new file mode 100644 index 00000000000..63c4d14399c --- /dev/null +++ b/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/result.ref b/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/result.ref new file mode 100644 index 00000000000..e67630b7175 --- /dev/null +++ b/tests/17_DS_DFTU/09_PW_DFTU_S4_XY/result.ref @@ -0,0 +1,3 @@ +etotref -6348.2271462104699822 +etotperatomref -3174.1135731052 +totaltimeref 3.89 diff --git a/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/INPUT b/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/INPUT new file mode 100644 index 00000000000..07704d5163a --- /dev/null +++ b/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/INPUT @@ -0,0 +1,28 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/KPT b/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/STRU b/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/STRU new file mode 100644 index 00000000000..63c4d14399c --- /dev/null +++ b/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/result.ref b/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/result.ref new file mode 100644 index 00000000000..d274aea2b7d --- /dev/null +++ b/tests/17_DS_DFTU/10_PW_DFTU_S4_XY/result.ref @@ -0,0 +1 @@ +etotref -6348.2271462104727107 diff --git a/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/INPUT b/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/INPUT new file mode 100644 index 00000000000..5ec0a0f0e53 --- /dev/null +++ b/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/INPUT @@ -0,0 +1,29 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +nspin 2 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/KPT b/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/STRU b/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/STRU new file mode 100644 index 00000000000..8535c1db16e --- /dev/null +++ b/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 +0.51 0.51 0.51 mag -2.0 diff --git a/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/result.ref b/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/result.ref new file mode 100644 index 00000000000..e78a37517f6 --- /dev/null +++ b/tests/17_DS_DFTU/11_PW_DFTU_S2_FeO/result.ref @@ -0,0 +1,3 @@ +etotref -6348.2272130009841931 +etotperatomref -3174.1136065005 +totaltimeref 2.04 diff --git a/tests/17_DS_DFTU/12_PW_DS_S2_Z/INPUT b/tests/17_DS_DFTU/12_PW_DS_S2_Z/INPUT new file mode 100644 index 00000000000..7bcfbc3ffd5 --- /dev/null +++ b/tests/17_DS_DFTU/12_PW_DS_S2_Z/INPUT @@ -0,0 +1,32 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +nspin 2 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/12_PW_DS_S2_Z/KPT b/tests/17_DS_DFTU/12_PW_DS_S2_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/12_PW_DS_S2_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/12_PW_DS_S2_Z/STRU b/tests/17_DS_DFTU/12_PW_DS_S2_Z/STRU new file mode 100644 index 00000000000..b942348be5d --- /dev/null +++ b/tests/17_DS_DFTU/12_PW_DS_S2_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/12_PW_DS_S2_Z/result.ref b/tests/17_DS_DFTU/12_PW_DS_S2_Z/result.ref new file mode 100644 index 00000000000..ee7039ca16a --- /dev/null +++ b/tests/17_DS_DFTU/12_PW_DS_S2_Z/result.ref @@ -0,0 +1,3 @@ +etotref -5322.706641187102 +etotperatomref -2661.3533205936 +totaltimeref 1.64 diff --git a/tests/17_DS_DFTU/13_PW_DS_S4_XY/INPUT b/tests/17_DS_DFTU/13_PW_DS_S4_XY/INPUT new file mode 100644 index 00000000000..b2aa0bbd5af --- /dev/null +++ b/tests/17_DS_DFTU/13_PW_DS_S4_XY/INPUT @@ -0,0 +1,32 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/13_PW_DS_S4_XY/KPT b/tests/17_DS_DFTU/13_PW_DS_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/13_PW_DS_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/13_PW_DS_S4_XY/STRU b/tests/17_DS_DFTU/13_PW_DS_S4_XY/STRU new file mode 100644 index 00000000000..1ffecf17384 --- /dev/null +++ b/tests/17_DS_DFTU/13_PW_DS_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/13_PW_DS_S4_XY/result.ref b/tests/17_DS_DFTU/13_PW_DS_S4_XY/result.ref new file mode 100644 index 00000000000..c17b77b3c76 --- /dev/null +++ b/tests/17_DS_DFTU/13_PW_DS_S4_XY/result.ref @@ -0,0 +1,3 @@ +etotref -5319.63101475035 +etotperatomref -2659.8155073752 +totaltimeref 3.65 diff --git a/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/INPUT b/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/INPUT new file mode 100644 index 00000000000..b2aa0bbd5af --- /dev/null +++ b/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/INPUT @@ -0,0 +1,32 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/KPT b/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/STRU b/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/STRU new file mode 100644 index 00000000000..0a9effad744 --- /dev/null +++ b/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 sc 1 1 1 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 sc 1 1 1 diff --git a/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/result.ref b/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/result.ref new file mode 100644 index 00000000000..6af9d49efff --- /dev/null +++ b/tests/17_DS_DFTU/14_PW_DS_S4_XYZ/result.ref @@ -0,0 +1,3 @@ +etotref -5319.679766457001 +etotperatomref -2659.8398832285 +totaltimeref 2.98 diff --git a/tests/17_DS_DFTU/15_PW_DS_S4_Z/INPUT b/tests/17_DS_DFTU/15_PW_DS_S4_Z/INPUT new file mode 100644 index 00000000000..1957fe592bf --- /dev/null +++ b/tests/17_DS_DFTU/15_PW_DS_S4_Z/INPUT @@ -0,0 +1,32 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/15_PW_DS_S4_Z/KPT b/tests/17_DS_DFTU/15_PW_DS_S4_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/15_PW_DS_S4_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/15_PW_DS_S4_Z/STRU b/tests/17_DS_DFTU/15_PW_DS_S4_Z/STRU new file mode 100644 index 00000000000..bbe4a2796fa --- /dev/null +++ b/tests/17_DS_DFTU/15_PW_DS_S4_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/15_PW_DS_S4_Z/result.ref b/tests/17_DS_DFTU/15_PW_DS_S4_Z/result.ref new file mode 100644 index 00000000000..d6215c4ae08 --- /dev/null +++ b/tests/17_DS_DFTU/15_PW_DS_S4_Z/result.ref @@ -0,0 +1,3 @@ +etotref -5319.594960678665 +etotperatomref -2659.7974803393 +totaltimeref 3.20 diff --git a/tests/17_DS_DFTU/16_PW_DS_S4_XY/INPUT b/tests/17_DS_DFTU/16_PW_DS_S4_XY/INPUT new file mode 100644 index 00000000000..1957fe592bf --- /dev/null +++ b/tests/17_DS_DFTU/16_PW_DS_S4_XY/INPUT @@ -0,0 +1,32 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/16_PW_DS_S4_XY/KPT b/tests/17_DS_DFTU/16_PW_DS_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/16_PW_DS_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/16_PW_DS_S4_XY/STRU b/tests/17_DS_DFTU/16_PW_DS_S4_XY/STRU new file mode 100644 index 00000000000..1ffecf17384 --- /dev/null +++ b/tests/17_DS_DFTU/16_PW_DS_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/16_PW_DS_S4_XY/result.ref b/tests/17_DS_DFTU/16_PW_DS_S4_XY/result.ref new file mode 100644 index 00000000000..e52cd13e20a --- /dev/null +++ b/tests/17_DS_DFTU/16_PW_DS_S4_XY/result.ref @@ -0,0 +1,3 @@ +etotref -5319.631014750344 +etotperatomref -2659.8155073752 +totaltimeref 2.97 diff --git a/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/INPUT b/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/INPUT new file mode 100644 index 00000000000..4ba27422a83 --- /dev/null +++ b/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/INPUT @@ -0,0 +1,34 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/KPT b/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/STRU b/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/STRU new file mode 100644 index 00000000000..0a9effad744 --- /dev/null +++ b/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 sc 1 1 1 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 sc 1 1 1 diff --git a/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/result.ref b/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/result.ref new file mode 100644 index 00000000000..b4dccfd1360 --- /dev/null +++ b/tests/17_DS_DFTU/17_PW_DS_S4_XYZ/result.ref @@ -0,0 +1,3 @@ +etotref -5319.679766456968 +etotperatomref -2659.8398832285 +totaltimeref 3.20 diff --git a/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/INPUT b/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/INPUT new file mode 100644 index 00000000000..f838996f2e5 --- /dev/null +++ b/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/INPUT @@ -0,0 +1,40 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +nspin 2 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/KPT b/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/STRU b/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/STRU new file mode 100644 index 00000000000..bbe4a2796fa --- /dev/null +++ b/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/result.ref b/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/result.ref new file mode 100644 index 00000000000..355eb9a752a --- /dev/null +++ b/tests/17_DS_DFTU/18_PW_DFTU_DS_S2_Z/result.ref @@ -0,0 +1,3 @@ +etotref -5298.3025531171588227 +etotperatomref -2649.1512765586 +totaltimeref 1.96 diff --git a/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/INPUT b/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/INPUT new file mode 100644 index 00000000000..d816dfe980b --- /dev/null +++ b/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/INPUT @@ -0,0 +1,40 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +device cpu + +noncolin 1 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/KPT b/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/STRU b/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/STRU new file mode 100644 index 00000000000..1ffecf17384 --- /dev/null +++ b/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/result.ref b/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/result.ref new file mode 100644 index 00000000000..9a714200ac8 --- /dev/null +++ b/tests/17_DS_DFTU/19_PW_DFTU_DS_S4_XY/result.ref @@ -0,0 +1,3 @@ +etotref -5303.0869839122487974 +etotperatomref -2651.5434919561 +totaltimeref 3.27 diff --git a/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/INPUT b/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/INPUT new file mode 100644 index 00000000000..db6f3ebe401 --- /dev/null +++ b/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/INPUT @@ -0,0 +1,40 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 1 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/KPT b/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/STRU b/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/STRU new file mode 100644 index 00000000000..0a9effad744 --- /dev/null +++ b/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 sc 1 1 1 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 sc 1 1 1 diff --git a/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/result.ref b/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/result.ref new file mode 100644 index 00000000000..1b90c6c6183 --- /dev/null +++ b/tests/17_DS_DFTU/20_PW_DFTU_DS_S4_XYZ/result.ref @@ -0,0 +1,3 @@ +etotref -5303.0883622111805380 +etotperatomref -2651.5441811056 +totaltimeref 3.55 diff --git a/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/INPUT b/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/INPUT new file mode 100644 index 00000000000..660e992e401 --- /dev/null +++ b/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/INPUT @@ -0,0 +1,39 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/KPT b/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/STRU b/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/STRU new file mode 100644 index 00000000000..bbe4a2796fa --- /dev/null +++ b/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/result.ref b/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/result.ref new file mode 100644 index 00000000000..c3cbcbf5c63 --- /dev/null +++ b/tests/17_DS_DFTU/21_PW_DFTU_DS_S4_Z/result.ref @@ -0,0 +1,3 @@ +etotref -5303.0904121633711839 +etotperatomref -2651.5452060817 +totaltimeref 3.68 diff --git a/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/INPUT b/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/INPUT new file mode 100644 index 00000000000..660e992e401 --- /dev/null +++ b/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/INPUT @@ -0,0 +1,39 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/KPT b/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/STRU b/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/STRU new file mode 100644 index 00000000000..1ffecf17384 --- /dev/null +++ b/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/result.ref b/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/result.ref new file mode 100644 index 00000000000..4b6f072b9fa --- /dev/null +++ b/tests/17_DS_DFTU/22_PW_DFTU_DS_S4_XY/result.ref @@ -0,0 +1,3 @@ +etotref -5303.0869839122487974 +etotperatomref -2651.5434919561 +totaltimeref 3.89 diff --git a/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/INPUT b/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/INPUT new file mode 100644 index 00000000000..660e992e401 --- /dev/null +++ b/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/INPUT @@ -0,0 +1,39 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +kpar 2 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/KPT b/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/STRU b/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/STRU new file mode 100644 index 00000000000..0a9effad744 --- /dev/null +++ b/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 sc 1 1 1 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 sc 1 1 1 diff --git a/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/result.ref b/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/result.ref new file mode 100644 index 00000000000..ccd33af65ef --- /dev/null +++ b/tests/17_DS_DFTU/23_PW_DFTU_DS_S4_XYZ/result.ref @@ -0,0 +1,3 @@ +etotref -5303.0883622111823570 +etotperatomref -2651.5441811056 +totaltimeref 3.04 diff --git a/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/INPUT b/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/INPUT new file mode 100644 index 00000000000..85eeb52307a --- /dev/null +++ b/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/INPUT @@ -0,0 +1,32 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 + +nspin 2 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/KPT b/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/STRU b/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/STRU new file mode 100644 index 00000000000..bbe4a2796fa --- /dev/null +++ b/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/result.ref b/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/result.ref new file mode 100644 index 00000000000..590cba281a1 --- /dev/null +++ b/tests/17_DS_DFTU/24_LCAO_DS_S2_Z/result.ref @@ -0,0 +1 @@ +etotref -6777.8296487160 diff --git a/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/INPUT b/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/INPUT new file mode 100644 index 00000000000..6c46e513622 --- /dev/null +++ b/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/INPUT @@ -0,0 +1,32 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 + +noncolin 1 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +#sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/KPT b/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/STRU b/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/STRU new file mode 100644 index 00000000000..17f53a6dcde --- /dev/null +++ b/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/result.ref b/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/result.ref new file mode 100644 index 00000000000..ff3af4cb3f8 --- /dev/null +++ b/tests/17_DS_DFTU/25_LCAO_DS_S4_XY/result.ref @@ -0,0 +1,3 @@ +etotref -6777.644505951771 +etotperatomref -3388.8222529759 +totaltimeref 3.68 diff --git a/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/INPUT b/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/INPUT new file mode 100644 index 00000000000..57d29cb9d4f --- /dev/null +++ b/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/INPUT @@ -0,0 +1,32 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 + +noncolin 1 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/KPT b/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/STRU b/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/STRU new file mode 100644 index 00000000000..a96b8d1a0e3 --- /dev/null +++ b/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 diff --git a/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/result.ref b/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/result.ref new file mode 100644 index 00000000000..e980b09cdc6 --- /dev/null +++ b/tests/17_DS_DFTU/26_LCAO_DS_S4_XYZ/result.ref @@ -0,0 +1,3 @@ +etotref -6777.82997491835 +etotperatomref -3388.9149874592 +totaltimeref 3.27 diff --git a/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/INPUT b/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/INPUT new file mode 100644 index 00000000000..2fed38ce6ef --- /dev/null +++ b/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/INPUT @@ -0,0 +1,31 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/KPT b/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/STRU b/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/STRU new file mode 100644 index 00000000000..8535c1db16e --- /dev/null +++ b/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 +0.51 0.51 0.51 mag -2.0 diff --git a/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/result.ref b/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/result.ref new file mode 100644 index 00000000000..51bd721197e --- /dev/null +++ b/tests/17_DS_DFTU/27_LCAO_DS_S4_Z/result.ref @@ -0,0 +1,3 @@ +etotref -6777.82965031416 +etotperatomref -3388.9148251571 +totaltimeref 3.00 diff --git a/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/INPUT b/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/INPUT new file mode 100644 index 00000000000..2fed38ce6ef --- /dev/null +++ b/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/INPUT @@ -0,0 +1,31 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/KPT b/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/STRU b/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/STRU new file mode 100644 index 00000000000..63c4d14399c --- /dev/null +++ b/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/result.ref b/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/result.ref new file mode 100644 index 00000000000..824d67fc620 --- /dev/null +++ b/tests/17_DS_DFTU/28_LCAO_DS_S4_XY/result.ref @@ -0,0 +1,3 @@ +etotref -6777.829650530383 +etotperatomref -3388.9148252652 +totaltimeref 3.84 diff --git a/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/INPUT b/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/INPUT new file mode 100644 index 00000000000..2fed38ce6ef --- /dev/null +++ b/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/INPUT @@ -0,0 +1,31 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/KPT b/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/STRU b/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/STRU new file mode 100644 index 00000000000..a96b8d1a0e3 --- /dev/null +++ b/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 diff --git a/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/result.ref b/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/result.ref new file mode 100644 index 00000000000..a1a600b1cf3 --- /dev/null +++ b/tests/17_DS_DFTU/29_LCAO_DS_S4_XYZ/result.ref @@ -0,0 +1,3 @@ +etotref -6777.828978144594 +etotperatomref -3388.9144890723 +totaltimeref 2.82 diff --git a/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/INPUT b/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/INPUT new file mode 100644 index 00000000000..43de2cb8422 --- /dev/null +++ b/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/INPUT @@ -0,0 +1,37 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 + +nspin 2 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/KPT b/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/STRU b/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/STRU new file mode 100644 index 00000000000..8535c1db16e --- /dev/null +++ b/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 +0.51 0.51 0.51 mag -2.0 diff --git a/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/result.ref b/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/result.ref new file mode 100644 index 00000000000..af84f7b835c --- /dev/null +++ b/tests/17_DS_DFTU/30_LCAO_DFTU_DS_S2_Z/result.ref @@ -0,0 +1 @@ +etotref -6772.1000709242498488 diff --git a/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/INPUT b/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/INPUT new file mode 100644 index 00000000000..0a703886a5c --- /dev/null +++ b/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/INPUT @@ -0,0 +1,37 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 + +noncolin 1 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/KPT b/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/STRU b/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/STRU new file mode 100644 index 00000000000..63c4d14399c --- /dev/null +++ b/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/result.ref b/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/result.ref new file mode 100644 index 00000000000..cf5eb283d36 --- /dev/null +++ b/tests/17_DS_DFTU/31_LCAO_DFTU_DS_S4_XY/result.ref @@ -0,0 +1,3 @@ +etotref -6772.1005518486881556 +etotperatomref -3386.0502759243 +totaltimeref 3.38 diff --git a/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/INPUT b/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/INPUT new file mode 100644 index 00000000000..0a703886a5c --- /dev/null +++ b/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/INPUT @@ -0,0 +1,37 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 + +noncolin 1 +#nbands 28 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/KPT b/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/STRU b/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/STRU new file mode 100644 index 00000000000..a96b8d1a0e3 --- /dev/null +++ b/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 diff --git a/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/result.ref b/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/result.ref new file mode 100644 index 00000000000..9af247d3b48 --- /dev/null +++ b/tests/17_DS_DFTU/32_LCAO_DFTU_DS_S4_XYZ/result.ref @@ -0,0 +1,3 @@ +etotref -6772.1005302553776346 +etotperatomref -3386.0502651277 +totaltimeref 2.86 diff --git a/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/INPUT b/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/INPUT new file mode 100644 index 00000000000..f109dff51d7 --- /dev/null +++ b/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/INPUT @@ -0,0 +1,36 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/KPT b/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/STRU b/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/STRU new file mode 100644 index 00000000000..8535c1db16e --- /dev/null +++ b/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 +0.51 0.51 0.51 mag -2.0 diff --git a/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/result.ref b/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/result.ref new file mode 100644 index 00000000000..a5bcee9bce6 --- /dev/null +++ b/tests/17_DS_DFTU/33_LCAO_DFTU_DS_S4_Z/result.ref @@ -0,0 +1,3 @@ +etotref -6772.1005518486863366 +etotperatomref -3386.0502759243 +totaltimeref 3.49 diff --git a/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/INPUT b/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/INPUT new file mode 100644 index 00000000000..f109dff51d7 --- /dev/null +++ b/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/INPUT @@ -0,0 +1,36 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/KPT b/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/STRU b/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/STRU new file mode 100644 index 00000000000..63c4d14399c --- /dev/null +++ b/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 2.0 0.0 0.0 +0.51 0.51 0.51 magmom -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/result.ref b/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/result.ref new file mode 100644 index 00000000000..8d5b8b517fc --- /dev/null +++ b/tests/17_DS_DFTU/34_LCAO_DFTU_DS_S4_XY/result.ref @@ -0,0 +1,3 @@ +etotref -6772.1005518486863366 +etotperatomref -3386.0502759243 +totaltimeref 3.64 diff --git a/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/INPUT b/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/INPUT new file mode 100644 index 00000000000..f109dff51d7 --- /dev/null +++ b/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/INPUT @@ -0,0 +1,36 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type lcao +ecutwfc 20 +gamma_only 0 +noncolin 1 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver genelpa +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin parameters +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB diff --git a/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/KPT b/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/STRU b/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/STRU new file mode 100644 index 00000000000..a96b8d1a0e3 --- /dev/null +++ b/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 magmom 1.155 1.155 1.155 +0.51 0.51 0.51 magmom -1.155 -1.155 -1.155 diff --git a/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/result.ref b/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/result.ref new file mode 100644 index 00000000000..44139de14d7 --- /dev/null +++ b/tests/17_DS_DFTU/35_LCAO_DFTU_DS_S4_XYZ/result.ref @@ -0,0 +1,3 @@ +etotref -6772.1005301547193085 +etotperatomref -3386.0502650774 +totaltimeref 3.13 diff --git a/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/INPUT b/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/INPUT new file mode 100644 index 00000000000..04f7faa4798 --- /dev/null +++ b/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/INPUT @@ -0,0 +1,34 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 0 +nspin 2 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 1 + +# DeltaSpin parameters — nsc=1: 只读 lambda 不迭代优化 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 1 +nsc_min 1 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/KPT b/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/STRU b/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/STRU new file mode 100644 index 00000000000..115ded29104 --- /dev/null +++ b/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/result.ref b/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/result.ref new file mode 100644 index 00000000000..605c3ad5edb --- /dev/null +++ b/tests/17_DS_DFTU/36_PW_DS_S2_ReadLam_Z/result.ref @@ -0,0 +1,3 @@ +etotref -5333.69240202835 +etotperatomref -2666.8462010142 +totaltimeref 5.94 diff --git a/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/INPUT b/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/INPUT new file mode 100644 index 00000000000..0dff4f4d4a1 --- /dev/null +++ b/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/INPUT @@ -0,0 +1,33 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 1 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 2 + +# DeltaSpin parameters — nsc=1: 只读 lambda 不迭代优化 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 1 +nsc_min 1 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-3 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/KPT b/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/STRU b/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/STRU new file mode 100644 index 00000000000..115ded29104 --- /dev/null +++ b/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/result.ref b/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/result.ref new file mode 100644 index 00000000000..23f1a5689c5 --- /dev/null +++ b/tests/17_DS_DFTU/37_PW_DS_S4_ReadLam_XY/result.ref @@ -0,0 +1,3 @@ +etotref -5335.077393619672 +etotperatomref -2667.5386968098 +totaltimeref 3.13 diff --git a/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/INPUT b/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/INPUT new file mode 100644 index 00000000000..53a01dc7279 --- /dev/null +++ b/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/INPUT @@ -0,0 +1,34 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 0 +nspin 2 +scf_thr 1.0e-6 +scf_nmax 100 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 1 + +# DeltaSpin — 极严阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-10 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/KPT b/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/STRU b/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/STRU new file mode 100644 index 00000000000..b43039501d3 --- /dev/null +++ b/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 +0.51 0.51 0.51 mag -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/result.ref b/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/result.ref new file mode 100644 index 00000000000..58e32cd1c0d --- /dev/null +++ b/tests/17_DS_DFTU/38_PW_DS_S2_Thr1e10_Z/result.ref @@ -0,0 +1,3 @@ +etotref -6368.964006945744 +etotperatomref -3184.4820034729 +totaltimeref 6.33 diff --git a/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/INPUT b/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/INPUT new file mode 100644 index 00000000000..e9fb27212f5 --- /dev/null +++ b/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/INPUT @@ -0,0 +1,33 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 1 +scf_thr 1.0e-6 +scf_nmax 100 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 2 + +# DeltaSpin — 极严阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-10 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/KPT b/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/STRU b/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/STRU new file mode 100644 index 00000000000..b43039501d3 --- /dev/null +++ b/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 +0.51 0.51 0.51 mag -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/result.ref b/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/result.ref new file mode 100644 index 00000000000..8507c130334 --- /dev/null +++ b/tests/17_DS_DFTU/39_PW_DS_S4_Thr1e10_XY/result.ref @@ -0,0 +1,3 @@ +etotref -6370.632169015102 +etotperatomref -3185.3160845076 +totaltimeref 3.72 diff --git a/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/INPUT b/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/INPUT new file mode 100644 index 00000000000..16730d9141b --- /dev/null +++ b/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/INPUT @@ -0,0 +1,35 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 0 +nspin 2 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +out_alllog 1 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 1 + +# DeltaSpin — 极松阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 10 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/KPT b/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/STRU b/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/STRU new file mode 100644 index 00000000000..b43039501d3 --- /dev/null +++ b/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 +0.51 0.51 0.51 mag -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/result.ref b/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/result.ref new file mode 100644 index 00000000000..0653ad07faa --- /dev/null +++ b/tests/17_DS_DFTU/40_PW_DS_S2_Thr10_Z/result.ref @@ -0,0 +1 @@ +etotref !FINAL_ETOT_IS diff --git a/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/INPUT b/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/INPUT new file mode 100644 index 00000000000..39892653cbd --- /dev/null +++ b/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/INPUT @@ -0,0 +1,33 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 1 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 2 + +# DeltaSpin — 极松阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 0.1 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/KPT b/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/STRU b/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/STRU new file mode 100644 index 00000000000..115ded29104 --- /dev/null +++ b/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/result.ref b/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/result.ref new file mode 100644 index 00000000000..68879ee0794 --- /dev/null +++ b/tests/17_DS_DFTU/41_PW_DS_S4_Thr10_XY/result.ref @@ -0,0 +1,3 @@ +etotref -5311.338010287786 +etotperatomref -2655.6690051439 +totaltimeref 1.80 diff --git a/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/INPUT b/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/INPUT new file mode 100644 index 00000000000..5406399e22b --- /dev/null +++ b/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/INPUT @@ -0,0 +1,40 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 0 +nspin 2 +scf_thr 1.0e-6 +scf_nmax 100 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 1 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin — 极严阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-10 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/KPT b/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/STRU b/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/STRU new file mode 100644 index 00000000000..b43039501d3 --- /dev/null +++ b/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 +0.51 0.51 0.51 mag -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/result.ref b/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/result.ref new file mode 100644 index 00000000000..59715c4ba18 --- /dev/null +++ b/tests/17_DS_DFTU/42_PW_DFTU_DS_S2_Thr1e10_Z/result.ref @@ -0,0 +1 @@ +etotref -6363.8892809126737120 diff --git a/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/INPUT b/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/INPUT new file mode 100644 index 00000000000..70c55514c38 --- /dev/null +++ b/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/INPUT @@ -0,0 +1,39 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 1 +scf_thr 1.0e-6 +scf_nmax 100 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 2 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin — 极严阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-10 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/KPT b/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/STRU b/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/STRU new file mode 100644 index 00000000000..b43039501d3 --- /dev/null +++ b/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 +0.51 0.51 0.51 mag -2.0 0.0 0.0 diff --git a/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/result.ref b/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/result.ref new file mode 100644 index 00000000000..7621ad1d7cb --- /dev/null +++ b/tests/17_DS_DFTU/43_PW_DFTU_DS_S4_Thr1e10_XY/result.ref @@ -0,0 +1,3 @@ +etotref -6348.2271462104718012 +etotperatomref -3174.1135731052 +totaltimeref 5.09 diff --git a/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/INPUT b/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/INPUT new file mode 100644 index 00000000000..6a8252d8b28 --- /dev/null +++ b/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/INPUT @@ -0,0 +1,40 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 0 +nspin 2 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 1 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin — 极松阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 10 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/KPT b/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/STRU b/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/STRU new file mode 100644 index 00000000000..115ded29104 --- /dev/null +++ b/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/result.ref b/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/result.ref new file mode 100644 index 00000000000..1fe0498f71d --- /dev/null +++ b/tests/17_DS_DFTU/44_PW_DFTU_DS_S2_Thr10_Z/result.ref @@ -0,0 +1 @@ +etotref -5273.5080205169788314 diff --git a/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/INPUT b/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/INPUT new file mode 100644 index 00000000000..5e7e3f0b673 --- /dev/null +++ b/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/INPUT @@ -0,0 +1,39 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 1 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 2 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin — 极松阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 0.1 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/KPT b/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/STRU b/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/STRU new file mode 100644 index 00000000000..115ded29104 --- /dev/null +++ b/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/result.ref b/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/result.ref new file mode 100644 index 00000000000..a69048230e3 --- /dev/null +++ b/tests/17_DS_DFTU/45_PW_DFTU_DS_S4_Thr10_XY/result.ref @@ -0,0 +1,3 @@ +etotref -5290.6699254076938814 +etotperatomref -2645.3349627038 +totaltimeref 2.40 diff --git a/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/INPUT b/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/INPUT new file mode 100644 index 00000000000..806fccb319f --- /dev/null +++ b/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/INPUT @@ -0,0 +1,35 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 0 +nspin 2 +scf_thr 1.0e-6 +scf_nmax 100 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 1 + +# DeltaSpin — bfgs 策略 + 极严阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-10 +sc_lambda_strategy bfgs + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/KPT b/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/STRU b/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/STRU new file mode 100644 index 00000000000..115ded29104 --- /dev/null +++ b/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/result.ref b/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/result.ref new file mode 100644 index 00000000000..4e15f76f389 --- /dev/null +++ b/tests/17_DS_DFTU/46_PW_DS_S2_Thr1e10_Z_bfgs/result.ref @@ -0,0 +1 @@ +etotref -6368.964006945507 diff --git a/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/INPUT b/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/INPUT new file mode 100644 index 00000000000..94b4cce50c7 --- /dev/null +++ b/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/INPUT @@ -0,0 +1,34 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 1 +scf_thr 1.0e-6 +scf_nmax 100 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 2 + +# DeltaSpin — bfgs 策略 + 极严阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 1e-10 +sc_lambda_strategy bfgs + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/KPT b/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/STRU b/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/STRU new file mode 100644 index 00000000000..115ded29104 --- /dev/null +++ b/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 0.0 0.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 0.0 0.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/result.ref b/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/result.ref new file mode 100644 index 00000000000..b7f6b6f201c --- /dev/null +++ b/tests/17_DS_DFTU/47_PW_DS_S4_Thr1e10_XY_bfgs/result.ref @@ -0,0 +1 @@ +etotref -6370.632169014631 diff --git a/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/INPUT b/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/INPUT new file mode 100644 index 00000000000..b3755c89aec --- /dev/null +++ b/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/INPUT @@ -0,0 +1,41 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 0 +nspin 2 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 1 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin — bfgs 策略 + 极松阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 10 +sc_lambda_strategy bfgs + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/KPT b/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/STRU b/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/STRU new file mode 100644 index 00000000000..56de4bfea7c --- /dev/null +++ b/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/result.ref b/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/result.ref new file mode 100644 index 00000000000..cfedd7664b7 --- /dev/null +++ b/tests/17_DS_DFTU/48_PW_DFTU_DS_S2_Thr10_Z_bfgs/result.ref @@ -0,0 +1 @@ +etotref -5264.8465689110780659 diff --git a/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/INPUT b/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/INPUT new file mode 100644 index 00000000000..60c666bb7f3 --- /dev/null +++ b/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/INPUT @@ -0,0 +1,40 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 0 + +noncolin 1 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 +kpar 2 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 + +# DeltaSpin — bfgs 策略 + 极松阈值 +sc_mag_switch 1 +sc_thr 1e-4 +nsc 100 +nsc_min 2 +sc_scf_nmin 2 +alpha_trial 0.01 +sccut 3.0 +sc_scf_thr 0.1 +sc_lambda_strategy bfgs + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/KPT b/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/KPT new file mode 100644 index 00000000000..35597cecff1 --- /dev/null +++ b/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Monkhorst-Pack +2 2 2 0 0 0 diff --git a/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/STRU b/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/STRU new file mode 100644 index 00000000000..56de4bfea7c --- /dev/null +++ b/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/STRU @@ -0,0 +1,21 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 2.0 sc 1 1 1 +0.51 0.51 0.51 mag -2.0 sc 1 1 1 diff --git a/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/result.ref b/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/result.ref new file mode 100644 index 00000000000..dc03e5c2ec5 --- /dev/null +++ b/tests/17_DS_DFTU/49_PW_DFTU_DS_S4_Thr10_XY_bfgs/result.ref @@ -0,0 +1 @@ +etotref -5290.6583389350662401 diff --git a/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/INPUT b/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/INPUT new file mode 100644 index 00000000000..3eb6a75c3eb --- /dev/null +++ b/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/INPUT @@ -0,0 +1,28 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 1 + +nspin 2 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr -1 2 +hubbard_u 0 5.0 +onsite_radius 3.0 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/KPT b/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/KPT new file mode 100644 index 00000000000..c289c0158aa --- /dev/null +++ b/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Gamma +1 1 1 0 0 0 diff --git a/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/STRU b/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/STRU new file mode 100644 index 00000000000..cdfe9c1b756 --- /dev/null +++ b/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/STRU @@ -0,0 +1,27 @@ +ATOMIC_SPECIES +O 1.000 O.upf +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +8_O_gga_100Ry_7au_2s2p1d.orb +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +1 +0.00 0.00 0.00 mag 2.0 + +O +0.0 +1 +0.50 0.50 0.50 diff --git a/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/result.ref b/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/result.ref new file mode 100644 index 00000000000..1fe64bf4833 --- /dev/null +++ b/tests/17_DS_DFTU/50_FeO_O_first_Fe_second/result.ref @@ -0,0 +1,3 @@ +etotref -3579.9923209019589194 +etotperatomref -1789.9961604510 +totaltimeref 2.85 diff --git a/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/INPUT b/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/INPUT new file mode 100644 index 00000000000..fe8ed81b5af --- /dev/null +++ b/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/INPUT @@ -0,0 +1,28 @@ +INPUT_PARAMETERS +suffix autotest +calculation scf +basis_type pw +ecutwfc 20 +gamma_only 1 + +nspin 2 +#nbands 40 +scf_thr 1.0e-6 +scf_nmax 50 +out_chg 0 +smearing_method gaussian +smearing_sigma 0.01 +mixing_type broyden +mixing_beta 0.4 +ks_solver dav_subspace +symmetry 0 + +# DFT+U parameters +dft_plus_u 1 +orbital_corr 2 0 +hubbard_u 5.0 0 +onsite_radius 3.0 + +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB +pw_seed 1 diff --git a/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/KPT b/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/KPT new file mode 100644 index 00000000000..c289c0158aa --- /dev/null +++ b/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Gamma +1 1 1 0 0 0 diff --git a/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/STRU b/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/STRU new file mode 100644 index 00000000000..aa9dd6f44d8 --- /dev/null +++ b/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/STRU @@ -0,0 +1,27 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf +O 1.000 O.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb +8_O_gga_100Ry_7au_2s2p1d.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +1 +0.00 0.00 0.00 mag 2.0 + +O +0.0 +1 +0.50 0.50 0.50 diff --git a/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/result.ref b/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/result.ref new file mode 100644 index 00000000000..1fe64bf4833 --- /dev/null +++ b/tests/17_DS_DFTU/51_FeO_Fe_first_O_second/result.ref @@ -0,0 +1,3 @@ +etotref -3579.9923209019589194 +etotperatomref -1789.9961604510 +totaltimeref 2.85 diff --git a/tests/17_DS_DFTU/52_PW_DFTU_SO/INPUT b/tests/17_DS_DFTU/52_PW_DFTU_SO/INPUT new file mode 100644 index 00000000000..3b28a05ff45 --- /dev/null +++ b/tests/17_DS_DFTU/52_PW_DFTU_SO/INPUT @@ -0,0 +1,47 @@ +INPUT_PARAMETERS +suffix autotest +nbands 40 + +calculation scf +ecutwfc 10 +scf_thr 1.0e-4 +scf_nmax 50 +out_chg 0 + +#init_chg file +#out_dos 1 +#dos_sigma 0.05 +#out_band 1 + +smearing_method gaussian +smearing_sigma 0.01 + +#force_thr_ev 0.01 +#relax_method cg +#relax_bfgs_init 0.5 + +mixing_type pulay +mixing_beta 0.3 +mixing_restart 1e-3 +mixing_dmr 1 +mixing_gg0 1.1 + +ks_solver dav_subspace +diago_smooth_ethr true +pw_diag_ndim 2 +basis_type pw +gamma_only 0 +noncolin 1 +lspinorb 1 +cal_force 1 +cal_stress 1 + +#Parameter DFT+U +dft_plus_u 1 +orbital_corr 2 +hubbard_u 5.0 +onsite_radius 3.0 +pseudo_dir ../../PP_ORB +orbital_dir ../../PP_ORB + +pw_seed 1 diff --git a/tests/17_DS_DFTU/52_PW_DFTU_SO/KPT b/tests/17_DS_DFTU/52_PW_DFTU_SO/KPT new file mode 100644 index 00000000000..e769af76382 --- /dev/null +++ b/tests/17_DS_DFTU/52_PW_DFTU_SO/KPT @@ -0,0 +1,4 @@ +K_POINTS +0 +Gamma +2 1 1 0 0 0 diff --git a/tests/17_DS_DFTU/52_PW_DFTU_SO/STRU b/tests/17_DS_DFTU/52_PW_DFTU_SO/STRU new file mode 100644 index 00000000000..91021e0a697 --- /dev/null +++ b/tests/17_DS_DFTU/52_PW_DFTU_SO/STRU @@ -0,0 +1,22 @@ +ATOMIC_SPECIES +Fe 1.000 Fe.upf + +NUMERICAL_ORBITAL +Fe_gga_6au_100Ry_4s2p2d1f.orb + +LATTICE_CONSTANT +8.190 + +LATTICE_VECTORS + 1.00 0.50 0.50 + 0.50 1.00 0.50 + 0.50 0.50 1.00 +ATOMIC_POSITIONS +Direct + +Fe +0.0 +2 +0.00 0.00 0.00 mag 1.0 1.0 1.0 +0.51 0.51 0.51 mag 1.0 1.0 1.0 + diff --git a/tests/17_DS_DFTU/52_PW_DFTU_SO/result.ref b/tests/17_DS_DFTU/52_PW_DFTU_SO/result.ref new file mode 100644 index 00000000000..664b01025bf --- /dev/null +++ b/tests/17_DS_DFTU/52_PW_DFTU_SO/result.ref @@ -0,0 +1 @@ +etotref -5662.3908859906650832 diff --git a/tests/17_DS_DFTU/CASES_CPU.txt b/tests/17_DS_DFTU/CASES_CPU.txt new file mode 100644 index 00000000000..16eb044be99 --- /dev/null +++ b/tests/17_DS_DFTU/CASES_CPU.txt @@ -0,0 +1,36 @@ +06_PW_SPIN_S2_Z +07_PW_SPIN_S4_XYZ +08_PW_DFTU_S2_Z +09_PW_DFTU_S4_XY +11_PW_DFTU_S2_FeO +12_PW_DS_S2_Z +13_PW_DS_S4_XY +14_PW_DS_S4_XYZ +15_PW_DS_S4_Z +16_PW_DS_S4_XY +17_PW_DS_S4_XYZ +18_PW_DFTU_DS_S2_Z +19_PW_DFTU_DS_S4_XY +20_PW_DFTU_DS_S4_XYZ +21_PW_DFTU_DS_S4_Z +22_PW_DFTU_DS_S4_XY +23_PW_DFTU_DS_S4_XYZ +25_LCAO_DS_S4_XY +26_LCAO_DS_S4_XYZ +27_LCAO_DS_S4_Z +28_LCAO_DS_S4_XY +29_LCAO_DS_S4_XYZ +31_LCAO_DFTU_DS_S4_XY +32_LCAO_DFTU_DS_S4_XYZ +33_LCAO_DFTU_DS_S4_Z +34_LCAO_DFTU_DS_S4_XY +35_LCAO_DFTU_DS_S4_XYZ +36_PW_DS_S2_ReadLam_Z +37_PW_DS_S4_ReadLam_XY +38_PW_DS_S2_Thr1e10_Z +39_PW_DS_S4_Thr1e10_XY +41_PW_DS_S4_Thr10_XY +43_PW_DFTU_DS_S4_Thr1e10_XY +45_PW_DFTU_DS_S4_Thr10_XY +50_FeO_O_first_Fe_second +51_FeO_Fe_first_O_second diff --git a/tests/17_DS_DFTU/CMakeLists.txt b/tests/17_DS_DFTU/CMakeLists.txt new file mode 100644 index 00000000000..7c78260e772 --- /dev/null +++ b/tests/17_DS_DFTU/CMakeLists.txt @@ -0,0 +1,16 @@ +enable_testing() + +find_program(BASH bash) +if(ENABLE_ASAN) + add_test( + NAME 17_DS_DFTU_test_with_asan + COMMAND ${BASH} ../integrate/Autotest.sh -a ${ABACUS_BIN_PATH} -n 2 -s true + WORKING_DIRECTORY ${ABACUS_TEST_DIR}/17_DS_DFTU + ) +else() + add_test( + NAME 17_DS_DFTU + COMMAND ${BASH} ../integrate/Autotest.sh -a ${ABACUS_BIN_PATH} -n 4 + WORKING_DIRECTORY ${ABACUS_TEST_DIR}/17_DS_DFTU + ) +endif() diff --git a/tests/17_DS_DFTU/README.md b/tests/17_DS_DFTU/README.md new file mode 100644 index 00000000000..6f31996a9eb --- /dev/null +++ b/tests/17_DS_DFTU/README.md @@ -0,0 +1,143 @@ +# 17_DS_DFTU — DeltaSpin & DFT+U 集成测试集 + +本目录包含 ABACUS 中 **DeltaSpin(自旋约束 DFT)** 和 **DFT+U** 功能的全部集成测试用例, +涵盖 LCAO 和 PW 基组、共线/非共线自旋、DFT+U、DeltaSpin 及其组合。 + +## 测试清单 (52 例) + +### 一、LCAO Spin (01-02) + +| # | 算例 | 说明 | +|---|------|------| +| 01 | LCAO_SPIN_S2_Z | 验证 LCAO 基组下共线自旋的基础 SCF 收敛性,作为 LCAO 磁性计算的基准对照 | +| 02 | LCAO_SPIN_S4_XYZ | 验证 LCAO 基组下非共线自旋的基础 SCF 收敛性,覆盖 LCAO 非共线计算路径 | + +### 二、LCAO DFT+U (03-05) + +| # | 算例 | 说明 | +|---|------|------| +| 03 | LCAO_DFTU_S2_Z | 验证 LCAO 基组下 DFT+U (U=5.0eV, l=2) 与共线自旋的耦合,确保 LCAO 路径的 DFT+U 占据矩阵计算正确 | +| 04 | LCAO_DFTU_S4_XY | 验证 LCAO 基组下 DFT+U 与非共线自旋 (XY 磁矩) 的耦合,覆盖 LCAO 路径中 nspin=4 的占据矩阵计算 | +| 05 | LCAO_DFTU_S4_XYZ | 验证 LCAO 基组下 DFT+U 与非共线自旋 (XYZ 磁矩) 的耦合,覆盖 LCAO 路径的最完整占据矩阵场景 | + +### 三、PW Spin (06-07) + +| # | 算例 | 说明 | +|---|------|------| +| 06 | PW_SPIN_S2_Z | 验证 PW 基组下共线自旋的基础 SCF 收敛性,作为 PW 磁性计算的基准对照 | +| 07 | PW_SPIN_S4_XYZ | 验证 PW 基组下非共线自旋的基础 SCF 收敛性,覆盖 PW 非共线计算路径 | + +### 四、PW DFT+U (08-11) + +| # | 算例 | 说明 | +|---|------|------| +| 08 | PW_DFTU_S2_Z | 验证 PW 基组下 DFT+U (U=5.0eV, l=2) 与共线自旋的耦合,确保 PW 路径的 DFT+U 有效势计算正确 | +| 09 | PW_DFTU_S4_XY | 验证 PW 基组下 DFT+U 与非共线自旋 (XY 磁矩) 的耦合,覆盖 PW 路径中 nspin=4 的 onsite 投影矩阵 | +| 10 | PW_DFTU_S4_XY | 与 09 相同参数但不同晶体结构,验证 PW DFT+U 非共线在不同晶格下的泛化能力 | +| 11 | PW_DFTU_S2_FeO | 验证 PW 基组下 DFT+U 在 FeO 体系上的正确性,确保 Fe-3d 轨道的 DFT+U 修正有效 | + +### 五、PW DeltaSpin (12-17) + +| # | 算例 | 说明 | +|---|------|------| +| 12 | PW_DS_S2_Z | 验证 PW 基组下 DeltaSpin 与共线自旋的耦合,确保 DeltaSpin 迭代优化磁矩到目标值的正确性 | +| 13 | PW_DS_S4_XY | 验证非共线 DeltaSpin 在 XY 磁矩约束下的迭代优化,覆盖 nspin=4 路径的 lambda 更新 | +| 14 | PW_DS_S4_XYZ | 验证非共线 DeltaSpin 在 XYZ 三方向磁矩约束下的迭代优化,覆盖最完整的自旋约束场景 | +| 15 | PW_DS_S4_Z | 验证非共线 DeltaSpin 仅约束 Z 方向磁矩时的行为,确保 noncolin=1 框架下单轴约束不引入非物理 XY 分量 | +| 16 | PW_DS_S4_XY | 与 13 相同参数但不同晶体结构,验证非共线 DeltaSpin XY 约束在不同晶格下的泛化能力 | +| 17 | PW_DS_S4_XYZ | 与 14 相同参数但不同晶体结构,验证非共线 DeltaSpin XYZ 约束在不同晶格下的泛化能力 | + +### 六、PW DFT+U + DeltaSpin (18-23) + +| # | 算例 | 说明 | +|---|------|------| +| 18 | PW_DFTU_DS_S2_Z | 验证 PW 基组下 DFT+U 与 DeltaSpin 联合 (共线自旋) 的耦合,确保 U 修正与磁矩约束不冲突 | +| 19 | PW_DFTU_DS_S4_XY | 验证非共线 DFT+U+DeltaSpin 联合在 XY 磁矩约束下的耦合,覆盖两种方法在 nspin=4 路径的联合迭代 | +| 20 | PW_DFTU_DS_S4_XYZ | 验证非共线 DFT+U+DeltaSpin 联合在 XYZ 三方向磁矩约束下的耦合,覆盖最完整的联合约束场景 | +| 21 | PW_DFTU_DS_S4_Z | 验证非共线 DFT+U+DeltaSpin 联合仅约束 Z 方向磁矩时的行为,确保单轴约束与 DFT+U 有效势的正确叠加 | +| 22 | PW_DFTU_DS_S4_XY | 与 19 相同参数但不同晶体结构,验证非共线 DFT+U+DeltaSpin 联合在不同晶格下的泛化能力 | +| 23 | PW_DFTU_DS_S4_XYZ | 与 20 相同参数但不同晶体结构,验证非共线 DFT+U+DeltaSpin 联合 XYZ 约束在不同晶格下的泛化能力 | + +### 七、LCAO DeltaSpin (24-29) + +| # | 算例 | 说明 | +|---|------|------| +| 24 | LCAO_DS_S2_Z | 验证 LCAO 基组下 DeltaSpin 与共线自旋的耦合,确保 LCAO 密度矩阵路径的自旋约束优化正确 | +| 25 | LCAO_DS_S4_XY | 验证 LCAO 基组下非共线 DeltaSpin 在 XY 磁矩约束下的迭代优化,覆盖 LCAO 路径中 nspin=4 的磁矩投影 | +| 26 | LCAO_DS_S4_XYZ | 验证 LCAO 基组下非共线 DeltaSpin 在 XYZ 三方向磁矩约束下的迭代优化,覆盖 LCAO 路径的最完整约束场景 | +| 27 | LCAO_DS_S4_Z | 验证 LCAO 基组下非共线 DeltaSpin 仅约束 Z 方向磁矩时的行为,确保 noncolin=1 框架下单轴约束的正确性 | +| 28 | LCAO_DS_S4_XY | 与 25 相同参数但不同晶体结构,验证 LCAO 非共线 DeltaSpin XY 约束在不同晶格下的泛化能力 | +| 29 | LCAO_DS_S4_XYZ | 与 26 相同参数但不同晶体结构,验证 LCAO 非共线 DeltaSpin XYZ 约束在不同晶格下的泛化能力 | + +### 八、LCAO DFT+U + DeltaSpin (30-35) + +| # | 算例 | 说明 | +|---|------|------| +| 30 | LCAO_DFTU_DS_S2_Z | 验证 LCAO 基组下 DFT+U 与 DeltaSpin 联合 (共线自旋) 的耦合,确保密度矩阵路径的 U 修正与磁矩约束不冲突 | +| 31 | LCAO_DFTU_DS_S4_XY | 验证 LCAO 基组下非共线 DFT+U+DeltaSpin 联合在 XY 磁矩约束下的耦合,覆盖 LCAO 密度矩阵路径的联合约束 | +| 32 | LCAO_DFTU_DS_S4_XYZ | 验证 LCAO 基组下非共线 DFT+U+DeltaSpin 联合在 XYZ 三方向磁矩约束下的耦合,覆盖 LCAO 路径的最完整联合场景 | +| 33 | LCAO_DFTU_DS_S4_Z | 验证 LCAO 基组下非共线 DFT+U+DeltaSpin 联合仅约束 Z 方向磁矩时的行为,确保单轴约束与 DFT+U 密度矩阵的正确叠加 | +| 34 | LCAO_DFTU_DS_S4_XY | 与 31 相同参数但不同晶体结构,验证 LCAO DFT+U+DeltaSpin 联合在不同晶格下的泛化能力 | +| 35 | LCAO_DFTU_DS_S4_XYZ | 与 32 相同参数但不同晶体结构,验证 LCAO DFT+U+DeltaSpin 联合 XYZ 约束在不同晶格下的泛化能力 | + +### 九、PW DeltaSpin 特殊参数 (36-41) + +| # | 算例 | 说明 | +|---|------|------| +| 36 | PW_DS_S2_ReadLam_Z | 验证 `nsc=1` 模式 (直接读取 lambda 文件不迭代优化) 的正确性,确保 DeltaSpin 在非自洽 lambda 模式下仍能正确计算磁矩 | +| 37 | PW_DS_S4_ReadLam_XY | 验证非共线 DeltaSpin 的 `nsc=1` 模式,覆盖 XY 磁矩约束下的非自洽 lambda 路径 | +| 38 | PW_DS_S2_Thr1e10_Z | 验证 DeltaSpin 在极严收敛阈值 (sc_scf_thr=1e-10) 下的稳定性,确保迭代优化能收敛到高精度解 | +| 39 | PW_DS_S4_Thr1e10_XY | 验证非共线 DeltaSpin 在极严收敛阈值 (sc_scf_thr=1e-10) 下的稳定性,覆盖 XY 磁矩约束场景 | +| 40 | PW_DS_S2_Thr10_Z | 验证 DeltaSpin 在极松收敛阈值 (sc_scf_thr=10) 下的行为,测试算法在低精度要求下的鲁棒性和 out_alllog 日志输出 | +| 41 | PW_DS_S4_Thr10_XY | 验证非共线 DeltaSpin 在极松收敛阈值 (sc_scf_thr=10) 下的行为,覆盖 XY 磁矩约束的低精度场景 | + +### 十、PW DFT+U + DeltaSpin 特殊参数 (42-45) + +| # | 算例 | 说明 | +|---|------|------| +| 42 | PW_DFTU_DS_S2_Thr1e10_Z | 验证 DFT+U 与 DeltaSpin 联合在极严收敛阈值 (sc_scf_thr=1e-10) 下的迭代稳定性,确保两种方法耦合时的收敛性 | +| 43 | PW_DFTU_DS_S4_Thr1e10_XY | 验证非共线 DFT+U+DeltaSpin 在极严收敛阈值 (sc_scf_thr=1e-10) 下的耦合稳定性,覆盖 XY 磁矩约束 | +| 44 | PW_DFTU_DS_S2_Thr10_Z | 验证 DFT+U 与 DeltaSpin 联合在极松收敛阈值 (sc_scf_thr=10) 下的行为,测试耦合算法在低精度要求下的鲁棒性 | +| 45 | PW_DFTU_DS_S4_Thr10_XY | 验证非共线 DFT+U+DeltaSpin 在极松收敛阈值 (sc_scf_thr=10) 下的行为,覆盖 XY 磁矩约束的低精度场景 | + +### 十一、Relax 结构优化 (46-49) + +| # | 算例 | 说明 | +|---|------|------| +| 46 | PW_DS_S2_Thr1e10_Z_bfgs | 验证 DeltaSpin 使用 BFGS 策略 (sc_lambda_strategy=bfgs) 的收敛行为,测试 BFGS 优化器在自旋约束 SCF 中的正确性 | +| 47 | PW_DS_S4_Thr1e10_XY_bfgs | 验证非共线 DeltaSpin 使用 BFGS 策略的收敛行为,覆盖 XY 磁矩约束下 BFGS 优化器的正确性 | +| 48 | PW_DFTU_DS_S2_Thr10_Z_bfgs | 验证 DFT+U 与 DeltaSpin 联合使用 BFGS 策略的收敛行为,测试 BFGS 在 DFT+U+DS 耦合场景中的正确性 | +| 49 | PW_DFTU_DS_S4_Thr10_XY_bfgs | 验证非共线 DFT+U+DeltaSpin 联合使用 BFGS 策略的收敛行为,覆盖 XY 磁矩约束下 BFGS 优化器的正确性 | + +### 十二、FeO 原子顺序 (50-51) + +| # | 算例 | 说明 | +|---|------|------| +| 50 | FeO_O_first_Fe_second | 验证 FeO 体系中 O 原子类型在前、Fe 在后的排序下 DFT+U 的正确性,确保原子类型顺序不影响 DFT+U 的 onsite 投影 | +| 51 | FeO_Fe_first_O_second | 验证 FeO 体系中 Fe 原子类型在前、O 在后的排序下 DFT+U 的正确性,与 50 对比确保 eff_pot_pw_index 索引计算与原子类型顺序无关 | + +### 十三、SOC + DFT+U (52) + +| # | 算例 | 说明 | +|---|------|------| +| 52 | PW_DFTU_SO | 验证 DFT+U 与自旋轨道耦合 (SOC) 同时开启时的兼容性,确保 DFT+U 的 onsite 投影与 SOC 的自旋混合正确耦合 | + +## 运行方式 + +```bash +# 运行全部测试 +cd tests/17_DS_DFTU +bash ../integrate/Autotest.sh -a -n 4 + +# 运行单个测试 +cd 08_PW_DFTU_S2_Z +bash ../../integrate/run_debug.sh "" +``` + +## 已知问题 + +- 19-23: PW DFT+U + DeltaSpin + 非共线 → port 和 zdy-tmp 均崩溃(上游 bug) + +## 测试条件说明 + +- 09/10 (PW DFT+U + 非共线): 仅支持 **2 进程 MPI** 运行,已提供 `result.ref` 参考文件 diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 83f1f326297..c30d0b77474 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -9,6 +9,7 @@ add_subdirectory(07_OFDFT) add_subdirectory(08_EXX) add_subdirectory(10_others) add_subdirectory(11_PW_GPU) +add_subdirectory(17_DS_DFTU) if(ENABLE_MLALGO) add_subdirectory(09_DeePKS) diff --git a/tests/PP_ORB/O.upf b/tests/PP_ORB/O.upf new file mode 100644 index 00000000000..7e7db6d66f6 --- /dev/null +++ b/tests/PP_ORB/O.upf @@ -0,0 +1,1224 @@ + + + + This pseudopotential file has been produced using the code + ONCVPSP (Optimized Norm-Conservinng Vanderbilt PSeudopotential) + scalar-relativistic version 2.1.1, 03/26/2014 by D. R. Hamann + The code is available through a link at URL www.mat-simresearch.com. + Documentation with the package provides a full discription of the + input data below. + + + While it is not required under the terms of the GNU GPL, it is + suggested that you cite D. R. Hamann, Phys. Rev. B 88, 085117 (2013) + in any publication using these pseudopotentials. + + + Copyright 2015 The Regents of the University of California + + This work is licensed under the Creative Commons Attribution-ShareAlike + 4.0 International License. To view a copy of this license, visit + http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to + Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. + + This pseudopotential is part of the Schlipf-Gygi norm-conserving + pseudopotential library. Its construction parameters were tuned to + reproduce materials of a training set with very high accuracy and + should be suitable as a general purpose pseudopotential to treat a + variety of different compounds. For details of the construction and + testing of the pseudopotential please refer to: + + [insert reference to paper here] + + We kindly ask that you include this reference in all publications + associated to this pseudopotential. + + + +# ATOM AND REFERENCE CONFIGURATION +# atsym z nc nv iexc psfile + O 8.00 1 2 4 upf +# +# n l f energy (Ha) + 1 0 2.00 + 2 0 2.00 + 2 1 4.00 +# +# PSEUDOPOTENTIAL AND OPTIMIZATION +# lmax + 1 +# +# l, rc, ep, ncon, nbas, qcut + 0 1.29195 -0.88057 5 8 8.98916 + 1 1.47310 -0.33187 5 8 9.14990 +# +# LOCAL POTENTIAL +# lloc, lpopt, rc(5), dvloc0 + 4 5 0.90330 0.00000 +# +# VANDERBILT-KLEINMAN-BYLANDER PROJECTORs +# l, nproj, debl + 0 2 1.51851 + 1 2 1.53631 +# +# MODEL CORE CHARGE +# icmod, fcfact + 0 0.00000 +# +# LOG DERIVATIVE ANALYSIS +# epsh1, epsh2, depsh + -5.00 3.00 0.02 +# +# OUTPUT GRID +# rlmax, drl + 6.00 0.01 +# +# TEST CONFIGURATIONS +# ncnf + 0 +# nvcnf +# n l f + + + + + + + + + 0.0000 0.0100 0.0200 0.0300 0.0400 0.0500 0.0600 0.0700 + 0.0800 0.0900 0.1000 0.1100 0.1200 0.1300 0.1400 0.1500 + 0.1600 0.1700 0.1800 0.1900 0.2000 0.2100 0.2200 0.2300 + 0.2400 0.2500 0.2600 0.2700 0.2800 0.2900 0.3000 0.3100 + 0.3200 0.3300 0.3400 0.3500 0.3600 0.3700 0.3800 0.3900 + 0.4000 0.4100 0.4200 0.4300 0.4400 0.4500 0.4600 0.4700 + 0.4800 0.4900 0.5000 0.5100 0.5200 0.5300 0.5400 0.5500 + 0.5600 0.5700 0.5800 0.5900 0.6000 0.6100 0.6200 0.6300 + 0.6400 0.6500 0.6600 0.6700 0.6800 0.6900 0.7000 0.7100 + 0.7200 0.7300 0.7400 0.7500 0.7600 0.7700 0.7800 0.7900 + 0.8000 0.8100 0.8200 0.8300 0.8400 0.8500 0.8600 0.8700 + 0.8800 0.8900 0.9000 0.9100 0.9200 0.9300 0.9400 0.9500 + 0.9600 0.9700 0.9800 0.9900 1.0000 1.0100 1.0200 1.0300 + 1.0400 1.0500 1.0600 1.0700 1.0800 1.0900 1.1000 1.1100 + 1.1200 1.1300 1.1400 1.1500 1.1600 1.1700 1.1800 1.1900 + 1.2000 1.2100 1.2200 1.2300 1.2400 1.2500 1.2600 1.2700 + 1.2800 1.2900 1.3000 1.3100 1.3200 1.3300 1.3400 1.3500 + 1.3600 1.3700 1.3800 1.3900 1.4000 1.4100 1.4200 1.4300 + 1.4400 1.4500 1.4600 1.4700 1.4800 1.4900 1.5000 1.5100 + 1.5200 1.5300 1.5400 1.5500 1.5600 1.5700 1.5800 1.5900 + 1.6000 1.6100 1.6200 1.6300 1.6400 1.6500 1.6600 1.6700 + 1.6800 1.6900 1.7000 1.7100 1.7200 1.7300 1.7400 1.7500 + 1.7600 1.7700 1.7800 1.7900 1.8000 1.8100 1.8200 1.8300 + 1.8400 1.8500 1.8600 1.8700 1.8800 1.8900 1.9000 1.9100 + 1.9200 1.9300 1.9400 1.9500 1.9600 1.9700 1.9800 1.9900 + 2.0000 2.0100 2.0200 2.0300 2.0400 2.0500 2.0600 2.0700 + 2.0800 2.0900 2.1000 2.1100 2.1200 2.1300 2.1400 2.1500 + 2.1600 2.1700 2.1800 2.1900 2.2000 2.2100 2.2200 2.2300 + 2.2400 2.2500 2.2600 2.2700 2.2800 2.2900 2.3000 2.3100 + 2.3200 2.3300 2.3400 2.3500 2.3600 2.3700 2.3800 2.3900 + 2.4000 2.4100 2.4200 2.4300 2.4400 2.4500 2.4600 2.4700 + 2.4800 2.4900 2.5000 2.5100 2.5200 2.5300 2.5400 2.5500 + 2.5600 2.5700 2.5800 2.5900 2.6000 2.6100 2.6200 2.6300 + 2.6400 2.6500 2.6600 2.6700 2.6800 2.6900 2.7000 2.7100 + 2.7200 2.7300 2.7400 2.7500 2.7600 2.7700 2.7800 2.7900 + 2.8000 2.8100 2.8200 2.8300 2.8400 2.8500 2.8600 2.8700 + 2.8800 2.8900 2.9000 2.9100 2.9200 2.9300 2.9400 2.9500 + 2.9600 2.9700 2.9800 2.9900 3.0000 3.0100 3.0200 3.0300 + 3.0400 3.0500 3.0600 3.0700 3.0800 3.0900 3.1000 3.1100 + 3.1200 3.1300 3.1400 3.1500 3.1600 3.1700 3.1800 3.1900 + 3.2000 3.2100 3.2200 3.2300 3.2400 3.2500 3.2600 3.2700 + 3.2800 3.2900 3.3000 3.3100 3.3200 3.3300 3.3400 3.3500 + 3.3600 3.3700 3.3800 3.3900 3.4000 3.4100 3.4200 3.4300 + 3.4400 3.4500 3.4600 3.4700 3.4800 3.4900 3.5000 3.5100 + 3.5200 3.5300 3.5400 3.5500 3.5600 3.5700 3.5800 3.5900 + 3.6000 3.6100 3.6200 3.6300 3.6400 3.6500 3.6600 3.6700 + 3.6800 3.6900 3.7000 3.7100 3.7200 3.7300 3.7400 3.7500 + 3.7600 3.7700 3.7800 3.7900 3.8000 3.8100 3.8200 3.8300 + 3.8400 3.8500 3.8600 3.8700 3.8800 3.8900 3.9000 3.9100 + 3.9200 3.9300 3.9400 3.9500 3.9600 3.9700 3.9800 3.9900 + 4.0000 4.0100 4.0200 4.0300 4.0400 4.0500 4.0600 4.0700 + 4.0800 4.0900 4.1000 4.1100 4.1200 4.1300 4.1400 4.1500 + 4.1600 4.1700 4.1800 4.1900 4.2000 4.2100 4.2200 4.2300 + 4.2400 4.2500 4.2600 4.2700 4.2800 4.2900 4.3000 4.3100 + 4.3200 4.3300 4.3400 4.3500 4.3600 4.3700 4.3800 4.3900 + 4.4000 4.4100 4.4200 4.4300 4.4400 4.4500 4.4600 4.4700 + 4.4800 4.4900 4.5000 4.5100 4.5200 4.5300 4.5400 4.5500 + 4.5600 4.5700 4.5800 4.5900 4.6000 4.6100 4.6200 4.6300 + 4.6400 4.6500 4.6600 4.6700 4.6800 4.6900 4.7000 4.7100 + 4.7200 4.7300 4.7400 4.7500 4.7600 4.7700 4.7800 4.7900 + 4.8000 4.8100 4.8200 4.8300 4.8400 4.8500 4.8600 4.8700 + 4.8800 4.8900 4.9000 4.9100 4.9200 4.9300 4.9400 4.9500 + 4.9600 4.9700 4.9800 4.9900 5.0000 5.0100 5.0200 5.0300 + 5.0400 5.0500 5.0600 5.0700 5.0800 5.0900 5.1000 5.1100 + 5.1200 5.1300 5.1400 5.1500 5.1600 5.1700 5.1800 5.1900 + 5.2000 5.2100 5.2200 5.2300 5.2400 5.2500 5.2600 5.2700 + 5.2800 5.2900 5.3000 5.3100 5.3200 5.3300 5.3400 5.3500 + 5.3600 5.3700 5.3800 5.3900 5.4000 5.4100 5.4200 5.4300 + 5.4400 5.4500 5.4600 5.4700 5.4800 5.4900 5.5000 5.5100 + 5.5200 5.5300 5.5400 5.5500 5.5600 5.5700 5.5800 5.5900 + 5.6000 5.6100 5.6200 5.6300 5.6400 5.6500 5.6600 5.6700 + 5.6800 5.6900 5.7000 5.7100 5.7200 5.7300 5.7400 5.7500 + 5.7600 5.7700 5.7800 5.7900 5.8000 5.8100 5.8200 5.8300 + 5.8400 5.8500 5.8600 5.8700 5.8800 5.8900 5.9000 5.9100 + 5.9200 5.9300 5.9400 5.9500 5.9600 5.9700 5.9800 5.9900 + 6.0000 6.0100 + + + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100 + 0.0100 0.0100 + + + + -2.7605700345E+01 -3.0784865229E+01 -3.2349253618E+01 -3.2751366129E+01 + -3.2443703381E+01 -3.1938024293E+01 -3.1464309182E+01 -3.1081089156E+01 + -3.0780110534E+01 -3.0537539935E+01 -3.0331510013E+01 -3.0145737072E+01 + -2.9968950315E+01 -2.9793612997E+01 -2.9614858119E+01 -2.9429745358E+01 + -2.9236754295E+01 -2.9035419392E+01 -2.8826044018E+01 -2.8609460457E+01 + -2.8386823980E+01 -2.8159440471E+01 -2.7928630453E+01 -2.7695631454E+01 + -2.7461537210E+01 -2.7227268680E+01 -2.6993569183E+01 -2.6761015821E+01 + -2.6530039202E+01 -2.6300945472E+01 -2.6073936437E+01 -2.5849125236E+01 + -2.5626546546E+01 -2.5406161554E+01 -2.5187858066E+01 -2.4971448643E+01 + -2.4756666888E+01 -2.4543166280E+01 -2.4330521781E+01 -2.4118237688E+01 + -2.3905762462E+01 -2.3692510781E+01 -2.3477892157E+01 -2.3261343931E+01 + -2.3042365424E+01 -2.2820549497E+01 -2.2595607833E+01 -2.2367387034E+01 + -2.2135873927E+01 -2.1901189722E+01 -2.1663577130E+01 -2.1423377048E+01 + -2.1181003715E+01 -2.0936919717E+01 -2.0691610496E+01 -2.0445564221E+01 + -2.0199254652E+01 -1.9953130261E+01 -1.9707605286E+01 -1.9463057569E+01 + -1.9219826157E+01 -1.8978213343E+01 -1.8738487400E+01 -1.8500884660E+01 + -1.8265614596E+01 -1.8032862583E+01 -1.7802792920E+01 -1.7575552408E+01 + -1.7351272692E+01 -1.7130071760E+01 -1.6912056036E+01 -1.6697321537E+01 + -1.6485954716E+01 -1.6278033179E+01 -1.6073625719E+01 -1.5872792819E+01 + -1.5675586595E+01 -1.5482050684E+01 -1.5292220051E+01 -1.5106120725E+01 + -1.4923769479E+01 -1.4745173477E+01 -1.4570329881E+01 -1.4399225454E+01 + -1.4231836134E+01 -1.4068126609E+01 -1.3908049886E+01 -1.3751546850E+01 + -1.3598545829E+01 -1.3448962145E+01 -1.3302697611E+01 -1.3159640031E+01 + -1.3019669910E+01 -1.2882669787E+01 -1.2748527869E+01 -1.2617137724E+01 + -1.2488398068E+01 -1.2362212574E+01 -1.2238489692E+01 -1.2117142884E+01 + -1.1998090066E+01 -1.1881253628E+01 -1.1766560302E+01 -1.1653941032E+01 + -1.1543330569E+01 -1.1434668196E+01 -1.1327896756E+01 -1.1222962743E+01 + -1.1119816109E+01 -1.1018409929E+01 -1.0918701041E+01 -1.0820648952E+01 + -1.0724215912E+01 -1.0629366457E+01 -1.0536068056E+01 -1.0444290028E+01 + -1.0354003461E+01 -1.0265180964E+01 -1.0177796938E+01 -1.0091826632E+01 + -1.0007246112E+01 -9.9240322892E+00 -9.8421624187E+00 -9.7616139299E+00 + -9.6823643703E+00 -9.6043910435E+00 -9.5276710191E+00 -9.4521810563E+00 + -9.3778971539E+00 -9.3047950086E+00 -9.2328496504E+00 -9.1620363695E+00 + -9.0923305110E+00 -9.0237077361E+00 -8.9561431451E+00 -8.8896116158E+00 + -8.8240890211E+00 -8.7595512275E+00 -8.6959740871E+00 -8.6333343002E+00 + -8.5716086751E+00 -8.5107748378E+00 -8.4508110921E+00 -8.3916960209E+00 + -8.3334097570E+00 -8.2759323779E+00 -8.2192454990E+00 -8.1633312824E+00 + -8.1081727717E+00 -8.0537547331E+00 -8.0000623436E+00 -7.9470813275E+00 + -7.8947972321E+00 -7.8431966714E+00 -7.7922661900E+00 -7.7419928953E+00 + -7.6923641381E+00 -7.6433675558E+00 -7.5949912477E+00 -7.5472233618E+00 + -7.5000526824E+00 -7.4534678545E+00 -7.4074582876E+00 -7.3620131113E+00 + -7.3171223055E+00 -7.2727754717E+00 -7.2289631023E+00 -7.1856752817E+00 + -7.1429029518E+00 -7.1006366576E+00 -7.0588677730E+00 -7.0175872653E+00 + -6.9767869231E+00 -6.9364581153E+00 -6.8965930059E+00 -6.8571833718E+00 + -6.8182217002E+00 -6.7797001743E+00 -6.7416115702E+00 -6.7039484718E+00 + -6.6667039108E+00 -6.6298708673E+00 -6.5934425974E+00 -6.5574124728E+00 + -6.5217739457E+00 -6.4865207734E+00 -6.4516465808E+00 -6.4171455001E+00 + -6.3830113137E+00 -6.3492384567E+00 -6.3158210267E+00 -6.2827535725E+00 + -6.2500305645E+00 -6.2176466374E+00 -6.1855966333E+00 -6.1538752555E+00 + -6.1224776754E+00 -6.0913987519E+00 -6.0606338291E+00 -6.0301780858E+00 + -6.0000268845E+00 -5.9701757684E+00 -5.9406201055E+00 -5.9113557302E+00 + -5.8823781953E+00 -5.8536833983E+00 -5.8252672254E+00 -5.7971255362E+00 + -5.7692545325E+00 -5.7416501530E+00 -5.7143087187E+00 -5.6872264675E+00 + -5.6603996483E+00 -5.6338248102E+00 -5.6074982583E+00 -5.5814166408E+00 + -5.5555765569E+00 -5.5299745536E+00 -5.5046075112E+00 -5.4794720848E+00 + -5.4545651655E+00 -5.4298837212E+00 -5.4054245210E+00 -5.3811847580E+00 + -5.3571614267E+00 -5.3333515869E+00 -5.3097525260E+00 -5.2863613325E+00 + -5.2631753334E+00 -5.2401918998E+00 -5.2174082208E+00 -5.1948218607E+00 + -5.1724302389E+00 -5.1502307471E+00 -5.1282210592E+00 -5.1063986772E+00 + -5.0847611945E+00 -5.0633063698E+00 -5.0420317959E+00 -5.0209352464E+00 + -5.0000145619E+00 -4.9792674276E+00 -4.9586917756E+00 -4.9382855278E+00 + -4.9180464624E+00 -4.8979726494E+00 -4.8780620924E+00 -4.8583126624E+00 + -4.8387225484E+00 -4.8192898345E+00 -4.8000124892E+00 -4.7808887980E+00 + -4.7619169283E+00 -4.7430949440E+00 -4.7244212124E+00 -4.7058939893E+00 + -4.6875114188E+00 -4.6692719493E+00 -4.6511739183E+00 -4.6332155580E+00 + -4.6153953755E+00 -4.5977117925E+00 -4.5801631296E+00 -4.5627479386E+00 + -4.5454647266E+00 -4.5283119032E+00 -4.5112880519E+00 -4.4943917650E+00 + -4.4776215414E+00 -4.4609759844E+00 -4.4444537713E+00 -4.4280534916E+00 + -4.4117737574E+00 -4.3956133298E+00 -4.3795708900E+00 -4.3636450498E+00 + -4.3478346519E+00 -4.3321384475E+00 -4.3165551080E+00 -4.3010834883E+00 + -4.2857224193E+00 -4.2704706626E+00 -4.2553270471E+00 -4.2402904912E+00 + -4.2253598492E+00 -4.2105339172E+00 -4.1958116967E+00 -4.1811921057E+00 + -4.1666739956E+00 -4.1522563551E+00 -4.1379381819E+00 -4.1237184204E+00 + -4.1095960051E+00 -4.0955700209E+00 -4.0816394841E+00 -4.0678033425E+00 + -4.0540606926E+00 -4.0404106187E+00 -4.0268521620E+00 -4.0133843475E+00 + -4.0000063484E+00 -3.9867172703E+00 -3.9735161622E+00 -3.9604021828E+00 + -3.9473745101E+00 -3.9344322882E+00 -3.9215745909E+00 -3.9088006833E+00 + -3.8961097526E+00 -3.8835009538E+00 -3.8709734686E+00 -3.8585265734E+00 + -3.8461594934E+00 -3.8338713986E+00 -3.8216615661E+00 -3.8095292837E+00 + -3.7974738126E+00 -3.7854943426E+00 -3.7735902317E+00 -3.7617607813E+00 + -3.7500052762E+00 -3.7383229617E+00 -3.7267132311E+00 -3.7151754129E+00 + -3.7037088125E+00 -3.6923127373E+00 -3.6809865997E+00 -3.6697297595E+00 + -3.6585415452E+00 -3.6474213158E+00 -3.6363685045E+00 -3.6253825007E+00 + -3.6144626587E+00 -3.6036083788E+00 -3.5928191167E+00 -3.5820942902E+00 + -3.5714332815E+00 -3.5608355222E+00 -3.5503004929E+00 -3.5398276385E+00 + -3.5294163709E+00 -3.5190661440E+00 -3.5087764652E+00 -3.4985468053E+00 + -3.4883766076E+00 -3.4782653400E+00 -3.4682125383E+00 -3.4582176981E+00 + -3.4482802961E+00 -3.4383998061E+00 -3.4285757940E+00 -3.4188077789E+00 + -3.4090952728E+00 -3.3994377480E+00 -3.3898348019E+00 -3.3802859760E+00 + -3.3707908119E+00 -3.3613487964E+00 -3.3519595367E+00 -3.3426226031E+00 + -3.3333375585E+00 -3.3241039280E+00 -3.3149213002E+00 -3.3057892823E+00 + -3.2967074576E+00 -3.2876753919E+00 -3.2786926486E+00 -3.2697588729E+00 + -3.2608736674E+00 -3.2520366348E+00 -3.2432473253E+00 -3.2345054052E+00 + -3.2258105008E+00 -3.2171622335E+00 -3.2085601965E+00 -3.2000040171E+00 + -3.1914933640E+00 -3.1830278760E+00 -3.1746071920E+00 -3.1662308963E+00 + -3.1578986987E+00 -3.1496102547E+00 -3.1413652201E+00 -3.1331632249E+00 + -3.1250039281E+00 -3.1168870307E+00 -3.1088122044E+00 -3.1007791210E+00 + -3.0927874062E+00 -3.0848367838E+00 -3.0769269476E+00 -3.0690575847E+00 + -3.0612283700E+00 -3.0534389657E+00 -3.0456891132E+00 -3.0379785141E+00 + -3.0303068700E+00 -3.0226738555E+00 -3.0150791818E+00 -3.0075225888E+00 + -3.0000037918E+00 -2.9925225065E+00 -2.9850784103E+00 -2.9776712548E+00 + -2.9703007806E+00 -2.9629667163E+00 -2.9556687906E+00 -2.9484066870E+00 + -2.9411801890E+00 -2.9339890404E+00 -2.9268329824E+00 -2.9197117526E+00 + -2.9126250543E+00 -2.9055726836E+00 -2.8985543940E+00 -2.8915699388E+00 + -2.8846190644E+00 -2.8777014940E+00 -2.8708170314E+00 -2.8639654413E+00 + -2.8571464888E+00 -2.8503599315E+00 -2.8436055055E+00 -2.8368830243E+00 + -2.8301922637E+00 -2.8235329996E+00 -2.8169050030E+00 -2.8103080169E+00 + -2.8037418662E+00 -2.7972063374E+00 -2.7907012166E+00 -2.7842262903E+00 + -2.7777813024E+00 -2.7713660913E+00 -2.7649804532E+00 -2.7586241845E+00 + -2.7522970813E+00 -2.7459989054E+00 -2.7397294879E+00 -2.7334886419E+00 + -2.7272761732E+00 -2.7210918876E+00 -2.7149355663E+00 -2.7088070278E+00 + -2.7027061039E+00 -2.6966326094E+00 -2.6905863591E+00 -2.6845671557E+00 + -2.6785748003E+00 -2.6726091446E+00 -2.6666700119E+00 -2.6607572257E+00 + -2.6548706095E+00 -2.6490099504E+00 -2.6431751131E+00 -2.6373659318E+00 + -2.6315822382E+00 -2.6258238639E+00 -2.6200906210E+00 -2.6143823457E+00 + -2.6086988964E+00 -2.6030401127E+00 -2.5974058340E+00 -2.5917958991E+00 + -2.5862101123E+00 -2.5806483565E+00 -2.5751104788E+00 -2.5695963262E+00 + -2.5641057455E+00 -2.5586385667E+00 -2.5531946375E+00 -2.5477738314E+00 + -2.5423760028E+00 -2.5370010057E+00 -2.5316486943E+00 -2.5263188926E+00 + -2.5210114847E+00 -2.5157263372E+00 -2.5104633109E+00 -2.5052222669E+00 + -2.5000030605E+00 -2.4948055297E+00 -2.4896295712E+00 -2.4844750523E+00 + -2.4793418405E+00 -2.4742298031E+00 -2.4691387940E+00 -2.4640686757E+00 + -2.4590193426E+00 -2.4539906681E+00 -2.4489825259E+00 -2.4439947896E+00 + -2.4390273136E+00 -2.4340799794E+00 -2.4291526807E+00 -2.4242452972E+00 + -2.4193577083E+00 -2.4144897934E+00 -2.4096414103E+00 -2.4048124528E+00 + -2.4000028171E+00 -2.3952123882E+00 -2.3904410511E+00 -2.3856886909E+00 + -2.3809551705E+00 -2.3762403912E+00 -2.3715442531E+00 -2.3668666465E+00 + -2.3622074621E+00 -2.3575665900E+00 -2.3529439004E+00 -2.3483392967E+00 + -2.3437526852E+00 -2.3391839614E+00 -2.3346330208E+00 -2.3300997590E+00 + -2.3255840547E+00 -2.3210858091E+00 -2.3166049363E+00 -2.3121413368E+00 + -2.3076949109E+00 -2.3032655590E+00 -2.2988531707E+00 -2.2944576400E+00 + -2.2900788908E+00 -2.2857168282E+00 -2.2813713571E+00 -2.2770423826E+00 + -2.2727298068E+00 -2.2684335124E+00 -2.2641534347E+00 -2.2598894829E+00 + -2.2556415666E+00 -2.2514095953E+00 -2.2471934785E+00 -2.2429931038E+00 + -2.2388083992E+00 -2.2346392847E+00 -2.2304856742E+00 -2.2263474813E+00 + -2.2222246197E+00 -2.2181169931E+00 -2.2140245088E+00 -2.2099471023E+00 + -2.2058846913E+00 -2.2018371933E+00 -2.1978045263E+00 -2.1937866078E+00 + -2.1897833323E+00 -2.1857946406E+00 -2.1818204576E+00 -2.1778607050E+00 + -2.1739153043E+00 -2.1699841770E+00 -2.1660672366E+00 -2.1621643955E+00 + -2.1582755972E+00 -2.1544007668E+00 -2.1505398296E+00 -2.1466927107E+00 + -2.1428593354E+00 -2.1390396118E+00 -2.1352334757E+00 -2.1314408644E+00 + -2.1276617067E+00 -2.1238959313E+00 -2.1201434667E+00 -2.1164042418E+00 + -2.1126781618E+00 -2.1089651802E+00 -2.1052652307E+00 -2.1015782452E+00 + -2.0979041558E+00 -2.0942428944E+00 -2.0905943898E+00 -2.0869585565E+00 + -2.0833353508E+00 -2.0797247079E+00 -2.0761265629E+00 -2.0725408510E+00 + -2.0689675074E+00 -2.0654064613E+00 -2.0618576366E+00 -2.0583209894E+00 + -2.0547964579E+00 -2.0512839803E+00 -2.0477834948E+00 -2.0442949396E+00 + -2.0408182464E+00 -2.0373533437E+00 -2.0339001895E+00 -2.0304587249E+00 + -2.0270288910E+00 -2.0236106288E+00 -2.0202038794E+00 -2.0168085786E+00 + -2.0134246558E+00 -2.0100520723E+00 -2.0066907719E+00 -2.0033406985E+00 + -2.0000017959E+00 -1.9966740079E+00 + + + + 0.0000000000E+00 -8.2277987587E-02 -1.6449650094E-01 -2.4659331589E-01 + -3.2850076507E-01 -4.1014315697E-01 -4.9143436126E-01 -5.7227561145E-01 + -6.5255357242E-01 -7.3213871601E-01 -8.1088404273E-01 -8.8862418182E-01 + -9.6517489579E-01 -1.0403330086E+00 -1.1138767696E+00 -1.1855666583E+00 + -1.2551466268E+00 -1.3223457701E+00 -1.3868804065E+00 -1.4484565448E+00 + -1.5067727038E+00 -1.5615230499E+00 -1.6124008059E+00 -1.6591018874E+00 + -1.7013287065E+00 -1.7387940878E+00 -1.7712252542E+00 -1.7983677872E+00 + -1.8199895278E+00 -1.8358843512E+00 -1.8458757490E+00 -1.8498201626E+00 + -1.8476100093E+00 -1.8391763180E+00 -1.8244910913E+00 -1.8035689946E+00 + -1.7764688338E+00 -1.7432942635E+00 -1.7041941790E+00 -1.6593624450E+00 + -1.6090370910E+00 -1.5534989864E+00 -1.4930699667E+00 -1.4281104400E+00 + -1.3590165014E+00 -1.2862165869E+00 -1.2101677086E+00 -1.1313513155E+00 + -1.0502688403E+00 -9.6743710159E-01 -8.8338307362E-01 -7.9863905513E-01 + -7.1373774907E-01 -6.2920651981E-01 -5.4556280508E-01 -4.6330924845E-01 + -3.8292814427E-01 -3.0487830138E-01 -2.2958906110E-01 -1.5745833560E-01 + -8.8847539142E-02 -2.4079508545E-02 3.6563691834E-02 9.2844371601E-02 + 1.4456869600E-01 1.9158861786E-01 2.3380285278E-01 2.7115554880E-01 + 3.0363571638E-01 3.3127718718E-01 3.5415591151E-01 3.7238794759E-01 + 3.8612701355E-01 3.9556147997E-01 4.0091144918E-01 4.0242475349E-01 + 4.0037331823E-01 3.9504921005E-01 3.8676054393E-01 3.7582731451E-01 + 3.6257720433E-01 3.4734141910E-01 3.3045059692E-01 3.1223083453E-01 + 2.9299986887E-01 2.7306344758E-01 2.5271191615E-01 2.3221704416E-01 + 2.1182910699E-01 1.9177423337E-01 1.7225200095E-01 1.5343329037E-01 + 1.3546088278E-01 1.1845195585E-01 1.0249965400E-01 8.7673842851E-02 + 7.4022153762E-02 6.1571422473E-02 5.0329214779E-02 4.0284462837E-02 + 3.1410783526E-02 2.3667878040E-02 1.7003522416E-02 1.1355567187E-02 + 6.6538808501E-03 2.8220678737E-03 -2.2001240793E-04 -2.5547866842E-03 + -4.2654366871E-03 -5.4349451843E-03 -6.1429431867E-03 -6.4660691080E-03 + -6.4767064284E-03 -6.2432624449E-03 -5.8262992394E-03 -5.2807905609E-03 + -4.6556441299E-03 -3.9940297562E-03 -3.3299352117E-03 -2.6921631841E-03 + -2.1041805481E-03 -1.5819270303E-03 -1.1356467997E-03 -7.7141771203E-04 + -4.9002008278E-04 -2.8685599057E-04 -1.5440700844E-04 -8.1295680504E-05 + -5.2958831332E-05 -4.9024979272E-05 -2.2654023957E-05 2.0802206378E-06 + 1.6220072646E-06 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 + + + 0.0000000000E+00 -1.1723087215E-02 -2.2970588285E-02 -3.3277762775E-02 + -4.2201369170E-02 -4.9329936250E-02 -5.4293469599E-02 -5.6772419538E-02 + -5.6505748973E-02 -5.3297954463E-02 -4.7024911330E-02 -3.7638432778E-02 + -2.5169454710E-02 -9.7297804400E-03 8.4876559486E-03 2.9210025055E-02 + 5.2087327705E-02 7.6696576823E-02 1.0254737811E-01 1.2908882978E-01 + 1.5571763119E-01 1.8178728239E-01 2.0661822255E-01 2.2950875795E-01 + 2.4974659019E-01 2.6662075364E-01 2.7943382524E-01 2.8751410085E-01 + 2.9022762150E-01 2.8698983512E-01 2.7727667838E-01 2.6063490065E-01 + 2.3669144662E-01 2.0516162976E-01 1.6585646477E-01 1.1868792319E-01 + 6.3673539750E-02 9.3865644130E-04 -6.9282296205E-02 -1.4664653237E-01 + -2.3070525508E-01 -3.2090755014E-01 -4.1660590836E-01 -5.1706326588E-01 + -6.2146146368E-01 -7.2891100660E-01 -8.3846198150E-01 -9.4911597183E-01 + -1.0598387764E+00 -1.1695734189E+00 -1.2772547958E+00 -1.3818230208E+00 + -1.4822368409E+00 -1.5774891053E+00 -1.6666187728E+00 -1.7487234696E+00 + -1.8229735575E+00 -1.8886194365E+00 -1.9450060422E+00 -1.9915764523E+00 + -2.0278831780E+00 -2.0535914046E+00 -2.0684822557E+00 -2.0724589319E+00 + -2.0655426479E+00 -2.0478740211E+00 -2.0197115502E+00 -1.9814248238E+00 + -1.9334892505E+00 -1.8764808008E+00 -1.8110656211E+00 -1.7379906968E+00 + -1.6580733899E+00 -1.5721902388E+00 -1.4812642140E+00 -1.3862522622E+00 + -1.2881324457E+00 -1.1878908963E+00 -1.0865088556E+00 -9.8494996663E-01 + -8.8414798922E-01 -7.8499509134E-01 -6.8833085835E-01 -5.9493213952E-01 + -5.0550383294E-01 -4.2067068746E-01 -3.4097017799E-01 -2.6684648767E-01 + -1.9864560686E-01 -1.3661153627E-01 -8.0883449121E-02 -3.1493861467E-02 + 1.1620322833E-02 4.8599589644E-02 7.9650226738E-02 1.0503788587E-01 + 1.2508039801E-01 1.4013995079E-01 1.5061541504E-01 1.5693366067E-01 + 1.5954043230E-01 1.5889246200E-01 1.5544945597E-01 1.4966651822E-01 + 1.4199014613E-01 1.3284299597E-01 1.2262519169E-01 1.1170778401E-01 + 1.0042928223E-01 8.9095504256E-02 7.7963775928E-02 6.7255547029E-02 + 5.7152141457E-02 4.7799764163E-02 3.9294804857E-02 3.1701123437E-02 + 2.5050278343E-02 1.9344675342E-02 1.4551822257E-02 1.0621676019E-02 + 7.4866418173E-03 5.0616443312E-03 3.2542385947E-03 1.9673338834E-03 + 1.1012904675E-03 5.6303386455E-04 2.6418408232E-04 1.2416036016E-04 + 8.3638494781E-05 8.3445532246E-05 3.9712888830E-05 -3.5905440177E-06 + -2.7996494097E-06 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 + + + 0.0000000000E+00 3.5860269827E-03 1.4317078272E-02 3.2112256128E-02 + 5.6837367274E-02 8.8305873299E-02 1.2628021312E-01 1.7047348950E-01 + 2.2055150919E-01 2.7613516420E-01 3.3680313934E-01 4.0209492899E-01 + 4.7151414403E-01 5.4453208758E-01 6.2059157645E-01 6.9911098297E-01 + 7.7948847033E-01 8.6110639245E-01 9.4333582837E-01 1.0255412195E+00 + 1.1070850756E+00 1.1873327175E+00 1.2656570189E+00 1.3414431144E+00 + 1.4140930355E+00 1.4830302391E+00 1.5477039940E+00 1.6075935868E+00 + 1.6622123152E+00 1.7111112337E+00 1.7538826190E+00 1.7901631245E+00 + 1.8196365952E+00 1.8420365276E+00 1.8571481070E+00 1.8648098949E+00 + 1.8649150077E+00 1.8574119504E+00 1.8423049075E+00 1.8196536236E+00 + 1.7895728317E+00 1.7522312003E+00 1.7078498355E+00 1.6567003418E+00 + 1.5991024567E+00 1.5354212770E+00 1.4660641009E+00 1.3914769134E+00 + 1.3121405542E+00 1.2285667331E+00 1.1412932403E+00 1.0508797039E+00 + 9.5790298913E-01 8.6295169350E-01 7.6662172078E-01 6.6951129918E-01 + 5.7221537424E-01 4.7532181715E-01 3.7940524579E-01 2.8502383408E-01 + 1.9271369849E-01 1.0298539150E-01 1.6320224853E-02 -6.6834318537E-02 + -1.4606648327E-01 -2.2100368547E-01 -2.9131520005E-01 -3.5671342275E-01 + -4.1695549560E-01 -4.7184486654E-01 -5.2123135255E-01 -5.6501133337E-01 + -6.0312746230E-01 -6.3556796384E-01 -6.6236527842E-01 -6.8359449970E-01 + -6.9937142322E-01 -7.0985023488E-01 -7.1522090252E-01 -7.1570630926E-01 + -7.1155917402E-01 -7.0305880711E-01 -6.9050775058E-01 -6.7422835456E-01 + -6.5455934146E-01 -6.3185241011E-01 -6.0646893131E-01 -5.7877678564E-01 + -5.4914739239E-01 -5.1795297687E-01 -4.8556415192E-01 -4.5234782509E-01 + -4.1866194064E-01 -3.8484854594E-01 -3.5122924483E-01 -3.1810273720E-01 + -2.8574272301E-01 -2.5439648427E-01 -2.2428377777E-01 -1.9559309794E-01 + -1.6848457254E-01 -1.4308867914E-01 -1.1950650654E-01 -9.7810361217E-02 + -7.8045864567E-02 -6.0228944123E-02 -4.4352023434E-02 -3.0384350338E-02 + -1.8273960617E-02 -7.9495883994E-03 6.7759973479E-04 7.7099198058E-03 + 1.3261526160E-02 1.7457218324E-02 2.0427124578E-02 2.2305646179E-02 + 2.3229681315E-02 2.3338002873E-02 2.2762351834E-02 2.1631893993E-02 + 2.0072411241E-02 1.8198596607E-02 1.6115058374E-02 1.3919754929E-02 + 1.1698898123E-02 9.5228516845E-03 7.4546618226E-03 5.5452981103E-03 + 3.8288793415E-03 2.3331720374E-03 1.0736899984E-03 5.2926998136E-05 + -7.3149996906E-04 -1.2929678601E-03 -1.6511600028E-03 -1.8297965622E-03 + -1.8585360541E-03 -1.7678438403E-03 -1.5911002682E-03 -1.3586955246E-03 + -1.1019690505E-03 -8.4702839256E-04 -6.1497715279E-04 -4.2541468309E-04 + -2.8402093044E-04 -1.9725198792E-04 -1.6029137555E-04 -1.5177315345E-04 + -8.9838576118E-05 -5.4771430827E-06 9.6147639048E-06 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 + + + 0.0000000000E+00 9.2893242255E-04 3.7019764676E-03 8.2779769445E-03 + 1.4588689808E-02 2.2539302945E-02 3.2009163013E-02 4.2852705981E-02 + 5.4900588274E-02 6.7961014587E-02 8.1821257513E-02 9.6249363136E-02 + 1.1099603565E-01 1.2579669284E-01 1.4037368313E-01 1.5443865322E-01 + 1.6769505435E-01 1.7984077312E-01 1.9057087191E-01 1.9958042202E-01 + 2.0656741042E-01 2.1123570101E-01 2.1329802783E-01 2.1247899806E-01 + 2.0851807950E-01 2.0117254726E-01 1.9022036400E-01 1.7546296384E-01 + 1.5672791414E-01 1.3387142592E-01 1.0678068413E-01 7.5375969726E-02 + 3.9612545899E-02 -5.1771162367E-04 -4.4984990254E-02 -9.3720492346E-02 + -1.4661576728E-01 -2.0352233418E-01 -2.6425172921E-01 -3.2857590175E-01 + -3.9622799242E-01 -4.6690352048E-01 -5.4026196118E-01 -6.1592871460E-01 + -6.9349745874E-01 -7.7253287529E-01 -8.5257373131E-01 -9.3313629426E-01 + -1.0137180499E+00 -1.0938015836E+00 -1.1728591565E+00 -1.2503568716E+00 + -1.3257591460E+00 -1.3985339592E+00 -1.4681572088E+00 -1.5341175221E+00 + -1.5959215886E+00 -1.6530979289E+00 -1.7052026499E+00 -1.7518226198E+00 + -1.7925806287E+00 -1.8271386946E+00 -1.8552014319E+00 -1.8765200508E+00 + -1.8908940379E+00 -1.8981739659E+00 -1.8982634918E+00 -1.8911201167E+00 + -1.8767560905E+00 -1.8552388890E+00 -1.8266905450E+00 -1.7912868237E+00 + -1.7492558153E+00 -1.7008761570E+00 -1.6464740899E+00 -1.5864208426E+00 + -1.5211291980E+00 -1.4510496961E+00 -1.3766664971E+00 -1.2984929459E+00 + -1.2170668968E+00 -1.1329458601E+00 -1.0467020356E+00 -9.5891730044E-01 + -8.7017821889E-01 -7.8107114433E-01 -6.9217748077E-01 -6.0406917196E-01 + -5.1730448344E-01 -4.3242413992E-01 -3.4994793756E-01 -2.7037179974E-01 + -1.9415879235E-01 -1.2172780433E-01 -5.3447524144E-02 1.0365478214E-02 + 6.9448141778E-02 1.2359143475E-01 1.7264050244E-01 2.1649715017E-01 + 2.5511519340E-01 2.8850025641E-01 3.1670773256E-01 3.3984027408E-01 + 3.5804463559E-01 3.7150996533E-01 3.8046133188E-01 3.8515667437E-01 + 3.8588276317E-01 3.8295243350E-01 3.7669486668E-01 3.6745451357E-01 + 3.5558583975E-01 3.4145309072E-01 3.2541361978E-01 3.0782259726E-01 + 2.8902885306E-01 2.6937446596E-01 2.4917322584E-01 2.2872577817E-01 + 2.0831749242E-01 1.8820101658E-01 1.6860103788E-01 1.4972254242E-01 + 1.3173967350E-01 1.1478834281E-01 9.8985070983E-02 8.4416837992E-02 + 7.1132442003E-02 5.9164183104E-02 4.8515945607E-02 3.9162908641E-02 + 3.1067102133E-02 2.4166226359E-02 1.8383581049E-02 1.3631066681E-02 + 9.8082692597E-03 6.8108066814E-03 4.5261412594E-03 2.8479629339E-03 + 1.6647468801E-03 8.7544647586E-04 3.8748703362E-04 1.0892680591E-04 + -2.3425222903E-05 -7.7922079727E-05 -9.4002294959E-05 -9.5906384375E-05 + -5.6988379155E-05 -3.5463910332E-06 5.9608986436E-06 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 + + + 1.9514303897E+01 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 2.7522534413E+00 0.0000000000E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 -9.6137176497E+00 0.0000000000E+00 + 0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 -3.2324794045E+00 + + + + + + 0.0000000000E+00 2.4555322044E-04 9.9435596078E-04 2.2826802809E-03 + 4.1704577663E-03 6.7405039734E-03 1.0097446971E-02 1.4366374033E-02 + 1.9691214178E-02 2.6232877622E-02 3.4167176269E-02 4.3682552029E-02 + 5.4977642082E-02 6.8258712011E-02 8.3736989112E-02 1.0162592925E-01 + 1.2213845080E-01 1.4548416964E-01 1.7186666804E-01 2.0148082992E-01 + 2.3451027351E-01 2.7112491013E-01 3.1147865648E-01 3.5570732483E-01 + 4.0392671304E-01 4.5623091264E-01 5.1269085328E-01 5.7335309214E-01 + 6.3823886033E-01 7.0734337238E-01 7.8063540062E-01 8.5805711439E-01 + 9.3952417996E-01 1.0249261017E+00 1.1141268534E+00 1.2069656628E+00 + 1.3032581166E+00 1.4027973581E+00 1.5053555697E+00 1.6106855669E+00 + 1.7185225365E+00 1.8285859201E+00 1.9405813920E+00 2.0542029191E+00 + 2.1691348784E+00 2.2850542133E+00 2.4016326049E+00 2.5185386399E+00 + 2.6354399503E+00 2.7520052604E+00 2.8679065939E+00 2.9828210875E+00 + 3.0964328328E+00 3.2084348565E+00 3.3185305939E+00 3.4264354515E+00 + 3.5318783577E+00 3.6346027795E+00 3.7343681356E+00 3.8309505035E+00 + 3.9241436467E+00 4.0137595888E+00 4.0996291450E+00 4.1816023622E+00 + 4.2595486912E+00 4.3333571276E+00 4.4029361120E+00 4.4682134089E+00 + 4.5291358858E+00 4.5856689029E+00 4.6377959244E+00 4.6855179084E+00 + 4.7288525836E+00 4.7678337886E+00 4.8025102209E+00 4.8329448489E+00 + 4.8592138549E+00 4.8814055976E+00 4.8996195505E+00 4.9139652152E+00 + 4.9245610181E+00 4.9315332006E+00 4.9350147101E+00 4.9351441007E+00 + 4.9320644514E+00 4.9259223085E+00 4.9168666605E+00 4.9050479496E+00 + 4.8906171284E+00 4.8737247647E+00 4.8545202011E+00 4.8331507724E+00 + 4.8097610853E+00 4.7844923629E+00 4.7574818564E+00 4.7288623255E+00 + 4.6987615887E+00 4.6673022280E+00 4.6346013382E+00 4.6007695137E+00 + 4.5659116073E+00 4.5301262966E+00 4.4935060361E+00 4.4561370652E+00 + 4.4180998488E+00 4.3794678786E+00 4.3403092049E+00 4.3006862196E+00 + 4.2606559084E+00 4.2202701844E+00 4.1795755629E+00 4.1386143513E+00 + 4.0974246608E+00 4.0560406954E+00 4.0144930190E+00 3.9728092235E+00 + 3.9310140868E+00 3.8891296584E+00 3.8471763585E+00 3.8051727118E+00 + 3.7631353587E+00 3.7210801524E+00 3.6790222306E+00 3.6369754752E+00 + 3.5949532922E+00 3.5529695496E+00 3.5110371051E+00 3.4691686011E+00 + 3.4273777892E+00 3.3856773650E+00 3.3440802075E+00 3.3026002179E+00 + 3.2612500462E+00 3.2200431162E+00 3.1789931977E+00 3.1381126416E+00 + 3.0974153228E+00 3.0569141416E+00 3.0166213097E+00 2.9765501384E+00 + 2.9367121016E+00 2.8971193603E+00 2.8577833885E+00 2.8187144722E+00 + 2.7799237680E+00 2.7414203117E+00 2.7032138975E+00 2.6653129770E+00 + 2.6277254184E+00 2.5904592077E+00 2.5535203424E+00 2.5169161760E+00 + 2.4806515332E+00 2.4447325241E+00 2.4091633992E+00 2.3739487914E+00 + 2.3390925610E+00 2.3045979624E+00 2.2704685078E+00 2.2367061977E+00 + 2.2033142099E+00 2.1702934279E+00 2.1376466921E+00 2.1053739087E+00 + 2.0734775689E+00 2.0419567389E+00 2.0108135020E+00 1.9800463428E+00 + 1.9496568192E+00 1.9196429197E+00 1.8900057747E+00 1.8607428608E+00 + 1.8318549701E+00 1.8033391361E+00 1.7751957603E+00 1.7474216534E+00 + 1.7200166764E+00 1.6929775891E+00 1.6663036438E+00 1.6399916950E+00 + 1.6140403080E+00 1.5884465799E+00 1.5632083065E+00 1.5383229695E+00 + 1.5137875184E+00 1.4895999441E+00 1.4657562937E+00 1.4422551560E+00 + 1.4190916630E+00 1.3962646344E+00 1.3737695287E+00 1.3516041505E+00 + 1.3297648393E+00 1.3082482527E+00 1.2870517468E+00 1.2661707761E+00 + 1.2456035241E+00 1.2253449792E+00 1.2053930652E+00 1.1857436665E+00 + 1.1663933299E+00 1.1473392986E+00 1.1285766929E+00 1.1101036434E+00 + 1.0919153412E+00 1.0740090637E+00 1.0563813566E+00 1.0390278824E+00 + 1.0219464881E+00 1.0051322370E+00 9.8858273586E-01 9.7229433753E-01 + 9.5626288454E-01 9.4048618717E-01 9.2495945205E-01 9.0968023104E-01 + 8.9464515722E-01 8.7984988668E-01 8.6529242529E-01 8.5096836827E-01 + 8.3687483992E-01 8.2300917666E-01 8.0936653060E-01 7.9594527075E-01 + 7.8274162686E-01 7.6975194584E-01 7.5697415488E-01 7.4440404132E-01 + 7.3203906033E-01 7.1987668706E-01 7.0791241321E-01 6.9614463944E-01 + 6.8457015681E-01 6.7318513862E-01 6.6198789975E-01 6.5097491802E-01 + 6.4014312669E-01 6.2949060514E-01 6.1901366286E-01 6.0870987684E-01 + 5.9857713213E-01 5.8861165082E-01 5.7881152804E-01 5.6917450923E-01 + 5.5969679766E-01 5.5037688431E-01 5.4121243428E-01 5.3219968832E-01 + 5.2333742019E-01 5.1462325731E-01 5.0605357159E-01 4.9762726943E-01 + 4.8934201433E-01 4.8119433956E-01 4.7318319951E-01 4.6530638099E-01 + 4.5756049431E-01 4.4994457445E-01 4.4245651371E-01 4.3509310405E-01 + 4.2785330791E-01 4.2073518522E-01 4.1373575057E-01 4.0685380826E-01 + 4.0008762412E-01 3.9343446965E-01 3.8689291717E-01 3.8046146824E-01 + 3.7413768570E-01 3.6791984300E-01 3.6180669955E-01 3.5579614205E-01 + 3.4988609008E-01 3.4407557281E-01 3.3836282768E-01 3.3274538488E-01 + 3.2722254123E-01 3.2179272713E-01 3.1645359987E-01 3.1120418021E-01 + 3.0604318897E-01 3.0096867591E-01 2.9597910956E-01 2.9107357482E-01 + 2.8625054370E-01 2.8150791508E-01 2.7684511501E-01 2.7226081425E-01 + 2.6775306980E-01 2.6332090390E-01 2.5896332932E-01 2.5467885571E-01 + 2.5046582167E-01 2.4632363995E-01 2.4225111660E-01 2.3824645700E-01 + 2.3430890125E-01 2.3043753190E-01 2.2663103075E-01 2.2288787866E-01 + 2.1920758650E-01 2.1558909889E-01 2.1203087347E-01 2.0853209698E-01 + 2.0509203703E-01 2.0170965015E-01 1.9838332245E-01 1.9511274762E-01 + 1.9189700637E-01 1.8873489405E-01 1.8562532983E-01 1.8256784638E-01 + 1.7956156317E-01 1.7660516721E-01 1.7369802148E-01 1.7083951993E-01 + 1.6802882312E-01 1.6526455844E-01 1.6254643724E-01 1.5987374760E-01 + 1.5724562162E-01 1.5466088112E-01 1.5211928277E-01 1.4962011532E-01 + 1.4716247731E-01 1.4474543496E-01 1.4236869347E-01 1.4003158209E-01 + 1.3773319599E-01 1.3547277982E-01 1.3325001085E-01 1.3106425897E-01 + 1.2891464189E-01 1.2680052175E-01 1.2472157135E-01 1.2267720085E-01 + 1.2066657246E-01 1.1868911133E-01 1.1674450739E-01 1.1483221045E-01 + 1.1295144579E-01 1.1110165375E-01 1.0928256040E-01 1.0749365423E-01 + 1.0573423961E-01 1.0400373057E-01 1.0230190533E-01 1.0062828986E-01 + 9.8982281391E-02 9.7363232570E-02 9.5770986299E-02 9.4205104640E-02 + 9.2665089253E-02 9.1150203181E-02 8.9660362635E-02 8.8195164240E-02 + 8.6754184314E-02 8.5336692055E-02 8.3942565008E-02 8.2571471377E-02 + 8.1223022254E-02 7.9896618189E-02 7.8591960933E-02 7.7308833712E-02 + 7.6046880544E-02 7.4805642086E-02 7.3584629176E-02 7.2383740506E-02 + 7.1202651131E-02 7.0041020775E-02 6.8898242541E-02 6.7774247481E-02 + 6.6668765461E-02 6.5581486791E-02 6.4511958509E-02 6.3459869402E-02 + 6.2425088961E-02 6.1407336085E-02 6.0406316994E-02 5.9421479885E-02 + 5.8452822608E-02 5.7500090972E-02 5.6563019041E-02 5.5641218020E-02 + 5.4734411156E-02 5.3842496378E-02 5.2965233586E-02 5.2102372240E-02 + 5.1253457601E-02 5.0418435755E-02 4.9597119611E-02 4.8789283548E-02 + 4.7994643509E-02 4.7212852717E-02 4.6443878327E-02 4.5687517686E-02 + 4.4943559586E-02 4.4211678082E-02 4.3491681375E-02 4.2783483184E-02 + 4.2086894267E-02 4.1401717619E-02 4.0727603994E-02 4.0064476856E-02 + 3.9412211446E-02 3.8770631668E-02 3.8139554396E-02 3.7518622579E-02 + 3.6907839114E-02 3.6307055480E-02 3.5716108368E-02 3.5134814385E-02 + 3.4562862225E-02 3.4000262270E-02 3.3446869273E-02 3.2902532276E-02 + 3.2367070616E-02 3.1840217984E-02 3.1321976903E-02 3.0812213425E-02 + 3.0310788441E-02 2.9817533657E-02 2.9332204905E-02 2.8854805910E-02 + 2.8385213523E-02 2.7923299942E-02 2.7468917225E-02 2.7021823958E-02 + 2.6582032748E-02 2.6149430718E-02 2.5723900801E-02 2.5305321709E-02 + 2.4893439431E-02 2.4488280997E-02 2.4089743440E-02 2.3697719832E-02 + 2.3312099450E-02 2.2932666819E-02 2.2559404233E-02 2.2192239239E-02 + 2.1831074449E-02 2.1475809062E-02 2.1126270265E-02 2.0782385275E-02 + 2.0444115841E-02 2.0111373508E-02 1.9784066758E-02 1.9462068697E-02 + 1.9145243997E-02 1.8833590598E-02 1.8527028380E-02 1.8225474479E-02 + 1.7928843269E-02 1.7636952905E-02 1.7349817563E-02 1.7067371509E-02 + 1.6789539913E-02 1.6516245476E-02 1.6247359808E-02 1.5982811518E-02 + 1.5722581767E-02 1.5466603166E-02 1.5214806114E-02 1.4967116953E-02 + 1.4723378333E-02 1.4483616032E-02 1.4247769523E-02 1.4015776305E-02 + 1.3787571887E-02 1.3563051106E-02 1.3342146415E-02 1.3124847757E-02 + 1.2911099151E-02 1.2700842848E-02 1.2494019314E-02 1.2290502671E-02 + 1.2090293497E-02 1.1893354059E-02 1.1699632778E-02 1.1509076483E-02 + 1.1321619331E-02 1.1137157247E-02 1.0955705606E-02 1.0777218459E-02 + 1.0601648447E-02 1.0428946789E-02 1.0259036651E-02 1.0091856576E-02 + 9.9274061406E-03 9.7656432701E-03 9.6065246281E-03 9.4500056081E-03 + 9.2960048318E-03 9.1444890831E-03 8.9954481332E-03 8.8488435842E-03 + 8.7046359123E-03 8.5627844608E-03 8.4232085626E-03 8.2858909371E-03 + 8.1508168888E-03 8.0179514903E-03 7.8872588105E-03 7.7587019092E-03 + 7.6322050317E-03 7.5077566051E-03 7.3853419925E-03 7.2649295244E-03 + 7.1464866384E-03 7.0299798746E-03 6.9153417240E-03 6.8025579795E-03 + 6.6916177587E-03 6.5824924357E-03 6.4751525918E-03 6.3695680107E-03 + 6.2656820661E-03 6.1634712357E-03 6.0629312937E-03 5.9640364449E-03 + 5.8667601907E-03 5.7710753252E-03 5.6769381537E-03 5.5843106621E-03 + 5.4931974191E-03 5.4035752507E-03 5.3154203599E-03 5.2287083230E-03 + 5.1434098717E-03 5.0594687788E-03 4.9768997803E-03 4.8956821190E-03 + 4.8157944869E-03 4.7372150220E-03 4.6599213051E-03 4.5838621307E-03 + 4.5090375337E-03 4.4354374536E-03 4.3630428668E-03 4.2918342606E-03 + 4.2217916304E-03 4.1528822007E-03 4.0850770685E-03 4.0183813155E-03 + 3.9527780072E-03 3.8882497776E-03 3.8247788275E-03 3.7623469216E-03 + 3.7009082742E-03 3.6404704082E-03 3.5810223207E-03 3.5225486012E-03 + 3.4650334570E-03 3.4084607124E-03 3.3528049178E-03 3.2980391006E-03 + 3.2441697846E-03 3.1911833332E-03 3.1390657742E-03 3.0878027987E-03 + 3.0373797587E-03 2.9877641034E-03 2.9389511875E-03 2.8909379581E-03 + 2.8437120887E-03 2.7972609573E-03 2.7515716446E-03 2.7066309326E-03 + 2.6624026263E-03 2.6188968441E-03 2.5761042605E-03 2.5340137731E-03 + 2.4926140192E-03 2.4518933742E-03 2.4118369488E-03 2.3724166399E-03 + 2.3336420418E-03 2.2955034166E-03 2.2579907985E-03 2.2210939931E-03 + 2.1848025759E-03 2.1491007974E-03 2.1139679998E-03 2.0794109580E-03 + 2.0454209566E-03 2.0119890803E-03 1.9791062132E-03 1.9467630375E-03 + 1.9149447857E-03 1.8836339902E-03 1.8528364927E-03 1.8225445192E-03 + 1.7927501202E-03 1.7634451708E-03 1.7346213692E-03 1.7062663468E-03 + 1.6783624620E-03 1.6509161204E-03 1.6239204100E-03 1.5973682657E-03 + 1.5712524689E-03 1.5455656462E-03 + + diff --git a/tests/integrate/tools/catch_properties.sh b/tests/integrate/tools/catch_properties.sh index 859d35309fc..8252a184566 100755 --- a/tests/integrate/tools/catch_properties.sh +++ b/tests/integrate/tools/catch_properties.sh @@ -157,7 +157,7 @@ fi # echo "has_stress:"$has_stress #------------------------------- if ! test -z "$has_stress" && [ $has_stress == 1 ]; then - grep -A6 "TOTAL-STRESS" $running_path| awk 'NF==3' | tail -3> stress.txt + grep -A6 "TOTAL-STRESS" $running_path| awk '/^[[:space:]]*-?[0-9]/' | head -3> stress.txt total_stress=`sum_file stress.txt` rm stress.txt echo "totalstressref $total_stress" >>$1