Ml#3019
Conversation
…ations - HVectorBase: all hot functions (saxpy, tight, clear, norm2, copy) inline in header - Saxpy: split-loop prefetch at distance 8 for L1/L2 cache - HPreFetch.h: platform-independent prefetch + likely/unlikely macros - HFactor: prefetch in ftranL/btranL inner loops, insertion sort, likely/unlikely - computeDot/collectAj: inline in header (eliminate pricing call overhead) - HighsSeparation: forced 6 separation rounds at tree nodes - HighsTableauSeparator: numTries=0 reset + 50x pool limit - HighsSeparator: virtual resetTries() for tree-node reset - HEkkDualRHS: pdqsort -> partial_sort (O(n log n) -> O(n log k)) - HSimplexNlaProductForm: kProductFormMaxUpdates 50->100 (2x fewer rebuilds) - HighsSearch: redundant propagate skip when no bound changes - HighsDomain: insertion sort for tiny resolveBuffer arrays - Search: redundant propagate skip after reduced cost fixing
…l option to LP relaxation
|
@yunusemrecagliyan Have you noticed any improvements from computing parallel cuts? I don't see any changes here on suppressing global information that could be derived during separation, e.g., cliques. |
This was an accidental PR; I was just doing some experiments. Honestly, it triggered a bunch of RAM access issues, and the performance gains were either negligible or even worse in most cases. Making this work properly would require significant time and effort. Also, the parallel improvements in this version were different from the version I actually intended to merge. While parallel processing can speed up certain problems almost proportionally to the thread count, it is far from being a universal solution that works well across all problem types. The main gain actually came from using insertion sort instead of pdqsort, especially for speeding up the early stages. However, even that only improved early-stage performance for some specific problems, nothing more. |
|
@yunusemrecagliyan is the parallel performance comment specifically related to cut generation, or something more general? Thanks for the interesting note on |
|
@Opt-Mucca It was more of a general parallelization attempt. I didn't specifically benchmark or track the performance gains of each individual task implementation across different workloads. |
|
There's a general parallel attempt in #2886 |
No description provided.