Skip to content

Ml#3019

Closed
yunusemrecagliyan wants to merge 2 commits into
ERGO-Code:masterfrom
yunusemrecagliyan:ml
Closed

Ml#3019
yunusemrecagliyan wants to merge 2 commits into
ERGO-Code:masterfrom
yunusemrecagliyan:ml

Conversation

@yunusemrecagliyan
Copy link
Copy Markdown

No description provided.

…ations

- HVectorBase: all hot functions (saxpy, tight, clear, norm2, copy) inline in header
- Saxpy: split-loop prefetch at distance 8 for L1/L2 cache
- HPreFetch.h: platform-independent prefetch + likely/unlikely macros
- HFactor: prefetch in ftranL/btranL inner loops, insertion sort, likely/unlikely
- computeDot/collectAj: inline in header (eliminate pricing call overhead)
- HighsSeparation: forced 6 separation rounds at tree nodes
- HighsTableauSeparator: numTries=0 reset + 50x pool limit
- HighsSeparator: virtual resetTries() for tree-node reset
- HEkkDualRHS: pdqsort -> partial_sort (O(n log n) -> O(n log k))
- HSimplexNlaProductForm: kProductFormMaxUpdates 50->100 (2x fewer rebuilds)
- HighsSearch: redundant propagate skip when no bound changes
- HighsDomain: insertion sort for tiny resolveBuffer arrays
- Search: redundant propagate skip after reduced cost fixing
@Opt-Mucca
Copy link
Copy Markdown
Collaborator

@yunusemrecagliyan Have you noticed any improvements from computing parallel cuts? I don't see any changes here on suppressing global information that could be derived during separation, e.g., cliques.

@yunusemrecagliyan
Copy link
Copy Markdown
Author

@yunusemrecagliyan Have you noticed any improvements from computing parallel cuts? I don't see any changes here on suppressing global information that could be derived during separation, e.g., cliques.

This was an accidental PR; I was just doing some experiments. Honestly, it triggered a bunch of RAM access issues, and the performance gains were either negligible or even worse in most cases. Making this work properly would require significant time and effort.

Also, the parallel improvements in this version were different from the version I actually intended to merge. While parallel processing can speed up certain problems almost proportionally to the thread count, it is far from being a universal solution that works well across all problem types.

The main gain actually came from using insertion sort instead of pdqsort, especially for speeding up the early stages. However, even that only improved early-stage performance for some specific problems, nothing more.

@Opt-Mucca
Copy link
Copy Markdown
Collaborator

@yunusemrecagliyan is the parallel performance comment specifically related to cut generation, or something more general?

Thanks for the interesting note on pdqsort!

@yunusemrecagliyan
Copy link
Copy Markdown
Author

@Opt-Mucca It was more of a general parallelization attempt. I didn't specifically benchmark or track the performance gains of each individual task implementation across different workloads.

@Opt-Mucca
Copy link
Copy Markdown
Collaborator

There's a general parallel attempt in #2886
Any feedback is welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants