Skip to content

Attempt to add ExecuteDFS to main (Generic depth first search) #737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

saffronmciver
Copy link
Contributor

This pull request will build on #505 and #499 and attempt to add ExecuteDFS to the main branch again.

My aim is to add back the fixed versions of functions using the generic DFS, adding more tests to ensure they are correct, and to look at improving the performance of these functions.

I will also look at improving the performance of ExecuteDFS itself.

@saffronmciver
Copy link
Contributor Author

So far I have refactored the iterative solution of ExecuteDFS to a recursive one. The iterative stack can grow very large, since all successors are pushed to the stack if they have not been visited. if current is the node being visited and successors := [1, 2] is a list of successors to current, and both are unvisited, vertex 2 is pushed to the stack even if it would be visited in the branch where 1 is visited. The iterative solution checks if the current node has been visited and continues the loop if it has to deal with this.

However, this has implications for AncestorFunc and CrossFunc. If (current, 2) is a back edge / cross edge, the relevant function is called before any calls to PreorderFunc, PostorderFunc, etc. are called by visiting the branch of 1. With recursion, AncestorFunc or CrossFunc can easily be called after the recursive call exploring 1 is complete.

I believe removing this behavior and using recursion would also allow forward edges to be recognised during the DFS if this is useful at all.

@saffronmciver
Copy link
Contributor Author

I have found that using the HashMap from the datastructures package is not improving performance (and for many occasions is worsening it). Below are my findings from timing various functions with three different implementations:

  1. Using HashMaps without reserving memory
  2. Using plain lists with a BitArray to store bits for visited / backtracked nodes
  3. Using plain lists without the BitArray (using preorder and postorder to determine visited / backtracked nodes)

ArticulationPoints for CompleteDigraph(10000)

Implementation Min Time (ms) Memory Allocated (Bytes)
HashMaps 13,597 925,563,519
PList with BitArray 10,255 923,069,485
PList no BitArray 10,205 923,070,766

There is not much difference in terms of memory but this is because the current implementation uses much more memory in the non DFS part of the procedure. The HashMap implementation uses 2,099,704 memory for regular DFS on the same graph, and the list implementation uses 323,142. The worse time for HashMaps appears to be mainly caused by lookups. When I was reducing the amount of lookups I noticed significant improvements.

DigraphTopologicalSort for BinaryTree(21)

Implementation Min Time (ms) Memory Allocated (Bytes)
HashMaps 1365 469,969,923
PList with BitArray 363 107,881,074
PList no BitArray 365 107,882,284

All connected components are explored, so all the nodes have information stored in the hashmaps. I think the large memory increase is due to how HashMaps have plain lists for both values and keys, and there are four of these HashMaps in the DFS record. DigraphTopologicalSort uses an AncestorFunc, meaning more lookups are necessary to check if a neighbor is backtracked determining whether to call the AncestorFunc or not. The time increase for HashMaps is worse for this function as a result.

DominatorTree(d, 1) for CompleteDigraph(5000)

Implementation Min Time (ms) Memory Allocated (Bytes)
HashMaps 23,642 1,463,632,881
PList with BitArray 12,203 1,461,462,329
PList no BitArray 12,005 1,461,462,065

This is another case of the DFS memory being dominated by the memory used by the procedure itself (which I may try and look into to see if this can be improved).

ExecuteDFS (including the record creation)

I also tested only running ExecuteDFS to remove the interference of the work each function does itself.

On BinaryTree(21)

Implementation Min Time (ms) Memory Allocated (Bytes)
HashMaps 0 1692
PList with BitArray 122 67,111,802
PList no BitArray 122 67,111,802

This example shows the advantage of HashMaps. Since only one node is explored in the DFS search, the time taken is dominated by the initialisation of data structures rather than the recursive procedure. The whole lists are being initialised, which is not the case for the key and value lists of the HashMap

On ChainDigraph(20000)

Implementation Memory Allocated (Bytes)
HashMaps 4,197,140
PList with BitArray 642,987

The memory usage is worse for the HashMap due to the two plain lists each HashMap stores. I have not figured out why it is so much more than 2x worse though.

On BinaryTree(21) exploring all connected components (record.config.forest := true)

Implementation Min Time (ms) Memory Allocated (Bytes)
HashMaps 1198 402,655,646
PList with BitArray 200 67,111,959

As opposed to the case without forest := true, this does force all nodes to be explored (the worst case for HashMaps).

Thoughts on which data structure to choose

I think generally, HashMaps are not preferable to PList for this algorithm. The worst case for PList, where search search starts from a node with no children is probably not a common case, and the worst case of a HashMap seems much worse. There was only a 122 ms difference for the setup of the PList versions with a graph of 2,097,151 vertices (ExecuteDFS for BinaryTree(21)). The memory usage was much higher than for HashMaps, but this only grows with the number of vertices with no further multiplier (the memory usage for HashMaps grows faster due to two PLists per HashMap, but it seems to be more than 2x worse).

I will keep the BitArrays for now in case I remove the usage of some record fields given certain configuration options to reduce memory usage if they are not needed. The memory usage is barely different since only 1 bit is stored per vertex for both the backtracking and visited arrays.

@james-d-mitchell james-d-mitchell added the performance A label for issues or PR related to performance label Apr 7, 2025
…ingForest, add options to turn off specific record fields when not necessary
@saffronmciver saffronmciver marked this pull request as ready for review April 30, 2025 12:50
@saffronmciver
Copy link
Contributor Author

saffronmciver commented Apr 30, 2025

Below is an overview of the current state of generic DFS performance for selected examples. All benchmarking was done as an average of 100 runs.

General ExecuteDFS Performance

Performance of ExecuteDFS on CompleteDigraphs

AverageExecuteDFS

The above graph shows the performance of ExecuteDFS on CompleteDigraphs of increasing size. Benchmarks include the creation of the DFSRecord, since this is always necessary in practice (and is what speeds up mostly when removing fields from the record).

An "all fields" lines refer to use of a DFSRecord with all of use_preorder, use_postorder, use_edge and use_parents being true, and "minimum fields" refers to the least of these being enabled as possible (for iterative, none, for recursive this is only use_parents and use_edge). Iterative with all fields enabled clearly performs worse, and there seems to be an increase in how much worse the "all fields" configurations perform for larger graphs. It is not completely clear whether iterative or recursive is better for "min fields". The recursive procedure is limited by needing to store edge and parents fields. Although, recursive DFS should have a lesser memory footprint (the stack for iterative DFS is unbounded).

Examples for CompleteDigraphs

A CompleteDigraph is a particularly bad case for both procedures, since all successors that have not been visited are pushed to the stack in iterative DFS, and recursive DFS calls it's function for each successor. DominatorTree and VerticesReachableFrom appear to show consistent slowdown on size increase of the graph (more runs may be needed to investigate DominatorTree on larger graphs due to the increase when the size is 1500). The slowdown of DominatorTree and VerticesReachableFrom as compared to the currently implemented non generic DFS algorithms is shown below.

Slowdown of DominatorTree and VerticesReachableFrom for CompleteDigraphs

DominatorTreeVerticesReachableFrom

UndirectedSpanningForest has a high slowdown factor for CompleteDigraphs. I am not sure whether replacing this algorithm with generic DFS is a good idea (but would like to further see why this occurs, since this was not the case for DominatorTree or VerticesReachableFrom when using inputs of CompleteDigraph, although I should test with higher sizes of CompleteDigraphs, but the runtimes were becoming unpractically large).

Slowdown of UndirectedSpanningForest for CompleteDigraphs

UndirectedSpanningForestComplete

Slowdown of UndirectedSpanningForest for BinaryTrees

BinaryTree

The slowdown is much less significant for BinaryTrees which has less of the behaviour where a very large amount of calls
are made for trying to visit vertices, or for iterative (currently being used for this procedure), there are less occurences of large quantities of repeated nodes being pushed to the stack.

Examples for BinaryTrees

The slowdown factors for all of DigraphTopologicalSort, IsAntiSymmetricGraph and IsAcyclicGraph is not significant as shown below (when called with BinaryTree).

Slowdown of UndirectedSpanningForest, DigraphTopologicalSort, IsAntiSymmetricGraph and IsAcyclicGraph for BinaryTrees

BinaryTree

Examples for Modified ChainGraphs

For the following examples, I modified a ChainDigraph to concatenate 10 of itself into a single graph (so each graph has 10 ChainDigraphs with a root node of 1 that are all connected). This was to allow for an acyclic example that was not completely trivial.

Slowdown of DigraphPath and LongestDistanceFromVertex for modified ChainDigraphs

PathLongestDistance

The slowdown seems relatively consistent when increasing the size of this particular graph type for DigraphPath and LongestDistanceFromVertex

@saffronmciver
Copy link
Contributor Author

saffronmciver commented Apr 30, 2025

Some other notes on the state of this procedure

  • Further investigation on whether using the recursive or iterative procedure is better might be useful (I think this can just depend on the type of graph, but I would like to understand why CompleteDigraphs are much worse for UndirectedSpanningForest).
  • I would like to add more tests to some implementations, I have checked through each one and tried to think of any ways they would not be correct, but I think some even more tests would be beneficial.
  • More tests should be added to ensure behaviour of DFSRecord configurations are correct (epecially combinations of them, and to check ExecuteDFS function behaviour is consistent across them when it should be). These were recently added so I have not got round to adding these tests.
  • Making sure there are no other procedures existing that are currently using a non generic DFS procedure that would not perform too much worse with generic DFS (and that would be able to be implemented using it)
  • Pushing backtrack nodes to the iterative stack is not necessary if no PostOrderFunc exists, and if record.config.use_postorder is false.
  • If there is any way that the code can be made tidier while keeping the configuration options, this would also be good to change so it is more maintainable.
  • StrongOrientation is currently not implemented for Symmetric Digraphs (uses generic DFS; the version in main is also not implemented for these graphs).

Overview of what has been implemented

  • Refactor of the original dfs.c file with both an iterative and recursive procedure. The recursive procedure uses tail call optimisation so that it does not stack overflow when doing a deep graph search (e.g. ChainDigraph(2000000)).
  • The DFSRecord still uses plain lists since I found using HashMaps had too much overhead of hashing for lookups and assignments to justify the reduction in memory usage for graph searches where the graph is only partially traversed.
  • Added back in each implementation and refactored for the new procedure. Added a specific DFS version of VerticesReachableFrom for a digraph and a list (in addition to the version for a single vertex) that uses record.config.forest_specific.
  • Added the config field to DFSRecord to specify which record fields to use and therefore assign when calling NewDFSRecord with a specific configuration. Options also exist to visit all components (forest), and to make sure specified vertices are visited (forest_specific). Iterative or recursive can be used by specifying config.iterative. config.revisit means vertices can be revisited. forest configuration options reduce overhead from calling ExecuteDFS multiple times.
  • Each implementation uses a DFSRecord configuration (record.config) that ensures the least amount of record fields are assigned when they are not needed.
  • Documentation was updated to reflect the new usage of ExecuteDFS and it's related functions (NewDFSFlags and NewDFSRecord)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A label for issues or PR related to performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants