Skip to content

Find code duplication, standard libraries and inlined functions #2

@7i

Description

@7i

To find reused code, like in-lined functions or partially duplicated functions, we could normalize basic blocks and save an anonymous representation of it so we later can identify matching basic blocks in a control flow graph. This representation can be a list of instruction names eg. "add phi div icmp"

First we could create a data-dependence graph (find all instructions that depend on other instructions) to find possible orderings inside a basic block.
We also have to identify SubBlock orderings too (using the term "SubBlock" with lack of a better word to refer to a number of instructions that have a dependency chain connecting them).
Perhaps we can use something from current instruction scheduling algorithms?

Example of SubBlock's:
basic block:

1,2,3,4,5,6,7,8,9 // 1-9 represent 9 instructions

Found dependencies:

[3 [1][2]    ]	// to run 3 we need to have executed 1 and 2 before
[5 [3][4]    ]  // to run 5 we need to have executed 3 and 4 before 
[9 [6][7][8] ]  // to run 9 we need to have executed 6, 7 and 8 before 

Note that 1 and 2 can be executed in any order, same is true for:
3 and 4
6, 7 and 8
but also the SubBlock 1 to 5 can be executed before or after the SubBlock 6 to 9

To normalize this we follow predefined rules.
Example:

  • SubBlock with least number of instructions first.
  • If two SubBlocks have same number of instructions take alphabetical name of first non-matching instruction 'a' first and 'z' last
  • Instructions alphabetical name 'a' first and 'z' last

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions