Skip to content

potential for further performance improvements #9

@ltalirz

Description

@ltalirz

While pycodcif is the fastest tool tested here https://github.com/ltalirz/cif-parsing-benchmark
there might still be low-hanging fruit for further optimization:

Screenshot 2019-06-10 at 12 16 20

Only about a third of the time is spent in parse_cif and

  1. More time is spent in decode_utf8_frame
  2. Significant time is spent in extract_precision

My questions would be

Re 1.: Without knowing details of what this function does - if it's really about decoding utf8, could this perhaps be done once per file rather than once per every element (e.g. decode_utf8_typed_values is called 1.7M times on the test set)?
Even if not, this function could probably be sped up significantly by moving it to C.

Re 2.: How about making this optional, i.e. adding a flag that allows to disable extracting precision?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions