Skip to content

json.dumps takes significant CPU time in data heavy workloads like ETL or streaming #689

@widmogrod

Description

@widmogrod

Hi Team,

I use the crate-python driver during CDC process that writes data to CrateDB.
During benchmarking and optimizations of the process I recoded CPU flamegraphs that show that a lot of CPU time was spent during json serialization

return json.dumps(data, cls=CrateJsonEncoder)

There are publicly available benchmarks reporting that json.dumps is very slow, and changing the library to ujson or orjson can make a huge difference. I can confirm that swapping json to ujson reduces time spent on serialization by around 30-40%.

I didn't check how this impacts correctness since the current implementation uses JSON.dumps(cls=) to provide custom transformation logic.

Cheers,
Gabriel

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions