Skip to content

Eats all ram and Crashes. #383

@basejn

Description

@basejn

I have csv file with 15 000 text documents. I load them in SFrame.
Then i count the words with .apply , from a predefined vocabulary , and create a new column with a wordcount vector .

The vocabulary size . respectively the vector size , is 50 000 . This means that after each apply a 50 000 array is generated.
If the vocabulary is 10 000 for eaxample , there is no problem , but with bigger sizes the problems shows up.

Generating the vectors is ok , it takes some time(1 minute) but the ram stays in reasonable borders. max peaks of 2-3 gb for the main python process and peaks of 1,5gb for each worker.
The problem comes when i try to get a row from the sframe.

No matter if using indexing (mySframe[0]) or with iteration (for row in mySframe:...).
Then the ram starts to expand and finaly it crashes . (Even the windows shows a message , to close programs to prevent data loss)
The problem happens on the first reading of the first row (mySframe[0]) , not ahead in iteration.

My final goal it to use this vectors for training a model with SGD . I will need only small batches of data in the same time. So i will have to iterate the dataset acouple of times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions