Skip to content

Memory issue when repeatedly creating large DataFrames #2902

Open
@eirikbrandsaas

Description

@eirikbrandsaas

Description

Creating a large panel, and then "overwriting" it in a loop causes the process to just eat up more and more memory.

MWE

Note that this will example will use significant memory on your computer

using DataFrames
function inner_df(Nrow,Ncol)

    df = DataFrame(rand(Nrow,Ncol),:auto)
end

function outer_df(N)
    Nrow=76
    Ncol=21
    df=DataFrame(rand(0,Ncol),:auto)

    for i = 1:N
        append!(df,inner_df(Nrow,Ncol))
    end
    return df
end

function iterate(Niter)
    for i =1:Niter
        println(i)
        @time pan = outer_df(1000*10*10)
    end
end

iterate(10)

Expected Behavior

I don't know anything about the internal workins of this, but I assumed that in the second iteration of iterate that memory usage would not increase. On my system, virtual, residential, and %RAM all increase (outputs from top) on every iteration. Moreover, the memory is not cleared fully after the function terminates. Running it one more increases memory usage even more. Running garbage collection GC.gc() doesn't work either.

Versions:

[a93c6f00] DataFrames v1.2.2
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)
Environment:
  JULIA_DEPOT_PATH = /sftwr/user-pkg/m1eeb00/julia
  JULIA_PKG_SERVER = pkg.julialang.org
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 6
  JULIA_BINDIR = /apps/julia/current/bin/frb

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions