Multiprocessing and memory management
On 03/07/2019 18.37, Israel Brewster wrote:
> I have a script that benefits greatly from multiprocessing (it?s generating a bunch of images from data). Of course, as expected each process uses a chunk of memory, and the more processes there are, the more memory used. The amount used per process can vary from around 3 GB (yes, gigabytes) to over 40 or 50 GB, depending on the amount of data being processed (usually closer to 10GB, the 40/50 is fairly rare). This puts me in a position of needing to balance the number of processes with memory usage, such that I maximize resource utilization (running one process at a time would simply take WAY to long) while not overloading RAM (which at best would slow things down due to swap).
> Obviously this process will be run on a machine with lots of RAM, but as I don?t know how large the datasets that will be fed to it are, I wanted to see if I could build some intelligence into the program such that it doesn?t overload the memory. A couple of approaches I thought of:
> 1) Determine the total amount of RAM in the machine (how?), assume an average of 10GB per process, and only launch as many processes as calculated to fit. Easy, but would run the risk of under-utilizing the processing capabilities and taking longer to run if most of the processes were using significantly less than 10GB
> 2) Somehow monitor the memory usage of the various processes, and if one process needs a lot, pause the others until that one is complete. Of course, I?m not sure if this is even possible.
> 3) Other approaches?
Are you familiar with Dask? <https://docs.dask.org/en/latest/>
I don't know it myself other than through hearsay, but I have a feeling
it may have a ready-to-go solution to your problem. You'd have to look
into dask in more detail than I have...