git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

More CPUs doen't equal more speed


Just got a 1 liner working with parallel. Super! All I ended up doing is:

    parallel mma {} ::: *mma

which whizzed through my files in less than 1/4 of the time of my
one-at-a-time script. (In case anyone is wondering, or cares, this is a
bunch of Musical Midi Accompaniment files:
https://mellowood.ca/mma/index.html).

On Fri, May 24, 2019 at 9:28 AM Rob Gaddi <rgaddi at highlandtechnology.invalid>
wrote:

> On 5/23/19 6:32 PM, Cameron Simpson wrote:
> > On 23May2019 17:04, bvdp <bob at mellowood.ca> wrote:
> >> Anyway, yes the problem is that I was naively using command.getoutput()
> >> which blocks until the command is finished. So, of course, only one
> >> process
> >> was being run at one time! Bad me!
> >>
> >> I guess I should be looking at subprocess.Popen(). Now, a more relevant
> >> question ... if I do it this way I then need to poll though a list of
> >> saved
> >> process IDs to see which have finished? Right? My initial thought is to
> >> batch them up in small groups (say CPU_COUNT-1) and wait for that
> >> batch to
> >> finish, etc. Would it be foolish to send send a large number (1200 in
> >> this
> >> case since this is the number of files) and let the OS worry about
> >> scheduling and have my program poll 1200 IDs?
> >>
> >> Someone mentioned the GIL. If I launch separate processes then I don't
> >> encounter this issue? Right?
> >
> > Yes, but it becomes more painful to manage. If you're issues distinct
> > separate commands anyway, dispatch many or all and then wait for them as
> > a distinct step.  If the commands start thrashing the rest of the OS
> > resources (such as the disc) then you may want to do some capacity
> > limitation, such as a counter or semaphore to limit how many go at once.
> >
> > Now, waiting for a subcommand can be done in a few ways.
> >
> > If you're then parent of all the processes you can keep a set() of the
> > issued process ids and then call os.wait() repeatedly, which returns the
> > pid of a completed child process. Check it against your set. If you need
> > to act on the specific process, use a dict to map pids to some record of
> > the subprocess.
> >
> > Alternatively, you can spawn a Python Thread for each subcommand, have
> > the Thread dispatch the subcommand _and_ wait for it (i.e. keep your
> > command.getoutput() method, but in a Thread). Main programme waits for
> > the Threads by join()ing them.
> >
>
> I'll just note, because no one else has brought it up yet, that rather
> than manually creating threads and/or process pools for all these
> things, this is exactly what the standard concurrent.futures module is
> for.  It's a fairly brilliant wrapper around all this stuff, and I feel
> like it often doesn't get enough love.
>
>
> --
> Rob Gaddi, Highland Technology -- www.highlandtechnology.com
> Email address domain is currently out of order.  See above to fix.
> --
> https://mail.python.org/mailman/listinfo/python-list
>


-- 

**** Listen to my FREE CD at http://www.mellowood.ca/music/cedars ****
Bob van der Poel ** Wynndel, British Columbia, CANADA **
EMAIL: bob at mellowood.ca
WWW:   http://www.mellowood.ca