git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] CloudStack graceful shutdown


I may be remembering this incorrectly, but from what I recall, if a resource is owned by one MS and a request related to that resource comes in to another MS, the MS that received the request passes it on to the other MS.

> On Apr 4, 2018, at 2:36 PM, Rafael Weingärtner <rafaelweingartner@xxxxxxxxx> wrote:
> 
> Big +1 for this feature; I only have a few doubts.
> 
> * Regarding the tasks/jobs that management servers (MSs) execute; are these
> tasks originate from requests that come to the MS, or is it possible that
> requests received by one management server to be executed by other? I mean,
> if I execute a request against MS1, will this request always be
> executed/threated by MS1, or is it possible that this request is executed
> by another MS (e.g. MS2)?
> 
> * I would suggest that after we block traffic coming from 8080/8443/8250(we
> will need to block this as well right?), we can log the execution of tasks.
> I mean, something saying, there are XXX tasks (enumerate tasks) still being
> executed, we will wait for them to finish before shutting down.
> 
> * The timeout (60 minutes suggested) could be global settings that we can
> load before executing the graceful-shutdown.
> 
> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <ilya.mailing.lists@xxxxxxxxx>
> wrote:
> 
>> Use case:
>> In any environment - time to time - administrator needs to perform a
>> maintenance. Current stop sequence of cloudstack management server will
>> ignore the fact that there may be long running async jobs - and terminate
>> the process. This in turn can create a poor user experience and occasional
>> inconsistency  in cloudstack db.
>> 
>> This is especially painful in large environments where the user has
>> thousands of nodes and there is a continuous patching that happens around
>> the clock - that requires migration of workload from one node to another.
>> 
>> With that said - i've created a script that monitors the async job queue
>> for given MS and waits for it complete all jobs. More details are posted
>> below.
>> 
>> I'd like to introduce "graceful-shutdown" into the systemctl/service of
>> cloudstack-management service.
>> 
>> The details of how it will work is below:
>> 
>> Workflow for graceful shutdown:
>>  Using iptables/firewalld - block any connection attempts on 8080/8443 (we
>> can identify the ports dynamically)
>>  Identify the MSID for the node, using the proper msid - query async_job
>> table for
>> 1) any jobs that are still running (or job_status=“0”)
>> 2) job_dispatcher not like “pseudoJobDispatcher"
>> 3) job_init_msid=$my_ms_id
>> 
>> Monitor this async_job table for 60 minutes - until all async jobs for MSID
>> are done, then proceed with shutdown
>>    If failed for any reason or terminated, catch the exit via trap command
>> and unblock the 8080/8443
>> 
>> Comments are welcome
>> 
>> Regards,
>> ilya
>> 
> 
> 
> 
> -- 
> Rafael Weingärtner


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-cloudstack-development/msg07190.html on line 136
Call Stack
#TimeMemoryFunctionLocation
10.0007368816{main}( ).../msg07190.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-cloudstack-development/msg07190.html on line 136
Call Stack
#TimeMemoryFunctionLocation
10.0007368816{main}( ).../msg07190.html:0