git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT


Thanks



----- Original Message -----
> From: "Rohit Yadav" <rohit.yadav@xxxxxxxxxxxxx>
> To: "dev" <dev@xxxxxxxxxxxxxxxxxxxxx>, "dev" <dev@xxxxxxxxxxxxxxxxxxxxx>
> Sent: Friday, 20 April, 2018 10:35:55
> Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT

> Hi Andrei,
> 
> I've fixed this recently, please see
> https://github.com/apache/cloudstack/pull/2579
> 
> As a workaround you can add routing rules manually. On the PR, there is a link
> to a comment that explains the issue and suggests manual workaround. Let me
> know if that works for you.
> 
> Regards.
> 
> 
> From: Andrei Mikhailovsky
> Sent: Friday, 20 April, 2:21 PM
> Subject: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
> To: dev
> 
> 
> Hello, I have been posting to the users thread about this issue. here is a quick
> summary in case if people contributing to the source nat code on the VPC side
> would like to fix this issue. Problem summary: no connectivity between virtual
> machines behind two Static NAT networks. Problem case: When one virtual machine
> sends a packet to the external address of the another virtual machine that are
> handled by the same router and both are behind the Static NAT the traffic does
> not work. 10.1.10.100 10.1.10.1:eth2 eth3:10.1.20.1 10.1.20.100 virt1 router
> virt2 178.248.108.77:eth1:178.248.108.113 a single packet is send from virt1 to
> virt2. stage1: it arrives to the router on eth2 and enters "nat_PREROUTING"
> IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113) goes through the "10 1K DNAT
> all -- * * 0.0.0.0/0 178.248.108.113 to:10.1.20.100 " rule and has the DST
> DNATED to the internal IP of the virt2 stage2: Enters the FORWARDING chain and
> is being DROPPED by the default policy. DROPPED:IN=eth2 OUT=eth1
> SRC=10.1.10.100 DST=10.1.20.100 The reason being is that the OUT interface is
> not correctly changed from eth1 to eth3 during the nat_PREROUTING so the packet
> is not intercepted by the FORWARD rule and thus not accepted. "24 14K
> ACL_INBOUND_eth3 all -- * eth3 0.0.0.0/0 10.1.20.0/24" stage3: manually
> inserted rule to accept this packet for FORWARDING. the packet enters the
> "nat_POSTROUTING" chain IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100 and has
> the SRC changed to the external IP 16 1320 SNAT all -- * eth1 10.1.10.100
> 0.0.0.0/0 to:178.248.108.77 and is sent to the external network on eth1.
> 13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id 2644,
> seq 2, length 64 For some reason, during the nat_PREROUTING stage the DST_IP is
> changed, but the OUT interface still reflects the interface associated with the
> old DST_IP. Here is the routing table # ip route list default via 178.248.108.1
> dev eth1 10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1
> 10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1 169.254.0.0/16 dev
> eth0 proto kernel scope link src 169.254.0.5 178.248.108.0/25 dev eth1 proto
> kernel scope link src 178.248.108.101 # ip rule list 0: from all lookup local
> 32761: from all fwmark 0x3 lookup Table_eth3 32762: from all fwmark 0x2 lookup
> Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1 32764: from 10.1.0.0/16
> lookup static_route_back 32765: from 10.1.0.0/16 lookup static_route 32766:
> from all lookup main 32767: from all lookup default Further into the
> investigation, the problem was pinned down to those rules. All the traffic from
> internal IP on the static NATed connection were forced to go to the outside
> interface (eth1), by setting the mark 0x1 and then using the matching # ip rule
> to direct it. #iptables -t mangle -L PREROUTING -vn Chain PREROUTING (policy
> ACCEPT 97 packets, 11395 bytes) pkts bytes target prot opt in out source
> destination 49 3644 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW
> CONNMARK save 37 2720 MARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW MARK set
> 0x1 37 2720 CONNMARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW CONNMARK save
> 114 8472 MARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW MARK set 0x1 114 8472
> CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save # ip rule 0:
> from all lookup local 32761: from all fwmark 0x3 lookup Table_eth3 32762: from
> all fwmark 0x2 lookup Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1
> 32764: from 10.1.0.0/16 lookup static_route_back 32765: from 10.1.0.0/16 lookup
> static_route 32766: from all lookup main 32767: from all lookup default The
> acceptable solution is to delete those rules all together.? The problem with
> such approach is that the inter VPC traffic will use the internal IP addresses,
> so the packets going from 178.248.108.77 to 178.248.108.113 would be seen as
> communication between 10.1.10.100 and 10.1.20.100 thus we need to apply further
> two rules # iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d
> 10.1.20.0/24 -j SNAT --to-source 178.248.108.77 # iptables -t nat -I
> POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j SNAT --to-source
> 178.248.108.113 in order to make sure that the packets leaving the router would
> have correct source IP. This way it is possible to have static NAT on all of
> the IPS within the VPC and ensure a successful communication between them. So,
> for a quick and dirty fix, we ran this command on the VR: for i in iptables -t
> mangle -L PREROUTING -vn | awk '/0x1/ && !/eth1/ {print $8}'; do iptables -t
> mangle -D PREROUTING -s $i -m state —state NEW -j MARK —set-mark "0x1" ; done
> The issue has been introduced around early 4.9.x releases I believe. Thanks
> Andrei
> rohit.yadav@xxxxxxxxxxxxx
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> ----- Original Message ----- > From: "Andrei Mikhailovsky" > To: "users" > Sent:
> Monday, 16 April, 2018 22:32:25 > Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
> > Hello, > > I have done some more testing with the VPC network tiers and it
> seems that the > Static NAT is indeed causing connectivity issues. Here is what
> I've done: > > > Setup 1. I have created two test network tiers with one guest
> vm in each tier. > Static NAT is NOT enabled. Each VM has a port forwarding
> rule (port 22) from > its dedicated public IP address. ACLs have been setup to
> allow traffic on port > 22 from the private ip addresses on each network tier.
> > > 1. ACLs seems to work just fine. traffic between the networks flows
> according to > the rules. both vms can see each other's private IPs and can
> ping/ssh/etc > > 2. From the Internet hosts can access vms on port 22 > > 4.
> The vms can also access each other and itself on their public IPs. I don't >
> think this worked before, but could be wrong. > > > > Setup 2. Everything the
> same as Setup 1, but one public IP address has been > setup as Static NAT to
> one guest vm. the second guest vm and second public IP > remained unchanged. >
> > 1. ACLs stopped working correctly (see below) > > 2. From the Internet hosts
> can access vms on port 22, including the Static NAT > vm > > 3. Other guest vms
> can access the Static NAT vm using private & public IP > addresses > > 4.
> Static NAT vm can NOT access other vms neither using public nor private IPs > >
> 5. Static NAT vm can access the internet hosts (apart from the public IP range
> > belonging to the cloudstack setup) > > > The above behaviour of Setup 2
> scenarios is very strange, especially points 4 & > 5. > > Any thoughts anyone?
> > > Cheers > > ----- Original Message ----- >> From: "Rohit Yadav" >> To:
> "users" >> Sent: Thursday, 12 April, 2018 12:06:54 >> Subject: Re: Upgrade from
> ACS 4.9.3 to 4.11.0 > >> Hi Andrei, >> >> >> Thanks for sharing, yes the egress
> thing is a known issue which is caused due to >> failure during VR setup to
> create egress table. By performing a restart of the >> network (without cleanup
> option selected), the egress table gets created and >> rules are successfully
> applied. >> >> >> The issue has been fixed in the vr downtime pr: >> >>
> https://github.com/apache/cloudstack/pull/2508 >> >> >> - Rohit >> >> >> >> >>
> >> ________________________________ >> From: Andrei Mikhailovsky >> Sent:
> Tuesday, April 3, 2018 3:33:43 PM >> To: users >> Subject: Re: Upgrade from ACS
> 4.9.3 to 4.11.0 >> >> Rohit, >> >> Following the update from 4.9.3 to 4.11.0, I
> would like to comment on a few >> things: >> >> 1. The upgrade went well, a
> part from the cloudstack-management server startup >> issue that I've described
> in my previous email. >> 2. there was an issue with the virtual router template
> upgrade. The issue is >> described below: >> >> VR template upgrade issue: >>
> >> After updating the systemvm template I went onto the Infrastructure >
> Virtual >> Routers and selected the Update template option for each virtual
> router. The >> virtual routers were updated successfully using the new
> templates. However, >> this has broken ALL Egress rules on all networks and
> none of the guest vms. >> Port forwarding / incoming rules were working just
> fine. Removal and addition >> of Egress rules did not fix the issue. To fix the
> issue I had to restart each >> of the networks with the Clean up option ticked.
> >> >> >> Cheers >> >> Andrei >> >> rohit.yadav@xxxxxxxxxxxxx >>
> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >>
> @shapeblue >> >> >> >> ----- Original Message ----- >>> From: "Andrei
> Mikhailovsky" >>> To: "users" >>> Sent: Monday, 2 April, 2018 21:44:27 >>>
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >> >>> Hi Rohit, >>> >>>
> Following some further investigation it seems that the installation packages
> >>> replaced the following file: >>> >>> /etc/default/cloudstack-management >>>
> >>> with >>> >>> /etc/default/cloudstack-management.dpkg-dist >>> >>> >>> Thus,
> the management server couldn't load the env variables and thus was unable >>>
> to start. >>> >>> I've put the file back and the management server is able to
> start. >>> >>> I will let you know if there are any other issues/problems. >>>
> >>> Cheers >>> >>> Andrei >>> >>> >>> >>> ----- Original Message ----- >>>>
> From: "Andrei Mikhailovsky" >>>> To: "users" >>>> Sent: Monday, 2 April, 2018
> 20:58:59 >>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>> >>>> Hi Rohit,
> >>>> >>>> I have just upgraded and having issues starting the service with the
> following >>>> error: >>>> >>>> >>>> Apr 02 20:56:37 ais-cloudhost13
> systemd[1]: cloudstack-management.service: >>>> Failed to load environment
> files: No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13
> systemd[1]: cloudstack-management.service: >>>> Failed to run 'start-pre' task:
> No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]:
> Failed to start CloudStack >>>> Management Server. >>>> -- Subject: Unit
> cloudstack-management.service has failed >>>> -- Defined-By: systemd >>>> >>>>
> Cheers >>>> >>>> Andrei >>>> >>>> ----- Original Message ----- >>>>> From:
> "Rohit Yadav" >>>>> To: "users" >>>>> Sent: Friday, 30 March, 2018 19:17:48
> >>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>>> >>>>> Some of the
> upgrade and minor issues have been fixed and will make their way >>>>> into
> 4.11.1.0. You're welcome to upgrade and share your feedback, but bear in >>>>>
> mind due to some changes a new/updated systemvmtemplate need to be issued for
> >>>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0 releases,
> but >>>>> 4.11.0.0 users will have to register that new template). >>>>> >>>>>
> >>>>> >>>>> - Rohit >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>
> ________________________________ >>>>> From: Andrei Mikhailovsky >>>>> Sent:
> Friday, March 30, 2018 11:00:34 PM >>>>> To: users >>>>> Subject: Upgrade from
> ACS 4.9.3 to 4.11.0 >>>>> >>>>> Hello, >>>>> >>>>> My current infrastructure is
> ACS 4.9.3 with KVM based on Ubuntu 16.04 servers >>>>> for the KVM hosts and
> the management server. >>>>> >>>>> I am planning to perform an upgrade from ACS
> 4.9.3 to 4.11.0 and was wondering >>>>> if anyone had any issues during the
> upgrades? Anything to watch out for? >>>>> >>>>> I have previously seen issues
> with upgrading to 4.10, which required some manual >>>>> db updates from what I
> recall. Has this issue been fixed in the 4.11 upgrade >>>>> process? >>>>>
> >>>>> thanks >>>>> >>>>> Andrei >>>>> >>>>> rohit.yadav@xxxxxxxxxxxxx >>>>>
> www.shapeblue.com >>>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK > >
> > > > @shapeblue