git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT


Andrei, if not mistaken I believe I saw same behavior even on 4.8 - in our
case, what I vaguely remember was, that we configure Port Forwarding
instead of Static NAT - it did solve our use case (for some customer), but
maybe it's not acceptable for you...

Cheers

On Mon, 9 Jul 2018 at 18:27, Andrei Mikhailovsky <andrei@xxxxxxxxxx.invalid>
wrote:

> Hi Rohit,
>
> I would like to send you a quick update on this issue. I have recently
> upgraded to 4.11.1.0 with the new system vm templates. The issue that I've
> described is still present in the latest release. Hasn't it been included
> in the latest 4.11 maintenance release? I thought that it would be as it
> breaks the major function of the VPC.
>
> Cheers.
>
> Andrei
>
> ----- Original Message -----
> > From: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx.INVALID>
> > To: "dev" <dev@xxxxxxxxxxxxxxxxxxxxx>
> > Sent: Friday, 20 April, 2018 11:52:30
> > Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
>
> > Thanks
> >
> >
> >
> > ----- Original Message -----
> >> From: "Rohit Yadav" <rohit.yadav@xxxxxxxxxxxxx>
> >> To: "dev" <dev@xxxxxxxxxxxxxxxxxxxxx>, "dev" <dev@xxxxxxxxxxxxxxxxxxxxx
> >
> >> Sent: Friday, 20 April, 2018 10:35:55
> >> Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
> >
> >> Hi Andrei,
> >>
> >> I've fixed this recently, please see
> >> https://github.com/apache/cloudstack/pull/2579
> >>
> >> As a workaround you can add routing rules manually. On the PR, there is
> a link
> >> to a comment that explains the issue and suggests manual workaround.
> Let me
> >> know if that works for you.
> >>
> >> Regards.
> >>
> >>
> >> From: Andrei Mikhailovsky
> >> Sent: Friday, 20 April, 2:21 PM
> >> Subject: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT
> >> To: dev
> >>
> >>
> >> Hello, I have been posting to the users thread about this issue. here
> is a quick
> >> summary in case if people contributing to the source nat code on the
> VPC side
> >> would like to fix this issue. Problem summary: no connectivity between
> virtual
> >> machines behind two Static NAT networks. Problem case: When one virtual
> machine
> >> sends a packet to the external address of the another virtual machine
> that are
> >> handled by the same router and both are behind the Static NAT the
> traffic does
> >> not work. 10.1.10.100 10.1.10.1:eth2 eth3:10.1.20.1 10.1.20.100 virt1
> router
> >> virt2 178.248.108.77:eth1:178.248.108.113 a single packet is send from
> virt1 to
> >> virt2. stage1: it arrives to the router on eth2 and enters
> "nat_PREROUTING"
> >> IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113) goes through the "10
> 1K DNAT
> >> all -- * * 0.0.0.0/0 178.248.108.113 to:10.1.20.100 " rule and has the
> DST
> >> DNATED to the internal IP of the virt2 stage2: Enters the FORWARDING
> chain and
> >> is being DROPPED by the default policy. DROPPED:IN=eth2 OUT=eth1
> >> SRC=10.1.10.100 DST=10.1.20.100 The reason being is that the OUT
> interface is
> >> not correctly changed from eth1 to eth3 during the nat_PREROUTING so
> the packet
> >> is not intercepted by the FORWARD rule and thus not accepted. "24 14K
> >> ACL_INBOUND_eth3 all -- * eth3 0.0.0.0/0 10.1.20.0/24" stage3: manually
> >> inserted rule to accept this packet for FORWARDING. the packet enters
> the
> >> "nat_POSTROUTING" chain IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100
> and has
> >> the SRC changed to the external IP 16 1320 SNAT all -- * eth1
> 10.1.10.100
> >> 0.0.0.0/0 to:178.248.108.77 and is sent to the external network on
> eth1.
> >> 13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id
> 2644,
> >> seq 2, length 64 For some reason, during the nat_PREROUTING stage the
> DST_IP is
> >> changed, but the OUT interface still reflects the interface associated
> with the
> >> old DST_IP. Here is the routing table # ip route list default via
> 178.248.108.1
> >> dev eth1 10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1
> >> 10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1
> 169.254.0.0/16 dev
> >> eth0 proto kernel scope link src 169.254.0.5 178.248.108.0/25 dev eth1
> proto
> >> kernel scope link src 178.248.108.101 # ip rule list 0: from all lookup
> local
> >> 32761: from all fwmark 0x3 lookup Table_eth3 32762: from all fwmark 0x2
> lookup
> >> Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1 32764: from
> 10.1.0.0/16
> >> lookup static_route_back 32765: from 10.1.0.0/16 lookup static_route
> 32766:
> >> from all lookup main 32767: from all lookup default Further into the
> >> investigation, the problem was pinned down to those rules. All the
> traffic from
> >> internal IP on the static NATed connection were forced to go to the
> outside
> >> interface (eth1), by setting the mark 0x1 and then using the matching #
> ip rule
> >> to direct it. #iptables -t mangle -L PREROUTING -vn Chain PREROUTING
> (policy
> >> ACCEPT 97 packets, 11395 bytes) pkts bytes target prot opt in out source
> >> destination 49 3644 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW
> >> CONNMARK save 37 2720 MARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW
> MARK set
> >> 0x1 37 2720 CONNMARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW
> CONNMARK save
> >> 114 8472 MARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW MARK set 0x1
> 114 8472
> >> CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save # ip
> rule 0:
> >> from all lookup local 32761: from all fwmark 0x3 lookup Table_eth3
> 32762: from
> >> all fwmark 0x2 lookup Table_eth2 32763: from all fwmark 0x1 lookup
> Table_eth1
> >> 32764: from 10.1.0.0/16 lookup static_route_back 32765: from
> 10.1.0.0/16 lookup
> >> static_route 32766: from all lookup main 32767: from all lookup default
> The
> >> acceptable solution is to delete those rules all together.? The problem
> with
> >> such approach is that the inter VPC traffic will use the internal IP
> addresses,
> >> so the packets going from 178.248.108.77 to 178.248.108.113 would be
> seen as
> >> communication between 10.1.10.100 and 10.1.20.100 thus we need to apply
> further
> >> two rules # iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d
> >> 10.1.20.0/24 -j SNAT --to-source 178.248.108.77 # iptables -t nat -I
> >> POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j SNAT --to-source
> >> 178.248.108.113 in order to make sure that the packets leaving the
> router would
> >> have correct source IP. This way it is possible to have static NAT on
> all of
> >> the IPS within the VPC and ensure a successful communication between
> them. So,
> >> for a quick and dirty fix, we ran this command on the VR: for i in
> iptables -t
> >> mangle -L PREROUTING -vn | awk '/0x1/ && !/eth1/ {print $8}'; do
> iptables -t
> >> mangle -D PREROUTING -s $i -m state —state NEW -j MARK —set-mark "0x1"
> ; done
> >> The issue has been introduced around early 4.9.x releases I believe.
> Thanks
> >> Andrei
> >> rohit.yadav@xxxxxxxxxxxxx
> >> www.shapeblue.com
> >> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> >> @shapeblue
> >>
> >>
> >>
> >> ----- Original Message ----- > From: "Andrei Mikhailovsky" > To:
> "users" > Sent:
> >> Monday, 16 April, 2018 22:32:25 > Subject: Re: Upgrade from ACS 4.9.3
> to 4.11.0
> >> > Hello, > > I have done some more testing with the VPC network tiers
> and it
> >> seems that the > Static NAT is indeed causing connectivity issues. Here
> is what
> >> I've done: > > > Setup 1. I have created two test network tiers with
> one guest
> >> vm in each tier. > Static NAT is NOT enabled. Each VM has a port
> forwarding
> >> rule (port 22) from > its dedicated public IP address. ACLs have been
> setup to
> >> allow traffic on port > 22 from the private ip addresses on each
> network tier.
> >> > > 1. ACLs seems to work just fine. traffic between the networks flows
> >> according to > the rules. both vms can see each other's private IPs and
> can
> >> ping/ssh/etc > > 2. From the Internet hosts can access vms on port 22 >
> > 4.
> >> The vms can also access each other and itself on their public IPs. I
> don't >
> >> think this worked before, but could be wrong. > > > > Setup 2.
> Everything the
> >> same as Setup 1, but one public IP address has been > setup as Static
> NAT to
> >> one guest vm. the second guest vm and second public IP > remained
> unchanged. >
> >> > 1. ACLs stopped working correctly (see below) > > 2. From the
> Internet hosts
> >> can access vms on port 22, including the Static NAT > vm > > 3. Other
> guest vms
> >> can access the Static NAT vm using private & public IP > addresses > >
> 4.
> >> Static NAT vm can NOT access other vms neither using public nor private
> IPs > >
> >> 5. Static NAT vm can access the internet hosts (apart from the public
> IP range
> >> > belonging to the cloudstack setup) > > > The above behaviour of Setup
> 2
> >> scenarios is very strange, especially points 4 & > 5. > > Any thoughts
> anyone?
> >> > > Cheers > > ----- Original Message ----- >> From: "Rohit Yadav" >>
> To:
> >> "users" >> Sent: Thursday, 12 April, 2018 12:06:54 >> Subject: Re:
> Upgrade from
> >> ACS 4.9.3 to 4.11.0 > >> Hi Andrei, >> >> >> Thanks for sharing, yes
> the egress
> >> thing is a known issue which is caused due to >> failure during VR
> setup to
> >> create egress table. By performing a restart of the >> network (without
> cleanup
> >> option selected), the egress table gets created and >> rules are
> successfully
> >> applied. >> >> >> The issue has been fixed in the vr downtime pr: >> >>
> >> https://github.com/apache/cloudstack/pull/2508 >> >> >> - Rohit >> >>
> >> >> >>
> >> >> ________________________________ >> From: Andrei Mikhailovsky >>
> Sent:
> >> Tuesday, April 3, 2018 3:33:43 PM >> To: users >> Subject: Re: Upgrade
> from ACS
> >> 4.9.3 to 4.11.0 >> >> Rohit, >> >> Following the update from 4.9.3 to
> 4.11.0, I
> >> would like to comment on a few >> things: >> >> 1. The upgrade went
> well, a
> >> part from the cloudstack-management server startup >> issue that I've
> described
> >> in my previous email. >> 2. there was an issue with the virtual router
> template
> >> upgrade. The issue is >> described below: >> >> VR template upgrade
> issue: >>
> >> >> After updating the systemvm template I went onto the Infrastructure >
> >> Virtual >> Routers and selected the Update template option for each
> virtual
> >> router. The >> virtual routers were updated successfully using the new
> >> templates. However, >> this has broken ALL Egress rules on all networks
> and
> >> none of the guest vms. >> Port forwarding / incoming rules were working
> just
> >> fine. Removal and addition >> of Egress rules did not fix the issue. To
> fix the
> >> issue I had to restart each >> of the networks with the Clean up option
> ticked.
> >> >> >> >> Cheers >> >> Andrei >> >> rohit.yadav@xxxxxxxxxxxxx >>
> >> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N
> 4HSUK >>
> >> @shapeblue >> >> >> >> ----- Original Message ----- >>> From: "Andrei
> >> Mikhailovsky" >>> To: "users" >>> Sent: Monday, 2 April, 2018 21:44:27
> >>>
> >> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >> >>> Hi Rohit, >>> >>>
> >> Following some further investigation it seems that the installation
> packages
> >> >>> replaced the following file: >>> >>>
> /etc/default/cloudstack-management >>>
> >> >>> with >>> >>> /etc/default/cloudstack-management.dpkg-dist >>> >>>
> >>> Thus,
> >> the management server couldn't load the env variables and thus was
> unable >>>
> >> to start. >>> >>> I've put the file back and the management server is
> able to
> >> start. >>> >>> I will let you know if there are any other
> issues/problems. >>>
> >> >>> Cheers >>> >>> Andrei >>> >>> >>> >>> ----- Original Message -----
> >>>>
> >> From: "Andrei Mikhailovsky" >>>> To: "users" >>>> Sent: Monday, 2
> April, 2018
> >> 20:58:59 >>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>> >>>> Hi
> Rohit,
> >> >>>> >>>> I have just upgraded and having issues starting the service
> with the
> >> following >>>> error: >>>> >>>> >>>> Apr 02 20:56:37 ais-cloudhost13
> >> systemd[1]: cloudstack-management.service: >>>> Failed to load
> environment
> >> files: No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13
> >> systemd[1]: cloudstack-management.service: >>>> Failed to run
> 'start-pre' task:
> >> No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13
> systemd[1]:
> >> Failed to start CloudStack >>>> Management Server. >>>> -- Subject: Unit
> >> cloudstack-management.service has failed >>>> -- Defined-By: systemd
> >>>> >>>>
> >> Cheers >>>> >>>> Andrei >>>> >>>> ----- Original Message ----- >>>>>
> From:
> >> "Rohit Yadav" >>>>> To: "users" >>>>> Sent: Friday, 30 March, 2018
> 19:17:48
> >> >>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>>> >>>>> Some of
> the
> >> upgrade and minor issues have been fixed and will make their way >>>>>
> into
> >> 4.11.1.0. You're welcome to upgrade and share your feedback, but bear
> in >>>>>
> >> mind due to some changes a new/updated systemvmtemplate need to be
> issued for
> >> >>>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0
> releases,
> >> but >>>>> 4.11.0.0 users will have to register that new template).
> >>>>> >>>>>
> >> >>>>> >>>>> - Rohit >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>
> >> ________________________________ >>>>> From: Andrei Mikhailovsky >>>>>
> Sent:
> >> Friday, March 30, 2018 11:00:34 PM >>>>> To: users >>>>> Subject:
> Upgrade from
> >> ACS 4.9.3 to 4.11.0 >>>>> >>>>> Hello, >>>>> >>>>> My current
> infrastructure is
> >> ACS 4.9.3 with KVM based on Ubuntu 16.04 servers >>>>> for the KVM
> hosts and
> >> the management server. >>>>> >>>>> I am planning to perform an upgrade
> from ACS
> >> 4.9.3 to 4.11.0 and was wondering >>>>> if anyone had any issues during
> the
> >> upgrades? Anything to watch out for? >>>>> >>>>> I have previously seen
> issues
> >> with upgrading to 4.10, which required some manual >>>>> db updates
> from what I
> >> recall. Has this issue been fixed in the 4.11 upgrade >>>>> process?
> >>>>>
> >> >>>>> thanks >>>>> >>>>> Andrei >>>>> >>>>> rohit.yadav@xxxxxxxxxxxxx
> >>>>>
> >> www.shapeblue.com >>>>> 53 Chandos Place, Covent Garden, London WC2N
> 4HSUK > >
> > > > > > @shapeblue
>


-- 

Andrija Panić