Debugging OVN external connectivity – Part 1

In an OVN deployment (with OpenStack or not), I have faced issues related to external (North/South) connectivity to/from the VMs and most of the time it is misconfiguration in the OVN databases. So I thought of writing this post.

I assume that the reader is familiar with the basic OVN architecture. Please see at the end of the post which has links to some of the tutorials and blog posts on OVN.

OVN provides external connectivity in two ways

  • Creating a logical gateway router. I recommend reading this excellent blog to know more about it – http://blog.spinhirne.com/2016/09/the-ovn-gateway-router.html
  • Adding a logical gateway router port to the a logical router.
    • This can be again configured as HA or non HA. If HA is enabled then the gateway router port is scheduled on multiple chassis with one acting as master. If it fails for some reason then the other chassis will take over.

In this blog post I will concentrate on the logical gateway router port with no HA. In the next blog post, I intend to cover the logical gateway router port with HA scenario.

What does scheduling mean here ? It means the chassis which is selected to host the gateway router port provides the centralized external connectivity. The north-south tenant traffic will first be redirected to this chassis and it acts as a gateway.

I will take OpenStack as an example here. Let’s say you have a private network “private” with subnet – 172.168.0.0/24 and a VM port is created with IP – 172.168.0.12. The private network is attache to a neutron router “r1” and a gateway is added to it.

$openstack network create private
$openstack subnet create --network private --subnet-range 172.168.0.0/24 private-subnet
$openstack router create r1
$openstack router add subnet r1 private-subnet
$openstack network create public --external --provider-network-type vlan --provider-segment 10 --provider-physical-network datacentre
$openstack subnet create --network public --subnet-range 10.0.0.0/24 --allocation-pool start=10.0.0.100,end=10.0.0.200 --no-dhcp public-subnet
$openstack router set --external-gateway public r1
$openstack port create --network private vm1
$openstack floating ip create --port vm1 public

When we run “ovn-nbctl show” we will see the below output. In my case OVN databases are running on a node with IP 172.16.2.11 with port 6641 for OVN northbound db and port 6642 for OVN southbound db.

# ovn-nbctl --db=tcp:172.16.2.11:6641 show
switch 9315d386-d612-49b2-8e90-0e69a29af331 (neutron-3e992fef-0d27-4a51-b85a-fa0d1aac48d4) (aka public)
 port 3e29a3ce-8113-47c4-909d-99586f510be6
 type: localport
 addresses: ["fa:16:3e:7d:1e:60"]
 port 4bbf7c7d-fdf0-444d-af04-68abf0cdb9c9
 type: router
 router-port: lrp-4bbf7c7d-fdf0-444d-af04-68abf0cdb9c9
 port provnet-3e992fef-0d27-4a51-b85a-fa0d1aac48d4
 type: localnet
 tag: 10
 addresses: ["unknown"]
switch 78cb4087-716f-4674-9279-9f5a8a4e251a (neutron-a004a804-436a-4175-b764-17c69e016247) (aka private)
 port 31b9cd7f-573b-4b4b-9d6a-8c9ebe968e80 (aka vm1)
 addresses: ["fa:16:3e:48:15:ef 172.168.0.12"]
 port 2a36a3e8-6490-479f-af38-e6b86d4800a1
 type: localport
 addresses: ["fa:16:3e:c8:06:ce 172.168.0.2"]
 port 16ec2d28-a9b1-492a-885e-ba6a18f731a0
 type: router
 router-port: lrp-16ec2d28-a9b1-492a-885e-ba6a18f731a0
router 207b380c-4f66-412b-979f-f0696ed1832b (neutron-82572bd3-f790-413b-b83b-942b2d23f9d2) (aka r1)
 port lrp-4bbf7c7d-fdf0-444d-af04-68abf0cdb9c9
 mac: "fa:16:3e:93:7f:f0"
 networks: ["10.0.0.102/24"]
 port lrp-16ec2d28-a9b1-492a-885e-ba6a18f731a0
 mac: "fa:16:3e:97:e5:c6"
 networks: ["172.168.0.1/24"]
 nat 343cfa84-9ef2-4fa6-996f-3c2eb97eaafa
 external ip: "10.0.0.106"
 logical ip: "172.168.0.12"
 type: "dnat_and_snat"
 nat 8e43b8a4-9a04-4fd2-adc1-57bf70dd062c
 external ip: "10.0.0.102"
 logical ip: "172.168.0.0/24"
 type: "snat"

Step 1. Get the list of chassis in your deployment

In OVN terminology, chassis is nothing but a node where ovn-controller service is running. ovn-controller service running on each chassis connects to the south bound database and an entry is created for each chassis in the southbound db.

Run “ovn-sbctl show”

In my case, I get the below output

# ovn-sbctl --db=tcp:172.16.2.11:6642 show
Chassis "771bfd23-8a81-4685-b759-bb4d7d542282"
 hostname: "overcloud-novacompute-1.novalocal"
 Encap geneve
 ip: "172.16.0.14"
 options: {csum="true"}
Chassis "116e3e4f-3ae1-4788-a300-b902b019530b"
 hostname: "overcloud-controller-0.novalocal"
 Encap geneve
 ip: "172.16.0.17"
 options: {csum="true"}
Chassis "58e05e13-bc58-4afc-b975-88b13c9b38cf"
 hostname: "overcloud-controller-1.novalocal"
 Encap geneve
 ip: "172.16.0.10"
 options: {csum="true"}
 Port_Binding "cr-lrp-4bbf7c7d-fdf0-444d-af04-68abf0cdb9c9"
Chassis "3c3f0f21-8cc9-4668-8a11-c3aebe5bbda3"
 hostname: "overcloud-novacompute-2.novalocal"
 Encap geneve
 ip: "172.16.0.7"
 options: {csum="true"}
Chassis "f7479467-cfea-49a2-a662-8c87bf69380e"
 hostname: "overcloud-controller-2.novalocal"
 Encap geneve
 ip: "172.16.0.12"
 options: {csum="true"}
Chassis "b8d08aa0-0486-403a-851b-366e45416c51"
 hostname: "overcloud-novacompute-0.novalocal"
 Encap geneve
 ip: "172.16.0.16"
 options: {csum="true"}

Step 2: Verify ovn-bridge-mappings on all your chassis’s.

Make sure that ovn-bridge-mappings are configured in your chassis.

In order for a chassis to provide external connectivity, ovn-controller expects “ovn-bridge-mappings” to be configured. You can verify ovn-bridge-mappings settings by running the below command in the chassis.

#ovs-vsctl get open . external_ids:ovn-bridge-mappings
 "datacentre:br-ex"

In my case it returns “datacentre:br-ex”. Please see http://openvswitch.org/support/dist-docs/ovn-controller.8.html and grep for ovn-bridge-mappings for more information about it. In case the above command returns error and you want that chassis to provide external connectivity, then configure ovn-bridge-mappings by running

#ovs-vsctl set open . external_ids:ovn-bridge-mappings=”datacentre:br-ex”

“datacentre:br-ex” is just an example. Also create the ovs bridge “br-ex” if not present.

Step 3: Get the scheduled chassis of the gateway router port

Next step is to figure out where the gateway router port is scheduled. The chassis on which the gateway router port is scheduled acts as the gateway for the tenant traffic.

First get the name of the logical router gateway port by running the below command. 10.0.0.102 happens to be gateway ip attached to the router in my case. You can figure it out by running “openstack router show r1”.

[root@overcloud-controller-0 heat-admin]# ovn-nbctl --db=tcp:172.16.2.11:6641 show | grep 10.0.0.102/24 -B3
router 207b380c-4f66-412b-979f-f0696ed1832b (neutron-82572bd3-f790-413b-b83b-942b2d23f9d2) (aka r1)
 port lrp-4bbf7c7d-fdf0-444d-af04-68abf0cdb9c9
 mac: "fa:16:3e:93:7f:f0"
 networks: ["10.0.0.102/24"]

If you look into the options column, you will see that the gateway port is scheduled on the chassis “116e3e4f-3ae1-4788-a300-b902b019530b” which is “overcloud-controller-0.novalocal” in my case. You will see another option “gateway_chassis”. If that is set, then the gateway port is scheduled on multiple chassis with HA configured. Let’s assume “gateway_chassis” column is empty for now. In case “options” column is empty it means the gateway router port is not scheduled. In the case of OpenStack this should not happen. In the case of other CMS’s (cloud management system) it is expected that this column is set by CMS. You can schedule it manually. See step 4

Step 4: Schedule the gateway router port if required

This step will be required either if “options” column was empty in step 3 or the gateway router port was scheduled on a chassis which doesn’t provide external connectivity. So you want to reschedule it to another chassis which provides external connectivity. Select a chassis where you want to schedule. Make sure that it has ovn-bridge-mappings configured. If you are facing the external connectivity issue with your tenant traffic, then this is most likely the cause and you need to fix it here.

Let’s say you want to select the chassis 58e05e13-bc58-4afc-b975-88b13c9b38cf (overcloud-controller-1.novalocal).

[root@overcloud-controller-0 heat-admin]# ovn-nbctl --db=tcp:172.16.2.11:6641 set logical_router_port 528f0224-c016-4560-a122-3bb12bbdef1c options:redirect-chassis=58e05e13-bc58-4afc-b975-88b13c9b38cf.

Run the command in step 2 to verify once.

Following the above steps should provide external connectivity to your tenant traffic. If it still doesn’t work, most likely it is a bug in OVN. Please report it to the OVS mailing list <dev@openvswitch.org>.

Conclusion

In this blog post we saw how to inspect the OVN databases to figure out the issue if your external connectivity is broken for your tenant traffic. In the next blog post we will see how to fix issues for HA scenario.

Links to OVN blogs and tutorials

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s