81 lines
		
	
	
	
		
			3.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			81 lines
		
	
	
	
		
			3.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| date: 2014-03-31
 | |
| title: Openstack Neutron Performance problems
 | |
| category: devops
 | |
| featured_image: https://i.imgur.com/fSMzOUE.jpg
 | |
| ---
 | |
| 
 | |
| For the last few weeks I have been consulting on a private cloud project
 | |
| for a local company. Unsurprisingly this has been based around the
 | |
| typical Openstack setup.
 | |
| 
 | |
| -   Nova - KVM
 | |
| -   Neutron - Openvswitch
 | |
| -   Cinder - LVM
 | |
| -   Glance - local files
 | |
| 
 | |
| My architecture is nothing out of the ordinary. A pair of hosts each
 | |
| with 2 networks that look something like this:
 | |
| 
 | |
| 
 | |
| 
 | |
| All this is configured using Red Hat RDO. I had done all this under both
 | |
| Grizzly and, using RDO, it was 30 minutes to set up.
 | |
| 
 | |
| Given how common and simple the setup, imagine my surprise when it did
 | |
| not work. What do I mean did not work? From the outset I was worried
 | |
| about Neutron. While I am fairly up to date with SDN in theory, I am
 | |
| fairly green in practise. Fortunately, while RDO does not automate it's
 | |
| configuration, there is at least an [accurate
 | |
| document](https://openstack.redhat.com/Neutron_with_existing_external_network)
 | |
| in how to configure it.
 | |
| 
 | |
| Now, if I was just using small images that would probably be fine,
 | |
| however this project required Windows images. As a result some problems
 | |
| quickly surfaced. Each time I deployed a new Windows image, everything
 | |
| would lock up:
 | |
| 
 | |
| -   no network access to VM's
 | |
| -   Openvswitch going mad (800-1000% CPU)
 | |
| -   SSH access via eth0 completely dead
 | |
| 
 | |
| It has to be said that I initially barked up the wrong tree, pointing
 | |
| the finger at disk access (usually the problem with shared systems).
 | |
| However it turned out I was wrong.
 | |
| 
 | |
| A brief Serverfault/Twitter with \@martenhauville brought up a few
 | |
| suggestions, one of which caught my eye:
 | |
| 
 | |
| > <https://ask.openstack.org/en/question/25947/openstack-neutron-stability-problems-with-openvswitch/>
 | |
| > there are known Neutron configuration challenges to overcome with GRE
 | |
| > and MTU settings
 | |
| 
 | |
| This is where my problem lay: the external switch had an MTU of 1500,
 | |
| Openvswitch also. Finally, `ip link` in a VM would give you
 | |
| 
 | |
|     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-ex state UP mode DEFAULT qlen 1000
 | |
| 
 | |
| Everything matches, however I was using GRE tunnels, which add a header
 | |
| to each frame. This was pushing them over 1500 on entry to `br-tun`
 | |
| causing massive network fragmentation, which basically destroyed
 | |
| Openvswitch every time I performed a large transfer. It showed up when
 | |
| deploying an image, because that is hitting the Glance API over http.
 | |
| 
 | |
| Once armed with this knowledge, the fix is trivial. Add the following to
 | |
| `/etc/neutron/dhcp_agent.ini`:
 | |
| 
 | |
|     dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf
 | |
| 
 | |
| Now create the file `/etc/neutron/dnsmasq-neutron.conf` which contains
 | |
| the following:
 | |
| 
 | |
|     dhcp-option-force=26,1454
 | |
| 
 | |
| Now you can restart the DHCP agent and all will be well:
 | |
| 
 | |
|     service neutron-dhcp-agent restart
 | |
| 
 | |
| I've gone on a bit in this post, as I feel the background is important.
 | |
| By far the hardest part was diagnosing the problem, without knowing what
 | |
| my background was it would be much harder to narrow down your problem to
 | |
| being the same as mine.
 | 
