81 lines
3.1 KiB
Markdown
81 lines
3.1 KiB
Markdown
---
|
|
date: 2014-03-31
|
|
title: Openstack Neutron Performance problems
|
|
category: devops
|
|
featured_image: https://i.imgur.com/fSMzOUE.jpg
|
|
---
|
|
|
|
For the last few weeks I have been consulting on a private cloud project
|
|
for a local company. Unsurprisingly this has been based around the
|
|
typical Openstack setup.
|
|
|
|
- Nova - KVM
|
|
- Neutron - Openvswitch
|
|
- Cinder - LVM
|
|
- Glance - local files
|
|
|
|
My architecture is nothing out of the ordinary. A pair of hosts each
|
|
with 2 networks that look something like this:
|
|
|
|

|
|
|
|
All this is configured using Red Hat RDO. I had done all this under both
|
|
Grizzly and, using RDO, it was 30 minutes to set up.
|
|
|
|
Given how common and simple the setup, imagine my surprise when it did
|
|
not work. What do I mean did not work? From the outset I was worried
|
|
about Neutron. While I am fairly up to date with SDN in theory, I am
|
|
fairly green in practise. Fortunately, while RDO does not automate it's
|
|
configuration, there is at least an [accurate
|
|
document](https://openstack.redhat.com/Neutron_with_existing_external_network)
|
|
in how to configure it.
|
|
|
|
Now, if I was just using small images that would probably be fine,
|
|
however this project required Windows images. As a result some problems
|
|
quickly surfaced. Each time I deployed a new Windows image, everything
|
|
would lock up:
|
|
|
|
- no network access to VM's
|
|
- Openvswitch going mad (800-1000% CPU)
|
|
- SSH access via eth0 completely dead
|
|
|
|
It has to be said that I initially barked up the wrong tree, pointing
|
|
the finger at disk access (usually the problem with shared systems).
|
|
However it turned out I was wrong.
|
|
|
|
A brief Serverfault/Twitter with \@martenhauville brought up a few
|
|
suggestions, one of which caught my eye:
|
|
|
|
> <https://ask.openstack.org/en/question/25947/openstack-neutron-stability-problems-with-openvswitch/>
|
|
> there are known Neutron configuration challenges to overcome with GRE
|
|
> and MTU settings
|
|
|
|
This is where my problem lay: the external switch had an MTU of 1500,
|
|
Openvswitch also. Finally, `ip link` in a VM would give you
|
|
|
|
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-ex state UP mode DEFAULT qlen 1000
|
|
|
|
Everything matches, however I was using GRE tunnels, which add a header
|
|
to each frame. This was pushing them over 1500 on entry to `br-tun`
|
|
causing massive network fragmentation, which basically destroyed
|
|
Openvswitch every time I performed a large transfer. It showed up when
|
|
deploying an image, because that is hitting the Glance API over http.
|
|
|
|
Once armed with this knowledge, the fix is trivial. Add the following to
|
|
`/etc/neutron/dhcp_agent.ini`:
|
|
|
|
dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf
|
|
|
|
Now create the file `/etc/neutron/dnsmasq-neutron.conf` which contains
|
|
the following:
|
|
|
|
dhcp-option-force=26,1454
|
|
|
|
Now you can restart the DHCP agent and all will be well:
|
|
|
|
service neutron-dhcp-agent restart
|
|
|
|
I've gone on a bit in this post, as I feel the background is important.
|
|
By far the hardest part was diagnosing the problem, without knowing what
|
|
my background was it would be much harder to narrow down your problem to
|
|
being the same as mine.
|