Michael Stang
Hi everyone,
we have a little problem with our openstack installation using shared storage with ocfs2 over iSCSI with multipath, as OpenStack version we are running Mitaka and Ubuntu 16.04.3.
We have totally 10 compute nodes. We have 2 shared storage volumes which one of the is used by compute node 1-5 and the other one is used by 6-10 as ephermal storage.
Now, when we start a new instance from a new image which is not in the image cache, its working fine and the copy process from glance to the image cache is fast as ist should be over 10GBit/s. If we do the same but starting more instances at once like 2..xx and this instances start on more than one compute node in the group 1-5 or 6-10 than the instances are take very long (30 minute) to start or they even run into error states.
We see that then the copy process of the image to the image cache is VERY slow, its like 1-2 MByte every second. It seems that more than one compute node is trying to copy the image in the cache and so start to block the shared filesystem. All compute nodes who try this are marked as down in horizon dashboard, but on the host the libvirt-bin or nova-compute service is running but nova-compute has stopped to write into the logfiles, seems also freezed somehow.
After the copy process of the image in the cache has finished all nodes recover and going back up again.
Did someone see this behaviour or is this known? Is this a bug or a configuration problem and can be solved somehow?
