MessagingTimeout in block live-migration due to long image fetch operation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

MessagingTimeout in block live-migration due to long image fetch operation

Gustavo Randich
Hi everyone,

(running Mitaka)

When doing block live-migration, if the image / backing file is not present at destination host, sometimes pre-live migration fails after 60 seconds as shown below. Retrying the migration to the same destination host succeeds.

It seems that an rpc_response_timeout of 60 seconds is not enough for this scenario, in which fetching the image involves 90 seconds. We don't like to increase rpc_response_timeout  to say, 120 seconds, only for this reason ('cause in other kind of errors we prefer to fail fast).

Given that migrations are usually long, shouldn't this operation be under the scope of a configurable timeout such as live_migration_progress_timeout or live_migration_completion_timeout which overrides the default rpc timeout?

Thanks!



SOURCE HOST

Nov 27 09:22:22 source check data is LibvirtLiveMigrateData(bdms=<?>,block_migration=True,disk_available_mb=400384,disk_over_commit=False,filename='tmpLBvgop',graphics_listen_addr_spice=<?>,graphics_listen_addr_vnc=<?>,image_type='default',instance_relative_path='a3dc8ffe-5fd1-475e-a016-3d14128aa84c',is_shared_block_storage=False,is_shared_instance_path=False,is_volume_backed=False,migration=<?>,serial_listen_addr=<?>,target_connect_addr=<?>)
Nov 27 09:23:22 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Pre live migration failed at r65-host18
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Traceback (most recent call last):
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5275, in _do_live_migration
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]     block_migration, disk, dest, migrate_data)
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]   File "/usr/lib/python2.7/dist-packages/nova/compute/rpcapi.py", line 692, in pre_live_migration
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]     disk=disk, migrate_data=migrate_data)
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]     retry=self.retry)
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]   File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]     timeout=timeout, retry=retry)
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]     retry=retry)
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in _send
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]     result = self._waiter.wait(msg_id, timeout)
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 342, in wait
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]     message = self.waiters.get(msg_id, timeout=timeout)
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 244, in get
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]     'to message ID %s' % msg_id)
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] MessagingTimeout: Timed out waiting for a reply to message ID 32be13b876094c47b293f89b8d445652
2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c]



DESTINATION HOST

Nov 27 09:22:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] migrate_data in pre_live_migration: LibvirtLiveMigrateData(bdms=<?>,block_migration=True,disk_available_mb=400384,disk_over_commit=False,filename='tmpLBvgop',graphics_listen_...
Nov 27 09:22:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Creating instance directory: /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c
Nov 27 09:22:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Creating disk.info with the contents: {u'/var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk': u'qcow2'}
Nov 27 09:22:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Checking to make sure images and backing files are present before live migration.
Nov 27 09:22:23 Lock "bb045b7affecd2cd3976771e1809996277954a98" acquired by "nova.virt.libvirt.imagebackend.fetch_func_sync" :: waited 0.000s
Nov 27 09:23:23 Acquired semaphore "refresh_cache-a3dc8ffe-5fd1-475e-a016-3d14128aa84c"
Nov 27 09:23:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] _get_instance_nw_info()
Nov 27 09:23:23 REQ: curl -g -i -X GET http://prod-controller-01:9696/v2.0/ports.json?tenant_id=840871e744c8406a908838a6abe42a32&device_id=a3dc8ffe-5fd1-475e-a016-3d14128aa84c -H "User-Agent: python-neutronclient" -H "Accept: application/j...
Nov 27 09:23:23 RESP: [200] Date: Mon, 27 Nov 2017 14:23:23 GMT Connection: keep-alive Content-Type: application/json; charset=UTF-8 Content-Length: 878 X-Openstack-Request-Id: req-1b61ebb8-3aab-4f5f-9d9f-da4675dcaa96 #012RESP BODY: {"port...
...
...
Nov 27 09:23:23 Releasing semaphore "refresh_cache-a3dc8ffe-5fd1-475e-a016-3d14128aa84c"
Nov 27 09:23:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] During wait destroy, instance disappeared.
...
...
Nov 27 09:23:23 mv /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c_del
Nov 27 09:23:24 CMD "mv /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c_del" returned: 0 in 0.037s
Nov 27 09:23:24 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Deleting instance files /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c_del
Nov 27 09:23:24 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Deletion of /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c_del complete
Nov 27 09:23:50 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.part
Nov 27 09:23:50 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.part" returned: 0 in 0.048s
Nov 27 09:23:50 4531211f-5252-403d-a90f-56ddcaa00928 was qcow2, converting to raw
Nov 27 09:23:50 qemu-img convert -O raw /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.part /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.converted -f qcow2
Nov 27 09:23:53 CMD "qemu-img convert -O raw /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.part /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.converted -f qcow2" returned: 0 in 3.400s
Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.converted
Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.converted" returned: 0 in 0.047s
Nov 27 09:23:54 Lock "bb045b7affecd2cd3976771e1809996277954a98" released by "nova.virt.libvirt.imagebackend.fetch_func_sync" :: held 90.662s
Nov 27 09:23:54 touch -c /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98
Nov 27 09:23:54 touch -c /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98" returned: 0 in 0.113s
Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98
Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98" returned: 0 in 0.048s
Nov 27 09:23:54 Lock "bb045b7affecd2cd3976771e1809996277954a98" acquired by "nova.virt.libvirt.imagebackend.copy_qcow2_image" :: waited 0.000s
Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98
Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98" returned: 0 in 0.046s
Nov 27 09:23:54 qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk
Nov 27 09:23:54 CMD "qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk" returned: 1 in 0.042s
Nov 27 09:23:54 u'qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk' failed. Not Retrying.
Nov 27 09:23:54 Lock "bb045b7affecd2cd3976771e1809996277954a98" released by "nova.virt.libvirt.imagebackend.copy_qcow2_image" :: held 0.092s
Nov 27 09:23:54 Exception during message handling: Unexpected error while running command. Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk Exit code: 1
Stdout: u"Formatting '/var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk', fmt=qcow2 size=7516192768 backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16\n"
Stderr: u'qemu-img: /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk: Could not create file: No such file or directory\n'
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     incoming.message))
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     return self._do_dispatch(endpoint, method, ctxt, args)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 110, in wrapped
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     payload)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     self.force_reraise()
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 89, in wrapped
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     return f(self, context, *args, **kw)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 409, in decorated_function
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 387, in decorated_function
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     kwargs['instance'], e, sys.exc_info())
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     self.force_reraise()
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 375, in decorated_function
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5221, in pre_live_migration
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     migrate_data)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6695, in pre_live_migration
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     fallback_from_host=instance.host)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6864, in _create_images_and_backing
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     fallback_from_host)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6791, in _try_fetch_image_cache
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     size=size)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 251, in cache
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     *args, **kwargs)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 626, in create_image
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     copy_qcow2_image(base, self.path, size)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     return f(*args, **kwargs)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 584, in copy_qcow2_image
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     libvirt_utils.create_cow_image(base, target)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 107, in create_cow_image
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     execute(*cmd)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 56, in execute
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     return utils.execute(*args, **kwargs)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 389, in execute
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     return processutils.execute(*cmd, **kwargs)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 389, in execute
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher     cmd=sanitized_cmd)
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher ProcessExecutionError: Unexpected error while running command.
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Exit code: 1
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Stdout: u"Formatting '/var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk', fmt=qcow2 size=7516192768 backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16\n"
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Stderr: u'qemu-img: /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk: Could not create file: No such file or directory\n'
2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher
Nov 27 09:23:54 Returning exception Unexpected error while running command.
Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk
Exit code: 1
Stdout: u"Formatting '/var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk', fmt=qcow2 size=7516192768 backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16\n"
Stderr: u'qemu-img: /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk: Could not create file: No such file or directory\n' to caller



_______________________________________________
OpenStack-operators mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators