[nova] Cinder cross_az_attach=False changes/fixes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[nova] Cinder cross_az_attach=False changes/fixes

Matt Riedemann-3
This is a request for any operators out there that configure nova to set:

[cinder]
cross_az_attach=False

To check out these two bug fixes:

1. https://review.openstack.org/#/c/366724/

This is a case where nova is creating the volume during boot from volume
and providing an AZ to cinder during the volume create request. Today we
just pass the instance.availability_zone which is None if the instance
was created without an AZ set. It's unclear to me if that causes the
volume creation to fail (someone in IRC was showing the volume going
into ERROR state while Nova was waiting for it to be available), but I
think it will cause the later attach to fail here [1] because the
instance AZ (defaults to None) and volume AZ (defaults to nova) may not
match. I'm still looking for more details on the actual failure in that
one though.

The proposed fix in this case is pass the AZ associated with any host
aggregate that the instance is in.

2. https://review.openstack.org/#/c/469675/

This is similar, but rather than checking the AZ when we're on the
compute and the instance has a host, we're in the API and doing a boot
from volume where an existing volume is provided during server create.
By default, the volume's AZ is going to be 'nova'. The code doing the
check here is getting the AZ for the instance, and since the instance
isn't on a host yet, it's not in any aggregate, so the only AZ we can
get is from the server create request itself. If an AZ isn't provided
during the server create request, then we're comparing
instance.availability_zone (None) to volume['availability_zone']
("nova") and that results in a 400.

My proposed fix is in the case of BFV checks from the API, we default
the AZ if one wasn't requested when comparing against the volume. By
default this is going to compare "nova" for nova and "nova" for cinder,
since CONF.default_availability_zone is "nova" by default in both projects.

--

I'm requesting help from any operators that are setting
cross_az_attach=False because I have to imagine your users have run into
this and you're patching around it somehow, so I'd like input on how you
or your users are dealing with this.

I'm also trying to recreate these in upstream CI [2] which I was already
able to do with the 2nd bug.

Having said all of this, I really hate cross_az_attach as it's
config-driven API behavior which is not interoperable across clouds.
Long-term I'd really love to deprecate this option but we need a
replacement first, and I'm hoping placement with compute/volume resource
providers in a shared aggregate can maybe make that happen.

[1]
https://github.com/openstack/nova/blob/f278784ccb06e16ee12a42a585c5615abe65edfe/nova/virt/block_device.py#L368
[2] https://review.openstack.org/#/c/467674/

--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Reply | Threaded
Open this post in threaded view
|

Re: [nova] Cinder cross_az_attach=False changes/fixes

Sam Morrison
Hi Matt,

Just looking into this,

> On 1 Jun 2017, at 9:08 am, Matt Riedemann <[hidden email]> wrote:
>
> This is a request for any operators out there that configure nova to set:
>
> [cinder]
> cross_az_attach=False
>
> To check out these two bug fixes:
>
> 1. https://review.openstack.org/#/c/366724/
>
> This is a case where nova is creating the volume during boot from volume and providing an AZ to cinder during the volume create request. Today we just pass the instance.availability_zone which is None if the instance was created without an AZ set. It's unclear to me if that causes the volume creation to fail (someone in IRC was showing the volume going into ERROR state while Nova was waiting for it to be available), but I think it will cause the later attach to fail here [1] because the instance AZ (defaults to None) and volume AZ (defaults to nova) may not match. I'm still looking for more details on the actual failure in that one though.
>
> The proposed fix in this case is pass the AZ associated with any host aggregate that the instance is in.

If cross_az_attach is false won’t it always result in the instance AZ being None as it won’t be on a host yet?
I haven’t traced back the code fully so not sure if an instance gets scheduled onto a host and then the volume create call happens  or they happen in parallel etc. (in the case for boot from volume)


When cross_az_attach is false:
If a user does a boot from volume (create new volume) and specifies an AZ then I would expect the instance and the volume to be created in the specified AZ.
If the AZ doesn’t exist in cinder or nova I would expect it to fail.

If a user doesn’t specify an AZ I would expect that the instance and the volume are in the same AZ.
If there isn’t a common AZ between cinder and nova I would expect it to fail.



>
> 2. https://review.openstack.org/#/c/469675/
>
> This is similar, but rather than checking the AZ when we're on the compute and the instance has a host, we're in the API and doing a boot from volume where an existing volume is provided during server create. By default, the volume's AZ is going to be 'nova'. The code doing the check here is getting the AZ for the instance, and since the instance isn't on a host yet, it's not in any aggregate, so the only AZ we can get is from the server create request itself. If an AZ isn't provided during the server create request, then we're comparing instance.availability_zone (None) to volume['availability_zone'] ("nova") and that results in a 400.
>
> My proposed fix is in the case of BFV checks from the API, we default the AZ if one wasn't requested when comparing against the volume. By default this is going to compare "nova" for nova and "nova" for cinder, since CONF.default_availability_zone is "nova" by default in both projects.
>

Is this an alternative approach? Just trying to get my head around this all.

Thanks,
Sam


> --
>
> I'm requesting help from any operators that are setting cross_az_attach=False because I have to imagine your users have run into this and you're patching around it somehow, so I'd like input on how you or your users are dealing with this.
>
> I'm also trying to recreate these in upstream CI [2] which I was already able to do with the 2nd bug.
>
> Having said all of this, I really hate cross_az_attach as it's config-driven API behavior which is not interoperable across clouds. Long-term I'd really love to deprecate this option but we need a replacement first, and I'm hoping placement with compute/volume resource providers in a shared aggregate can maybe make that happen.
>
> [1] https://github.com/openstack/nova/blob/f278784ccb06e16ee12a42a585c5615abe65edfe/nova/virt/block_device.py#L368
> [2] https://review.openstack.org/#/c/467674/
>
> --
>
> Thanks,
>
> Matt
>
> _______________________________________________
> OpenStack-operators mailing list
> [hidden email]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Reply | Threaded
Open this post in threaded view
|

Re: [nova] Cinder cross_az_attach=False changes/fixes

Sylvain Bauza-5


On Tue, Jun 6, 2017 at 9:45 PM, Sam Morrison <[hidden email]> wrote:
Hi Matt,

Just looking into this,

> On 1 Jun 2017, at 9:08 am, Matt Riedemann <[hidden email]> wrote:
>
> This is a request for any operators out there that configure nova to set:
>
> [cinder]
> cross_az_attach=False
>
> To check out these two bug fixes:
>
> 1. https://review.openstack.org/#/c/366724/
>
> This is a case where nova is creating the volume during boot from volume and providing an AZ to cinder during the volume create request. Today we just pass the instance.availability_zone which is None if the instance was created without an AZ set. It's unclear to me if that causes the volume creation to fail (someone in IRC was showing the volume going into ERROR state while Nova was waiting for it to be available), but I think it will cause the later attach to fail here [1] because the instance AZ (defaults to None) and volume AZ (defaults to nova) may not match. I'm still looking for more details on the actual failure in that one though.
>
> The proposed fix in this case is pass the AZ associated with any host aggregate that the instance is in.

If cross_az_attach is false won’t it always result in the instance AZ being None as it won’t be on a host yet?
I haven’t traced back the code fully so not sure if an instance gets scheduled onto a host and then the volume create call happens  or they happen in parallel etc. (in the case for boot from volume)


Sorry for ressurecting an old thread, but we recently discussed about the AZ relationship between Nova and Cinder at the PTG and I wanted to clarify a couple of things.


When cross_az_attach is false:
If a user does a boot from volume (create new volume) and specifies an AZ then I would expect the instance and the volume to be created in the specified AZ.

I agree, that looks to me the right behaviour to see.
I also add that if Nova is configured to assign an AZ by default (by using default_schedule_zone opt), then that behaviour has to be enforced too.


If the AZ doesn’t exist in cinder or nova I would expect it to fail.


I agree.

If a user doesn’t specify an AZ I would expect that the instance and the volume are in the same AZ.

That's where I disagree. If no AZ was specified by the time the instance was created OR if Nova wasn't configured to assign an AZ by default to each instance, then Nova will pick any AZ and will honestly don't care about which AZ the instance is. In other words, by a transient relationship, the instance will have an AZ because it will be hosted on a compute that is part of an AZ (or by default to the value of default_availability_zone option) but that doesn't mean that that instance will be on that AZ forever, since there was no formal contract that expressed a specific AZ. Consequently, when an instance is migrated, it could land to a host which is not *in the same AZ*.

In that case, I don't see a reason why if the instance is not specifically tied to an AZ, we should ask Cinder to honor that AZ since the volume and the instance could have different AZ names in the future after a move operation.


If there isn’t a common AZ between cinder and nova I would expect it to fail.



I'd rather prefer to have the same behaviour as if cross_az_attach was set to True, ie. not providing an AZ in our call to Cinder.


>
> 2. https://review.openstack.org/#/c/469675/
>
> This is similar, but rather than checking the AZ when we're on the compute and the instance has a host, we're in the API and doing a boot from volume where an existing volume is provided during server create. By default, the volume's AZ is going to be 'nova'. The code doing the check here is getting the AZ for the instance, and since the instance isn't on a host yet, it's not in any aggregate, so the only AZ we can get is from the server create request itself. If an AZ isn't provided during the server create request, then we're comparing instance.availability_zone (None) to volume['availability_zone'] ("nova") and that results in a 400.
>
> My proposed fix is in the case of BFV checks from the API, we default the AZ if one wasn't requested when comparing against the volume. By default this is going to compare "nova" for nova and "nova" for cinder, since CONF.default_availability_zone is "nova" by default in both projects.
>

Is this an alternative approach? Just trying to get my head around this all.


 Same as the above I wrote. If the user didn't specify an AZ and if Nova isn't configuring for assigning a default AZ, then Nova shouldn't care of which Cinder AZ the instance can be attached to. Since the AZ instance can change, that contract would be broken in case of a move operation.


Thanks,
Sam


> --
>
> I'm requesting help from any operators that are setting cross_az_attach=False because I have to imagine your users have run into this and you're patching around it somehow, so I'd like input on how you or your users are dealing with this.
>
> I'm also trying to recreate these in upstream CI [2] which I was already able to do with the 2nd bug.
>
> Having said all of this, I really hate cross_az_attach as it's config-driven API behavior which is not interoperable across clouds. Long-term I'd really love to deprecate this option but we need a replacement first, and I'm hoping placement with compute/volume resource providers in a shared aggregate can maybe make that happen.
>


I do hate that configuration option for both its non-interoperability and the fact that it promises things it can honor.
I'd rather be in favor of being super-explicit in the conf option description that even if you set that flag to False, that won't mean that your instances will share the same AZ than Cinder, but rather only if and only if the instance is tied to a specific AZ (either thanks to a boot flag or by config).

-Sylvain

> [1] https://github.com/openstack/nova/blob/f278784ccb06e16ee12a42a585c5615abe65edfe/nova/virt/block_device.py#L368
> [2] https://review.openstack.org/#/c/467674/
>
> --
>
> Thanks,
>
> Matt
>
> _______________________________________________
> OpenStack-operators mailing list
> [hidden email]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators