[tripleo][ironic] Hardware provisioning testing for Ocata

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[tripleo][ironic] Hardware provisioning testing for Ocata

Justin Kilpatrick
Morning everyone,

I've been working on a performance testing tool for TripleO hardware
provisioning operations off and on for about a year now and I've been
using it to try and collect more detailed data about how TripleO
performs in scale and production use cases. Perhaps more importantly
YODA (Yet Openstack Deployment Tool, Another) automates the task
enough that days of deployment testing is a set it and forget it
operation.

You can find my testing tool here [0] and the test report [1] has
links to raw data and visualization. Just scroll down, click the
capcha and click "go to kibana". I  still need to port that machine
from my own solution over to search guard.

If you have too much email to consider clicking links I'll copy the
results summary here.

TripleO inspection workflows have seen massive improvements from
Newton with a failure rate for 50 nodes with the default workflow
falling from 100% to <15%. Using patches slated for Pike that spurious
failure rate reaches zero.

Overcloud deployments show a significant improvement of deployment
speed in HA and stack update tests.

Ironic deployments in the overcloud allow the use of Ironic for bare
metal scale out alongside more traditional VM compute. Considering a
single conductor starts to struggle around 300 nodes it will be
difficult to push a multi conductor setup to it's limits.

Finally Ironic node cleaning, shows a similar failure rate to
inspection and will require similar attention in TripleO workflows to
become painless.

[0] https://review.openstack.org/#/c/384530/
[1] https://docs.google.com/document/d/194ww0Pi2J-dRG3-X75mphzwUZVPC2S1Gsy1V0K0PqBo/

Thanks for your time!

- Justin

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: [tripleo][ironic] Hardware provisioning testing for Ocata

Emilien Macchi-4
On Thu, Jun 8, 2017 at 2:21 PM, Justin Kilpatrick <[hidden email]> wrote:

> Morning everyone,
>
> I've been working on a performance testing tool for TripleO hardware
> provisioning operations off and on for about a year now and I've been
> using it to try and collect more detailed data about how TripleO
> performs in scale and production use cases. Perhaps more importantly
> YODA (Yet Openstack Deployment Tool, Another) automates the task
> enough that days of deployment testing is a set it and forget it
> operation.
>
> You can find my testing tool here [0] and the test report [1] has
> links to raw data and visualization. Just scroll down, click the
> capcha and click "go to kibana". I  still need to port that machine
> from my own solution over to search guard.
>
> If you have too much email to consider clicking links I'll copy the
> results summary here.
>
> TripleO inspection workflows have seen massive improvements from
> Newton with a failure rate for 50 nodes with the default workflow
> falling from 100% to <15%. Using patches slated for Pike that spurious
> failure rate reaches zero.
>
> Overcloud deployments show a significant improvement of deployment
> speed in HA and stack update tests.
>
> Ironic deployments in the overcloud allow the use of Ironic for bare
> metal scale out alongside more traditional VM compute. Considering a
> single conductor starts to struggle around 300 nodes it will be
> difficult to push a multi conductor setup to it's limits.
>
> Finally Ironic node cleaning, shows a similar failure rate to
> inspection and will require similar attention in TripleO workflows to
> become painless.
>
> [0] https://review.openstack.org/#/c/384530/
> [1] https://docs.google.com/document/d/194ww0Pi2J-dRG3-X75mphzwUZVPC2S1Gsy1V0K0PqBo/
>
> Thanks for your time!

Hey Justin,

All of this is really cool. I was wondering if you had a list of bugs
that you've faced or reported yourself regarding to performances
issues in TripleO.
As you might have seen in a separate thread on openstack-dev, we're
planning a sprint on June 21/22th to improve performances in TripleO.
We would love your participation or someone from your team and if you
have time before, please add the deployment-time tag to the Launchpad
bugs that you know related to performances.

Thanks a lot,

> - Justin
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [hidden email]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Emilien Macchi

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: [tripleo][ironic] Hardware provisioning testing for Ocata

Justin Kilpatrick
Hi Emilien,

I'll try and get a list of the Perf&Scale teams TripleO deployment
bugs and bring them to the deployment hackfest.

I look forward to participating!

- Justin

On Thu, Jun 8, 2017 at 11:10 AM, Emilien Macchi <[hidden email]> wrote:

> On Thu, Jun 8, 2017 at 2:21 PM, Justin Kilpatrick <[hidden email]> wrote:
>> Morning everyone,
>>
>> I've been working on a performance testing tool for TripleO hardware
>> provisioning operations off and on for about a year now and I've been
>> using it to try and collect more detailed data about how TripleO
>> performs in scale and production use cases. Perhaps more importantly
>> YODA (Yet Openstack Deployment Tool, Another) automates the task
>> enough that days of deployment testing is a set it and forget it
>> operation.
>>
>> You can find my testing tool here [0] and the test report [1] has
>> links to raw data and visualization. Just scroll down, click the
>> capcha and click "go to kibana". I  still need to port that machine
>> from my own solution over to search guard.
>>
>> If you have too much email to consider clicking links I'll copy the
>> results summary here.
>>
>> TripleO inspection workflows have seen massive improvements from
>> Newton with a failure rate for 50 nodes with the default workflow
>> falling from 100% to <15%. Using patches slated for Pike that spurious
>> failure rate reaches zero.
>>
>> Overcloud deployments show a significant improvement of deployment
>> speed in HA and stack update tests.
>>
>> Ironic deployments in the overcloud allow the use of Ironic for bare
>> metal scale out alongside more traditional VM compute. Considering a
>> single conductor starts to struggle around 300 nodes it will be
>> difficult to push a multi conductor setup to it's limits.
>>
>> Finally Ironic node cleaning, shows a similar failure rate to
>> inspection and will require similar attention in TripleO workflows to
>> become painless.
>>
>> [0] https://review.openstack.org/#/c/384530/
>> [1] https://docs.google.com/document/d/194ww0Pi2J-dRG3-X75mphzwUZVPC2S1Gsy1V0K0PqBo/
>>
>> Thanks for your time!
>
> Hey Justin,
>
> All of this is really cool. I was wondering if you had a list of bugs
> that you've faced or reported yourself regarding to performances
> issues in TripleO.
> As you might have seen in a separate thread on openstack-dev, we're
> planning a sprint on June 21/22th to improve performances in TripleO.
> We would love your participation or someone from your team and if you
> have time before, please add the deployment-time tag to the Launchpad
> bugs that you know related to performances.
>
> Thanks a lot,
>
>> - Justin
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: [hidden email]?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> --
> Emilien Macchi
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [hidden email]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: [tripleo][ironic] Hardware provisioning testing for Ocata

Dmitry Tantsur
In reply to this post by Justin Kilpatrick
On 06/08/2017 02:21 PM, Justin Kilpatrick wrote:

> Morning everyone,
>
> I've been working on a performance testing tool for TripleO hardware
> provisioning operations off and on for about a year now and I've been
> using it to try and collect more detailed data about how TripleO
> performs in scale and production use cases. Perhaps more importantly
> YODA (Yet Openstack Deployment Tool, Another) automates the task
> enough that days of deployment testing is a set it and forget it
> operation. >
> You can find my testing tool here [0] and the test report [1] has
> links to raw data and visualization. Just scroll down, click the
> capcha and click "go to kibana". I  still need to port that machine
> from my own solution over to search guard.
>
> If you have too much email to consider clicking links I'll copy the
> results summary here.
>
> TripleO inspection workflows have seen massive improvements from
> Newton with a failure rate for 50 nodes with the default workflow
> falling from 100% to <15%. Using patches slated for Pike that spurious
> failure rate reaches zero.

\o/

>
> Overcloud deployments show a significant improvement of deployment
> speed in HA and stack update tests.
>
> Ironic deployments in the overcloud allow the use of Ironic for bare
> metal scale out alongside more traditional VM compute. Considering a
> single conductor starts to struggle around 300 nodes it will be
> difficult to push a multi conductor setup to it's limits.

This number of "300", does it come from your testing or from other sources? If
the former, which driver were you using? What exactly problems have you seen
approaching this number?

>
> Finally Ironic node cleaning, shows a similar failure rate to
> inspection and will require similar attention in TripleO workflows to
> become painless.

Could you please elaborate? (a bug could also help). What exactly were you doing?

>
> [0] https://review.openstack.org/#/c/384530/
> [1] https://docs.google.com/document/d/194ww0Pi2J-dRG3-X75mphzwUZVPC2S1Gsy1V0K0PqBo/
>
> Thanks for your time!

Thanks for YOUR time, this work is extremely valuable!

>
> - Justin
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [hidden email]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: [tripleo][ironic] Hardware provisioning testing for Ocata

Justin Kilpatrick
On Fri, Jun 9, 2017 at 5:25 AM, Dmitry Tantsur <[hidden email]> wrote:
> This number of "300", does it come from your testing or from other sources?
> If the former, which driver were you using? What exactly problems have you
> seen approaching this number?

I haven't encountered this issue personally, but talking to Joe
Talerico and some operators at summit around this number a single
conductor begins to fall behind polling all of the out of band
interfaces for the machines that it's responsible for. You start to
see what you would expect from polling running behind, like incorrect
power states listed for machines and a general inability to perform
machine operations in a timely manner.

Having spent some time at the Ironic operators form this is pretty
normal and the correct response is just to scale out conductors, this
is a problem with TripleO because we don't really have a scale out
option with a single machine design. Fortunately just increasing the
time between interface polling acts as a pretty good stopgap for this
and lets Ironic catch up.

I may get some time on a cloud of that scale in the future, at which
point I will have hard numbers to give you. One of the reasons I made
YODA was the frustrating prevalence of anecdotes instead of hard data
when it came to one of the most important parts of the user
experience. If it doesn't deploy people don't use it, full stop.

> Could you please elaborate? (a bug could also help). What exactly were you
> doing?

https://bugs.launchpad.net/ironic/+bug/1680725

Describes exactly what I'm experiencing. Essentially the problem is
that nodes can and do fail to pxe, then cleaning fails and you just
lose the nodes. Users have to spend time going back and babysitting
these nodes and there's no good instructions on what to do with failed
nodes anyways. The answer is move them to manageable and then to
available at which point they go back into cleaning until it finally
works.

Like introspection was a year ago this is a cavalcade of documentation
problems and software issues. I mean really everything *works*
technically but the documentation acts like cleaning will work all the
time and so does the software, leaving the user to figure out how to
accommodate the realities of the situation without so much as a
warning that it might happen.

This comes out as more of a ux issue than a software one, but we can't
just ignore these.

- Justin

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: [tripleo][ironic] Hardware provisioning testing for Ocata

Mark Goddard-2
This is great information Justin, thanks for sharing. It will prove useful as we scale up our ironic deployments.

It seems to me that a reference configuration of ironic would be a useful resource for many people. Some key decisions affecting scalability and performance may at first seem arbitrary but have an impact on performance and scalability, such as:

- BIOS vs. UEFI
- PXE vs. iPXE bootloader
- TFTP vs. HTTP for kernel/ramdisk transfer
- iSCSI vs. Swift (or one day standalone HTTP?) for image transfer
- Hardware specific drivers vs. IPMI
- Local boot vs. netboot
- Fat images vs. slim + post-configuration
- Any particularly useful configuration tunables (power state polling interval, nova build concurrency, others?)

I personally use kolla + kolla-ansible which by default uses PXE + TFTP + iSCSI which is arguably not the best combination.

Cheers,
Mark

On 9 June 2017 at 12:28, Justin Kilpatrick <[hidden email]> wrote:
On Fri, Jun 9, 2017 at 5:25 AM, Dmitry Tantsur <[hidden email]> wrote:
> This number of "300", does it come from your testing or from other sources?
> If the former, which driver were you using? What exactly problems have you
> seen approaching this number?

I haven't encountered this issue personally, but talking to Joe
Talerico and some operators at summit around this number a single
conductor begins to fall behind polling all of the out of band
interfaces for the machines that it's responsible for. You start to
see what you would expect from polling running behind, like incorrect
power states listed for machines and a general inability to perform
machine operations in a timely manner.

Having spent some time at the Ironic operators form this is pretty
normal and the correct response is just to scale out conductors, this
is a problem with TripleO because we don't really have a scale out
option with a single machine design. Fortunately just increasing the
time between interface polling acts as a pretty good stopgap for this
and lets Ironic catch up.

I may get some time on a cloud of that scale in the future, at which
point I will have hard numbers to give you. One of the reasons I made
YODA was the frustrating prevalence of anecdotes instead of hard data
when it came to one of the most important parts of the user
experience. If it doesn't deploy people don't use it, full stop.

> Could you please elaborate? (a bug could also help). What exactly were you
> doing?

https://bugs.launchpad.net/ironic/+bug/1680725

Describes exactly what I'm experiencing. Essentially the problem is
that nodes can and do fail to pxe, then cleaning fails and you just
lose the nodes. Users have to spend time going back and babysitting
these nodes and there's no good instructions on what to do with failed
nodes anyways. The answer is move them to manageable and then to
available at which point they go back into cleaning until it finally
works.

Like introspection was a year ago this is a cavalcade of documentation
problems and software issues. I mean really everything *works*
technically but the documentation acts like cleaning will work all the
time and so does the software, leaving the user to figure out how to
accommodate the realities of the situation without so much as a
warning that it might happen.

This comes out as more of a ux issue than a software one, but we can't
just ignore these.

- Justin

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: [tripleo][ironic] Hardware provisioning testing for Ocata

Sai Sindhur Malleni
In reply to this post by Emilien Macchi-4


On Thu, Jun 8, 2017 at 11:10 AM, Emilien Macchi <[hidden email]> wrote:
On Thu, Jun 8, 2017 at 2:21 PM, Justin Kilpatrick <[hidden email]> wrote:
> Morning everyone,
>
> I've been working on a performance testing tool for TripleO hardware
> provisioning operations off and on for about a year now and I've been
> using it to try and collect more detailed data about how TripleO
> performs in scale and production use cases. Perhaps more importantly
> YODA (Yet Openstack Deployment Tool, Another) automates the task
> enough that days of deployment testing is a set it and forget it
> operation.
>
> You can find my testing tool here [0] and the test report [1] has
> links to raw data and visualization. Just scroll down, click the
> capcha and click "go to kibana". I  still need to port that machine
> from my own solution over to search guard.
>
> If you have too much email to consider clicking links I'll copy the
> results summary here.
>
> TripleO inspection workflows have seen massive improvements from
> Newton with a failure rate for 50 nodes with the default workflow
> falling from 100% to <15%. Using patches slated for Pike that spurious
> failure rate reaches zero.
>
> Overcloud deployments show a significant improvement of deployment
> speed in HA and stack update tests.
>
> Ironic deployments in the overcloud allow the use of Ironic for bare
> metal scale out alongside more traditional VM compute. Considering a
> single conductor starts to struggle around 300 nodes it will be
> difficult to push a multi conductor setup to it's limits.
>
> Finally Ironic node cleaning, shows a similar failure rate to
> inspection and will require similar attention in TripleO workflows to
> become painless.
>
> [0] https://review.openstack.org/#/c/384530/
> [1] https://docs.google.com/document/d/194ww0Pi2J-dRG3-X75mphzwUZVPC2S1Gsy1V0K0PqBo/
>
> Thanks for your time!

Hey Justin,

All of this is really cool. I was wondering if you had a list of bugs
that you've faced or reported yourself regarding to performances
issues in TripleO.
As you might have seen in a separate thread on openstack-dev, we're
planning a sprint on June 21/22th to improve performances in TripleO.

Is this an IRC thing, or a video call? I work on the OpenStack Performance and Scale team and would love to participate.

We would love your participation or someone from your team and if you
have time before, please add the deployment-time tag to the Launchpad
bugs that you know related to performances.

Thanks a lot,

> - Justin
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Emilien Macchi

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Sai Sindhur Malleni

Software Engineer
Red Hat Inc.
100 East Davie Street
Raleigh, NC, USA
Work: (919) 754-4557 | Cell: (919) 985-1055

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: [tripleo][ironic] Hardware provisioning testing for Ocata

Joe Talerico
In reply to this post by Dmitry Tantsur
On Fri, Jun 9, 2017 at 5:25 AM, Dmitry Tantsur <[hidden email]> wrote:

> On 06/08/2017 02:21 PM, Justin Kilpatrick wrote:
>>
>> Morning everyone,
>>
>> I've been working on a performance testing tool for TripleO hardware
>> provisioning operations off and on for about a year now and I've been
>> using it to try and collect more detailed data about how TripleO
>> performs in scale and production use cases. Perhaps more importantly
>> YODA (Yet Openstack Deployment Tool, Another) automates the task
>> enough that days of deployment testing is a set it and forget it
>> operation. >
>> You can find my testing tool here [0] and the test report [1] has
>> links to raw data and visualization. Just scroll down, click the
>> capcha and click "go to kibana". I  still need to port that machine
>> from my own solution over to search guard.
>>
>> If you have too much email to consider clicking links I'll copy the
>> results summary here.
>>
>> TripleO inspection workflows have seen massive improvements from
>> Newton with a failure rate for 50 nodes with the default workflow
>> falling from 100% to <15%. Using patches slated for Pike that spurious
>> failure rate reaches zero.
>
>
> \o/
>
>>
>> Overcloud deployments show a significant improvement of deployment
>> speed in HA and stack update tests.
>>
>> Ironic deployments in the overcloud allow the use of Ironic for bare
>> metal scale out alongside more traditional VM compute. Considering a
>> single conductor starts to struggle around 300 nodes it will be
>> difficult to push a multi conductor setup to it's limits.
>
>
> This number of "300", does it come from your testing or from other sources?

Dmitry - The "300" comes from my testing on different environments.

Most recently, here is what I saw at CNCF -
https://snapshot.raintank.io/dashboard/snapshot/Sp2wuk2M5adTpqfXMJenMXcSlCav2PiZ

The undercloud was "idle" during this period.

> If the former, which driver were you using?

pxe_ipmitool.

> What exactly problems have you seen approaching this number?

I would have to restart ironic-conductor before every scale-up, which
here is what ironic-conductor looks like after a restart
https://snapshot.raintank.io/dashboard/snapshot/Im3AxP6qUfMnTeB97kryUcQV6otY0bHP
. Without restarting ironic, the scale up would fail due to ironic (I
do not have the exact error we would encounter documented).

>
>>
>> Finally Ironic node cleaning, shows a similar failure rate to
>> inspection and will require similar attention in TripleO workflows to
>> become painless.
>
>
> Could you please elaborate? (a bug could also help). What exactly were you
> doing?
>
>>
>> [0] https://review.openstack.org/#/c/384530/
>> [1]
>> https://docs.google.com/document/d/194ww0Pi2J-dRG3-X75mphzwUZVPC2S1Gsy1V0K0PqBo/
>>
>> Thanks for your time!
>
>
> Thanks for YOUR time, this work is extremely valuable!
>
>
>>
>> - Justin
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: [hidden email]?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [hidden email]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: [tripleo][ironic] Hardware provisioning testing for Ocata

Joe Talerico
In reply to this post by Justin Kilpatrick
On Fri, Jun 9, 2017 at 7:28 AM, Justin Kilpatrick <[hidden email]> wrote:

> On Fri, Jun 9, 2017 at 5:25 AM, Dmitry Tantsur <[hidden email]> wrote:
>> This number of "300", does it come from your testing or from other sources?
>> If the former, which driver were you using? What exactly problems have you
>> seen approaching this number?
>
> I haven't encountered this issue personally, but talking to Joe
> Talerico and some operators at summit around this number a single
> conductor begins to fall behind polling all of the out of band
> interfaces for the machines that it's responsible for. You start to
> see what you would expect from polling running behind, like incorrect
> power states listed for machines and a general inability to perform
> machine operations in a timely manner.
>
> Having spent some time at the Ironic operators form this is pretty
> normal and the correct response is just to scale out conductors, this
> is a problem with TripleO because we don't really have a scale out
> option with a single machine design. Fortunately just increasing the
> time between interface polling acts as a pretty good stopgap for this
> and lets Ironic catch up.
>
> I may get some time on a cloud of that scale in the future, at which
> point I will have hard numbers to give you. One of the reasons I made
> YODA was the frustrating prevalence of anecdotes instead of hard data
> when it came to one of the most important parts of the user
> experience. If it doesn't deploy people don't use it, full stop.
>
>> Could you please elaborate? (a bug could also help). What exactly were you
>> doing?
>
> https://bugs.launchpad.net/ironic/+bug/1680725

Additionally, I would like to see more verbose output from the
cleaning : https://bugs.launchpad.net/ironic/+bug/1670893

>
> Describes exactly what I'm experiencing. Essentially the problem is
> that nodes can and do fail to pxe, then cleaning fails and you just
> lose the nodes. Users have to spend time going back and babysitting
> these nodes and there's no good instructions on what to do with failed
> nodes anyways. The answer is move them to manageable and then to
> available at which point they go back into cleaning until it finally
> works.
>
> Like introspection was a year ago this is a cavalcade of documentation
> problems and software issues. I mean really everything *works*
> technically but the documentation acts like cleaning will work all the
> time and so does the software, leaving the user to figure out how to
> accommodate the realities of the situation without so much as a
> warning that it might happen.
>
> This comes out as more of a ux issue than a software one, but we can't
> just ignore these.
>
> - Justin
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [hidden email]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev