[neutron] Functional job failure rate at 100%

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[neutron] Functional job failure rate at 100%

Jakub Libosvar-2
Hi all,

as per grafana [1] the functional job is broken. Looking at logstash [2]
it started happening consistently since 2017-08-03 16:27. I didn't find
any particular patch in Neutron that could cause it.

The culprit is that ovsdb starts misbehaving [3] and then we retry calls
indefinitely. We still use 2.5.2 openvswitch as we had before. I opened
a bug [4] and started investigation, I'll update my findings there.

I think at this point there is no reason to run "recheck" on your patches.

Thanks,
Jakub

[1]
http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen
[2] http://bit.ly/2vdKMwy
[3]
http://logs.openstack.org/14/488914/8/check/gate-neutron-dsvm-functional-ubuntu-xenial/75d7482/logs/openvswitch/ovsdb-server.txt.gz
[4] https://bugs.launchpad.net/neutron/+bug/1709032

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron] Functional job failure rate at 100%

Ihar Hrachyshka
On Mon, Aug 7, 2017 at 2:52 AM, Jakub Libosvar <[hidden email]> wrote:

> Hi all,
>
> as per grafana [1] the functional job is broken. Looking at logstash [2]
> it started happening consistently since 2017-08-03 16:27. I didn't find
> any particular patch in Neutron that could cause it.
>
> The culprit is that ovsdb starts misbehaving [3] and then we retry calls
> indefinitely. We still use 2.5.2 openvswitch as we had before. I opened
> a bug [4] and started investigation, I'll update my findings there.
>
> I think at this point there is no reason to run "recheck" on your patches.
>
> Thanks,
> Jakub
>
> [1]
> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen
> [2] http://bit.ly/2vdKMwy
> [3]
> http://logs.openstack.org/14/488914/8/check/gate-neutron-dsvm-functional-ubuntu-xenial/75d7482/logs/openvswitch/ovsdb-server.txt.gz
> [4] https://bugs.launchpad.net/neutron/+bug/1709032

Considering all the instability of the job we see lately (this bug
being the latest hit, but we also have bug
https://bugs.launchpad.net/neutron/+bug/1707933, close release, and no
significant resources on digging the issue, I propose to temporarily
disable the job: https://review.openstack.org/#/c/491548/. I also
suggest our mighty leadership to harness awareness of the issue and
rally troops to get it solved.

(to reply to Kevin's request in IRC) To recap what happened with
timeout bug: https://bugs.launchpad.net/neutron/+bug/1707933, it
popped up ~ month ago in master, but it hits Ocata branch too (so it's
either a recent backport, or some external dependency). The way it
happens is one of test worker (almost always running a
FirewallTestCase test case) dies in the middle of run (you can see
'Killed' message in console log, and most of the times, you can also
see the job taking ~2h and the last test worker dying with
'inprogress' state). The first hypothesis was that some (other?) test
case calls execute(['kill', ...]) with the worker PID. To check that,
Jakub proposed https://review.openstack.org/#/c/487065/ and rechecked
for a while until the bug was triggered in the gate. The collected log
suggested that kill was NOT called with the PID. The next step could
be catching all os.kill() calls in all functional tests and logging
their arguments somewhere (with call stacks). We were thinking of
mocking os.kill, replacing it with a function that would log and pass
it to the original implementation, but didn't have time for that so
far.

Regards,
Ihar

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Jakub Libosvar-2
In reply to this post by Jakub Libosvar-2
Daniel Alvarez and I spent some time looking at it and the culprit was
finally found.

tl;dr

We updated a kernel on machines to one containing bug when creating
conntrack entries which makes functional tests stuck. More info at [4].

For now, I sent a patch [5] to disable for now jobs that create
conntrack entries manually, it needs update of commit message. Once it
merges, we an enable back functional job to voting to avoid regressions.

Is it possible to switch used image for jenkins machines to use back the
older version? Any other ideas how to deal with the kernel bug?

Thanks
Jakub

[5] https://review.openstack.org/#/c/492068/1

On 07/08/2017 11:52, Jakub Libosvar wrote:

> Hi all,
>
> as per grafana [1] the functional job is broken. Looking at logstash [2]
> it started happening consistently since 2017-08-03 16:27. I didn't find
> any particular patch in Neutron that could cause it.
>
> The culprit is that ovsdb starts misbehaving [3] and then we retry calls
> indefinitely. We still use 2.5.2 openvswitch as we had before. I opened
> a bug [4] and started investigation, I'll update my findings there.
>
> I think at this point there is no reason to run "recheck" on your patches.
>
> Thanks,
> Jakub
>
> [1]
> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen
> [2] http://bit.ly/2vdKMwy
> [3]
> http://logs.openstack.org/14/488914/8/check/gate-neutron-dsvm-functional-ubuntu-xenial/75d7482/logs/openvswitch/ovsdb-server.txt.gz
> [4] https://bugs.launchpad.net/neutron/+bug/1709032
>


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Daniel Alvarez Sanchez
Some more info added to Jakub's excellent report :)


New kernel Ubuntu-4.4.0-89.112HEADUbuntu-4.4.0-89.112master was
tagged 9 days ago (07/31/2017) [0].

From a quick look, the only commit around this function is [1].


On Wed, Aug 9, 2017 at 3:29 PM, Jakub Libosvar <[hidden email]> wrote:
Daniel Alvarez and I spent some time looking at it and the culprit was
finally found.

tl;dr

We updated a kernel on machines to one containing bug when creating
conntrack entries which makes functional tests stuck. More info at [4].

For now, I sent a patch [5] to disable for now jobs that create
conntrack entries manually, it needs update of commit message. Once it
merges, we an enable back functional job to voting to avoid regressions.

Is it possible to switch used image for jenkins machines to use back the
older version? Any other ideas how to deal with the kernel bug?

Thanks
Jakub

[5] https://review.openstack.org/#/c/492068/1

On 07/08/2017 11:52, Jakub Libosvar wrote:
> Hi all,
>
> as per grafana [1] the functional job is broken. Looking at logstash [2]
> it started happening consistently since 2017-08-03 16:27. I didn't find
> any particular patch in Neutron that could cause it.
>
> The culprit is that ovsdb starts misbehaving [3] and then we retry calls
> indefinitely. We still use 2.5.2 openvswitch as we had before. I opened
> a bug [4] and started investigation, I'll update my findings there.
>
> I think at this point there is no reason to run "recheck" on your patches.
>
> Thanks,
> Jakub
>
> [1]
> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen
> [2] http://bit.ly/2vdKMwy
> [3]
> http://logs.openstack.org/14/488914/8/check/gate-neutron-dsvm-functional-ubuntu-xenial/75d7482/logs/openvswitch/ovsdb-server.txt.gz
> [4] https://bugs.launchpad.net/neutron/+bug/1709032
>


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Thierry Carrez
Thanks for this nice detective work, Jakub and Daniel ! This disabled
test job was making me nervous wrt. Pike release.

Now I suspect that this kernel regression is affecting Ubuntu Xenial
users for previous releases of OpenStack, too, so I hope this will get
fixed in Ubuntu soon enough.

Daniel Alvarez Sanchez wrote:

> Some more info added to Jakub's excellent report :)
>
>
> New kernel Ubuntu-4.4.0-89.112HEADUbuntu-4.4.0-89.112master was
> tagged 9 days ago (07/31/2017) [0].
>
> From a quick look, the only commit around this function is [1].
>
> [0] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/commit/?id=64de31ed97a03ec1b86fd4f76e445506dce55b02
> [1] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/commit/?id=2ad4caea651e1cc0fc86111ece9f9d74de825b78
>
> On Wed, Aug 9, 2017 at 3:29 PM, Jakub Libosvar <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Daniel Alvarez and I spent some time looking at it and the culprit was
>     finally found.
>
>     tl;dr
>
>     We updated a kernel on machines to one containing bug when creating
>     conntrack entries which makes functional tests stuck. More info at [4].
>
>     For now, I sent a patch [5] to disable for now jobs that create
>     conntrack entries manually, it needs update of commit message. Once it
>     merges, we an enable back functional job to voting to avoid regressions.
>
>     Is it possible to switch used image for jenkins machines to use back the
>     older version? Any other ideas how to deal with the kernel bug?
>
>     Thanks
>     Jakub
>
>     [5] https://review.openstack.org/#/c/492068/1
>     <https://review.openstack.org/#/c/492068/1>
>
>     On 07/08/2017 11:52, Jakub Libosvar wrote:
>     > Hi all,
>     >
>     > as per grafana [1] the functional job is broken. Looking at
>     logstash [2]
>     > it started happening consistently since 2017-08-03 16:27. I didn't
>     find
>     > any particular patch in Neutron that could cause it.
>     >
>     > The culprit is that ovsdb starts misbehaving [3] and then we retry
>     calls
>     > indefinitely. We still use 2.5.2 openvswitch as we had before. I
>     opened
>     > a bug [4] and started investigation, I'll update my findings there.
>     >
>     > I think at this point there is no reason to run "recheck" on your
>     patches.
>     >
>     > Thanks,
>     > Jakub
>     >
>     > [1]
>     >
>     http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen
>     <http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen>
>     > [2] http://bit.ly/2vdKMwy
>     > [3]
>     >
>     http://logs.openstack.org/14/488914/8/check/gate-neutron-dsvm-functional-ubuntu-xenial/75d7482/logs/openvswitch/ovsdb-server.txt.gz
>     <http://logs.openstack.org/14/488914/8/check/gate-neutron-dsvm-functional-ubuntu-xenial/75d7482/logs/openvswitch/ovsdb-server.txt.gz>
>     > [4] https://bugs.launchpad.net/neutron/+bug/1709032
>     <https://bugs.launchpad.net/neutron/+bug/1709032>
>     >
>
>
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     [hidden email]?subject:unsubscribe
>     <http://OpenStack-dev-request@...?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [hidden email]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


--
Thierry Carrez (ttx)

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Jeremy Stanley
In reply to this post by Jakub Libosvar-2
On 2017-08-09 15:29:04 +0200 (+0200), Jakub Libosvar wrote:
[...]
> Is it possible to switch used image for jenkins machines to use
> back the older version? Any other ideas how to deal with the
> kernel bug?

Making our images use non-current kernel packages isn't trivial, but
as Thierry points out in his reply this is not just a problem for
our CI system. Basically Ubuntu has broken OpenStack (and probably a
variety of other uses of conntrack) for a lot of people following
kernel updates in 16.04 LTS so the fix needs to happen there
regardless. Right now, basically, Ubuntu Xenial is not a good
platform to be running OpenStack on until they get the kernel
regression addressed.
--
Jeremy Stanley

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

signature.asc (968 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Jakub Libosvar-2
On 09/08/2017 18:23, Jeremy Stanley wrote:

> On 2017-08-09 15:29:04 +0200 (+0200), Jakub Libosvar wrote:
> [...]
>> Is it possible to switch used image for jenkins machines to use
>> back the older version? Any other ideas how to deal with the
>> kernel bug?
>
> Making our images use non-current kernel packages isn't trivial, but
> as Thierry points out in his reply this is not just a problem for
> our CI system. Basically Ubuntu has broken OpenStack (and probably a
> variety of other uses of conntrack) for a lot of people following
> kernel updates in 16.04 LTS so the fix needs to happen there
> regardless. Right now, basically, Ubuntu Xenial is not a good
> platform to be running OpenStack on until they get the kernel
> regression addressed.

True. Fortunately, the impact is not that catastrophic for Neutron as it
might seem on the first look. Not sure about the other projects, though.
Neutron doesn't create conntrack entries in production code - only in
testing. That said, agents should work just fine even with the kernel bug.

>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [hidden email]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Kevin Benton-3
This is just the code simulating the conntrack entries that would be created by real traffic in a production system, right?

On Wed, Aug 9, 2017 at 11:46 AM, Jakub Libosvar <[hidden email]> wrote:
On 09/08/2017 18:23, Jeremy Stanley wrote:
> On 2017-08-09 15:29:04 +0200 (+0200), Jakub Libosvar wrote:
> [...]
>> Is it possible to switch used image for jenkins machines to use
>> back the older version? Any other ideas how to deal with the
>> kernel bug?
>
> Making our images use non-current kernel packages isn't trivial, but
> as Thierry points out in his reply this is not just a problem for
> our CI system. Basically Ubuntu has broken OpenStack (and probably a
> variety of other uses of conntrack) for a lot of people following
> kernel updates in 16.04 LTS so the fix needs to happen there
> regardless. Right now, basically, Ubuntu Xenial is not a good
> platform to be running OpenStack on until they get the kernel
> regression addressed.

True. Fortunately, the impact is not that catastrophic for Neutron as it
might seem on the first look. Not sure about the other projects, though.
Neutron doesn't create conntrack entries in production code - only in
testing. That said, agents should work just fine even with the kernel bug.

>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Thierry Carrez
Oh, that's good for us. Should still be fixed, if only so that we can
test properly :)

Kevin Benton wrote:

> This is just the code simulating the conntrack entries that would be
> created by real traffic in a production system, right?
>
> On Wed, Aug 9, 2017 at 11:46 AM, Jakub Libosvar <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 09/08/2017 18:23, Jeremy Stanley wrote:
>     > On 2017-08-09 15:29:04 +0200 (+0200), Jakub Libosvar wrote:
>     > [...]
>     >> Is it possible to switch used image for jenkins machines to use
>     >> back the older version? Any other ideas how to deal with the
>     >> kernel bug?
>     >
>     > Making our images use non-current kernel packages isn't trivial, but
>     > as Thierry points out in his reply this is not just a problem for
>     > our CI system. Basically Ubuntu has broken OpenStack (and probably a
>     > variety of other uses of conntrack) for a lot of people following
>     > kernel updates in 16.04 LTS so the fix needs to happen there
>     > regardless. Right now, basically, Ubuntu Xenial is not a good
>     > platform to be running OpenStack on until they get the kernel
>     > regression addressed.
>
>     True. Fortunately, the impact is not that catastrophic for Neutron as it
>     might seem on the first look. Not sure about the other projects, though.
>     Neutron doesn't create conntrack entries in production code - only in
>     testing. That said, agents should work just fine even with the
>     kernel bug.

--
Thierry Carrez (ttx)

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Miguel Angel Ajo Pelayo-2
Good (amazing) job folks. :)

El 10 ago. 2017 9:43, "Thierry Carrez" <[hidden email]> escribió:
Oh, that's good for us. Should still be fixed, if only so that we can
test properly :)

Kevin Benton wrote:
> This is just the code simulating the conntrack entries that would be
> created by real traffic in a production system, right?
>
> On Wed, Aug 9, 2017 at 11:46 AM, Jakub Libosvar <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 09/08/2017 18:23, Jeremy Stanley wrote:
>     > On 2017-08-09 15:29:04 +0200 (+0200), Jakub Libosvar wrote:
>     > [...]
>     >> Is it possible to switch used image for jenkins machines to use
>     >> back the older version? Any other ideas how to deal with the
>     >> kernel bug?
>     >
>     > Making our images use non-current kernel packages isn't trivial, but
>     > as Thierry points out in his reply this is not just a problem for
>     > our CI system. Basically Ubuntu has broken OpenStack (and probably a
>     > variety of other uses of conntrack) for a lot of people following
>     > kernel updates in 16.04 LTS so the fix needs to happen there
>     > regardless. Right now, basically, Ubuntu Xenial is not a good
>     > platform to be running OpenStack on until they get the kernel
>     > regression addressed.
>
>     True. Fortunately, the impact is not that catastrophic for Neutron as it
>     might seem on the first look. Not sure about the other projects, though.
>     Neutron doesn't create conntrack entries in production code - only in
>     testing. That said, agents should work just fine even with the
>     kernel bug.

--
Thierry Carrez (ttx)

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Mooney, Sean K

 

From: Miguel Angel Ajo Pelayo [mailto:[hidden email]]
Sent: Thursday, August 10, 2017 8:55 AM
To: OpenStack Development Mailing List (not for usage questions) <[hidden email]>
Subject: Re: [openstack-dev] [neutron][infra] Functional job failure rate at 100%

 

Good (amazing) job folks. :)

 

El 10 ago. 2017 9:43, "Thierry Carrez" <[hidden email]> escribió:

Oh, that's good for us. Should still be fixed, if only so that we can
test properly :)

Kevin Benton wrote:
> This is just the code simulating the conntrack entries that would be
> created by real traffic in a production system, right?
>
> On Wed, Aug 9, 2017 at 11:46 AM, Jakub Libosvar <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 09/08/2017 18:23, Jeremy Stanley wrote:
>     > On 2017-08-09 15:29:04 +0200 (+0200), Jakub Libosvar wrote:
>     > [...]
>     >> Is it possible to switch used image for jenkins machines to use
>     >> back the older version? Any other ideas how to deal with the
>     >> kernel bug?
>     >
>     > Making our images use non-current kernel packages isn't trivial, but

[Mooney, Sean K]  so on that it would bre quite trivial to have disk image builder install

The linux-image-virtual-hwe-16.04 linux-image-virtual-hwe-16.04-edge to pull in a 4.10 or

4.11 kernel respctivly if the default 4.4 is broken. We just need a new dib element to install the package

And modify the  nodepool config to include it when it rebuildes the image every night. Alternitivly

You can pull a vanilla kernel form http://kernel.ubuntu.com/~kernel-ppa/mainline/

Follow the process documented here https://wiki.ubuntu.com/Kernel/MainlineBuilds

If you want to maintain testing with 4.4.x


>     > as Thierry points out in his reply this is not just a problem for
>     > our CI system. Basically Ubuntu has broken OpenStack (and probably a
>     > variety of other uses of conntrack) for a lot of people following
>     > kernel updates in 16.04 LTS so the fix needs to happen there
>     > regardless. Right now, basically, Ubuntu Xenial is not a good
>     > platform to be running OpenStack on until they get the kernel
>     > regression addressed.
>
>     True. Fortunately, the impact is not that catastrophic for Neutron as it
>     might seem on the first look. Not sure about the other projects, though.
>     Neutron doesn't create conntrack entries in production code - only in
>     testing. That said, agents should work just fine even with the
>     kernel bug.

--
Thierry Carrez (ttx)

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@...?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Jeremy Stanley
On 2017-08-10 17:13:58 +0000 (+0000), Mooney, Sean K wrote:
[...]

> so on that it would bre quite trivial to have disk image builder
> install The linux-image-virtual-hwe-16.04
> linux-image-virtual-hwe-16.04-edge to pull in a 4.10 or 4.11
> kernel respctivly if the default 4.4 is broken. We just need a new
> dib element to install the package And modify the  nodepool config
> to include it when it rebuildes the image every night.
> Alternitivly You can pull a vanilla kernel form
> http://kernel.ubuntu.com/~kernel-ppa/mainline/ Follow the process
> documented here https://wiki.ubuntu.com/Kernel/MainlineBuilds If
> you want to maintain testing with 4.4.x
[...]

Sure, your definition of "quite trivial" just differs a lot from
mine. Given that the bug in question seems to have no further impact
to the pace of development with the discussed test temporarily
disabled, that strikes me as a lot more maintainable in the long run
for a problem which started getting urgent attention from the distro
as soon as it was reported to them and will likely be resolved over
the course of the next few days at which point we'll automatically
update to the fixed version and you can try reenabling that test
again.
--
Jeremy Stanley

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

signature.asc (968 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [neutron][infra] Functional job failure rate at 100%

Jakub Libosvar-2
On 10/08/2017 21:16, Jeremy Stanley wrote:

> On 2017-08-10 17:13:58 +0000 (+0000), Mooney, Sean K wrote:
> [...]
>> so on that it would bre quite trivial to have disk image builder
>> install The linux-image-virtual-hwe-16.04
>> linux-image-virtual-hwe-16.04-edge to pull in a 4.10 or 4.11
>> kernel respctivly if the default 4.4 is broken. We just need a new
>> dib element to install the package And modify the  nodepool config
>> to include it when it rebuildes the image every night.
>> Alternitivly You can pull a vanilla kernel form
>> http://kernel.ubuntu.com/~kernel-ppa/mainline/ Follow the process
>> documented here https://wiki.ubuntu.com/Kernel/MainlineBuilds If
>> you want to maintain testing with 4.4.x
> [...]
>
> Sure, your definition of "quite trivial" just differs a lot from
> mine. Given that the bug in question seems to have no further impact
> to the pace of development with the discussed test temporarily
> disabled, that strikes me as a lot more maintainable in the long run
> for a problem which started getting urgent attention from the distro
> as soon as it was reported to them and will likely be resolved over
> the course of the next few days at which point we'll automatically
> update to the fixed version and you can try reenabling that test
> again.

Yeah, I totally agree with Jeremy. Ubuntu folks are very active :)
Thanks for that. My best hope is they are gonna backport the fix to
4.4.0 and tag a new kernel so we can start running the tests again.


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [hidden email]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Loading...