Sprint at Pycon: Port OpenStack to Python 3

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Sprint at Pycon: Port OpenStack to Python 3

victor stinner
Hi,

I will organize a sprint to Port OpenStack to Python 3 during 4 days at Montreal (Canada) during Pycon Montreal 2014, between April, 14 (Monday) and April, 17 (Thursday).

The goal of the sprint is to port OpenStack components and OpenStack dependencies to Python 3, send patches to port as much code as possible. If you don't know OpenStack, you may focus more on OpenStack dependencies (MySQL-python, python-ldap,  websockify, ...). This is also a good opportunity to try to replace eventlet with trollius (asyncio)! Wiki page of the sprint:

   https://wiki.openstack.org/wiki/Python3/SprintPycon2014

I'm writing to openstack-dev to:

 * know if you are interested to join the sprint
 * know which OpenStack components are most interested to receive and *review* Python 3 patches during the sprint
 * know which OpenStack components are interested to replace eventlet with trollius for the Juno release, or at least have an optional support of trollius
 * more generally, get your feedback :-)

I know that the sprint doesn't fit well with the Icehouse release (scheduled at the last day of the sprint!). I hope that patches can be reviewed during the sprint, so developers can update them if needed during the sprint. I guess that it's fine if patches are only merged after the sprint.

Victor

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

Doug Hellmann
I won't be attending PyCon this year, but count me in as a remote reviewer.

Will you be setting up an IRC channel we should join?

On Tue, Apr 1, 2014 at 4:40 AM, victor stinner
<[hidden email]> wrote:

> Hi,
>
> I will organize a sprint to Port OpenStack to Python 3 during 4 days at Montreal (Canada) during Pycon Montreal 2014, between April, 14 (Monday) and April, 17 (Thursday).
>
> The goal of the sprint is to port OpenStack components and OpenStack dependencies to Python 3, send patches to port as much code as possible. If you don't know OpenStack, you may focus more on OpenStack dependencies (MySQL-python, python-ldap,  websockify, ...). This is also a good opportunity to try to replace eventlet with trollius (asyncio)! Wiki page of the sprint:
>
>    https://wiki.openstack.org/wiki/Python3/SprintPycon2014
>
> I'm writing to openstack-dev to:
>
>  * know if you are interested to join the sprint
>  * know which OpenStack components are most interested to receive and *review* Python 3 patches during the sprint
>  * know which OpenStack components are interested to replace eventlet with trollius for the Juno release, or at least have an optional support of trollius
>  * more generally, get your feedback :-)
>
> I know that the sprint doesn't fit well with the Icehouse release (scheduled at the last day of the sprint!). I hope that patches can be reviewed during the sprint, so developers can update them if needed during the sprint. I guess that it's fine if patches are only merged after the sprint.
>
> Victor
>
> _______________________________________________
> OpenStack-dev mailing list
> [hidden email]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

John Dennis
In reply to this post by victor stinner
On 04/01/2014 04:40 AM, victor stinner wrote:

> Hi,
>
> I will organize a sprint to Port OpenStack to Python 3 during 4 days
> at Montreal (Canada) during Pycon Montreal 2014, between April, 14
> (Monday) and April, 17 (Thursday).
>
> The goal of the sprint is to port OpenStack components and OpenStack
> dependencies to Python 3, send patches to port as much code as
> possible. If you don't know OpenStack, you may focus more on
> OpenStack dependencies (MySQL-python, python-ldap,  websockify, ...).

What are the plans for python-ldap? Only a small part of python-ldap is
pure python, are you also planning on tackling the CPython code? The
biggest change in Py3 is unicode/str. The biggest pain point in the 2.x
version of python-ldap is unicode <--> utf-8 at the API. Currently with
python-ldap we have to encode most every parameter to utf-8 before
calling python-ldap and then decode the result back from utf-8 to
unicode. I always thought this should have been done inside the
python-ldap binding and it was a design failure it didn't correctly
handle Python's unicode objects. FWIW the binding relied in CPython's
automatic encoding conversion which applied the default encoding of
ASCII which causes encoding encoding exceptions, the CPython binding
just never used the correct argument processing in Py_ParseTuple() and
PyParseTupleWithKeywords() which allows you to specify the desired
encoding (the C API for LDAP specifies UTF-8 as does the RFC's).

The Py3 porting work for python-ldap is probably going to have to
address the unicode changes in Py3. If the Py3 port of python-ldap
brings sanity to the unicode <--> utf-8 conversion then that makes a
significant API change between the Py2 and Py3 versions of python-ldap
making calling the python-ldap API significantly different between Py2
and Py3. Does that mean you're also planning on backporting the Py3
changes in python-ldap to Py2 to keep the API more or less consistent?

FWIW I just spent a long time fixing unicode handling for LDAP and the
patches were just merged. I've also dealt with the unicode issue in
python-ldap in other projects (IPA) and have a lot of familiarity with
the problem. Also, unfortunately for the purpose of this discussion will
be off-line for several weeks starting at the end of the day.

--
John

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

John Dennis
In reply to this post by victor stinner
On 04/01/2014 04:40 AM, victor stinner wrote:
> Hi,
>
> I will organize a sprint to Port OpenStack to Python 3 during 4 days
> at Montreal (Canada) during Pycon Montreal 2014, between April, 14
> (Monday) and April, 17 (Thursday).
>
> The goal of the sprint is to port OpenStack components and OpenStack
> dependencies to Python 3,

This is a great goal, thank you! But I'm concerned it might be premature.

My concern is this. The singled biggest change in Py2 -> Py3 is string
handling, especially with regards to str vs. unicode. We have a
significant number of bugs in the current code base with regards to
encoding exceptions, I just got done fixing a number of them, I know
there are others. While I was fixing them I searched the OpenStack
coding guidelines to find out coding practices were supposed to be
enforcing with regards to non-ASCII strings and discovered there is
isn't much, it seems incomplete. Some of it seems based more on
speculation than actual knowledge of defined Python behavior. I'm not
sure, but given we do not have clear guidelines for unicode in Py2,
never mind guidelines that will allow running under both Py2 and Py3 I'm
willing to guess we have little in the gate testing that enforces any
string handling guidelines.

I'm just in the process of finishing up a document to address these
concerns. Unfortunately I'm going to be off-line for several weeks and I
didn't want to start a discussion I couldn't participate in (plus there
are some Py3 issues in the document I need to clean up) so I was going
to wait to post it.

My concern is we need to get our Py2 house in order *before* tackling
Py3 porting. Doing Py3 porting before we have clear guidelines on
unicode, str, bytes, encoding, etc. along with gate tests that enforce
these guidelines is putting the cart before the horse. Whatever patches
come out of a Py3 porting sprint might have to be completely redone.

FWIW projects that deal with web services, wire protocols, external
datastores, etc. who have already started porting to Py3 have
encountered significant pain points with Py3, some of which is just
being resolved and which have caused on-going changes in Py3. We deal
with a lot of these same issues in OpenStack. Before we just start
hacking away I think it would behoove us to first have a very clear and
explicit document on how we're going to address these issues *before* we
start changing code.



--
John

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

Doug Hellmann
On Tue, Apr 1, 2014 at 9:44 AM, John Dennis <[hidden email]> wrote:

> On 04/01/2014 04:40 AM, victor stinner wrote:
>> Hi,
>>
>> I will organize a sprint to Port OpenStack to Python 3 during 4 days
>> at Montreal (Canada) during Pycon Montreal 2014, between April, 14
>> (Monday) and April, 17 (Thursday).
>>
>> The goal of the sprint is to port OpenStack components and OpenStack
>> dependencies to Python 3,
>
> This is a great goal, thank you! But I'm concerned it might be premature.
>
> My concern is this. The singled biggest change in Py2 -> Py3 is string
> handling, especially with regards to str vs. unicode. We have a
> significant number of bugs in the current code base with regards to
> encoding exceptions, I just got done fixing a number of them, I know
> there are others. While I was fixing them I searched the OpenStack
> coding guidelines to find out coding practices were supposed to be
> enforcing with regards to non-ASCII strings and discovered there is
> isn't much, it seems incomplete. Some of it seems based more on
> speculation than actual knowledge of defined Python behavior. I'm not
> sure, but given we do not have clear guidelines for unicode in Py2,
> never mind guidelines that will allow running under both Py2 and Py3 I'm
> willing to guess we have little in the gate testing that enforces any
> string handling guidelines.
>
> I'm just in the process of finishing up a document to address these
> concerns. Unfortunately I'm going to be off-line for several weeks and I
> didn't want to start a discussion I couldn't participate in (plus there
> are some Py3 issues in the document I need to clean up) so I was going
> to wait to post it.
>
> My concern is we need to get our Py2 house in order *before* tackling
> Py3 porting. Doing Py3 porting before we have clear guidelines on
> unicode, str, bytes, encoding, etc. along with gate tests that enforce
> these guidelines is putting the cart before the horse. Whatever patches
> come out of a Py3 porting sprint might have to be completely redone.
>
> FWIW projects that deal with web services, wire protocols, external
> datastores, etc. who have already started porting to Py3 have
> encountered significant pain points with Py3, some of which is just
> being resolved and which have caused on-going changes in Py3. We deal
> with a lot of these same issues in OpenStack. Before we just start
> hacking away I think it would behoove us to first have a very clear and
> explicit document on how we're going to address these issues *before* we
> start changing code.

It sounds like the documentation you are working on will be really
helpful for planning.

We fully expect the porting process to take time and to have multiple
phases. Having code bases that can be tested under multiple versions
of the interpreter will be a good milestone, and help us uncover
issues similar to what you've identified in some places already. My
understanding is that the work so far has been on lower-level and
client libraries where the code base is smaller and the work can be
more focused for a similar reason.

Doug

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

John Dennis
In reply to this post by John Dennis
On 04/01/2014 09:44 AM, John Dennis wrote:
> FWIW projects that deal with web services, wire protocols, external
> datastores, etc. who have already started porting to Py3 have
> encountered significant pain points with Py3, some of which is just
> being resolved and which have caused on-going changes in Py3. We deal
> with a lot of these same issues in OpenStack.

Oh, almost forgot. One of the significant issues in Py3 string handling
occurs when dealing with the underlying OS, specifically Posix, the
interaction with Posix "objects" such as pathnames, hostnames,
environment values, etc. Virtually any place where in C you would pass a
pointer to char in the Posix API where the intention is you're passing a
character string. Unfortunately Posix does not enforce the concept of a
character or a character string, the pointer to char ends up being a
pointer to octets (e.g. binary data) which means you can end up with
strings that can't be encoded.

Py3 has attempted to deal with this by introducing something called
"surrogate escapes" which attempts to preserve non-encodable binary data
in what is supposed to be a character string so as not to corrupt data
as it transitions between Py3 and a host OS.

OpenStack deals a lot with Posix API's, thus this is another area where
we need to be careful and have clear guidelines. We're going to have to
deal with the whole problem of encoding/decoding in the presence of
surrogates.


--
John

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

victor stinner
In reply to this post by Doug Hellmann
Hi,

Le mardi 1 avril 2014, 08:12:20 Doug Hellmann a écrit :
> I won't be attending PyCon this year, but count me in as a remote reviewer.

Cool :-)

> Will you be setting up an IRC channel we should join?

Hum, let's say #openstack-pycon on the Freenode server.

Victor

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

victor stinner
In reply to this post by John Dennis
Hi,

Le mardi 1 avril 2014, 09:11:52 John Dennis a écrit :
> What are the plans for python-ldap? Only a small part of python-ldap is
> pure python, are you also planning on tackling the CPython code?

Oh, python-ldap was just an example, I don't have concrete plan for each
dependency. We are porting dependencies since some weeks, and many have
already a pending patch or pull request:
https://wiki.openstack.org/wiki/Python3#Dependencies

I know the Python C API and I know well all the Unicode issues, so I'm not
afraid of having to hack python-ldap if it's written in C ;-)

For your information, I am a Python core developer and I'm fixing Unicode
issues in Python since 4 years or more :-) I also wrote a free ebook
"Programming with Unicode":

   http://unicodebook.readthedocs.org/

> The biggest change in Py3 is unicode/str. The biggest pain point in the 2.x
> version of python-ldap is unicode <--> utf-8 at the API. Currently with
> python-ldap we have to encode most every parameter to utf-8 before
> calling python-ldap and then decode the result back from utf-8 to
> unicode.

According to the RFC 4511, LDAP speaks UTF-8 since the version 3. If the
encoding is always UTF-8, it's much easier :-)

> I always thought this should have been done inside the
> python-ldap binding and it was a design failure it didn't correctly
> handle Python's unicode objects. FWIW the binding relied in CPython's
> automatic encoding conversion which applied the default encoding of
> ASCII which causes encoding encoding exceptions, the CPython binding
> just never used the correct argument processing in Py_ParseTuple() and
> PyParseTupleWithKeywords() which allows you to specify the desired
> encoding (the C API for LDAP specifies UTF-8 as does the RFC's).

python-ldap can be modified to handle the unicode type in Python 2 and use
UTF-8.

> The Py3 porting work for python-ldap is probably going to have to
> address the unicode changes in Py3. If the Py3 port of python-ldap
> brings sanity to the unicode <--> utf-8 conversion then that makes a
> significant API change between the Py2 and Py3 versions of python-ldap
> making calling the python-ldap API significantly different between Py2
> and Py3. Does that mean you're also planning on backporting the Py3
> changes in python-ldap to Py2 to keep the API more or less consistent?

When I port an application to Python 3, I'm now trying to keep the same code
base for both Python versions and have the same API.

I don't know python-ldap, I will have to take a look at its source code to see
its current status and what should be done.

Victor

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

victor stinner
In reply to this post by John Dennis
Le mardi 1 avril 2014, 09:44:11 John Dennis a écrit :
> > The goal of the sprint is to port OpenStack components and OpenStack
> > dependencies to Python 3,
>
> This is a great goal, thank you! But I'm concerned it might be premature.

The portage is already in progress. There are many components (8 clients)
where the py33 (Python 3.3) gate is voting. We try to keep this page up to
date:

   https://wiki.openstack.org/wiki/Python3

There are already a lot of dependencies which are already Python 3 compatible,
and the portage of OpenStack "server" components already started.

> My concern is this. The singled biggest change in Py2 -> Py3 is string
> handling, especially with regards to str vs. unicode. We have a
> significant number of bugs in the current code base with regards to
> encoding exceptions, I just got done fixing a number of them, I know
> there are others.

In which OpenStack component?

> While I was fixing them I searched the OpenStack
> coding guidelines to find out coding practices were supposed to be
> enforcing with regards to non-ASCII strings and discovered there is
> isn't much, it seems incomplete. Some of it seems based more on
> speculation than actual knowledge of defined Python behavior. I'm not
> sure, but given we do not have clear guidelines for unicode in Py2,
> never mind guidelines that will allow running under both Py2 and Py3 I'm
> willing to guess we have little in the gate testing that enforces any
> string handling guidelines.

It's an ongoing effort. We are slowly porting code but also adding new tests
for non-ASCII data. For example, one of my recent patch for swiftclient adds
new tests for non-ASCII URLs, whereas the existing tests only test ASCII
(which is irrevelant for this specific test):

   https://review.openstack.org/#/c/84102/

To be honest, we invest more time on fixing Python 3 issues than on adding new
tests to check for non-regression. The problem is that currently, you cannot
even import the Python module, so it's hard to run tests and harder to add new
tests.

I hope that it will become easier to run tests on Python 2 and Python 3, and
to add more tests for non-ASCII data.

> I'm just in the process of finishing up a document to address these
> concerns. Unfortunately I'm going to be off-line for several weeks and I
> didn't want to start a discussion I couldn't participate in (plus there
> are some Py3 issues in the document I need to clean up) so I was going
> to wait to post it.

There are some "guidelines" to port Python 2 code on Python 3 on the wiki
page. You may complete it?

   https://wiki.openstack.org/wiki/Python3

> My concern is we need to get our Py2 house in order *before* tackling
> Py3 porting. Doing Py3 porting before we have clear guidelines on
> unicode, str, bytes, encoding, etc. along with gate tests that enforce
> these guidelines is putting the cart before the horse. Whatever patches
> come out of a Py3 porting sprint might have to be completely redone.

It's not easy to detect Unicode issues using Python 2 since most setup are in
english, only no test using non-ASCII data right now, and Python 2 uses
implicit conversion between bytes and Unicode strings.

It's much easier to detect Unicode issues using Python 3. I don't want to drop
Python 2 support, just *add* Python 3 support. The code will work on 2.6-3.3.

> FWIW projects that deal with web services, wire protocols, external
> datastores, etc. who have already started porting to Py3 have
> encountered significant pain points with Py3, some of which is just
> being resolved and which have caused on-going changes in Py3. We deal
> with a lot of these same issues in OpenStack. Before we just start
> hacking away I think it would behoove us to first have a very clear and
> explicit document on how we're going to address these issues *before* we
> start changing code.

There are many web servers and clients already running on Python 3. For
example, Django supports Python 3 since its version 1.5 (released in February
2013). I ported Paste and PasteScript on Python 3, but there are also web
modules already Python 3 compatible (ex: WebOb).

Victor

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

victor stinner
In reply to this post by John Dennis
Hi,

Le mardi 1 avril 2014, 10:48:21 John Dennis a écrit :
> Oh, almost forgot. One of the significant issues in Py3 string handling
> occurs when dealing with the underlying OS, specifically Posix, the
> interaction with Posix "objects" such as pathnames, hostnames,
> environment values, etc. Virtually any place where in C you would pass a
> pointer to char in the Posix API where the intention is you're passing a
> character string. Unfortunately Posix does not enforce the concept of a
> character or a character string, the pointer to char ends up being a
> pointer to octets (e.g. binary data) which means you can end up with
> strings that can't be encoded.

I know well these things because I worked directly in Python to have the best
Unicode support for filenames on UNIX and Windows. The summary is "Python 3
just works with Unicode filenames". You don't have to do anything.

For example, you don't have to worry of the encoding *of the filename* for such
code :

    for filename in os.listdir("conf/*.conf"):
        with open(filename, "rb", encoding="utf-8") as fp: ...

os.listdir() and open() use the same encoding: the filesystem encoding
(sys.getfilesystemencoding()) with "surrogateescape" error handler.

If you prefer bytes filenames, there are also supported on UNIX, but deprecated
on Windows.

> Py3 has attempted to deal with this by introducing something called
> "surrogate escapes" which attempts to preserve non-encodable binary data
> in what is supposed to be a character string so as not to corrupt data
> as it transitions between Py3 and a host OS.
>
> OpenStack deals a lot with Posix API's, thus this is another area where
> we need to be careful and have clear guidelines. We're going to have to
> deal with the whole problem of encoding/decoding in the presence of
> surrogates.

My policy is to always store filenames as Unicode. It's easy to follow this
policy since Python 3 returns Unicode filenames (ex: os.listdir("directory")).

You should not have to worry of invalid filenames / surrogate characters since
this case should be very rare.

---

Surrogates are only used if a filename contains a non-ASCII character and the
locale encoding is unable to decode it. This case should be very rare.
Usually, file content can contain non-ASCII data, it's more rare for file names.
I mean for a Linux server running OpenStack, but it's common for user
documents on Windows for example.

You only have to worry of surrogates if you want to display a filename or write
filenames in a file (ex: configuration file). Another major change in Python 3 is
that the UTF-8 encoder became strict: surrogate characters cannot be encoded
anymore (except if you use the "surrogatepass" error handler). So if your
locale encoding is UTF-8, print(filename) or open("test", "w").write(filename)  
will fail if the filename contains a surrogate character.

If you want to display filenames containing surrogate characters, use
repr(filename) or an error handler: "replace", "backslashreplace" or
"surrogateescape", depending on your use case.

If you want to write filenames containing surrogate characters into a file, you
might use the "surrogateescape " error handler, but it's probably a bad idea
because we may get error when *reading* again this file. It's maybe better to
raise an error if a filename is invalid (contains surrogate characters).

If you system uses the ASCII locale encoding, another option is also to fix
your setup configuration to use a locale using the UTF-8 encoding.

Victor

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

John Dennis
In reply to this post by victor stinner
On 04/01/2014 12:15 PM, Victor Stinner wrote:

> Hi,
>
> Le mardi 1 avril 2014, 09:11:52 John Dennis a écrit :
>> What are the plans for python-ldap? Only a small part of python-ldap is
>> pure python, are you also planning on tackling the CPython code?
>
> Oh, python-ldap was just an example, I don't have concrete plan for each
> dependency. We are porting dependencies since some weeks, and many have
> already a pending patch or pull request:
> https://wiki.openstack.org/wiki/Python3#Dependencies
>
> I know the Python C API and I know well all the Unicode issues, so I'm not
> afraid of having to hack python-ldap if it's written in C ;-)
>
> For your information, I am a Python core developer and I'm fixing Unicode
> issues in Python since 4 years or more :-) I also wrote a free ebook
> "Programming with Unicode":
>
>    http://unicodebook.readthedocs.org/

Great! It's wonderful to have someone steering the effort that actually
understands the issues. FWIW, I too have been fixing Python unicode
issues for years as well as using CPython and knowing it intimately.

Your book is good, I've seen it.

My general observation is i18n is a lot like security, 95% of developers
don't understand it, don't want to deal with it and have the mistaken
belief they can postpone addressing it until after the "coding is done"
instead of building it in from the very beginning. Security and
internationalization can't be bolted onto the side as an afterthought,
it has to be designed in from the beginning.

Since developers are not going to learn the issues what I think is
needed is a small set of do's and dont's, Follow some simple rules and
you'll be mostly O.K.

My simple rules go like this (I think you would concur):

* Every text string *internal* to your code is unicode.

* You encode/decode at the boundaries. Either an API boundary or an I/O
boundary. You must know and understand which encoding will be used at
the boundary and what the boundary requirements are.

* The use of str() should be banned, it's evil. Use six.text_type instead.

O.K. that might be a bit simplistic but it covers a large percentage.
The downside is the existing OpenStack code is nowhere near close to
following even these simple rules.



--
John

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

John Dennis
In reply to this post by victor stinner
On 04/01/2014 12:28 PM, Victor Stinner wrote:

> Le mardi 1 avril 2014, 09:44:11 John Dennis a écrit :
>>> The goal of the sprint is to port OpenStack components and
>>> OpenStack dependencies to Python 3,
>>
>> This is a great goal, thank you! But I'm concerned it might be
>> premature.
>
> The portage is already in progress. There are many components (8
> clients) where the py33 (Python 3.3) gate is voting. We try to keep
> this page up to date:
>
> https://wiki.openstack.org/wiki/Python3
>
> There are already a lot of dependencies which are already Python 3
> compatible, and the portage of OpenStack "server" components already
> started.
>
>> My concern is this. The singled biggest change in Py2 -> Py3 is
>> string handling, especially with regards to str vs. unicode. We
>> have a significant number of bugs in the current code base with
>> regards to encoding exceptions, I just got done fixing a number of
>> them, I know there are others.
>
> In which OpenStack component?

For one:

https://bugs.launchpad.net/keystone/+bug/1292311

But just looking at a lot of the OpenStack code it's easy to see things
are going to blow up once you start passing around non-ASCII characters.

> To be honest, we invest more time on fixing Python 3 issues than on
> adding new tests to check for non-regression. The problem is that
> currently, you cannot even import the Python module, so it's hard to
> run tests and harder to add new tests.


> I hope that it will become easier to run tests on Python 2 and Python
> 3, and to add more tests for non-ASCII data.

Yes, the fact the vast majority of the unit tests only pass ASCII values
is a significant problem.

Most of the problems are data driven. If you don't test with the data
that causes the problems you're not fully testing and that allows a lot
of problems to sneak through.

IMHO code reviews should not permit the inclusion of unit tests which do
not utilize test data containing non-ASCII characters. All existing unit
tests should be updated to use non-ASCII strings.

FWIW my last patch to one of the keystone LDAP unit test converted most
all the strings to contain non-ASCII characters. We should be doing this
for a lot of the test code.

> It's not easy to detect Unicode issues using Python 2 since most
> setup are in english, only no test using non-ASCII data right now,
> and Python 2 uses implicit conversion between bytes and Unicode
> strings.

See above.

It's not quite fair to say Py2 implicit conversions mask problems, in
many instances Py2's implicit conversions are the root cause of problems
(mainly because ASCII is the default encoding applied during the
implicit conversion).

>
> It's much easier to detect Unicode issues using Python 3. I don't
> want to drop Python 2 support, just *add* Python 3 support. The code
> will work on 2.6-3.3.


--
John

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Reply | Threaded
Open this post in threaded view
|

Re: Sprint at Pycon: Port OpenStack to Python 3

John Dennis
On 04/01/2014 02:08 PM, John Dennis wrote:

>>> My concern is this. The singled biggest change in Py2 -> Py3 is
>>> string handling, especially with regards to str vs. unicode. We
>>> have a significant number of bugs in the current code base with
>>> regards to encoding exceptions, I just got done fixing a number of
>>> them, I know there are others.
>>
>> In which OpenStack component?
>
> For one:
>
> https://bugs.launchpad.net/keystone/+bug/1292311
>
> But just looking at a lot of the OpenStack code it's easy to see things
> are going to blow up once you start passing around non-ASCII characters.

Oh almost forgot ...

The openstack log module blows up if you pass a UTF-8 encoded string.

For the LDAP code that limitation meant any logging had to be performed
before encoding and if any logging had to be done after encoding it
meant one had to be sure values were decoded again before logging.
That's very error prone.

I think the logging module needs to be fixed so that if it receives a
str or bytes object it will assume the encoding is UTF-8 (not default
ASCII) and properly decode it prior to forming the final message.

Not being able to log UTF-8 encoded strings is an accident waiting to
happen (nothing worse than an import message not getting seen because it
got trapped in an encode/decode exception handled (or not handled)
elsewhere)

--
John

_______________________________________________
OpenStack-dev mailing list
[hidden email]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev