October 15, 2008

Python 3.0 makes a big break

Author: Joab Jackson

Typically, each new version of the Python programming language has been gentle on users, more or less maintaining backward compatibility with previous versions. But in 2000, when Python creator Guido van Rossum announced that he was embarking on a new version of Python, he did not sugar coat his plan: Version 3.0 would not be backward-compatible. Now that the first release candidate of Python 3.0 is out, with final release planned for later this month, developers must grapple with the issue of whether to maintain older code or modify it to use the new interpreter.

Developers hate it when a new version of a language doesn't work with the code written for older versions of that language, but for van Rossum, the radical upgrade was necessary. The language was becoming ever more weighed down by multiple ways of doing the same task, and ways of doing tasks no one ever actually did.

"The motivation for 3.0 was to have one specific event where we did as much of the backward incompatibility all at once," van Rossum says. The idea is to "give the language a better foundation for going forward."

Naturally, some stirrings of discontent can be felt across the Python community.

"Python is pretty much determined to remove itself from consideration from various kinds of international projects like the one I work on. We're already catching flak from people due to a few things that were valid in 2.2 that are not valid in 2.3," bemoaned one developer in the comp.lang.python newsgroup.

"For an operating system distributor, Python 3.0 represents a large potential change in their repository of packages for relatively little benefit in terms of resulting functionality," says UK Python developer Paul Boddie.

What changes?

In a way, Python has been a victim of its own success. "The original idea for the language had a much smaller scope. I really hadn't expected it to be so successful and being used in a wide variety of applications, from Web applications to scientific calculations, and everything in between," van Rossum says.

Van Rossum first created Python in 1990, as an open source, extensible, high-level language that he needed to handle some system administration duties. Today, Python is one of the most popular languages used world-wide. In March 2008, Austrian researcher Anton Ertl ranked programming languages in terms of their popularity as gauged by the number of postings in Usenet newsgroups. Python proved to be the third most-discussed language on Usenet, right after C and Java, ahead of such stalwarts as C++ and Perl.

When it comes time to teach someone how to program, often the easiest programming language to use is Python. It is the Basic of today, though more elegant to work with than Basic ever was.

Yet Python's simplicity was being threatened by the unchecked growth of the language, van Rossum says. Throughout the '90s, new functions and features were bolted onto the language, and inconsistencies started popping up across the platform. "We were slowly losing the advantage" of simplicity. "We had to break backward compatibility. The alternative was unchecked bloat of the language definition, which happens very slowly and almost unnoticeable."

Python.org has a list of changes to the language. Some are small, and will go unnoticed by most programmers. Others can be relearned quickly.

"Most of the differences are in the details; the general gist of the language, how people think about the language and the capabilities are pretty much unchanged," van Rossum says.

For instance, the print statement got turned into a print function; you must now put parentheses around what you want to print to the screen. The change allows developers to work with print in a more flexible and uniform way. If someone needs to replace the print function with some other action, it can be done with a universal search and replace, rather than rewriting each print statement by hand.

Another change: The language only has one integer type, instead of the former distinction between long and short integers, which van Rossum characterizes as worthless.

Another tangle of cruft that has been pruned back is something called "classic classes." Python version 2 has two sets of classes, each with its own format. "There was a lot of machinery in the Python virtual machine that was either special-casing classic classes or double implementations with version for classic classes and another version for new-style classes. This was implementation bloat," van Rossum says. So, after six years of campaigning to get people to move to the new classes, the developers of Python 3 have put their collective foot down and are doing away with the classic classes.

Perhaps the biggest change that takes place with Python -- and the one that will require the most rewriting of existing code -- is the new way Python deals with bytes and strings. Originally, Python represented all input and output as strings. When Python was used more in casual settings, most strings that went through the interpreter could easily be represented by the standard ASCII character set. But as the language's use grew to global proportions, more and more users started using Unicode to support a wider array of language characters. To Python, Unicode looked a lot like 8-bit binary byte strings, which could be passed along to the interpreter as part of the output from another program. In some cases, the interpreter would confuse binary data with Unicode-encoded strings, and it would choke, big-time.

The answer? Define a new object class for handling bytes -- a first for Python. Also, redefine strings as Unicode. And then keep the two clear of one another. In other words, the byte type and string type are not compatible in Python 3.0.

"If you ever make the mistake of passing bytes around where you think they are text, your code will raise an exception almost immediately," van Rossum said.

Conversion and converts

van Rossum admits now that he didn't think much of the transition difficulties when he first started thinking about Python 3.0. "In 2000, the community was small enough that I thought that I would just release a new version and people would fix their code and that is that."

In the feedback process with the user community though, the core developers started hearing more clamor for a smooth transition process. They needed tools.

This is the role of the recently released version 2.6 of Python, which will serve as a transition version of the language. Users can easily upgrade their code from earlier versions of Python to version 2.6. The 2.6 interpreter will offer warning messages about aspects of the program that will no longer fly with version 3.0.

"We're encouraging people to upgrade to Python 2.6," van Rossum says. "2.6 can help you find the anachronisms in your code that you will need to change to be prepared for 3.0."

The development team also created a transition tool called 2to3 that converts Python 2.6 code into Python 3.0 code. You can then run your older code in 2.6, rewriting it until all the warning messages have been eliminated, then use 2to3 to convert the code into Python 3.0 specs.

Of course, since Python is a dynamic language, where types are not explicitly declared, there are a lot of cases where a translation tool will not help much, but it should help with the mundane tasks, such as changing print statements into print functions.

Even with this tools in place, van Rossum admits that the migration of the user community will be a slow haul, and not all Python shops will make the transition.

For instance, the printing preparation company Aahz Maruch works for, Page DNA, relies on 200,000 lines of Python code in its core revenue-generating operations. "It would be a huge job" to translate this code into Python 3.0, Maruch says. He says the company will wait for a few years for the automatic translation tools to improve. "We haven't even talked about 3.0 -- it's at least two or three years away."

Others are more skeptical about the necessity of the upgrade.

The "implementation of Python 3.0 gives relatively little attention to [current] issues [modern programming languages face], such as performance, pervasive concurrency, or the provision of a renewed, coherent bundled library," Boddie says.

He says there is a danger that Python 3.0 may not be seen as a necessary upgrade for most developers and, as a result, it could lose its status as the de facto Python in much the same way Windows Vista hasn't become the de facto Windows over its predecessor Windows XP.

Today, the chief implementation of Python is CPython, which is a Python interpreter written in C. However, Boddie notes that other implementations exist, such as JPython (Python in Java), IronPython (Python written in Microsoft .Net's Common Runtime Language), and PyPy (a Python interpreter written in Python).

"I think that Python 3.0 may actually focus attention on other Python implementations, particularly if these do not pursue Python 3.0 compatibility as a priority," Boddie suggests.

Nonetheless, the core development team is confident the tide will turn their way, eventually. "I expect that most people will be using 2.6 a year from now. Only bleeding edge people will be using 3.0," van Rossum says. However, "if you're starting a brand new thing, you should use 3.0."


  • Python
  • News
Click Here!