Python for .NET: Lessons learned

Mark Hammond, ActiveState Tool Corporation

[email protected]

November 2000.

Introduction

Python[1] is a dynamic, object oriented language. Over 10 years old, its popularity has been slowly growing to the point where it often ranks highly in polls for most popular programming language, and has many books and companies dedicated to its support. Although Python has a Unix heritage, it is very much at home on the Windows platform, with support for many windows specific services, including COM, Active Scripting, NT Services and so forth.

The .NET[2] framework is a set of facilities and technologies developed by Microsoft to provide new levels of functionality for applications and development tools. These technologies include a runtime system with features such as garbage collection, just-in-time compilation, and a type system available for use by many languages.

This paper describes an exploratory implementation of the Python language for the .NET framework. It describes the scope of the project, the experiences and problems in implementing a language such as Python for .NET, and a brief discussion of future directions both Python and .NET could take to address these issues.

About the implementation

Mark Hammond and Greg Stein created the Python for .NET implementation between early 1999 and July 2000. The work was performed under contract to Microsoft, and all work described here is subject to the following copyright:

Portions Copyright 1999-2000 Microsoft Corporation.

Portions Copyright 1997-1999 Greg Stein and Bill Tutt.

Also refer to the copyright information for the specific version of Python you are using.

The terms of the contract allow for unlimited use and copying of Python for .NET, including all source code, provided all copyright information remains in place.

The implementation can be found at http://www.ActiveState.com/.NET - this includes all source code, documentation and the latest version of this document.

Terminology

The term Python will often be ambiguous in this document. Therefore, the following conventions are used.

Whenever the term Python is used alone, it will refer to the Python language specification[4] rather than a specific implementation.

Status

The current state of the compiler and the runtime systems shows that it is possible to have compiled Python programs fully supported within the .NET framework. Most of the basic infrastructure is in place, with full bi-directional cross-language support (i.e., Python can inherit or call objects created in other languages, and other languages can inherit from or call objects created in Python).

There are three categories of problems which preclude Python for .NET from being truly useful to a large number of people:

· There is no support for some of the features of .NET that other frameworks will require, such as custom attributes, PInvoke or ASP.NET. This means that Python for .NET users are not currently able to write classes that interact with tools requiring those features. Related topics are the mismatch between the class/instance semantics, module/package semantics and exception systems.

· The speed of the current system is so low as to render the current implementation useless for anything beyond demonstration purposes. This speed problem applies to both the compiler itself, and the code generated by the compiler. Given that part of the appeal of Python programming is a quick edit-compile-run cycle, the speed issues severely limit the utility of Python on this platform. Some of the blame for this slow performance lies in the domain of .NET internals and Reflection::Emit, but some of it is due to the simple implementation of the Python for .NET compiler.

· There is no support for some Python features that some programs will require. Most of these are fairly obscure, but are a limitation. Examples include:

- String formatting

- Core language features, such as long integers, complex numbers, builtin object methods and so forth.

- The standard Python library.

If these of issues are addressed, we feel that Python for .NET would become a viable, interesting technology to use within .NET.

Architecture

The Python for .NET implementation consists of 3 semi-discrete areas – the compiler, the runtime, and the library. Although there is obviously significant interaction between these parts, each has a quite distinct architecture.

Compiler

The Python for .NET compiler is written using CPython. It compiles Python source code, and uses the .NET Reflection::Emit library to generate a .NET assembly. The COM Interoperability features of .NET are used to access the Reflection::Emit library.

This particular strategy was chosen to minimize the implementation time. Python’s parser is built-in to the CPython runtime, and the existing Python2C project[5] had infrastructure we could borrow. Greg Stein was involved in the Python2C project, so provided the expertise to get running quickly. The Python2C heritage is reflected in the copyright statements at the start of this document.

The key benefits to this approach were the rapid implementation of a simple compiler, and the rapid development obtained by coding the compiler in CPython.

The primary drawback to this approach is the speed of the compiler. Much of the abstract syntax tree (AST) manipulation code is also written in Python code, and as this is one of the most CPU intensive areas of the compiler, we suffer a significant speed penalty.

Further, the use of Reflection::Emit via COM is also causing us some performance problems. Some of these problems are due to the speed of the Python COM bindings, but Reflection::Emit itself and/or the COM interoperability layers are also costing us significant time.

Runtime

Due to the dynamic nature of Python, the compiler will often generate code that references the Python for .NET runtime. Even for a simple Python expression such as “a + b”, if the types of the variables are not known by the compiler, it will generate code to ensure the Python for .NET Runtime determines the correct semantics at runtime.

One of the most important jobs of the Runtime is to ensure that the Python language semantics are faithfully implemented. For this reason, the design and implementation of the Python for .NET runtime borrows heavily from the CPython implementation. The Python for .NET runtime is written in C#.

The Python for .NET runtime defines a .NET interface (IPyType) that captures Python’s semantics. The definition of this interface is almost identical to the existing CPython type object, which is the object primarily responsible for object semantics in CPython.

The IPyType interface defines the semantics for a Python type (ie, one or more .NET types) independent of an object instance. For example, the type definition for a string or integer defines the behaviour of strings and integers without reference to a specific string or specific integer. Thus, to operate on any .NET object, the Python runtime needs two components; an instance of an IPyType interface that describes the Python semantics (such as defining string or integer semantics), and the .NET object itself (ie, a reference to the specific string or integer being operated on). For this reason, the Python for .NET runtime defines a PyObject value-type (i.e., a C struct), which consists of a reference to an interface, and a .NET object. Almost all runtime functions work with these PyObjects.

The Python for .NET runtime also exposes an API for use by the compiler, working almost exclusively with the PyObject structures. The runtime also provides a function for creating a new PyObject at runtime, given nothing but an anonymous .NET object reference. The compiler will frequently generate calls to create these PyObject structures (often storing the result in a variable), and also pass these PyObject structures back into the runtime as needed.

Library

The role of the Python for .NET library is to implement the standard Python library – the modules Python guarantees will be available to a program at runtime. Python programs reference library modules with the import statement.

This standard library is defined by what is provided (and documented) by CPython. CPython allows these library modules to be written in Python, or in C/C++ where they are compiled as DLLs.

The Python for .NET library is very thin at the moment – only a handful of modules have been implemented. However, those that have are implemented mainly in Python for .NET, and built using the Python for .NET compiler. The few remaining builtin functions are implemented in C# in the Python for .NET runtime.

Discussion

This section discusses various points relating to the Python for .NET implementation. We discuss other implementation strategies for Python on .NET itself, down to specific implementation details of this current scheme.

We also discuss the identified limitations in some detail, and discuss strategies for overcoming them, then finish with a discussion on future directions and possibilities.

Alternative Implementation Strategies

Python for .NET itself

An alternative implementation strategy for Python for .NET would be to leverage CPython, and attempt to implement .NET support as a regular Python extension module. This was not considered at the outset of the project, as a primary motivation for Microsoft’s involvement was to prove that the .NET runtime and Intermediate Language were capable of supporting the language. The existing implementation demonstrates that this is indeed true, so now that the focus must switch to providing the best compatibility and/or interoperability with CPython, this idea warrants consideration.

A full analysis of this option is beyond the scope of this document. However, there are a number of obvious limitations with this approach

· Python code would not be verifiable. Thus, it would be impossible to write trusted content in Python.

· Other .NET languages would be unable to inherit from (or otherwise treat as a class) Python objects. It is quite likely that the reverse is possible – CPython programs would be able to inherit from .NET classes. This may be a trade-off worth making. Additionally, enhancements to .NET could assist in this goal.

· Any other .NET tools that require .NET support, such as possibly ASP.NET or the WinForms designer tool would not support Python.

Note that to all intents and purposes, this option is supported today using existing Python and .NET facilities. The .NET COM interoperability features means that many .NET features are available today – indeed, the compiler itself depends on this to be able use Reflection::Emit.

The compiler

A number of alternative strategies were considered for the compiler.

Use a different intermediate language

This solution would involve compiling the Python source code, and generating some other .NET compatible language rather than .NET Intermediate Language (IL). The most likely target language for this scenario would be C#.

It was decided that the Python compiler should target .NET IL if at all possible. It was felt that this would provide the tightest integration with the .NET framework, particularly in areas such as debugging and diagnostics. As there appeared to be no significant impediment to using IL directly, this option was rejected.

Use the unmanaged emit API.

The “unmanaged emit API” is a traditional, C style DLL in the standard Windows tradition. The Reflection::Emit library is a set of .NET classes. Both APIs are designed to take .NET IL and create .NET assemblies.

This solution would involve using the “unmanaged” .NET API instead of the .NET classes. However, there were 2 reasons why this option was rejected:

- The advice of the Microsoft staff was that the Reflection::Emit APIs are the new official way of creating assemblies. The unmanaged API existed only to provide a solution to the boot-strapping problem; you can’t create a managed .NET API for compilers until you have compilers to write the API in!

- Accessing these unmanaged APIs from Python code was more difficult than accessing a COM API.

Using the Reflection::Emit classes appeared to offer the shortest implementation time, so was chosen over the unmanaged option.

The runtime

There were no identified alternatives to the basic scheme of having a .NET runtime implement many of the Python semantics. Although improvements to the compiler would mean less reliance on the runtime, it is believed that the compiler will always require the runtime for some operations.

There are however a number of different implementation strategies possible. The first version of the compiler always works with .NET object references, rather than the PyObject structure as defined above. However, this meant that any given Object had its IPyType interface looked-up many times throughout its life, and this had a large performance impact. Performance testing at the time showed that using a value type (instead of a .NET class) provided the best performance.

The Microsoft support staff used their internal testing tool to determine that the IPyType interface lookup for a given .NET object is the biggest hot spot in the runtime. However, as these performance tools are not available to the Python for .NET project, meaningful tuning work is not possible. Hopefully, commercial performance analysis tools will appear allowing further analysis of the runtime performance.

The library

The Python for .NET compiler and runtime have been designed to allow arbitrary .NET assemblies to be used as Python modules. Thus, it should be possible to write the standard library in any language. The modules written in Python and compiled by the .NET compiler could have been written in C#. However, it was felt that writing the library in Python wherever possible would provide significant benefits, as witnessed in CPython.

Another strategy that has not been investigated would be to port the existing C modules to managed C++ or C#. This has not been explored as the managed C++ compiler was previously not stable enough for serious use, but primarily as this seemed a huge and improbable task. It should be investigated, as there is a huge body of existing C code that is otherwise useless in the Python for .NET environment.

The only other identified strategy would be to attempt to have the compiler inline all of the standard library calls. Although this is indeed done for come common built-in functions, it was not considered a viable option for the entire standard library, and would place too much burden on compiler maintenance.

Limitations and possible enhancements

In this section, we devote some space to the limitations on the Python for .NET system.

Performance

Probably the biggest single issue with Python for .NET is the performance of both the compiler and the runtime. The speed of the runtime must be the more critical issue, as the fastest compiler in the world would not be used if the generated code is too slow to be useful.

Only a small amount of effort has gone into analysing the performance of the runtime, mainly due to the lack of performance analysis tools available for .NET. Without such tools, making performance related changes is fruitless, as the effectiveness is difficult to measure.

Not withstanding the tuning of the runtime system, the simple existence of the runtime accounts for much of our performance problem. When simple arithmetic expressions take hundreds or thousands of Intermediate Language instructions (via the Python runtime) to complete, performance will always be a struggle.

We discuss type related performance in more detail later in later sections.

Closed World syndrome

Competing for the title of biggest single issue would have to be interoperability with existing Python code. Python itself is a very simple language, deriving much of its power from its library. Although the standard library (the modules provided with Python) is very rich, it is the vast array of extension modules provided by 3rd parties that really provides the power. Indeed, most Python books and Internet queries relate to using various modules, rather than the language itself.

The existing Python for .NET system does not allow any leverage of existing Python code. Although much of the standard library can be ported to .NET, it would not be reasonable to attempt to cover every Python module available.

Although .NET has its own rich class library, it does not solve the problem. There is not 100% overlap between the .NET and Python libraries, and lots of existing Python code already exists which reference these various external modules - module dependencies often run quite deep.

Class and instance semantics

Although support for basic classes works fine, there are a number of areas where Python and .NET semantics collide – the most obvious being multiple inheritance (supported by Python, but not by .NET). Other more subtle examples include the ability of a Python subclass to avoid calling a base-class constructor, or to reference self (the moral equivalent of this in C#) before calling any constructors at all.

Unfortunately, many of these semantics are used regularly in Python programs, so simply not supporting them in Python for .NET would raise a significant compatibility hurdle.

It was apparent that the existing simple design would not support the required semantics, so therefore no attempt was made to extend it as far as needed. However, with a clever design of the Python for .NET class system, and borrowing influence from JPython, it should be possible to get a very close match to the defined Python semantics.

Type declarations or inference for speed

Python code does not have type declarations. However, all Python objects have a distinct type, so Python is not a typeless language. For example, if we consider the two Python statements:

a = "hello"
a = 7

Although the Python variable a has not been declared, after the first statement it references a Python string object, and after the second it references a Python integer object. At any point in time, the variable has a specific type.

As a result of this, the compiler is rarely able to generate efficient code. The compiler makes no attempt to track variable assignments and types, so always generates code for the general case. Thus, simple arithmetic operations take many orders of magnitude more Intermediate Language instructions (including calls into the Python for .NET runtime) than would be required if the types of the variables were known.

There are two general approaches to this - type inference and type declarations. Type interference would involve the compiler tracking assignments and the type of objects. Although good results would be possible using local analysis, the dynamic nature of Python, and the modular compilation unit would prevent this working completely effectively. Type declarations would involve explicit declarations or other hints being added by the user. However, Python does not define syntax for these features.

Type declarations for semantics

There are certain situations where type declarations are required to capture certain.NET semantics rather than for speed. The most obvious example is method overloading – the ability to define a class with multiple methods of the same name, each differing by number or types of parameters. As Python does not support type declarations, it has no way to express such constructs. Therefore, some syntax[6] was invented to allow expression of overloaded methods.

Although this is an adequate work around for Python implementing overloaded methods and functions, it does not provide the facility to nominate the overloaded method when actually making a call. For example, given the Python statement:

myobject.Foo(a)

and that Foo is an overloaded method, the compiler has no way to determine the correct function to call, nor does the user have the ability to nominate it. In this situation we rely on .NET Reflection to select the appropriate function at runtime, but this has performance implications, and simply does not work when attempting to call base class methods or constructors.

Possible .NET or Python enhancements

This section describes a few changes that could happen to Python and/or .NET that we could take advantage of.

As can be seen from the previous section, there is plenty of work still to be done before we take full advantage of .NET given the current state of both .NET and Python. Therefore, we limit this section to be brief discussion of the possibilities, and save further analysis for when the existing implementation could be considered of usable quality.

Type declarations

There has been discussion in the Python community about adding optional type declarations to the next major version of Python. Although consensus has not been reached, the proposals being discussed would support all of our type declaration requirements – declaring the signature when implementing a method, and selecting which signature you wish to call.

However, once we have syntactic support, the compiler will obviously need to be enhanced to take advantage of the declarations. Indeed, this is probably the biggest issue preventing Python from moving forward with concrete syntax proposals – it is still not clear anyone has the time or inclination to enhance the CPython runtime to take full advantage, so therefore the syntax enhancements would be pointless. If a commitment from the JPython or Python for .NET projects was made to support these enhancements, it may help accelerate the acceptance of these proposals into the Python language specification.

Dynamic language support.

Due to Python’s dynamic nature, there are some Python features that are difficult to map into .NET semantics. A simple example is the ability for a Python object to add attributes at runtime – although no declaration or other reference to the attribute can be seen by source code analysis, reference to the attribute will succeed at runtime. Python provides many other ways to change object behaviour at runtime that are not captured by .NET.

To support this capability, the compiler will often emit special symbols or code specific to Python. At runtime, if these features are found they are used, and thus Python can take advantage of these features. This allows Python code compiled in a separate compilation unit (that is, it exists in a separate assembly) to still provide these dynamic Python semantics when the caller is Python.

This dynamic capability is analogous to IDispatch support in COM – the ability for a language to dynamically determine or expose an object model at runtime. .NET is focussed much more towards compile time determination of these attributes, in a clear drive for speed. However, the very nature of Python and scripting languages in general is that their users have made a conscious decision to trade execution speed for these runtime features – although possibly not as much execution speed as Python for .NET is currently costing them.

It is clear such dynamic features may preclude certain other .NET features - for example, the performance penalty associated with allowing dynamically obtained methods to be used as virtual methods may mean they are not supported as virtual. However, there is still enough utility in the feature overlap that would make this a useful, but optional .NET addition.

It should also be noted that there are many languages with dynamic features comparable to Python. However, with Python and every other such language needing to invent its own dynamic solution, these languages are not able to share such features, even when it would make sense to be able to do so. Formalization of these features in .NET would allow multiple dynamic languages to interoperate in a natural manner.

Conclusion

The .NET framework is an exciting architecture from Microsoft that provides many features in a language neutral manner. This offers unprecedented capabilities for working with multiple languages in a single project, while also offering type safety, security, just-in-time compilation and other features across all these languages.

Python is a very popular dynamic language often categorised as a scripting language. It has a number of dynamic features that make it very attractive in certain problem domains, but these very dynamic features are the ones that cause the most friction with .NET.

The intention of this project was to establish that the Microsoft .NET framework was capable of supporting the Python language, and to suggest ways that Python and .NET can be more tightly integrated in the future. On this basis, the project can be considered a complete success; a working version of the language has been delivered, and as this document shows, some real insights into the future possibilities have been gained.

There is, however, still significant work required to make Python truly useful on the .NET platform. We look forward to working with Microsoft to allow even tighter integration, and expect that future versions of Python will become an excellent choice for many developers working with .NET.



[1] The official internet site for Python is at http://www.python.org

[2] The official Microsoft internet site for .NET is at http://www.microsoft.com/net

[3] Our preferred name for this project is simply Python.NET, but the issue of using the trademarked .NET in this manner, plus the existence of the python.net internet domain makes this issue murky. Therefore, we have adopted the safe Python for .NET.

[4]http://www.python.org/doc/current/ref/

[5]http://lima.mudlib.org/~rassilon/p2c/

[6] Technically it is not a syntax change – the compiler recognises assignment to a specially named variable, and reacts accordingly.

Converted with Word to HTML.