SWIG 3 中文手册——4. 脚本语言

4 脚本语言

This chapter provides a brief overview of scripting language extension programming and the mechanisms by which scripting language interpreters access C and C++ code.c++

本章简要概述了脚本语言扩展编程,以及脚本语言解释器访问 C 和 C++ 代码的机制。编程

4.1 两种语言的概览

When a scripting language is used to control a C program, the resulting system tends to look as follows:api

当使用脚本语言来控制 C 程序时,生成的系统每每以下所示:数组

Scripting language input - C/C++ functions output

In this programming model, the scripting language interpreter is used for high level control whereas the underlying functionality of the C/C++ program is accessed through special scripting language "commands." If you have ever tried to write your own simple command interpreter, you might view the scripting language approach to be a highly advanced implementation of that. Likewise, If you have ever used a package such as MATLAB or IDL, it is a very similar model--the interpreter executes user commands and scripts. However, most of the underlying functionality is written in a low-level language like C or Fortran.数据结构

The two-language model of computing is extremely powerful because it exploits the strengths of each language. C/C++ can be used for maximal performance and complicated systems programming tasks. Scripting languages can be used for rapid prototyping, interactive debugging, scripting, and access to high-level data structures such associative arrays.app

在此编程模型中,脚本语言解释器用于高级控制,而 C/C++ 程序的基础功能经过特殊脚本语言“命令”访问。若是你曾尝试编写本身的简单命令解释器,则可能会将脚本语言方法视为其高级实现。一样,若是你曾经使用过像 MATLAB 或 IDL 这样的软件包,它就是一个很是类似的模型——解释器执行用户命令和脚本。可是,大多数底层功能都是用 C 或 Fortran 等低级语言编写的。ide

双语计算模型很是强大,由于它充分利用了每种语言的优点。C/C++ 可用于最大化性能和复杂系统编程任务。脚本语言可用于快速原型设计、交互式调试、脚本编写,以及对高级数据结构(如关联数组)的访问。函数

4.2 脚本语言如何调用 C?

Scripting languages are built around a parser that knows how to execute commands and scripts. Within this parser, there is a mechanism for executing commands and accessing variables. Normally, this is used to implement the builtin features of the language. However, by extending the interpreter, it is usually possible to add new commands and variables. To do this, most languages define a special API for adding new commands. Furthermore, a special foreign function interface defines how these new commands are supposed to hook into the interpreter.布局

Typically, when you add a new command to a scripting interpreter you need to do two things; first you need to write a special "wrapper" function that serves as the glue between the interpreter and the underlying C function. Then you need to give the interpreter information about the wrapper by providing details about the name of the function, arguments, and so forth. The next few sections illustrate the process.

脚本语言围绕一个知道如何执行命令和脚本的解析器构建。在此解析器中,有一种执行命令和访问变量的机制。一般,这用于实现语言的内置功能。可是,经过扩展解释器,一般能够添加新的命令和变量。为此,大多数语言都定义了一个用于添加新命令的特殊 API。此外,一个特殊的外部函数接口定义了这些新命令应该如何挂钩到解释器中。

一般,当你向脚本解释器添加新命令时,你须要作两件事。首先,你须要编写一个特殊的“包装器”函数,该函数充当解释器和底层 C 函数之间的粘合剂。而后,你须要经过提供有关函数名称、参数等的详细信息,为解释器提供有关包装器的信息。接下来的几节将说明这一过程。

4.2.1 包装器函数

Suppose you have an ordinary C function like this :

假定你的初始 C 函数以下:

int fact(int n) {
  if (n <= 1)
    return 1;
  else
    return n*fact(n-1);
}

In order to access this function from a scripting language, it is necessary to write a special "wrapper" function that serves as the glue between the scripting language and the underlying C function. A wrapper function must do three things :

  • Gather function arguments and make sure they are valid.
  • Call the C function.
  • Convert the return value into a form recognized by the scripting language.

As an example, the Tcl wrapper function for the fact() function above example might look like the following :

为了从脚本语言访问此函数,有必要编写一个特殊的“包装器”函数,做为脚本语言和底层 C 函数之间的粘合剂。包装函数必须作三件事:

  • 收集函数参数并确保它们有效。
  • 调用 C 函数。
  • 将返回值转换为脚本语言识别的形式。

举个例子,上面例子中 fact() 函数的 Tcl 包装器函数可能以下所示:

int wrap_fact(ClientData clientData, Tcl_Interp *interp, int argc, char *argv[]) {
  int result;
  int arg0;
  if (argc != 2) {
    interp->result = "wrong # args";
    return TCL_ERROR;
  }
  arg0 = atoi(argv[1]);
  result = fact(arg0);
  sprintf(interp->result, "%d", result);
  return TCL_OK;
}

Once you have created a wrapper function, the final step is to tell the scripting language about the new function. This is usually done in an initialization function called by the language when the module is loaded. For example, adding the above function to the Tcl interpreter requires code like the following :

一旦建立了包装函数,最后一步就是告诉脚本语言有关新函数的信息。这一般在加载模块时由语言调用的初始化函数中完成。例如,将上述函数添加到 Tcl 解释器须要以下代码:

int Wrap_Init(Tcl_Interp *interp) {
  Tcl_CreateCommand(interp, "fact", wrap_fact, (ClientData) NULL,
                    (Tcl_CmdDeleteProc *) NULL);
  return TCL_OK;
}

When executed, Tcl will now have a new command called "fact" that you can use like any other Tcl command.

Although the process of adding a new function to Tcl has been illustrated, the procedure is almost identical for Perl and Python. Both require special wrappers to be written and both need additional initialization code. Only the specific details are different.

执行时,Tcl 将有一个名为 fact 的新命令,你能够像使用任何其余 Tcl 命令同样使用它。

虽然只说明了向 Tcl 添加新函数的过程,但 Perl 和 Python 的过程几乎相同。二者都须要编写特殊的包装器,而且都须要额外的初始化代码。只有具体细节不一样。

4.2.2 变量连接

Variable linking refers to the problem of mapping a C/C++ global variable to a variable in the scripting language interpreter. For example, suppose you had the following variable:

变量连接指的是将 C/C++ 全局变量映射到脚本语言解释器中变量的问题。例如,假设你有如下变量:

double Foo = 3.5;

It might be nice to access it from a script as follows (shown for Perl):

以以下所示从脚本中访问它看起来挺不错(显示为 Perl):

$a = $Foo * 2.3;   # Evaluation
$Foo = $a + 2.0;   # Assignment

To provide such access, variables are commonly manipulated using a pair of get/set functions. For example, whenever the value of a variable is read, a "get" function is invoked. Similarly, whenever the value of a variable is changed, a "set" function is called.

In many languages, calls to the get/set functions can be attached to evaluation and assignment operators. Therefore, evaluating a variable such as $Foo might implicitly call the get function. Similarly, typing $Foo = 4 would call the underlying set function to change the value.

为了提供这种访问,一般使用一对 get/set 函数来操纵变量。例如,每当读取变量的值时,就会调用“get”函数。相似地,只要改变变量的值,就会调用“set”函数。

在许多语言中,对 get/set 函数的调用能够附加到求值和赋值运算符。所以,评估诸如 $Foo 之类的变量可能会隐式调用 get 函数。相似地,键入 $Foo = 4 将调用底层 set 函数来更改值。

4.2.3 常量

In many cases, a C program or library may define a large collection of constants. For example:

在许多状况下,C 程序或库能够定义大量常量。例如:

#define RED   0xff0000
#define BLUE  0x0000ff
#define GREEN 0x00ff00

To make constants available, their values can be stored in scripting language variables such as $RED, $BLUE, and $GREEN. Virtually all scripting languages provide C functions for creating variables so installing constants is usually a trivial exercise.

要使常量可用,它们的值能够存储在脚本语言变量中,例如 $RED$BLUE$GREEN。实际上,全部脚本语言都提供了用于建立变量的 C 函数,所以放置常量一般不是一个问题。

4.2.4 结构体与类

Although scripting languages have no trouble accessing simple functions and variables, accessing C/C++ structures and classes present a different problem. This is because the implementation of structures is largely related to the problem of data representation and layout. Furthermore, certain language features are difficult to map to an interpreter. For instance, what does C++ inheritance mean in a Perl interface?

The most straightforward technique for handling structures is to implement a collection of accessor functions that hide the underlying representation of a structure. For example,

虽然脚本语言在访问简单函数和变量时没有问题,但访问 C/C++ 结构体和类会带来不一样的问题。这是由于结构体的实现主要与数据表示和布局问题有关。此外,某些语言特征难以映射到解释器。例如,C++ 继承在 Perl 接口中对应着什么?

处理结构体最直接的技术是实现一个访问器函数的集合以隐藏结构的底层表示。例如,

struct Vector {
  Vector();
  ~Vector();
  double x, y, z;
};

can be transformed into the following set of functions :

能够转换为如下一组函数:

Vector *new_Vector();
void delete_Vector(Vector *v);
double Vector_x_get(Vector *v);
double Vector_y_get(Vector *v);
double Vector_z_get(Vector *v);
void Vector_x_set(Vector *v, double x);
void Vector_y_set(Vector *v, double y);
void Vector_z_set(Vector *v, double z);

Now, from an interpreter these function might be used as follows:

如今,能够从解释器中使用这些函数,以下所示:

% set v [new_Vector]
% Vector_x_set $v 3.5
% Vector_y_get $v
% delete_Vector $v
% ...

Since accessor functions provide a mechanism for accessing the internals of an object, the interpreter does not need to know anything about the actual representation of a Vector.

因为访问器函数提供了访问对象内部的机制,所以解释器不须要知道关于 Vector 的实际表示的任何信息。

4.2.5 代理类

In certain cases, it is possible to use the low-level accessor functions to create a proxy class, also known as a shadow class. A proxy class is a special kind of object that gets created in a scripting language to access a C/C++ class (or struct) in a way that looks like the original structure (that is, it proxies the real C++ class). For example, if you have the following C++ definition :

在某些状况下,可使用低级访问器函数来建立代理类,也称为影子类。代理类是一种特殊类型的对象,它以脚本语言建立,以一种看起来像原始结构体的方式访问 C/C++ 类(或结构体)(即它代理真正的 C++ 类)。例如,若是你有如下 C++ 定义:

class Vector {
public:
  Vector();
  ~Vector();
  double x, y, z;
};

A proxy classing mechanism would allow you to access the structure in a more natural manner from the interpreter. For example, in Python, you might want to do this:

代理分类机制容许你以更天然的方式从解释器访问结构体。例如,在 Python 中,你可能但愿这样作:

>>> v = Vector()
>>> v.x = 3
>>> v.y = 4
>>> v.z = -13
>>> ...
>>> del v

Similarly, in Perl5 you may want the interface to work like this:

一样,在 Perl5 中,你可能但愿接口像这样工做:

$v = new Vector;
$v->{x} = 3;
$v->{y} = 4;
$v->{z} = -13;

Finally, in Tcl :

最后是在 Tcl 中:

Vector v
v configure -x 3 -y 4 -z -13

When proxy classes are used, two objects are really at work--one in the scripting language, and an underlying C/C++ object. Operations affect both objects equally and for all practical purposes, it appears as if you are simply manipulating a C/C++ object.

当使用代理类时,有两个对象实际在起做用——一个在脚本语言中,另外一个在底层的 C/C++ 对象中。操做同等地影响两个对象,以及全部实际目的,看起来好像只是在操做 C/C++ 对象。

4.3 构建脚本扩展

The final step in using a scripting language with your C/C++ application is adding your extensions to the scripting language itself. There are two primary approaches for doing this. The preferred technique is to build a dynamically loadable extension in the form of a shared library. Alternatively, you can recompile the scripting language interpreter with your extensions added to it.

在 C/C++ 应用程序中使用脚本语言的最后一步是向脚本语言自己添加扩展。这有两种主要方法。首选技术是以共享库的形式构建可动态加载的扩展。或者,你能够从新编译脚本语言解释器并添加扩展。

4.3.1 共享库与动态加载

To create a shared library or DLL, you often need to look at the manual pages for your compiler and linker. However, the procedure for a few common platforms is shown below:

要建立共享库或 DLL,一般须要查看编译器和连接器的手册。可是,一些常见系统的过程以下所示:

# Build a shared library for Solaris
gcc -fpic -c example.c example_wrap.c -I/usr/local/include
ld -G example.o example_wrap.o -o example.so

# Build a shared library for Linux
gcc -fpic -c example.c example_wrap.c -I/usr/local/include
gcc -shared example.o example_wrap.o -o example.so

To use your shared library, you simply use the corresponding command in the scripting language (load, import, use, etc...). This will import your module and allow you to start using it. For example:

要使用共享库,只需使用脚本语言中的相应命令(loadimportuse 等)。这将导入你的模块并容许你开始使用它。例如:

% load ./example.so
% fact 4
24
%

When working with C++ codes, the process of building shared libraries may be more complicated--primarily due to the fact that C++ modules may need additional code in order to operate correctly. On many machines, you can build a shared C++ module by following the above procedures, but changing the link line to the following :

使用 C++ 代码时,构建共享库的过程可能会更复杂——主要是由于 C++ 模块可能须要额外的代码才能正常运行。在许多机器上,你能够按照上述过程构建共享 C++ 模块,但将连接行更改成如下内容:

c++ -shared example.o example_wrap.o -o example.so

4.3.2 连接共享库

When building extensions as shared libraries, it is not uncommon for your extension to rely upon other shared libraries on your machine. In order for the extension to work, it needs to be able to find all of these libraries at run-time. Otherwise, you may get an error such as the following :

将扩展构建为共享库时,扩展依赖于计算机上的其余共享库的状况并不罕见。为了使扩展可以工做,它须要可以在运行时找到全部这些库。不然,你可能会收到以下错误:

>>> import graph
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "/home/sci/data1/beazley/graph/graph.py", line 2, in ?
    import graphc
ImportError:  1101:/home/sci/data1/beazley/bin/python: rld: Fatal Error: cannot
successfully map soname 'libgraph.so' under any of the filenames /usr/lib/libgraph.so:/
lib/libgraph.so:/lib/cmplrs/cc/libgraph.so:/usr/lib/cmplrs/cc/libgraph.so:
>>>

What this error means is that the extension module created by SWIG depends upon a shared library called "libgraph.so" that the system was unable to locate. To fix this problem, there are a few approaches you can take.

  • Link your extension and explicitly tell the linker where the required libraries are located. Often times, this can be done with a special linker flag such as -R, -rpath, etc. This is not implemented in a standard manner so read the man pages for your linker to find out more about how to set the search path for shared libraries.
  • Put shared libraries in the same directory as the executable. This technique is sometimes required for correct operation on non-Unix platforms.
  • Set the UNIX environment variable LD_LIBRARY_PATH to the directory where shared libraries are located before running Python. Although this is an easy solution, it is not recommended. Consider setting the path using linker options instead.

这个错误意味着 SWIG 建立的扩展模块所依赖的名为 libgraph.so 的共享库在系统中没法找到。要解决此问题,你能够采起一些方法。

  • 连接你的扩展并明确告诉连接器所需库所在的位置。一般,这可使用特殊的连接器标志来完成,例如 -R-rpath 等。这不是以标准方式实现的,所以请阅读连接器的手册以了解更多有关如何设置共享库搜索路径的信息。
  • 将共享库放在与可执行文件相同的目录中。在非 Unix 平台上的正确操做有时须要此技术。
  • 在运行 Python 以前,将 UNIX 环境变量 LD_LIBRARY_PATH 设置为共享库所在的目录。虽然这是一个简单的解决方案,但不建议这样作。请考虑使用连接器选项设置路径。

4.3.3 静态连接

With static linking, you rebuild the scripting language interpreter with extensions. The process usually involves compiling a short main program that adds your customized commands to the language and starts the interpreter. You then link your program with a library to produce a new scripting language executable.

Although static linking is supported on all platforms, this is not the preferred technique for building scripting language extensions. In fact, there are very few practical reasons for doing this--consider using shared libraries instead.

使用静态连接,你可使用扩展来重建脚本语言解释器。该过程一般涉及编译一个简短的主程序,该程序将自定义命令添加到语言中并启动解释程序。而后,将程序与库连接以生成新的脚本语言可执行文件。

虽然全部平台都支持静态连接,但这不是构建脚本语言扩展的首选技术。实际上,这样作的实际理由不多——请考虑使用共享库。