”The Importance of Perl “

息来源:中国Perl协会
文章做者:klaus
出处:中国Perl协会 FPC(Foundation of Perlchina)
原名:The Importance of Perl
原文:_0498.html">http://www.perl.com/pub/a/oreilly/perl/news/importance_0498.html
请保护做者的著做权,维护做者劳动的结晶。
尽管媒体们如此关注Java和ActiveX,但真正“让英特网活起来”的却应该是Perl,一种专业技术分析家们视而不见,可是却出如今全部网络管理员、系统管理员或程序员脑中的语言。他们天天的工做包括创建常规的网络应用,或是把不一样的程序粘起来,以实现它们的设计者们没有预见到的功能。Sun的第一位网络管理员Hassan Schroeder评论道:“Perl是英特网的传送带/血管。”
Perl最初是由Larry Wall为UNIX开发的一种脚本语言,其目的是为了将UNIX shell的易用性和编程语言(好比C)的强大功能与可塑性结合起来。Perl很快成为了UNIX系统管理员们的偏心的语言。
随着World Wide Web的发展,Perl的用途有了极大的拓展。公共网关接口(CGI)提供了在网络服务器和程序之间传递数据并利用网页返回结果的简单方法。而Perl则迅速成为了CGI编程的主要语言。
在功能强大的Win32移植版本出现以后,Perl也有力地入侵,成为为NT系统的脚本语言之一,特别是在系统管理、网站管理与编程方面大显身手。
曾经,在主流的分析家们认为CGI程序和Perl将会很快被Java,ActiveX和其余新的专门为网络开发的技术所取代。然而,他们没有料到的是, Perl继续在发展壮大,Microsoft的Active Server Pages(ASP)和Apache服务器的mod_perl都支持在服务器上直接运行perl程序,以及数据界面如DBI,Perl DataBase界面,为后台数据库与Perl的整合提供稳定的API。
这篇文章探讨了为何Perl变得愈来愈重要的缘由,不局限于网络,而是作为一种普遍用途的计算机语言。这些缘由包括:
Perl这样的脚本语言与Java,C++或C这样的传统程序语言适合的任务有着跟本的不一样。
Perl将许多程序“粘着”在一块儿的能力,或者说将一个程序的输出转化成另外一个程序的输入的能力。
Perl在处理文本方面无以伦比的能力,如正则表达式等强大特性。当网络文本(HTML)从新兴起而成为全部应用软件和系统的网络“外交语言”后,这一点变得尤为重要。
分布式的开发团队以一种以有机的、进化式的方式,跟随着快速变化的需求。
一种好的脚本语言应该是一种高级软件开发语言,既可以快速地开发小工具,同时又拥有开发复杂程序所须要的工做流与数据组织形式。执行速度必定要快。在调用系统资源如文件操做,内部进程通讯,进程控制等方面必定要有效率。一种好的脚本语言应该能够运行在全部流行的操做系统上,适合信息处理(自由文本格式)和数据处理(数字与二进制数据)。它要可插入,可扩展。Perl符合了上面全部的标准。
为何/什么时候要用脚本语言?
John Ousterhout在他的文章中有力地阐述道,脚本:二十一世纪的高层次程序语言。“像Perl和Tcl这样的脚本语言表明了一种与C或Java这样的系统程序语言很是不一样的编程风格。脚本语言是被设计来“粘着”应用程序的;它们使用无类型的方法以达到比系统程序语言更高层次和更快捷的应用程序开发。计算机速度的提高和各类应用程序的混和与变化正使得在将来脚本语言变得愈来愈重要。”
Ousterhout继续道:就在咱们接近二十世纪的尾声的时候,人们编写程序的方法发生了一个跟本性的转变。这个转变是人们从C和C++这样的系统程序语言转向了Perl或Tcl这样的脚本语言。虽然许多人正处于这样的转变之中,但不多人意识到了它的发生,更少有人明白它为何在发生....
脚本语言是被设计来完成与系统程序语言所不一样的任务的,这致使了它们之间根本性的差别。系统程序语言的设计是从底层开始创建数据结构和算法,从最初级的计算机元素如内存单元开始。与之相反,脚本语言被设计用来作“胶着”的工做:它们假定已经存在不少有效的组件,而后将它们链接起来。系统程序语言使用严格的数据类型来控制复杂性,而脚本语言则没有数据类型,以便方便地连结组件并提供快速开发应用程序的能力。
脚本语言和系统程序语言是互补的,并且从60年代起主要的操做系统都同时支持它们。然而,近期的一些趋势,如更快的计算机、更好的脚本语言的出现,图形用户界面和组件体系结构的重要性不断增长,和英特网的发展,使得脚本语言的应用大大拓展。脚本语言将有愈来愈多的应用,而系统程许语言则主要被用来开发各类组件,这样的趋势在下个十年中还会继续。
系统管理员们是最先利用脚本语言的强大功能的人。任何一个操做系统中,都存在这种问题。通常是为了自动完成某种重复性的工做。即便Macintosh系统也须要一些用户定义的自动操做。任务可能很简单,好比自动备份和恢复系统,或者很复杂,好比周期性地储存硬盘上全部文件,或者存储二十四小时内全部系统设置的改动。有些时候已经有这样的工具能够完成这些工做,但自动操做须要有控制程序来启动它们,提取和转化它们的输出,以及协调这些程序的工做。
许多系统都内置了一些脚本语言,如VMS的DCI,MS-DOS的BAT文件,UNIX的shell脚本,IBM的Rexx,Windows的 Visual Basic and Visual Basic for Applications,还有Applescripts都是专用于某种系统的脚本语言的好例子。Perl的独特在于他打破了脚本语言与某个操做系统的紧密联系,而成为了一种在多个平台下普遍使用的脚本语言。
有些脚本语言,特别是Perl和Visual Basic,或者算上Tcl和Python,都作为多用途的语言而被普遍使用。成功的脚本语言一个长处在于它们很容易调用操做系统功能/服务。更高一层次来讲,作为一种多用途的脚本语言,它们必须稳健到你可使用它们编写复杂应用程序的程度。脚本语言能够用来编写原型,建模和作测试,但当脚本语言运行足够迅速和稳健的时候,原型就直接成为了应用程序。
那么,为何不使用那些多用途的程序语言如C,C++或Java替代脚本语言呢?答案很简单:成本。开发时间比硬件和内存更昂贵。脚本语言容易学习而且使用起来很简单。
正如Ousterhout指出的,脚本语言一般没有什么数据类型概念。脚本不区分整数和浮点数,变量是没有类型的。这是脚本语言善于快速开发的缘由之一。大概念是“不着急处理细节”。由于脚本语言擅长调用系统工具来作难作的事情(如拷贝文件和创建目录或文件夹),还没有实现的细节就能够用编译语言易于写成的小程序来处理。
对于编译语言来讲数据类型有什么用呢?它使得内存管理更加容易,但对于程序员来讲更难了。想一想看:当FORTRAN流行的时候一个程序员赚多少钱一小时?内存值多少钱?而如今呢?时代变了。内存便宜,程序员很贵!
系统语言必须把任何东西都写出来,这使得编译复杂数据结构更容易,可是程序员更难编写。脚本语言尽可能多地本身作出假定,尽可能少地要求明确指定。这让脚本语言更容易学习,写起来也更快。其代价是编写复杂的数据结构和算法时很困难。可是Perl在复杂数据结构和算法方面都作得很好,同时也没有牺牲写简单程序时的简便。
解释语言与编译语言
绝大多数脚本语言都是解释型语言,感受上好像不适合大型程序项目。这种说法是应该要纠正的。
确实,除某些有硬件特异性的语言以外,大部分状况下解释型语言程序都比编译语言要慢。解释型语言的优点在于,它写的程序在解释器所能安装的任何系统上均可以运行。解释器负责处理那些系统特异性的细节问题,而不是应用程序自己。(固然也有例外,好比这个应用程序可能调用了某个不可移殖的系统特性)
操做系统命令解释器如MS-DOS的command.com以及早期版本的UNIX C shell是解释器运行的很好例子:脚本里的命令一条一条都喂到解释器里去。对于效率影响最大的就是循环:循环中的每一条命令在每次运行的时候都从新解释。有些人认为全部的解释型语言都这么...缓慢、低效、一次一行。不过事实并非这样。
实际上有一些中间型语言,运行的时候被编译成某些中间码,而后被解释器装载运行。Java就是一个例子,这让它成为了一种颇有价值的跨平台语言。全部在不一样硬件上的java解释器都能交流并共享数据和进程资源。对于嵌入系统来讲这是很是棒的,由于嵌入系统实际上就是一种特殊目的的硬件。然而Java并非一种脚本语言。它须要数据声明,并且是预先编译的。(除非你把实时编译也算在内—虽然它实际上只是生成代码)
Perl也是一种中间型语言。Perl代码根据须要一块一块地进行编译,所不一样的是编译好的可执行部分被存在内存中而不是写成文件。任何一块Perl代码块只被编译一次。Perl在设计上的优点使得全部这些优化都很值得。Perl保留了解释语言的可移殖性,又有了接近编译语言的执行速度。已经快经历了十年历史的Perl,拥有数十万的开发者,如今又将经历它的五次脱胎换骨,它运行得既简洁又迅速。虽然在启动的时候可能会有一些延迟,由于须要一些时间编译代码,可是相对于代码执行的时间来讲这很短暂。并且,像”fast CGI”这样的技术能够将反复执行的脚本镜像存在保留在内存中来避免启动延迟,除非这个脚本是第一次运行。
无论怎么说,Perl 5.005将有一个由牛津大学的Malcolm Beattie所写的编译器。这个编译器将消除编译过程当中的启动延迟,并加入一些小的加速技术。它也消除了某些编写商业应用程序的程序员对脚本语言的生理恐惧。(使用编译器以后,其余人将没法再看到源代码)
信息处理与数据处理
互联网只是咱们与计算机交流形式的许多巨大变化中的一个。这个改变在咱们对工业的称乎中就能看得出来。过去它被称为“数据处理”,好比说:“若是我想中午拿到数据处理的结果的话,就得早上四点中把东西递交到数据中心去。”如今咱们将它称为“信息服务”,好比“信息服务部的头正和咱们的计划委员会一块儿工做”。兴趣和重点如今放在了“信息”而不是“数据”上。很明显,如今咱们更关心信息,而信息每每同时包括文本和数据,而不只仅是数据。Perl在处理信息方面是很优秀的。
Perl处理信息方面的很大一部分能力来源于一种叫作正则表达式的特殊语法。正则表达式赋予了Perl极大的处理和操做自由文本中的模式的能力。其余语言也有支持正则表达式的,(Java甚至有自由/免费的正则表达式库),可是没有一种能像Perl同样结合得这么好。
不少年以来,总的趋势一直都是将文本文件整合到特殊的应用文件格式中。惟有Unix,将ASCII文本定义为统一的程序文件交换格式,而其余的那些系统则让不兼容的文件格式愈来愈多。急剧扭转了这个趋势的,是互联网的出现。它的HTML数据格式是由有标记的ASCII文本组成的。因为互联网的重要性, HTML—以及与它相伴的ASCII文本—现在处在数据交换格式的中心地位,几乎全部的应用程序均可以输出它。微软甚至计划提供HTML方式的桌面。 HTML的继承者,XML(可扩展标记语言)现在已被普遍认为将成为混和环境下的标准文件交换格式。
HTML的强大显著地加强了Perl的力量。它是种理想的语言,不管是在核实用户输入在HTML表格中的内容,操做大量HTML内容,仍是提取和分析各类/海量log文件的时候。
这只是Perl处理文本强大能力的一方面。Perl不只给你许多分解数据的方法,还给你许多将数据粘回一块儿的办法。所以Perl在处理信息流并从新组装方面也很理想。能够很轻易地将信息转换后输入另外一个程序或进行分析汇总。
有人说下一代的应用程序将不会是如今这些程序的样子,而是“信息化应用程序”,其中文本将构成大部分的用户界面。假想一个典型的企业内部网的应用程序系统:一我的力资源系统,雇员经过它来选择哪一个公共基金会来投资他们的养老金,随时了解他们账户里的数目,并取得相应信息来帮助他们投资。这个系统的界面将包含许多信息化的文档(通常是以HTML的形式),一些基于表格的简单CGI程序,以及到后台实时股票行情系统的连接(多是英特网上的服务)。
利用传统的编程技术创建这样一个系统是很不实际的。任何一个公司的投资策略都会有不一样,传统编程技术投入的巨大工做量在这样一个局限的项目里没法获得回报。而用web作为前台,利用perl脚本完成连接到后台数据库的任务的话,你可能不须要不少时间就能完成这样一个系统。
或者来看看Amazon.com,它多是最成功的新网络经济的例子了。Amazon提供一个信息前台,以及一个后台数据库和订单系统,而后——你猜对了——用perl将它们连在了一块儿。
Perl对数据库的连接是由一组被功能强大的数据库独立界面支持的,它们被称为DBI。Perl+fast-cgi+DBI多是互联网上使用最普遍的数据库连接/链接系统了。ODBC模块也能够提供相似功能。
考虑到Perl强大的前台文本处理能力,以及后台的数据库连接功能,所以你应该开始明白为何Perl在新一代信息化应用中会起到愈来愈重要的做用了。
Perl的模式匹配和处理功能在其余方面的应用包括生物医学研究,以及数据挖掘。任何大的文本数据库,从人类基因组计划的基因序列分析到某些大网站的日志文件分析,均可以用Perl来处理。最终Perl进一步被用来作基于网络的研发和专门的英特网搜寻应用。在模式匹配和网络socket开发方面的优点成为英特网的通信方面的基石,也使Perl成为创建网络机器人的最佳语言,这些机器人用于在英特网上查找关键信息。
Perl用来开发应用
开发人员愈来愈认识到Perl做为一种应用开发语言的功用。Perl使得传统语言无法作到的项目成为可能。并不仅由于Perl开发简单,它也能够足够复杂,在须要的时候甚至使用最高级的面向对象语言技术。
在编制基于socket的客户端-服务器应用程序的时候Perl比C或C++要简单。用Perl编写自由文本处理程序比任何语言都更简单。Perl有一个由Perl写成的成熟的调试器,以及许多选项能够用来创建安全的应用程序。几乎任何一方面的应用都有免费的Perl模块可使用,当须要的时候即可以动态加载。
Perl能够很容易的用编译好的C/C++甚至Java写的函数进行扩展。这代表调用一些还没用Perl写的功能或系统服务也很容易。当在非UNIX系统下运行的时候,因为能够调用这个系统的特殊功能,所以这种拓展能力变得更加有价值。
Perl也能够在编译程序中被调用,或者被插入到其余语言编写的程序中。人们正在创建一种标准的方法,将Perl整合到Java中去,也就是说Java的类将能够用Perl来写。目前为止,这些程序须要内嵌Perl解释器。不过1997年的第四季度O’Reilly & Associates的Perl资源工具箱将包含一个新的后台编译器,将Perl编译为Java字节码以解决这个问题。
图形界面
因为Perl是在UNIX系统下开发的,ASCII终端是主要的输入输出设备(即便是像X同样的图形系统也包含了单独窗口的字符终端),所以Perl并无定义固有的图形界面(不过在今天这样群雄割据的图形界面的世界里这大概也算一种特性)。Perl采用扩展模块来创见图形界面的程序。使用最普遍的就是 Tk,其实最先它是为Tcl脚本语言开发的图形工具包,不过很快就被移殖到了Perl上。Tcl依然专一于X-Window系统,虽然她已经开始被移殖到微软的Windows系统上。
然而,如前所说,开发固有的图形界面已经变得不那么重要,由于web正逐渐成为多数应用程序标准的图形界面。“webtop”作为通用的跨平台应用正在快速的取代“desktop”。只要写一个“webtop”即可以用在UNIX,Mac,Windows/NT,Windows/95…任何一个有网页浏览器的系统。
实际上,愈来愈多的网站采用Perl和web来为一些传统的程序建立更简单易用的界面。好比Purdue大学网络计算中心为三十种电路模拟工具设计了一个网页界面,使用Perl从使用者填好的表格中提取数据并转化为命令行,发给Hub上连着的程序。
多线程
线程是作并行处理的很好的解决方法,尤为是当你在写双向通信或事件驱动的程序的时候。1997年早些时候Perl已经有了一个多线程的补丁。在97年第四季度Perl5.005出现的时候,它将被整合进标准发布当中。
Perl一直支持的多任务模型是“fork”和“wait”。最小的调度单位是进程,它适用于UNIX。Windows/NT的多线程机制并不太同样,所以 Perl的可移殖性目前便受到了限制。不过若是在进程控制和其余应用之间创建抽象层,问题就解决了。并且,调和UNIX和Win32系统Perl接口的进程控制代码的工做正在进行,1997年的第四季度就会完成。
Win32系统上的Perl
6年,微软委托ActiveWare网络公司(如今的ActiveState公司)为NT资源库建立一个Perl与Win32系统的接口。现在网络上处处均可以见到这个移植版本,听说接近一半的Perl源代码下载都是用在Win32平台上的。
Perl进入像NT这样的Win32平台是有不少缘由的。尽管有Visual Basic和Visual Basic for Apllications存在,Win32平台上的脚本语言支持依然比较弱。虽然VB是解释型脚本语言,但它依然是一种类型化的语言,用起来比较繁琐。并且它也没有像Perl那样强大的字符串处理能力。当创建大型NT站点的时候,系统管理员们则明显的认识到图形用户界面的限制,对于管理数百台计算机来讲脚本语言是必须的。
不少时候会有这种状况,一些有经验的系统管理员经常被叫去管理那些不使用UNIX系统的站点,这时使用Perl是将UNIX的优势带到其余系统去的好办法。
你也不能低估web的影响力量。如今网上有数以千计用Perl编写的CGI程序和站点管理工具,支持Perl对于任何服务器平台说都是必需的。对于 Microsoft的NT服务器来讲,O’Reilly和Netscape更显得重要,对Perl的支持是必须的。ActiveState的 PerlScript??可让Perl在支持ASP的NT网络服务器,如Microsoft的IIS和O’Reilly的WebSite中的动态脚本引擎上运行。
除了核心的Perl语言解释器以外,ActiveState Perl的Win32??接口还包括特别针对Win32环境的模块。好比它提供了对自动操做对象的全面支持。随着愈来愈多的Windows系统资源和组件支持Perl端口,Win32版本的Perl将可以使用愈来愈多的系统功能。
扩展Perl的力量
和Microsoft的Visual Basic或Sun的Java不一样,Perl没有一个巨大的公司为它撑腰。Perl最初是由Larry Wall开发并作为自由软件发布的。Larry后来开发Perl的工做是经过一个邮件组,在大概两百个合做者的帮助下进行的,这个邮件组叫作perl5- porters。最初这个邮件组是为了将Perl推向其余平台而创建的,但最终它成为开发Perl核心代码的贡献者们的汇集之处。
Perl5添加了一个扩展机制,独立的模块能够利用这个机制动态地加载到Perl程序之中。这致使了现在数百个附加模块的开发,其中许多重要的模块如今已经成为了Perl标准发行版本的一部分。附加的模块能够在综合Perl存档网络(CPAN)上获得。最好的进入CPAN的界面大概是www.perl.com,那里还包括许多书评,文章以及其余一些Perl程序员和使用者们关心的信息。
过去对使用自由软件曾经有的偏见,现在已经被粉碎了,由于人们认识到过去这些年来有许多最重大的计算机技术突破是从自由软件社区中产生的。 Internet自己很大程度上就是一个合做的自由软件项目,并且它的发展也是被那些自发组织的有远见的开发者所引导。相似的,在网络服务器平台中占有很大一块市场的是Apache,它也是一个自由软件项目,由大量的合做开发者社团创立,拓展和管理的。
除了持续不断的开发以外,Perl社区还经过新闻组和邮件提供活跃的技术支持。同时还存在无数咨询及付费的技术支持项目。无数的书籍提供了极好的文档材料,包括其中最著名的。Programming Perl,做者是Larry Wall,Randal Schwarz和Tom Chirstiansen。The Perl Journal和www.perl.com提供关于一些最新进展的信息。
总的来讲,因为巨大的开发者团体和自由软件社区合做的传统,Perl具备和能够和最大的公司媲美的开发和支持资源。
实际应用的案例
接下来的部分包括一些用户实际应用的例子,从那个不少系统管理员都很熟悉的快刀斩乱麻式的“Perl拯救那天”的故事,到一些更大的经常使用应用程序。有些故事是从1997年八月19-21号在San Jose,CA召开的第一届Perl年度大会上拿来的,在会议进展上找来的程序描述上面标上了做者的名字。
案例 1 - 拯救了Netscape技术支持的程序语言
Dav Amann (dove@netscape.com)
好,咱们来看看这个状况。你崭新的网络公司已经全面启动,你卖了多得超出你想象的浏览器,服务器和网络应用程序,你的公司大踏步的前进,最新的市场调查显示你的客户一年以内就已经超过了三十万。
如今惟一讨厌的问题是那三十万买了你的浏览器的家伙们可能会碰到点什么问题。他们可能不清楚到底他们要上的网在哪里,他们可能想要找人帮忙,他们可能想要找*你*来给他们技术支持。
当这种事情发生的时候,你大概会想:“好吧,那我写一些技术文章放到网上。”可是你开始着手这个计划的时候你会发现,你须要一种内容管理系统,一种发布系统,一些日志分析,而后收集和报告用户们在你的网站上的反馈,你早就该作这件事了。
幸运的是你知道Perl,而后你用Perl在三个月时间搞定了全部东西,仅仅靠了4个十分繁忙的技术支持工程师们的一些业余时间。
案例 2 - BYTE网站的快刀斩乱麻的转换
BYTE杂志准备要更新它本身的信息网络和会议系统,BIX,用这个系统编辑和读者能够交流各类信息。这个会议系统和Usenet很不一样,倒和Mail- list有点像。但是许多BYTE的编辑都习惯用Usenet,由于他们一直订阅Usenet。所以BYTE建了一个接口,把BYTE内部的讨论组变成了 Usenet系统。使用的语言就是Perl,只用了几天的时间和不到一百行的程序。
案例 3 - 把客户的需求转到合适的专家那里
一个世界领先的计算机公司的性能测试小组想把用户需求的导航自动化。他们想利用企业内部网的设计解决这个问题,可是确没有任何经费预算。两个只有几周 Perl编程经验的工程师解决了这个问题。Perl脚本对查询的关键词进行自动匹配,而后将他们导航到他们要找的专家的网页。这个CGI程序不只将客户指向他想找的专家页面和E-mail地址,并且自动把他的需求发送到专家那里。这个解决方案最终只花了短短几个星期,并且节省了不少预算。
案例 4 - email调查结果的收集和分析
一个Internet市场调查公司使用E-mail来作为调查手段,他们想对获得的一万个回复作自动化的分析。因而Perl又派上了用场。Perl脚本产生了SPSS的输入结果,虽然实际上Perl自己也能够用来作统计,若是这个统计学家会用Perl的话。
案例 5 - 跨平台的评测体系
SPEC(标准性能评测协会),一个评估计算机系统的工业协会,将他们的评测系统从SPEC92升级成SPEC95的时候,将主程序作了巨大的改动。他们但愿能比较省力的让他们的系统能在UNIX之外的平台下运行。SPEC92系统是使用UNIX shell管理的,不可移植并且无法扩展。SPEC95系统则使用了一个用Perl写的可移植和扩展的管理引擎。这个程序充分利用了Perl的面向对象特性,Perl对C的拓展性,以及Perl的动态模块载入。将SPEC95移殖到Windows/NT平台很容易。移殖到VMS系统的主要难度则在于VMS 缺少用户级别的fork方法。
案例 6 – 使用Perl工做的商业顾问
虽然不少年来我一直使用C语言工做,可是我发现再没有理由继续使用它了。我十年来的大部分工做都是获取、管理和转换信息,而不只仅是数据。我参与开发的应用程序不过是带了图形界面的信息获取、管理和转化系统。Perl现在比任何其余的语言都胜任这项工做,不管是脚本语言仍是系统编程语言。虽然我最开始只是使用Perl作为粘贴脚本和原型语言,可是如今我已经用它来干全部事情。它取代了个人C和UNIX shell程序。虽然,某些状况下我可能仍是须要使用C语言,不过我但愿Java最终将可以知足个人这些需求。
跨平台的GUI界面现在用HTML或本地运行上都作的很好,好比在企业内部网,或者是互联网上。
Perl提供了方便的数据结构接口以及商业数据库的界面模块。它为我提供了系统级的工具用于进程控制,文件管理,以及进程间通信(只要有socket存在)。它让我能够用库,模块,包,还有子程序等等东西建立个人程序。它还可让我写一些可以修改自身的程序,虽然看起来有点怪,不过有时候这个很必要。
Perl给个人最大好处在于我只须要原来五分之一的时间就能够完成一个复杂的任务。这个对于管理人员和客户都颇有吸引力,不过最感兴趣的是那些为这个付钱的人。 html

案例 7 – Perl作为飞行数据分析的快速原型语言
Phil Brown, Mitre 公司高等飞行系统研发中心(CAASD)(philsie@crete.mitre.org)
因为它的稳健和可塑性,它已经成为CAASD中不少程序员使用的工具,用来开发概念模型的快速原型。交通流管理实验室(T-lab)已经在使用数以百计的 Perl程序,从简单的数据界西和描点制图,到测算空间领域的复杂性,并计算飞行器传过这些领域的飞行时间。这些程序的大小从10行一直到1200行。由于许多这样的程序都高强度地使用I/O,所以Perl因为其多样地解析和搜索特性成为完成这些任务最天然的选择。
案例 8 – 在线专业打印
iPrint折扣打印与网络文具商店(http://www.iprint.com)使用一种所见即所得的网络桌面发布程序,直接连结到后台的打印机,而且创建在一个复杂的实时多功能的产品与价格数据库技术之上。顾客来到这个网站,在线地创建,测试而后预约定制的打印物品,如名片,信纸,商标,邮票,以及其余东西,特别是一些广告。
iPrint系统包括一个前台系统(网站)和后台进程,免去了操做打印机须要的全部前期手工过程,并为iPrint会计系统提供全部须要的信息。这个系统里接近80000行的程序中95%都使用Perl v5.003写的,运行在WindowsNT4.0上。iPrint很是依赖RDBMS(SQL服务器),而全部与服务器的交互都是用Perl和ODBC 完成的。iPrint使用了包括MIME和Win32::ODBC在内的许多CPAN模块。
案例 9 – Amazon.com的做品编辑系统
Amazon.com使Perl开发了一个基于CGI的编辑做品系统,综合了写做(使用Microsoft Word或Emacs),维护(CVS版本控制和使用glimpse方法的搜索),以及输出(使用正规SGML工具)的整个流程。
做者先使用CGI程序创建一个SGML文档,填一个小表格,而后将在用户的home目录下产生一个部分完成的SGML文档,它也能够在Microsoft Windows中被加载。然后做者能够用本身喜欢的编辑器来完成这个文档。利用CGI程序,用户能够看到文档的变化(’cvs diff’)以及用HTML方式在递交以前看到他们的SGML文档(’cvs commit’)。做者能够用关键词搜索SGML库(使用glimpse方法),以及追踪版本变化(’cvs log’)。编辑们也能够利用CGI程序创建时间表。
Amazon.com创建了一个基本的SGML精简类,而后创建了一些子类来进行不一样模式下对网站不一样部分的提炼(含图片的HTML或没有图片的HTML,未来可能还有PointCast,XML,braille等等)。
全部的代码都是使用Perl写的。它使用了CGI以及HTML::Parser模块。
案例 10 – 新英格兰医院的特殊打印服务器
新英格兰地区的医院系统里使用了十二种操做系统,从大型机一直到我的电脑系统。同时存在七种不一样的网络协议。有将近一万二千台PC和两千台同一型号的打印机,以及一千台特殊打印机。这个网络分布在整个城区,利用微波,T1,T3以及光纤。咱们要作的事情是实现网络打印。因为特殊打印机是用来在每一个专有网络中打印病人的注测和账户信息的,它经过转有网络连结在IBM大型主机上。如今的目标是但愿使用标准的协议利用标准的打印机来打印这些文档。
寻找了各类合适的可扩充打印服务系统以后,发现MIT Project Athena的Palladium能够作为不错的开发基础。不过它是独立打印服务器系统,不符合咱们的要求,医院须要一种分布式的服务器。咱们花费了两个月的时间但愿将Palladium移殖到医院的平台上而后作些修改,可是最终咱们发现这太不经济了。最后咱们决定本身来创建咱们要的系统,使用Perl作为核心程序,Tcl/Tk作GUI管理界面。Palladium有30000行源代码,而咱们更复杂的分布式服务器系统只涌了5000行的Perl以及四我的月的工做量就完成了第一个版本。这个Perl程序在一台运行UNIX的60MHz的Pentium机器上运行的速度已经足够快,因此没有必要再用C 重写任何代码。
案例 11 – Purdue大学的网络计算中心
在未来,计算处理有可能采起以网络为基础的服务模式,相似今天的电力供应和电话系统的体系构架。这种模式须要一种可以利用网络访问软件与硬件资源的底层机制。为了实现这个功能,咱们开发了一种基于网络的虚拟实验室(”The Hub”),可让使用者利用Netscape这样的www浏览器访问和运行服务器上的软件。
The Hub是一个能够用www访问的各类模拟工具与相关信息的收集,它是高度模块化的系统,有接近12000行的Perl代码。它包含了几个组成部分:a)经过www访问的用户界面。b)提供访问控制(安全与隐私)以及任务控制(运行、停止,以及程序状态函数)。 c)支持逻辑(虚拟)资源组织与管理。在Hub上,用户何以:a)上载与操做输入文件。b)运行程序。 c)浏览与下载输出文件。全部过程都是经过www浏览器实现。其内部结构是一系列专门的服务程序(用perl5写成)组成的分布式实体。这些服务程序控制了本地和远端的软件与硬件资源。硬件资源包括任意的硬件平台,软件资源包括该平台上全部程序。(目前的版本还不支持交互式和基于GUI的程序)
The Hub容许各类工具根据它们的域被组织在一块儿而且能够交叉引用。资源能够经过一种特别设计用来描述工具与硬件特性的语言逐步地向这个系统添加。例如,一个新的设备能够仅仅经过描述它的型号,运行模式,操做系统等信息便很容易的添加到Hub系统中。相似地,一个新工具软件能够经过“告诉”Hub系统它的位置,输入方法(如命令行语句),能够运行在何种机器上(如Sparc5),以何种形式整合到Hub系统中(如电路模拟程序)等等信息来被整合进入Hub系统。这些工做一般能够在半小时内完成。
为了实现这种功能,Hub解析URL的方式和标准的面向文档的web服务器不一样。URL的结构与底层的文件系统分离开来,而采用一种上下文敏感的方式解析(基于服务器上储存的用户详细状态),以此来完成虚拟账户和自由存取控制。Lab引擎能够提供它的高性能计算能力给Hub系统随时调用。当一个用户请求运行一个程序时,lab引擎使用用户指定的输入文件来决定(经过人工智能子系统-一样是使用Perl编写的)使用哪些资源来运行,选择一个合适的平台(如工做站解决2-D问题,超级计算机解决3-D问题),将相关输入文件传到相应的平台,经过远端服务器启动程序。当计算结束以后,远端服务器提示lab引擎,而后取回输出文件,递交给用户。
最初的原型系统:半导体模拟Hub,包含来自四个大学的十三个半导体技术工具程序。在不到一年的时间里,超过250个用户进行了超过13000次的模拟运算。提供VLSI设计的Hub,计算机体系结构和并行计算技术也在最近几个月被添加进来。目前他们维护了十四个左右的程序。这些Hub系统如今在 purdue大学的一些本科生课程和研究生课程中被使用,同时也用来协助合做性的研究。常用这个系统的包括Puedue大学的一些学生和来自欧州和美国不一样地区的一些研究人员。 java

 

---------------------------------------------------------------------------------------------------------------------------------------------------原文:git

   http://www.oreillynet.com/pub/a/oreilly/perl/news/importance_0498.html程序员

      

The Importance of Perl

by Tim O'Reilly, O'Reilly & Associates, Inc. and Ben Smith, Ronin House
web

Despite all the press attention to Java and ActiveX, the real job of "activating the Internet" belongs to Perl, a language that is all but invisible to the world of professional technology analysts but looms large in the mind of anyone -- webmaster, system administrator or programmer -- whose daily work involves building custom web applications or gluing together programs for purposes their designers had not quite foreseen. As Hassan Schroeder, Sun's first webmaster, remarked: "Perl is the duct tape of the Internet."正则表达式

Perl was originally developed by Larry Wall as a scripting language for UNIX, aiming to blend the ease of use of the UNIX shell with the power and flexibility of a system programming language like C. Perl quickly became the language of choice for UNIX system administrators.算法

With the advent of the World Wide Web, Perl usage exploded. The Common Gateway Interface (CGI) provided a simple mechanism for passing data from a web server to another program, and returning the result of that program interaction as a web page. Perl quickly became the dominant language for CGI programming. shell

With the development of a powerful Win32 port, Perl has also made significant inroads as a scripting language for NT, especially in the areas of system administration and web site management and programming. 数据库

For a while, the prevailing wisdom among analysts was that CGI programs--and Perl along with them--would soon be replaced by Java, ActiveX and other new technologies designed specifically for the Internet. Surprisingly, though, Perl has continued to gain ground, with frameworks such as Microsoft's Active Server Pages (ASP) and the Apache web server's mod_perl allowing Perl programs to be run directly from the server, and interfaces such as DBI, the Perl DataBase Interface, providing a stable API for integration of back-end databases. express

This paper explores some of the reasons why Perl will become increasingly important, not just for the web but as a general purpose computer language. These reasons include:

  • fundamental differences in the tasks best performed by scripting languages like Perl versus traditional system programming languages like Java, C++ or C.
  • Perl's ability to "glue together" other programs, or transform the output of one program so it can be used as input to another.
  • Perl's unparalleled ability to process text, using powerful features like regular expressions. This is especially important because of the re-emergence via the web of text files (HTML) as a lingua-franca across all applications and systems.
  • The ability of a distributed development community to keep up with rapidly changing demands, in an organic, evolutionary manner.

A good scripting language is a high-level software development language that allows for quick and easy development of trivial tools while having the process flow and data organization necessary to also develop complex applications. It must be fast while executing. It must be efficient when calling system resources such as file operations, interprocess communications, and process control. A great scripting language runs on every popular operating system, is tuned for information processing (free form text) and yet is excellent at data processing (numbers and raw, binary data). It is embeddable, and extensible. Perl fits all of these criteria.

When and Why a Scripting Language?
As John Ousterhout has elegantly argued in his paper, Scripting: Higher Level Programming for the 21st Century, "Scripting languages such as Perl and Tcl represent a very different style of programming than system programming languages such as C or Java. Scripting languages are designed for 'gluing' applications; they use typeless approaches to achieve a higher level of programming and more rapid application development than system programming languages. Increases in computer speed and changes in the application mix are making scripting languages more and more important for applications of the future."

Ousterhout goes on:

As we near the end of the 20th century a fundamental change is occurring in the way people write computer programs. The change is a transition from system programming languages such as C or C++ to scripting languages such as Perl or Tcl. Although many people are participating in the change, few people realize that it is occurring and even fewer people know why it is happening....

Scripting languages are designed for different tasks than system programming languages, and this leads to fundamental differences in the languages. System programming languages were designed for building data structures and algorithms from scratch, starting from the most primitive computer elements such as words of memory. In contrast, scripting languages are designed for gluing: they assume the existence of a set of powerful components and are intended primarily for connecting components together. System programming languages are strongly typed to help manage complexity, while scripting languages are typeless to simplify connections between components and provide rapid application development.

Scripting languages and system programming languages are complementary, and most major computing platforms since the 1960's have provided both kinds of languages. However, several recent trends, such as faster machines, better scripting languages, the increasing importance of graphical user interfaces and component architectures, and the growth of the Internet, have greatly increased the applicability of scripting languages. These trends will continue over the next decade, with scripting languages used for more and more applications and system programming languages used primarily for creating components.

System administrators were among the first to capitalize on the power of scripting languages. The problems are everywhere, on every operating system. They usually appear as the requirement to automate repetitive tasks. Even Macintosh operating systems need some user definable automation. It might be as simple as an automated backup and recovery system, or as complex as a periodic inventory of all the files on a disk, or all the system configuration changes in the last 24 hours. Many times, there are existing utilities that do part of the work, but automation requires a more general framework for running programs, capturing or transforming their output, and coordinating the work of multiple applications.

Most systems have included some form of scripting language. VMS's DCL, MS-DOS's .BAT files, UNIX's shell scripts, IBM's Rexx, Windows' Visual Basic and Visual Basic for Applications, and Applescript are good examples of scripting languages that are specific to a single operating system. Perl is fairly unique in that it has broken the tight association with a single operating system and become widely used as a scripting language on multiple platforms.

Some scripting languages, most notably Perl and Visual Basic, and to a lesser extent Tcl and Python, have gained wide use as general purpose programming languages. Successful scripting languages distinguish themselves by the ease with which they call and execute operating system utilities and services. To reach the next level, and function as general purpose languages, they must be robust enough that you can build entire complex application programs. The scripting language is used to prototype, model, and test. If the scripting language is robust and fast enough, the prototype evolves directly into the application.

So why not use a general purpose programming language like C, C++ or Java instead of a scripting language? The answer is simple: Cost. Development time is more expensive than fast hardware and memory. Scripting languages are easy to learn, and simple to use.

As Ousterhout points out, scripting languages typically lack data types. They don't distinguish between integer and floating point numbers. Variables are typeless. This is one of the ways that scripting languages speed up development. The concept is to "leave the details for later." Since scripting languages are generally good at calling system utilities to do the dirty work, for instance, copying files and building directories or file folders, the details can be handled by some small utility that, if it doesn't exist and is necessary, will be easy to write in a compiled language.

What do those data types do for compiled languages? They make memory management easier for the system, but harder for the programmer. Think about this: How much did a programmer make an hour when FORTRAN was on the ascendant? How much did memory cost then? How about now? Times have changed. Memory is cheap; programmers are expensive!

System languages need to have everything spelled out. This makes compilation of complex data structures easier, but programming harder. Scripting languages make as many assumptions as they can. As little as possible needs to be spelled out. This makes the scripting language easier to learn and faster to write in. The price to be paid is difficulty in developing complex data structures and algorithms. Perl, however, is good at both complex data structures and algorithms, without sacrificing ease of use for simple applications.

Interpreted vs. Compiled Languages

Most scripting languages are interpreted languages, which contributes to the perception that they may be inappropriate for large scale programming projects. This perception needs to be addressed.

With the exception of language specific hardware, it is true that interpreted programs are slower than compiled languages. The advantage of interpreted languages is that programs written in that language are portable to any system that the interpreter will run on. The system-specific details are handled by the interpreter, not by the application program. (There are always exceptions to this rule. For example, the application program may explicitly use a non-portable system resource.)

Operating system command interpreters such as MS-DOS's command.com and early versions of the UNIX C shell are good examples of how interpreters work: each command line is fed to the interpreter as it occurs in the script. The worst blow to efficiency is in any looping; each line in the loop is reinterpreted every time it is run. Some people think that all scripting languages work like this... slowly, inefficiently, a line at a time. This is not true.

However, there are middle languages, languages that are compiled to some intermediate code which is loaded and run by an interpreter at run time. Java is an example of this model; this is what will make Java a valuable a cross platform application language. All the Java interpreters on different hardware will be able to communicate and share data and process resources. This is perfect for embedded systems, where each device is actually a different kind of special purpose hardware. Java is not a scripting language, however. It requires data declarations. It is compiled ahead of time (unless you count Just-In-Time compilation -- really just code generation -- as part of the process).

Perl is also a middle language. Blocks of perl are compiled as needed, but the executable image is held in memory instead of written to a file. The compilation only happens once for any block of the perl script. The advantages of Perl's design make all this optimization work worth while. Perl maintains the portability of an interpreted language while achieving nearly the speed of a compiled language. Perl, nearly a decade old, with hundreds of thousands of developers, and now in its fifth incarnation, runs lean and fast. There is some amount of startup latency, as the script is initially compiled, but this is typically small relative to the overall performance of the script. In addition, techniques such as "fast CGI", which keeps the image of a frequently accessed CGI script in memory for repetitive re-execution, avoids this startup latency, except on the very first execution of a script.

In any event, Perl 5.005 will include a compiler, created by Malcolm Beattie of Oxford University. The compiler eliminates the startup latency of in-process compilation, and adds some other small speed-ups as well. It also addresses the psychological barrier programmers of commercial applications sometimes experience with respect to interpreted languages. (With a compiled language, the source code is no longer available for inspection by outside parties.)

 

Information Processing versus Data Processing

The World Wide Web is only one instance of a fundamental change in how we interact with computers. This change is visible in the very name we now give the industry. It used to be called "Data Processing," as in "I'll have to submit my job to the data processing center at 4 AM so that I can pick up my output before noon." Now we call it "Information Services" as in "the Director of Information Services is working with our planning committee." The interest and emphasis is now on "information" not "data." It is clear there is more interest in information, which typically includes a mix of text and numeric data, rather than just data. Perl excels at handling information.

An important part of Perl's information-handling power comes from a special syntax called regular expressions. Regular expressions give Perl enormous power to perform actions based on patterns that it recognizes in a body of free form text. Other languages support regular expressions as well (there is even a freeware regular expression library for Java), but no other language integrates them as well as Perl.

For many years, the trend was to embed text in specialized application file formats. Except for UNIX, which explicitly specified ASCII text as a universal file format for exchange between cooperating programs, most systems allowed incompatible formats to proliferate. This trend was reversed sharply by the World Wide Web, whose HTML data format consists of ASCII text with embedded markup tags. Because of the importance of the web, HTML -- and ASCII text with it -- is now center stage as an interchange format, exported by virtually all applications. There are even plans by Microsoft to provide an HTML view of the desktop. A successor to HTML, XML (eXtensible Markup Language) is widely expected to become a standard way of exchanging data in a mixed environment.

The increasing prominence of HTML plays directly to Perl's strengths. It is an ideal language for validating user input in HTML forms, for manipulating the contents of large collections of HTML files, or for extracting and analyzing data from voluminous log files.

That is only one side of the text processing power of Perl. Perl not only gives you several ways to pick data apart, but also several ways to glue data back together. Perl is thus ideal for taking apart an information stream and reconfiguring it. This can be done on the fly as a way of transforming information into input to other programs or for analysis and reporting.

One can argue that the next generation of computer applications will not be traditional software applications but "information applications", in which text forms a large percentage of the user interface. Consider the classic "Intranet" web application: a human resources system through which employees can choose which mutual funds in which to invest their retirement savings, track the performance of their account, and access information that helps them to make better investment decisions. The interface to such a system consists of a series of informational documents (typically presented as HTML), a few simple forms-based CGI scripts, and links to back-end systems (which may be outside services accessed via the Internet) for real-time stock quotes.

To build an application like this using traditional software techniques would be impractical. Each company's mix of available investments is unique; the application would not justify the amount of traditional programming required for such a localized application. Using the web as a front end, and perl scripts as a link to back end databases, you are essentially able to create a custom application in a matter of hours.

Or consider Amazon.com, perhaps the most visibly successful new web business. Amazon provides an information front-end to a back-end database and order-entry system, with, you guessed it, Perl, as a major component tying the two together.

Perl access to databases is supported by a powerful set of database-independent interfaces called DBI. Perl + fast-cgi + DBI is probably the most widely used "database connector" on the web. ODBC modules are also available.

Put together Perl's power to handle text on the front end, and connect to databases on the back end, and you begin to understand why it will play an increasingly important role in the new generation of information applications.

Other applications of Perl's ability to recognize and manipulate text patterns include biomedical research and data mining. Any large text database, from the gene sequences analyzed by the Human Genome Project to the log files collected by any large web site, can be studied and manipulated by Perl. Finally, Perl is increasingly being used for applications such as network-enabled research and specialized Internet search applications. Its strength with regular expressions and facility with sockets, the communications building block of the Internet, have made the language of choice for building Web robots, those programs that search the Internet for information.

 

Perl for Application Development

Developers are increasingly coming to realize Perl's value as an application development language. Perl makes it possible to realistically propose projects that would be unaffordable in the traditional system programming languages. Not only is it fast to build applications with Perl, but they can be very complex, even incorporating the best attributes of object-oriented programming if necessary.

It is easier to build socket-based client-server applications with Perl than with C or C++. It more efficient to build free text parsing applications in Perl than any other language. Perl has a sophisticated debugger (written in Perl), and many options for building secure applications. There are publicly available Perl modules for every sort of application. These can be dynamicly loaded as needed.

Perl can be easily extended with compiled functions written in C/C++ or even Java. This means that it is easy to include system services and functions that may not already be native to Perl. This is particularly valuable when working on non-UNIX platforms since the special attributes of that operating system can be included in the Perl language.

Perl can also be called from compiled applications, or embedded into applications written in other languages. Efforts are underway, for instance, to create a standard way to incorporate Perl into Java, such that Java classes could be created with Perl implementations. Currently, such applications must embed the Perl interpreter. A new compiler back-end, to be available in fourth quarter 1997 in O'Reilly & Associates' Perl Resource Kit, will remove this obstacle, allowing some Perl applications to be compiled to Java byte-code.

 

Graphical Interfaces

Because it was originally developed for the UNIX environment, where the ASCII terminal was the primary input/output device (and even windowing systems such as X preserved the terminal model within individual windows), Perl doesn't define a native GUI interface. (But in today's fragmented GUI world this can be construed as a feature.) Instead, there are Perl extension modules for creating applications with graphical interfaces. The most widely used is Tk, which was originally developed as a graphical toolkit for the Tcl scripting language, but which was soon ported to Perl. Tcl is still specific to the X Window System, though it is currently being ported to Microsoft Windows.

However, as noted earlier, the development of native windowing applications is becoming less important as the web becomes the standard GUI for many applications. The "webtop" is fast replacing the "desktop" as the universal cross-platform application target. Write one Web interface and it works on UNIX, Mac, Windows/NT, Windows/95...anything that has a Web browser.

In fact, an increasing number of sites use Perl and the Web to create new easier-to-use interfaces to legacy applications. For example, the Purdue University Network Computing Hub provides a web-based front-end to more than thirty different circuit simulation tools, using Perl to interpret user input into web forms and transform it into command sequences for programs connected to the hub.

 

Multithreading

Threads are a desireable abstraction for doing multiple and concurrent processing, particularly if you are programming for duplex communications or event driven applications. A multi-threading "patch" to Perl has been available since early 1997; it will be integrated into the standard distribution as of Perl version 5.005, in the fourth quarter.

The multitasking model that Perl has historically supported is "fork" and "wait." The granularity is the process. The flavor is UNIX. Unfortunately, the Windows/NT equivalent isn't quite the same. This is where the portability of Perl breaks down, at least for now. By building cross-platform multi-process Perl applications with a layer of abstraction between the process control and the rest of the application, the problems can be avoided. Furthermore, work is underway, to be completed in the fourth quarter of 1997, to reconcile the process-control code in the UNIX and Win32 ports of Perl.

 

Perl on Win32 Systems

In 1996, Microsoft commissioned ActiveWare Internet Corporation (now ActiveState Tool Corp) to create a port of Perl to Win32 for inclusion in the NT Resource Kit. That port has since become widely available on the net, and reportedly, nearly half of all downloads of the Perl source code are for the Win32 platform.

Perl has taken off on Win32 platforms such as NT for several reasons. Despite the presence of Visual Basic and Visual Basic for Applications, native scripting support on Win32 is relatively weak. While VB is an interpreted scripting language, it is still a typed language, which makes it somewhat more cumbersome to use. It also lacks the advanced string-handling capabilities that are so powerful in Perl. As efforts are underway to create larger-scale NT sites, the limitations of Graphical User Interfaces quickly become evident to administrators; scripting is essential for managing hundreds or thousands of machines.

It is not insignificant that many of the experienced administrators being called on to manage those sites cut their teeth on UNIX. Using Perl is a good way to bring the best of UNIX with you to other platforms.

Nor can you underestimate the drawing power of the web. As thousands of Perl-based CGI programs and site management tools are now available, Perl-support is essential for any web server platform. As NT-based web servers from Microsoft, O'Reilly and Netscape become a more important part of the web, Perl support is essential. In particular, ActiveState's PerlScript(tm) implementation allows Perl to be used as an active scripting engine on NT web servers such as Microsoft's IIS and O'Reilly's WebSite that support the Active Server Pages (ASP) technology.

In addition to the core Perl language interpreter, the ActiveState Perl for Win32(tm) port includes modules specifically targetted to the Win32 environment. For example, it provides full access to Automation objects. As more and more system resources and components support that interface under Windows, more aspects of the operating system are directly accessible by Perl for Win32.

 

Extending the Power of Perl

Unlike languages such as Microsoft's Visual Basic or Sun's Java, Perl does not have a large corporation behind it. Perl was originally developed by Larry Wall and made available as freeware. Larry is assisted in the further development of Perl by a group of about 200 regular contributors who collaborate via a mailing list called perl5-porters. The list was originally focussed on porting Perl to additional platforms, but gradually became the center for those adding to the core language.

In addition, Perl 5 includes an extension mechanism, by which independent modules can be dynamically loaded into a Perl program. This has led to the development of hundreds of add-in modules. Many of the most important modules have become part of the standard Perl distribution; additional modules are available via the Comprehensive Perl Archive Network (CPAN). The best entry point to the CPAN is probably the www.perl.com site, which also includes book reviews, articles, and other information of interest to Perl programmers and users.

While there has been a historical bias against using freeware for mission critical applications, this bias is crumbling rapidly, as it becomes widely recognized that many of the most significant computing advances of the past few decades have been developed by the freeware community. The Internet itself was largely developed as a collaborative freeware project, and its further development is still guided by a self-organizing group of visionary developers. Similarly, the leading web server platform in terms of market share, by a large margin, is Apache--again, a free software project created, extended and managed by a large collaborative developer community.

In addition to ongoing development, the Perl community provides active support via newsgroups and mailing lists. There are also numerous consultancies and paid support organizations. Excellent documentation is provided by numerous books, including most notably Programming Perl, by Larry Wall, Randal Schwarz and Tom Christiansen. The Perl Journal and www.perl.com provide information about the latest developments.

In short, because of the large developer base and the cooperative history of the freeware community, Perl has access to development and support resources matching those available to the largest corporations.

 

Application Stories

The following section includes a selection of user application stories, ranging from the quick and dirty "Perl saves the day" applications familiar to so many system administrators, to larger custom applications. Some of these application stories are taken from presentations at the first annual Perl Conference, held in San Jose, CA from August 19-21, 1997. The application descriptions from the conference proceedings are labeled with the names of their authors.

Case 1 - The Programming Language that Saved Netscape Technical Support
Dav Amann (dove@netscape.com)

Ok, so here's the situation. Your brand new exciting Internet company has taken off and you're selling more browsers, servers, and web applications than you ever hoped for, your company is growing by leaps and bounds, and the latest market information says that your customer base has just past the 30 million mark in less than a year.

And the only downside is that these 30 million folks might have a few problems with their browser; they might not know exactly what the Internet is; they might want to call someone for support. They might want to call *you* for technical support.

So, when this happens, you might think, "That's ok I'll just put some technical articles out on the web." But when you first look at the project, you realize that you're going to need some sort of Content Management System, some sort of Distribution system, some logging analysis, and gathering and reporting of feedback of your customers on your site. And you're going to want it yesterday.

Lucky for you, you know Perl. And with Perl you're able to get all of this built in 3 months in the spare time of 4 very busy technical support engineers.

Case 2 - A Quick and Dirty Conversion at BYTE

BYTE Magazine used to maintain its own information network and conferencing system, BIX, that both editors and readers used for exchanging ideas. The conferencing model was quite different from Usenet, somewhat closer to a mail-list. Since several of the BYTE editors were regular Usenet subscribers and preferred that model, BYTE built a gateway that translated and maintained the BIX editorial discussion groups as a private Usenet news group. The language was Perl. It took little more than a hundred lines of code and a few days of work.

Case 3 - Routing customer inquiries to appropriate experts

The performance testing group at one of the world's leading computer companies needed to automate query routing. They were directed to use their world-wide corporate Intranet, but not given any budget to do the project. Two engineers with only a few weeks of Perl experience created a solution. The Perl scripts responded to the query by matching key elements of queries with people with that expertise. The CGI programs not only pointed the client to the experts' Web-pages and E-mail addresses, but also passed the query on to all appropriate experts in their E-mail. The solution took no more than a few man-weeks and so could be asorbed into other budgets.

Case 4 - Collection and analysis of email survey data

An Internet market research firm that does its research using an E-mail survey wants to automate and generalize the handling of the anticipated ten thousand responses. Perl was used to automate the process. The Perl script generated input for SPSS, but would have been capable of doing statistical analysis if the statistician had known Perl.

Case 5 - A Cross-Platform Harness for Running Benchmarks

SPEC (the Standard Performance Evaluation Corporation), a industry consortium for benchmarking computer systems, radically changed the governing program when the SPEC92 benchmarks evolved to SPEC95. SPEC wanted to make it possible for their benchmarks to run other operating systems than UNIX without a major effort. The SPEC92 benchmarks were managed by UNIX shell scripts, unportable and inflexible. The SPEC95 benchmarks are managed by a portable, extensible engine written in Perl. The scripts take advantage of Perl's object oriented capabilities, Perl's extensibility with C, and Perl's dynamic module loading. Porting SPEC95 to Windows/NT was simple. The major problem with porting to VMS is its lack of user level forks.

Case 6 - Consultant working with Perl

Despite the years that I have spent developing in C, I have found little reason to continue to do so. Most of my work in the last ten years has been developing code that retrieves, manages, and converts information, not just data. The application programs I am involved in are merely graphical controls front-ending information retrieval, management, and conversion engines. Perl now fills the need for this kind of development better than any other language--scripting or system programming language. Even though I started using Perl merely as a glue scripting language and prototyping language, I now use it for everything. It has replaced both C and my UNIX shell programs. There will be times, I am sure, that I will have to write, or at least patch, a program in C. I expect that Java will eventually fill those requirements for me.

Cross-platform GUI interfaces are now done in HTML and run locally, in an Intranet, or as part of the Web.

Perl provides me with fast indexing to simple data structures and modules for talking to commercial databases. It provides me with system level tools for process management, file management, and interprocess communications wherever sockets are understood. It allows me to design my applications using libraries, modules, packages, and subroutines. It allows me to write applications that modify themselves; scary as that may seem, it is sometimes necessary.

The greatest benefit of Perl to me is that I can build solutions to complex problems in a fifth the time. This appeals to managers and clients, but particularly to the people paying the bills.

Case 7 - Perl as a Rapid-Prototyping Language for Flight Data Analysis
Phil Brown, Mitre Corporation Center for Advanced Aviation System Development (CAASD) (philsie@crete.mitre.org)

Because of its robustness and flexibility, Perl has become the language of choice by many programmers in CAASD for developing rapid-prototypes of concepts being explored. The Traffic Flow Management Lab (T-Lab) has implemented hundred of Perl programs that range from simple data parsing and generating plots, to measuring the complexity of regions of airspace and calculating the transit times of aircraft over these regions. The size of these applications range from about 10 lines to over 1200. Because many of the applications are very I/O intensive, Perl became the natural choice with its many parsing and searching features.

Case 8 - Online Specialty Printing
Dave Hodson (dave@iprint.com)

The iPrint Discount Printing & CyberStationery Shop (http://www.iPrint.com) is powered by a WYSIWYG, desktop publishing application on the Internet directly connected into a backend printer and sits on top of a sophisticated, real-time, multi-attributed product and pricing database technology. Customers come to our site to create, proof, and order customized popularly printed items--business cards, stationery, labels, stamps, specialty advertising items, etc online.

The iPrint system includes both a front-end (the website) and a back-end process that eliminates nearly all of the manual pre-flight process that printers perform and also provides all pertinent information to iPrint's accounting system. 95% of the approximately 80,000 lines of code used to perform this work is done using Perl v 5.003 with WinNT 4.0 OS. iPrint relies heavily on RDBMS (SQL Server) with all database interaction being performed by Perl and ODBC. iPrint uses many modules from the CPAN archives, including MIME and Win32::ODBC.

Case 9 - The Amazon.com Editorial Production System
Chris Mealy (mookie@amazon.com)

Amazon.com used Perl to develop a CGI-based editorial production system that integrates authoring (with Microsoft Word or Emacs), maintenance (version control with CVS and searching with glimpse), and output (with in-house SGML tools).

Writers use the CGI application to start an SGML document. They fill out a short form and then it generates a partially completed SGML document in the user's home directory, which may be mounted on their Microsoft Windows PC. The writer then uses their favorite editor to finish the document. With the CGI application, users see changes ('cvs diff') and their SGML rendered as HTML before submitting their document ('cvs commit'). Writers can do keyword searches of the SGML repository (by way of glimpse) and track changes ('cvs log'). Editors can also schedule content with the CGI application.

Amazon.com created a base SGML renderer class that is sub-classed to render different sections of the web site in different modes (html with graphics and html without graphics, and in the future, PointCast, XML, braille, etc).

All of the code is in Perl. It uses the CGI and HTML::Parser modules.

Case 10 - Specialty Print Servers at a New England Hospital

A major New England hospital uses twelve operating systems, from mainframes to desktop PCs. It has seven different network protocols. There are roughly twenty thousand PC workstations and two thousand printers of one type and one thousand speciality printers. The network is spread over an entire city using microwave, T1, T3, and private optical fiber. The problem is network printing. Specialty printers are required because the patient registration and billing system runs on IBM and Digital mainframes, the output going through their proprietary networks. The goal is to have all of the operating systems able to print to a standard printer through a standard protocol.

A search for appropriate scalable printer servers uncovered the MIT Project Athena's Palladium as a good starting point. However, its model of standalone print servers didn't fit. The hospital needed a distributed server model. When a two month effort to port Palladium to the hospital platform so that we could make the changes proved that it was not going to be economical, we decided to build exactly what we wanted in fast prototyping languages: Perl for the core application and Tcl/Tk for the GUI administrative interface. Palladium represents 30,000 lines of C. The more complex distributed server model required only 5,000 lines of Perl and only four man-months to achieve a first release. The Perl proved sufficiently fast on a 60MHz Pentium running a UNIX variant that no code required rewriting in C.

Case 11 - The Purdue University Network-Computing Hub
(Nirav H. Kapadia, Mark S. Lundstrom, Jose' A. B. Fortes)

In the future, computing may operate on a network-based and service-oriented model much like today's electricity and telecommunications infrastructures. This vision requires an underlying infrastructure capable of accessing and using network-accessible software and hardware resources as and when required. To address this need, we have developed a network-based virtual laboratory ("The Hub") that allows users to access and run existing software tools via standard world-wide web (WWW) browsers such as Netscape.

The Hub, a WWW-accessible collection of simulation tools and related information, is a highly modular software system that consists of approximately 12,000 lines of Perl5 code. It has been designed to: a) have a universally-accessible user-interface (via WWW browsers), b) provide access-control (security and privacy) and job-control (run, abort, and program status functions), and c) support logical (virtual) resource-organization and management. The Hub allows users to: a) upload and manipulate input-files, b) run programs, and c) view and download output - all via standard WWW browsers. The infrastructure is a distributed entity that consists of a set of specialized servers (written in Perl5) which access and control local and remote hardware and software resources. Hardware resources include arbitrary platforms, and software resources include any program (the current implementation does not support interactive and GUI-based programs).

The Hub allows tools to be organized and cross-referenced according to their domain. Resources can be added incrementally using a resource-description language specifically designed to facilitate the specification of tool and machine characteristics. For example, a new machine can be incorporated into the Hub simply by specifying its architecture (make, model, operating system, etc.) and starting a server on the machine. Similarly, a new tool can be added by "telling" the Hub the tool's location, its input behavior (e.g., command-line arguments), what kinds of machines it can run on (e.g., Sparc5), and how it fits into the logical organization of the Hub (e.g.,circuit simulation tool). Each of these tasks is typically accomplished in less than thirty minutes.

To facilitate this functionality, the Hub interprets the URLs differently from the standard document-oriented web servers. The structure of the URL is decoupled from that of the underlying filesystem and interpreted in a context-sensitive manner (based on user-specific state stored by the server), thus allowing virtual accounting and arbitrary access-control. The lab-engine provides the Hub with its on-demand high-performance computing capabilities. When a user requests the execution of a program, the lab-engine uses information in the user-specified input file to predict (via an artificial intelligence sub-system - also written in Perl5) the resources required for the run, selects an appropriate platform (e.g., workstation for a 2-D problem, supercomputer for a 3-D problem), transfers relevant input files to the selected machine, and initiates the program (via the remote server). When the run is completed, the remote server notifies the lab-engine, which retrieves the output files and informs the user.

The initial prototype, the Semiconductor Simulation Hub, currently contains thirteen semiconductor technology tools from four universities. In less than one year, over 250 users have performed more than 13,000 simulations. New Hubs for VLSI design, computer architectures, and parallel programming have been added in recent months; they currently contain a modest complement of fourteen tools. These Hubs are currently being used in several undergraduate and graduate courses at Purdue as well as to facilitate collaborative research. Regular user include students at Purdue University and researchers at several locations in the U.S. and Europe.