Electronic Data Interchange (EDI)

Electronic Data Interchange (EDI) refers to the structured transmission of data between organizations by electronic means. It is used to transfer electronic documents from one computer system to another (ie) from one trading partner to another trading partner. It is more than mere E-mail; for instance, organizations might replace bills of lading and even checks with appropriate EDI messages. It also refers specifically to a family of standards, including the X12 series. However, EDI also exhibits its pre-Internet roots, and the standards tend to focus on ASCII (American Standard Code for Information Interchange)-formatted single messages rather than the whole sequence of conditions and exchanges that make up an inter-organization business process.

In 1992, a survey of Canadian businesses found at least 140 that had adopted some form of EDI, but that many (in the sample) "[had] not benefited from implementing EDI, and that they [had] in fact been disadvantaged by it." [1]

The National Institute of Standards and Technology in a 1996 publication [1] defines Electronic Data Interchange as "the computer-to-computer interchange of strictly formatted messages that represent documents other than monetary instruments. EDI implies a sequence of messages between two parties, either of whom may serve as originator or recipient. The formatted data representing the documents may be transmitted from originator to recipient via telecommunications or physically transported on electronic storage media.". It goes on further to say that "In EDI, the usual processing of received messages is by computer only. Human intervention in the processing of a received message is typically intended only for error conditions, for quality review, and for special situations. For example, the transmission of binary or textual data is not EDI as defined here unless the data are treated as one or more data elements of an EDI message and are not normally intended for human interpretation as part of online data processing." [2]

EDI can be formally defined as 'The transfer of structured data, by agreed message standards, from one computer system to another without human intervention'. Most other definitions used are variations on this theme. Even in this era of technologies such as XML web services, the Internet and the World Wide Web, EDI is still the data format used by the vast majority of electronic commerce transactions in the world.

Standards

Generally speaking, EDI is considered to be a technical representation of a business conversation between two entities, either internal or external. Note, there is a perception that "EDI" constitutes the entire electronic data interchange paradigm, including the transmission, message flow, document format, and software used to interpret the documents. EDI is considered to describe the rigorously standardized format of electronic documents.

The EDI standards were designed to be independent of communication and software technologies. EDI can be transmitted using any methodology agreed to by the sender and recipient. This includes a variety of technologies, including modem (asynchronous, and bisynchronous), FTP, Email, HTTP, AS1, AS2, etc. It is important to differentiate between the EDI documents and the methods for transmitting them. When they compared the bisynchronous protocol 2400 bit/s modems, CLEO devices, and value-added networks used to transmit EDI documents to transmitting via the Internet, some people equated the non-Internet technologies with EDI and predicted erroneously that EDI itself would be replaced along with the non-Internet technologies. These non-internet transmission methods are being replaced by Internet Protocols such as FTP, telnet, and E-mail, but the EDI documents themselves still remain.

As more trading partners use the Internet for transmission, standards have emerged. In 2002, the IETF published RFC 3335, offering a standardized, secure method of transferring EDI data via e-mail. On July 12, 2005, an IETF working group ratified RFC4130 for MIME-based HTTP EDIINT (aka. AS2) transfers, and is preparing a similar RFC for FTP transfers (aka. AS3). While some EDI transmission has moved to these newer protocols, the providers of the value-added networks remain active.

EDI documents generally contain the same information that would normally be found in a paper document used for the same organizational function. For example an EDI 940 ship-from-warehouse order is used by a manufacturer to tell a warehouse to ship product to a retailer. It typically has a ship to address, bill to address, a list of product numbers (usually a UPC code) and quantities. It may have other information if the parties agree to include it. However, EDI is not confined to just business data related to trade but encompasses all fields such as medicine (e.g., patient records and laboratory results), transport (e.g., container and modal information), engineering and construction, etc. In some cases, EDI will be used to create a new business information flow (that was not a paper flow before). This is the case in the Advanced Shipment Notification (856) which was designed to inform the receiver of a shipment, the goods to be received and how the goods are packaged.

There are four major sets of EDI standards:

The UN-recommended UN/EDIFACT is the only international standard and is predominant outside of North America.

The US standard ANSI ASC X12 (X12) is predominant in North America.

The TRADACOMS standard developed by the ANA (Article Numbering Association) is predominant in the UK retail industry.

The ODETTE standard used within the European automotive industry

All of these standards first appeared in the early to mid 1980s. The standards prescribe the formats, character sets, and data elements used in the exchange of business documents and forms. The complete X12 Document List includes all major business documents, including purchase orders (called "ORDERS" in UN/EDIFACT and an "850" in X12) and invoices (called "INVOIC" in UN/EDIFACT and an "810" in X12).

The EDI standard says which pieces of information are mandatory for a particular document, which pieces are optional and give the rules for the structure of the document. The standards are like building codes. Just as two kitchens can be built "to code" but look completely different, two EDI documents can follow the same standard and contain different sets of information. For example a food company may indicate a product's expiration date while a clothing manufacturer would choose to send color and size information.

Standards are generally updated each year.

[edit]Specifications

Organizations that send or receive documents between each other are referred to as "trading partners" in EDI terminology. The trading partners agree on the specific information to be transmitted and how it should be used. This is done in human readable specifications (also called Message Implementation Guidelines). While the standards are analogous to building codes, the specifications are analogous to blue prints. (The specification may also be called a mapping but the term mapping is typically reserved for specific machine readable instructions given to the translation software.) Larger trading "hubs" have existing Message Implementation Guidelines which mirror their business processes for processing EDI and they are usually unwilling to modify their EDI business practices to meet the needs of their trading partners. Often in a large company these EDI guidelines will be written to be generic enough to be used by different branches or divisions and therefore will contain information not needed for a particular business document exchange. For other large companies, they may create separate EDI guidelines for each branch/division.

[edit]Transmission

Trading partners are free to use any method for the transmission of documents. In the past one of the more popular methods was the usage of a bisync modem to communicate through a value added network (VAN). Some organizations have used direct modem to modem connections and bulletin board systems (BBS), and recently there has been a move towards using some of the many Internet protocols for transmission, but most EDI is still transmitted using a VAN. In the healthcare industry, a VAN is referred to as a "clearinghouse".

[edit]Value Added Networks

In the most basic form, a VAN (value added network) acts as a regional post office. They receive transactions, examine the 'from' and the 'to' information, and route the transaction to the final recipient. VANs provide a number of additional services, e.g. retransmitting documents, providing third party audit information, acting as a gateway for different transmission methods, and handling telecommunications support. Because of these and other services VANs provide, businesses frequently use a VAN even when both trading partners are using Internet-based protocols. Healthcare clearinghouses perform many of the same functions as a VAN, but have additional legal restrictions that govern protected healthcare information.

VANs also provide an advantage with certificate replacement in AS2 transmissions. Because each node in a traditionally business-related AS2 transmission usually involves a security certificate, routing a large number of partners through a VAN can make certificate replacement much easier.

[edit]Internet/AS2

Until recently, the Internet transmission was handled by nonstandard methods between trading partners usually involving FTP or email attachments. There are also standards for embedding EDI documents into XML. Many organizations are migrating to this protocol to reduce costs. For example, Wal-Mart is now requiring its trading partners to switch to the AS2 protocol (Wal-Mart EDI Requirement).

AS2 (Applicability Statement 2) is the draft specification standard by which vendor applications communicate EDI or other business-to-business data (such as XML) over the Internet using HTTP, a standard used by the World Wide Web. AS2 provides security for the transport payload through digital signatures and data encryption, and ensures reliable, non-repudiable delivery through the use of receipts.

[edit]EDI via the Internet (Web EDI)

The Internet, as with VAN providers, uses its own communications protocols to ensure that EDI documents are transmitted securely. The most popular protocols are File Transfer Protocol Secure (FTPS), Hyper Text Transport Protocol Secure (HTTPS), and AS2.

The Internet has provided a means for any company, no matter how small or where they are located in the world, to become part of a major supply chain initiative hosted by a global retailer or manufacturing company. Many companies around the world have shifted production of labour intensive parts to low-cost, emerging regions such as China and Eastern Europe. Web-based EDI, or webEDI, allows a company to interact with its suppliers in these regions without the worrying of implementing a complex EDI infrastructure.

In its simplest form, webEDI enables small to medium-sized businesses to receive, turn around, create and manage electronic documents using just a web browser. This service seamlessly transforms your data into EDI format and transmits it to your trading partner. Simple pre-populated forms enable businesses to communicate and comply with their trading partners' requirements using built-in business rules. Using a friendly web-based interface, EDI transactions can be received, edited and sent as easily as an email. You will also be able to receive EDI documents and send EDI invoices and shipping documents with no software to install. All you require is an Internet connection. WebEDI has the added advantages that it is accessible anywhere in the world and you do not need a dedicated IT person to manage any software installation.

Even though VANs offer a very secure and reliable service to companies wishing to trade electronically, the Internet is making EDI more available to all. This is especially important in the emerging markets where IT awareness and infrastructure are very limited. WebEDI is traditionally based around the "hub and spoke'"model, with major trading partners or Application Service Providers (ASPs) being the hubs and smaller partners being the spokes.

Hubs or ASPs implement EDI using email or virtual mailboxes

Trading partners can send EDI messages directly to a web-enabled EDI messaging site, via the hub. EDI messages are simply sent using a web browser

Systems that are currently being developed will enable EDI messages to be displayed in a web browser and directed via open standard XML, directly into the user's accounts system

WebEDI-based users can interact with VANs without incurring the costs of setting up a dedicated VAN connection

[edit]EDI Outsourcing

Operating an EDI program is becoming increasingly complex as companies face a wider range of requirements from trading partners. Many companies have chosen to outsource their EDI needs to an integration service provider that offers the people, processes, and technology to operate a robust EDI program.

People - Skilled people with both technical and business expertise who can support and deliver a B2B program that meets current and future business objectives

Processes - Best-practice processes for implementing or extending the use of B2B e-commerce in an organization, managing a B2B program on an ongoing basis, and quickly and easily bringing new trading partners onto a B2B network

Technology - The comprehensive infrastructure needed to exchange EDI transactions with partners, translate business documents between any of the many EDI e-commerce standards now in use, and provide reporting and visibility into B2B processes and networks

[edit]Interpreting data

Often missing from the EDI specifications (referred to as EDI Implementation Guidelines) are real world descriptions of how the information should be interpreted by the business receiving it. For example, suppose candy is packaged in a large box that contains 5 display boxes and each display box contains 24 boxes of candy packaged for the consumer. If an EDI document says to ship 10 boxes of candy it may not be clear whether to ship 10 consumer packaged boxes, 240 consumer packaged boxes or 1200 consumer packaged boxes. It is not enough for two parties to agree to use a particular qualifier indicating case, pack, box or each; they must also agree on what that particular qualifier means.

EDI translation software provides the interface between internal systems and the EDI format sent/received. For an "inbound" document the EDI solution will receive the file (either via a Value Added Network or directly using protocols such as FTP or AS2), take the received EDI file (commonly referred to as a "mailbag"), validate that the trading partner who is sending the file is a valid trading partner, that the structure of the file meets the EDI standards and that the individual fields of information conforms to the agreed upon standards. Typically the translator will either create a file of either fixed length, variable length or XML tagged format or "print" the received EDI document (for non-integrated EDI environments). The next step is to convert/transform the file that the translator creates into a format that can be imported into a company's back-end business systems or ERP. This can be accomplished by using a custom program, an integrated proprietary "mapper" or to use an integrated standards based graphical "mapper" using a standard data transformation language such as XSLT. The final step is to import the transformed file (or database) into the company's back-end enterprise resource planning (ERP).

For an "outbound" document the process for integrated EDI is to export a file (or read a database) from a company's back-end ERP, transform the file to the appropriate format for the translator. The translation software will then "validate" the EDI file sent to ensure that it meets the standard agreed upon by the trading partners, convert the file into "EDI" format (adding in the appropriate identifiers and control structures) and send the file to the trading partner (using the appropriate communications protocol).

Another critical component of any EDI translation software is a complete "audit" of all the steps to move business documents between trading partners. The audit ensures that any transaction (which in reality is a business document) can be tracked to ensure that they are not lost. In case of a retailer sending a Purchase Order to a supplier, if the Purchase Order is "lost" anywhere in the business process, the effect is devastating to both businesses. To the supplier, they do not fulfill the order as they have not received it thereby losing business and damaging the business relationship with their retail client. For the retailer, they have a stock outage and the effect is lost sales, reduced customer service and ultimately lower profits.

In EDI terminology "inbound" and "outbound" refer to the direction of transmission of an EDI document in relation to a particular system, not the direction of merchandise, money or other things represented by the document. For example, an EDI document that tells a warehouse to perform an outbound shipment is an inbound document in relation to the warehouse computer system. It is an outbound document in relation to the manufacturer or dealer that transmitted the document.

[edit]Advantages of using EDI over paper systems

EDI and other similar technologies save a company money by providing an alternative to, or replacing information flows that require a great deal of human interaction and materials such as paper documents, meetings, faxes, etc. Even when paper documents are maintained in parallel with EDI exchange, e.g. printed shipping manifests, electronic exchange and the use of data from that exchange reduces the handling costs of sorting, distributing, organizing, and searching paper documents. EDI and similar technologies allow a company to take advantage of the benefits of storing and manipulating data electronically without the cost of manual entry. Another advantage of EDI is reduced errors, such as shipping and billing errors, because EDI eliminates the need to rekey documents on the destination side. One very important advantage of EDI over paper documents is the speed in which the trading partner receives and incorporates the information into their system thus greatly reducing cycle times. For this reason, EDI can be an important component of just-in-time production systems.

According to the 2008 Aberdeen report "A Comparison of Supplier Enablement around the Word", only 34% of purchase orders are transmitted electronically in North America. In EMEA, 36% of orders are transmitted electronically and in APAC, 41% of orders are transmitted electronically. They also report that the average paper requisition to order costs a company $37.45 in North America, $42.90 in EMEA and $23.90 in APAC. With an EDI requisition to order costs are reduced to $23.83 in North America, $34.05 in EMEA and 14.78 in APAC.

[edit]Barriers to implementation

There are a few barriers to adopting electronic data interchange. One of the most significant barriers is the accompanying business process change. Existing business processes built around slow paper handling may not be suited for EDI and would require changes to accommodate automated processing of business documents. For example, a business may receive the bulk of their goods by 1 or 2 day shipping and all of their invoices by mail. The existing process may therefore assume that goods are typically received before the invoice. With EDI, the invoice will typically be sent when the goods ship and will therefore require a process that handles large numbers of invoices whose corresponding goods have not yet been received.

Another significant barrier is the cost in time and money in the initial set-up. The preliminary expenses and time that arise from the implementation, customization and training can be costly and therefore may discourage some businesses. The key is to determine what method of integration is right for your company which will determine the cost of implementation. For a business that only receives one P.O. per year from a client, fully integrated EDI may not make economic sense. In this case, businesses may implement inexpensive "rip and read" solutions or use outsourced EDI solutions provided by EDI "Service Bureaus". For other businesses, the implementation of an integrated EDI solution may be necessary as increases in trading volumes brought on by EDI force them to re-implement their order processing business processes.

The key hindrance to a successful implementation of EDI is the perception many businesses have of the nature of EDI. Many view EDI from the technical perspective that EDI is a data format; it would be more accurate to take the business view that EDI is a system for exchanging business documents with external entities, and integrating the data from those documents into the company's internal systems. Successful implementations of EDI take into account the effect externally generated information will have on their internal systems and validate the business information received. For example, allowing a supplier to update a retailer's Accounts Payables system without appropriate checks and balances would be a recipe for disaster. Businesses new to the implementation of EDI should take pains to avoid such pitfalls.

Increased efficiency and cost savings drive the adoption of EDI for most trading partners. But even if a company would not choose to use EDI on their own, pressures from larger trading partners (called hubs) often force smaller trading partners to use EDI. An example of this is Wal-Mart`s insistence on using EDI with all of its trading partners; any partner not willing to use EDI with Wal-Mart will not be able to do business with the company.

[edit]Examples of Disadvantages of EDI

[edit]United States Health Care Systems

The United States health care system consists of thousands of different companies and other entities. In 1996, the Health Insurance Portability and Accountability Act (HIPAA) was enacted. In short, it set down standard transaction sets for specific EDI transactions and mandated electronic support for every insurance company in the United States for these transactions. While the benefits of EDI are numerous and only increase with increased volume, the drawbacks, though not directly related to EDI itself, include managerial problems in the support, maintenance and implementation of EDI transactions.

Though an EDI standard exists for health care transactions, the standard allows for variation between implementation, which gives way to the existence of Companion Guides, detailing each company's variation[3].

Each entity may have a different method of delivery, ranging from dial-up BBS systems[4]; mailing hard media such as a CD-ROM or tape backup; or FTP[5]. Some entities may elect not to support different methods of delivery depending on a trading partner's expected volume.

Due to varying implementation on nearly all points of EDI including contact, registration, submission and testing of transactions between different entities in US health care, the existence of EDI clearinghouses has sprung up. An EDI clearinghouse is one entity agreeing to act as a middle-man between multiple entities and their end-clients, such as between medical providers and insurance companies they accept coverage from. They may act as a value-added network and attempt to conform their different supported entities to one submission standard. One such example is Emdeon. An EDI clearinghouse will not cover all health care entities, though they may cover a large portion, and they may not cover all HIPAA-mandated transactions for all of their supported entities.

Because of the above points, one single computer application cannot handle all health care entities. Though this may not be necessary, it can lead to an obvious management headache as a company attempts to register itself with various EDI partners.

This all comes at a massive cost in time and management as a company may attempt to support a broad range of transactions with a broad range of entities. This example is an extension of the lack of strict standards across implementations, transactions and methods.

Wide Area Network (WAN) is a computer network that covers a broad area (i.e., any network whose communications links cross metropolitan, regional, or national boundaries [1]). This is in contrast with personal area networks (PANs), local area networks (LANs), campus area networks (CANs), or metropolitan area networks (MANs) which are usually limited to a room, building, campus or specific metropolitan area (e.g., a city) respectively. The largest and most well-known example of a WAN is the Internet.

WANs are used to connect LANs and other types of networks together, so that users and computers in one location can communicate with users and computers in other locations. Many WANs are built for one particular organization and are private. Others, built by Internet service providers, provide connections from an organization's LAN to the Internet. WANs are often built using leased lines. At each end of the leased line, a router connects to the LAN on one side and a hub within the WAN on the other. Leased lines can be very expensive. Instead of using leased lines, WANs can also be built using less costly circuit switching or packet switching methods. Network protocols including TCP/IP deliver transport and addressing functions. Protocols including Packet over SONET/SDH, MPLS, ATM and Frame relay are often used by service providers to deliver the links that are used in WANs. X.25 was an important early WAN protocol, and is often considered to be the "grandfather" of Frame Relay as many of the underlying protocols and functions of X.25 are still in use today (with upgrades) by Frame Relay.

Academic research into wide area networks can be broken down into three areas: Mathematical models, network emulation and network simulation.

Performance improvements are sometimes delivered via WAFS or WAN optimization.

Several options are available for WAN connectivity: [2]

Option: Description Advantages Disadvantages Bandwidth range Sample protocols used

Leased line Point-to-Point connection between two computers or Local Area Networks (LANs) Most secure Expensive PPP, HDLC, SDLC, HNAS

Circuit switching A dedicated circuit path is created between end points. Best example is dialup connections Less Expensive Call Setup 28 - 144 kbps PPP, ISDN

Packet switching Devices transport packets via a shared single point-to-point or point-to-multipoint link across a carrier internetwork. Variable length packets are transmitted over Permanent Virtual Circuits (PVC) or Switched Virtual Circuits (SVC) Shared media across link X.25 Frame-Relay

Cell relay Similar to packet switching, but uses fixed length cells instead of variable length packets. Data is divided into fixed-length cells and then transported across virtual circuits Best for simultaneous use of voice and data Overhead can be considerable ATM

Transmission rate usually range from 1200 bps to 6 Mbps, although some connections such as ATM and Leased lines can reach speeds greater than 156 Mbps. Typical communication links used in WANs are telephone lines, microwave links & satellite channels.

Recently with the proliferation of low cost of Internet connectivity many companies and organizations have turned to VPN to interconnect their networks, creating a WAN in that way. Companies such as Cisco, New Edge Networks and Check Point offer solutions to create VPN networks.

Value-added network

From Wikipedia, the free encyclopedia

A Value-added Network (VAN) is a hosted service offering that acts as an intermediary between business partners sharing standards based or proprietary data via shared Business Processes. The offered service is referred to as "Value-added Network Service".

Contents [hide]

1 Technical Definition

2 History

2.1 1960s: Timesharing and network services

2.2 1970s: Marketisation of telecommunication

3 Since 1980s: International competition and standardisation efforts

4 Perspective

5 See also

6 References

[edit]Technical Definition

VANs traditionally transmitted data formatted as Electronic Data Interchange but increasingly they also transmit data formatted as XML or in more specific "binary" formats. VANs usually service a given vertical or industry and provide "Value Added Network Services" ("VAN Services" or VANSs) such as data transformation between formats (EDI-to-XML, EDI-to-EDI, etc.).

At one extreme, a VAN hosts only horizontal Business-to-business application integration services, hosting general-purpose integration services for any process or industry. At the other extreme a VAN also hosts process-specific or industry-specific integration, for example supply chain ordering or data synchronization services.

A VAN not only transports (receives, stores and forwards) messages but also adds audit information to them and modifies the data in the process of automatic error detection and correction or conversion between communications protocols).

[edit]History

[edit]1960s: Timesharing and network services

Following in the wake of Timesharing providers, provision of leased lines between terminals and datacenters proved a sustainable business which led to the establishment of dedicated business units and companies specialized in the management and marketing of such network services. See Tymshare for an example of a timeshare services company that spun off Tymnet as a data communications specialist with a complex product portfolio.

[edit]1970s: Marketisation of telecommunication

The large-scale allocation of network services by private companies was in conflict with state-controlled telecommunications sector. To be able to gain a license for telecommunication service provision to customers, a private business had to "add value" to the communications line in order to be a distinguishable service. Therefore, the notion of "Value-added Network Services" was established to allow for operation of such private businesses as an exemption from state control.

The telco sector was marketised in the USA in 1982 (see Modification of Final Judgment) and in the United Kingdom starting with the early 1980s (manily due to the privatization of the British Telecom under P.M. Margaret Thatcher). In the later 1980s, running a "Value-Added Network Service" required licensing in the U.K. while "VAN" had merely become a functional description of a specific subset of networked data communication in the USA, as stated in.[1]

[edit]Since 1980s: International competition and standardisation efforts

On a multinational scale, due to the heterogeneous telecommunication economy and infrastructure before the market penetration of the Internet, management of a VAN Service proved a complicated task leading to the idea of "user defined networks",[2] a concept preceding the nowadays ubiquitous availability of internet service. Standardization efforts for data networking were made by ITU-T (former CCIT) and included X.25 (Packet Switched Network) and X.400 (Message Handling Systems), specifically motivated by an emerging transatlantic competition[3] in the early 1990s.

[edit]Perspective

In the absence of state-operated telecommunication sector, "Value-added Network Service" still is used, mainly as a functional description, in conjunction with dedicated leased lines for B2B communications (especially for EDIFACT data transfer).

Governments like South Africa still maintain explicit regulation[4] while others address specific services with licensing.[5]

Traditionally, most VANs primarily only supported general-purpose B2B integration capabilities focused on EDI but service providers are evolving to become more process- and industry-specific over time, particularly in industries such as retail and hi-tech manufacturing. Some sources ([6]) suggest that modern VANs should be called "Trading Grids" due to commonalities with Grid computing. Others, such as[7] distinguish into "Internet Service Providers" (ISPs) and "International Value-Added Network Services" (IVANS) operators.

A compiler is a computer program (or set of programs) that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language.

A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization.

Program faults caused by incorrect compiler behavior can be very difficult to track down and work around and compiler implementors invest a lot of time ensuring the correctness of their software.

The term compiler-compiler is sometimes used to refer to a parser generator, a tool often used to help create the lexer and parser.

History

Main article: History of compiler writing

Software for early computers was primarily written in assembly language for many years. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs started to become significantly greater than the cost of writing a compiler. The very limited memory capacity of early computers also created many technical problems when implementing a compiler.

Towards the end of the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper, in 1952, for the A-0 programming language. The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler, in 1957. COBOL was an early language to be compiled on multiple architectures, in 1960.[1]

In many application domains the idea of using a higher level language quickly caught on. Because of the expanding functionality supported by newer programming languages and the increasing complexity of computer architectures, compilers have become more and more complex.

Early compilers were written in assembly language. The first self-hosting compiler — capable of compiling its own source code in a high-level language — was created for Lisp by Tim Hart and Mike Levin at MIT in 1962.[2] Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal and C have been popular choices for implementation language. Building a self-hosting compiler is a bootstrapping problem -- the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter.

[edit]Compilers in education

Compiler construction and compiler optimization are taught at universities and schools as part of the computer science curriculum. Such courses are usually supplemented with the implementation of a compiler for an educational programming language. A well-documented example is Niklaus Wirth's PL/0 compiler, which Wirth used to teach compiler construction in the 1970s.[3] In spite of its simplicity, the PL/0 compiler introduced several influential concepts to the field:

Program development by stepwise refinement (also the title of a 1971 paper by Wirth[4])

The use of a recursive descent parser

The use of EBNF to specify the syntax of a language

A code generator producing portable P-code

The use of T-diagrams[5] in the formal description of the bootstrapping problem

[edit]Compiler output

One classification of compilers is by the platform on which their generated code executes. This is known as the target platform.

A native or hosted compiler is one whose output is intended to directly run on the same type of computer and operating system that the compiler itself runs on. The output of a cross compiler is designed to run on a different platform. Cross compilers are often used when developing software for embedded systems that are not intended to support a software development environment.

The output of a compiler that produces code for a virtual machine (VM) may or may not be executed on the same platform as the compiler that produced it. For this reason such compilers are not usually classified as native or cross compilers.

[edit]Compiled versus interpreted languages

Higher-level programming languages are generally divided for convenience into compiled languages and interpreted languages. However, in practice there is rarely anything about a language that requires it to be exclusively compiled, or exclusively interpreted; although it is possible to design languages that may be inherently interpretive. The categorization usually reflects the most popular or widespread implementations of a language — for instance, BASIC is sometimes called an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters.

Modern trends toward just-in-time compilation and bytecode interpretation at times blur the traditional categorizations of compilers and interpreters.

Some language specifications spell out that implementations must include a compilation facility; for example, Common Lisp. However, there is nothing inherent in the definition of Common Lisp that stops it from being interpreted. Other languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder; for example, APL, SNOBOL4, and many scripting languages allow programs to construct arbitrary source code at runtime with regular string operations, and then execute that code by passing it to a special evaluation function. To implement these features in a compiled language, programs must usually be shipped with a runtime library that includes a version of the compiler itself.

[edit]Hardware compilation

The output of some compilers may target hardware at a very low level, for example a Field Programmable Gate Array (FPGA) or structured Application-specific integrated circuit (ASIC). Such compilers are said to be hardware compilers or synthesis tools because the programs they compile effectively control the final configuration of the hardware and how it operates; the output of the compilation are not instructions that are executed in sequence - only an interconnection of transistors or lookup tables. For example, XST is the Xilinx Synthesis Tool used for configuring FPGAs. Similar tools are available from Altera, Synplicity, Synopsys and other vendors.

[edit]Compiler design

In the early days, the approach taken to compiler design used to be directly affected by the complexity of the processing, the experience of the person(s) designing it, and the resources available.

A compiler for a relatively simple language written by one person might be a single, monolithic piece of software. When the source language is large and complex, and high quality output is required the design may be split into a number of relatively independent phases. Having separate phases means development can be parceled up into small parts and given to different people. It also becomes much easier to replace a single phase by an improved one, or to insert new phases later (eg, additional optimizations).

The division of the compilation processes into phases was championed by the Production Quality Compiler-Compiler Project (PQCC) at Carnegie Mellon University. This project introduced the terms front end, middle end, and back end.

All but the smallest of compilers have more than two phases. However, these phases are usually regarded as being part of the front end or the back end. The point at where these two ends meet is always open to debate. The front end is generally considered to be where syntactic and semantic processing takes place, along with translation to a lower level of representation (than source code).

The middle end is usually designed to perform optimizations on a form other than the source code or machine code. This source code/machine code independence is intended to enable generic optimizations to be shared between versions of the compiler supporting different languages and target processors.

The back end takes the output from the middle. It may perform more analysis, transformations and optimizations that are for a particular computer. Then, it generates code for a particular processor and OS.

This front-end/middle/back-end approach makes it possible to combine front ends for different languages with back ends for different CPUs. Practical examples of this approach are the GNU Compiler Collection, LLVM, and the Amsterdam Compiler Kit, which have multiple front-ends, shared analysis and multiple back-ends.

[edit]One-pass versus multi-pass compilers

Classifying compilers by number of passes has its background in the hardware resource limitations of computers. Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one pass compilers generally compile faster than multi-pass compilers. Many languages were designed so that they could be compiled in a single pass (e.g., Pascal).

In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass.

The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.

Splitting a compiler up into small programs is a technique used by researchers interested in producing provably correct compilers. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program.

While the typical multi-pass compiler outputs machine code from its final pass, there are several other types:

A "source-to-source compiler" is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. OpenMP) or language constructs (e.g. Fortran's DOALL statements).

Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog implementations

This Prolog machine is also known as the Warren Abstract Machine (or WAM). Bytecode compilers for Java, Python, and many more are also a subtype of this.

Just-in-time compiler, used by Smalltalk and Java systems, and also by Microsoft .Net's Common Intermediate Language (CIL)

Applications are delivered in bytecode, which is compiled to native machine code just prior to execution.

[edit]Front end

The front end analyzes the source code to build an internal representation of the program, called the intermediate representation or IR. It also manages the symbol table, a data structure mapping each symbol in the source code to associated information such as location, type and scope. This is done over several phases, which includes some of the following:

Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within identifiers require a phase before parsing, which converts the input character sequence to a canonical form ready for the parser. The top-down, recursive-descent, table-driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. Atlas Autocode, and Imp (and some implementations of Algol and Coral66) are examples of stropped languages whose compilers would have a Line Reconstruction phase.

Lexical analysis breaks the source code text into small pieces called tokens. Each token is a single atomic unit of the language, for instance a keyword, identifier or symbol name. The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This phase is also called lexing or scanning, and the software doing lexical analysis is called a lexical analyzer or scanner.

Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports macro substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as Scheme support macro substitutions based on syntactic forms.

Syntax analysis involves parsing the token sequence to identify the syntactic structure of the program. This phase typically builds a parse tree, which replaces the linear sequence of tokens with a tree structure built according to the rules of a formal grammar which define the language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.

Semantic analysis is the phase in which the compiler adds semantic information to the parse tree and builds the symbol table. This phase performs semantic checks such as type checking (checking for type errors), or object binding (associating variable and function references with their definitions), or definite assignment (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the parsing phase, and logically precedes the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.

[edit]Back end

The term back end is sometimes confused with code generator because of the overlapped functionality of generating assembly code. Some literature uses middle end to distinguish the generic analysis and optimization phases in the back end from the machine-dependent code generators.

The main phases of the back end include the following:

Analysis: This is the gathering of program information from the intermediate representation derived from the input. Typical analyses are data flow analysis to build use-define chains, dependence analysis, alias analysis, pointer analysis, escape analysis etc. Accurate analysis is the basis for any compiler optimization. The call graph and control flow graph are usually also built during the analysis phase.

Optimization: the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are inline expansion, dead code elimination, constant propagation, loop transformation, register allocation or even automatic parallelization.

Code generation: the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes (see also Sethi-Ullman algorithm).

Compiler analysis is the prerequisite for any compiler optimization, and they tightly work together. For example, dependence analysis is crucial for loop transformation.

In addition, the scope of compiler analysis and optimizations vary greatly, from as small as a basic block to the procedure/function level, or even over the whole program (interprocedural optimization). Obviously, a compiler can potentially do a better job using a broader view. But that broad view is not free: large scope analysis and optimizations are very costly in terms of compilation time and memory space; this is especially true for interprocedural analysis and optimizations.

Interprocedural analysis and optimizations are common in modern commercial compilers from HP, IBM, SGI, Intel, Microsoft, and Sun Microsystems. The open source GCC was criticized for a long time for lacking powerful interprocedural optimizations, but it is changing in this respect. Another open source compiler with full analysis and optimization infrastructure is Open64, which is used by many organizations for research and commercial purposes.

Due to the extra time and space needed for compiler analysis and optimizations, some compilers skip them by default. Users have to use compilation options to explicitly tell the compiler which optimizations should be enabled.

Computer World

Search This Blog

Electronic Data Interchange (EDI)

Comments

Post a Comment

Popular posts from this blog

Random English

Mail Protocols