Overview of programming languages

Narrative

Early (~1950-1957) computers were set-up/programmed using binary assembly codes for that exact computer. Multiplication, or even adding numbers that took more than one word, was done in several steps. Registers had different abilities (r1 can have r2-4 added to it, r3-6 can do bit-shifting, ... ) which you needed to know. It was obvious that the set-up for one computer wouldn't work for another.

An early improvement was the assembler. It was a program translating "add r1, $10" into the appropriate binary codes. The computer operator was doing the same painful assembly code writing as before, for this exact machine. This just automated the very last step (and the reverse process made it much easier to read and fix code.)

Many computer programmers in the late 50's and 60's were trained on two US government projects. The Navy's Whirlwind project starting in 1944 as a way to built better and more general-purpose flight simulators. After hearing of ENIAC's success, they decided to build their own electronic computer. The hardware was a challenge, but "setting it up" (the programming) was more difficult than expected.

Later, the US airforce started work on a project to link radars all over the country, listening for Soviet Union bombers (the Cold War was well underway.) It was named SAGE -- semi automatic ground environment. By 1956 the hardware was sort-of running and the quasi-government agency overseeing it, the RAND (Research ANd Developement) corp found there weren't nearly enough programmers. They began a huge campaign of recruiting and training. Magazine ads looked for chess players and "people who are good at solving puzzles." They eventually found music teachers made good programmers. By 1963 SAGE was still going, but at least 6,000 SAGE-trained programmers have left for private work.

During this time at least one "formula only" programming language was written, and assembly tricks were learned. A common was was the subroutine. For example, suppose you write the 50 lines of code to multiply R0 by R1, using R2 as a "scratch" register. The first line might be to copy R2 into memory and the second-to-last line would be to load it back into R2. The last line would be to "jump" to another line.

To use this subroutine to multiply R5 by R7, you first copy R5 to R0 and R7 to R1. You then change the place to where the subroutine will jump when finished and jump to the start of the subroutine (if you don't have anything important in R2, you might jump past the part where it saves it.) The subroutine will jump back to here, and you'll copy R0 back to R5.

From 1952-1955(?) former Navy Officer Grace Hopper (working on a UNIVAC at Remmington-Rand) was working on an easier way to set-up computers and proposed a general computer "language" which she named Flow-Matic. Obvious benefits would be less training required for programmers, faster code writing and programs that would run on different machines. She named her language FLOW-MATIC. From ~1954 to 1957 IBM, John Backus and a team of about 10 (including ISU math major Harlan Herrick) were working on what would be the first practical programming language, FORTRAN. The important thing was that the compiler could turn lines of FORTRAN into assembly code that was as good, or better, than "real" assembly programmers could write.

Fortran was good at what it did, which was complex math. It was not terribly good with names and addresses or large files. By the mid '50s computers could obviously do useful work and were being bought by larger businesses for billing, payroll, ... . Several companies (including IBM) were working on a "business" language. Why couldn't words and files be added to FORTRAN? One reason was that it was regarded as the "scientific language." A technical reason was that added those extra features would slow it down so it wouldn't be good at anything.

Worried about rivals controlling the language programs were written in (a similar situation to the java/.NET battle going on now) a consortium of businesses and the US Government wrote COBOL, which was based on FLOW-MATIC (after the 12-person "short range committee" went nowhere, 6 of them met and actually wrote COBOL, including Jean Sammet, who worked for Grace Hopper.) COBOL would become and remain one of the more popular langauges for banks, insurance companies, etc... .

These came to be known as high-level languages, with assembly known as a low-level language. The idea being, the machine (the bare metal) is at the bottom and the higher you are, the less you can see how it actually works.

A vast number of programming languages were proposed after that. Most of them didn't catch on, weren't practical, were only experiments on compiler design, ... . LISP was for list processing (and is popular in artificial intelligence.) SnoBol was for string processing. Algol, written in Europe, had great ideas but no one could write a good compiler for it. IBM's PL/1 had so many features no one could learn them all or write a compiler handling them all. JOVIAL was one of many (hundreds) of early military languages. Smalltalk had all variables be "objects" which sent "messages" to each othter. It wasn't practicle, but later inspired "objects" in C++ and Java.

BASIC was written at Dartmouth in 1964 as something easy to use, for regular students. Dartmouth wanted engineers, chemists, etc... to be able to use a computer to solve equations from those discipline. Fortran was too much to learn, they felt. They designed a time-sharing system on a big IBM machine (a 701?) and computer language BASIC. In the late 70's, when micro-computer makers were looking for a small, simple language, they picked BASIC. My first programs, around 1979 on an Apple-II, were written in AppleBasic, possibly written by the tiny company micro-soft.

The UNIX operating system was designed at AT&T Bell Labs for the computers running the phones. It was written in PDP assembly language. In 1972 it needed a rewrite. Assembly would have taken too long and FORTRAN didn't have the ability to move around bits of memory. So, they wrote a new language, C, to rewrite UNIX. Of course, C benefitted from 15 years of experience from when FORTRAN was introduced, and is perfectly good at math.

In the 80's Bjarne Stroustrup, also at AT&T BELL Labs, worked on an improved "object oriented" version of C, named C++ (a programing joke.) It became "official" in 1998, but various version were used long before that. ISU taught programming in C++ from (at least) 1997 to 2004. MS-Windows is written in assembly and C++.

By the 1994 the internet had become popular and "distributed" languages (I just made that term up) were becoming popular. The original idea was that toasters, refrigerators, etc... would need a programming language, they would all communicate over a network using some common language, regardless of the exact chip. A language like Fortran didn't have "network" stuff in it, but beyond that, it was generally compiled into assembly language for one particular brand of computer chip. It would be impractical to send bits of FORTRAN code to and fro, compiling it each time, and too slow to interpret it.

Java was written at SUN in 1992 to solve this problem....

Flowmatic, High-Level language, Compilers

A snippet (from www.objectz.com/columnists/denise/featurepart2.html):

INPUT INVENTORY FILE A;
PRICE FILE B;
OUTPUT PRICED INVENTORY FILE C.
COMPARE PRODUCT #A WITH PRODUCT #B.
IF GREATER, GO TO OPERATION 10;
IF EQUAL, GO TO OPERATION 5;
OTHERWISE GO TO OPERATION 2.
TRANSFER A TO D;
WRITE ITEM D;
JUMP TO OPERATION 8.

Even today, computer chips only really understand assembly codes (load or store a register, ... ,) which we now call a low-level language. The sort of language that computers can't understand but can be translated into assembly, is known as a high-level language.

This program to turn Flow-Matic program (known as the source code) into assembly would come to be called a compiler. You write the source code, compile it into an executable (a new file, full of assembly code) and run that. By 1958, Flow-Matic was being used to write actual programs on a UNIVAC. It is not used today, but the improved version, COBOL, is.

Fortran

In 1957, John Backus at IBM finished Fortran (formula translator) together with an efficient compiler. It could turn lines of Fortran into machine code that was almost as good as if a "real" computer programmer had written it. This made Fortran the first real computer language. It was free for what were obvious business reasons at the time. IBM viewed itself primarily as selling computers, not programs or languages. Those were just things it did to support its computer business or as part of the service contract -- imagine Purina giving away free puppies.

Not only did Fortran let more people write programs, faster. It also made it much easier to adapt an old program and recompile, or run the same program on multiple types of computers (you just needed someone to write a Fortran compiler, hopefully a good one, for that machine.)

Cobol

In the 50's, physical computers had been divided into scientific and business models. For examples, scientific computers (ex: the IBM 701, 1953) used decimal numbers, computed logs and cosins and only printed number answers. Business computers (IBM 650, 1954) had to read names and addresses, and manipulate records (Bob Smith, 101 OakDale, owes $50, due Apr 3rd) but not much math beyond two decimal places. Fortran was known as the "scientific" computer language. By 1960, a consortium of the US government and businesses developed a business computer language, COBOL. It was based on FLOW-MATIC.

Many computer scientists regard cobol as the worst existing computer language, ever. Here's a psuedo-snippet. It is a loop (lines 110-120) that reads and adds all employee salaries.

00050 TOTAL-SALARY PIC 9999.
00100 READ EMPLOYEE-RECORD.
00103 MOVE 0 TO TOTAL-SALARY.
00105
00110 PERFORM ACCUMULATE-SALARY
00120   UNTIL NO-MORE-EMPLOYEES EQUAL-TO "YES".
00130
00200 ACCUMULATE-SALARY.
00210   ADD EMP-SALARY TO TOTAL-SALARY GIVING TOTAL-SALARY.
00220   READ EMPLOYEE-RECORD
00230     AT END
00240       MOVE "YES" TO NO-MORE-EMPLOYEES.

Cobol was made for punch cards, where numbering lines was important (in case you dropped them.) Positions 1-6 are actually reserved for line numbers. It wasn't made to read from the keyboard, since there weren't any. The thing it does best is to read a large file where each line is laid out in exactly the same way. Of course, the longest a line can be is 80 (including the first 6) since punch cards were that wide. Also note the "function" ACCUMULATE-SALARY can't get any inputs -- they didn't have any back then.

Even though Cobol looks and works like something from 1960, it's in use today. There are many old programs used by banks and insurance companies written in COBOL. It's easier to update a program than to write a new one from scratch, and a lot safer. The "Y2K" problem was caused partially by a lot of old Cobol programs using two digits for the date (COBOL wants you to say how many digits are in a number -- the top line says TOTAL-SALARY will be four digits.) No one ever imagined programs from 1970 would still be in use by the year 2000. We were going to have rocket packs and live on the moon by then, after all.

If you work in a bank, updating COBOL all day, and you need to write something new, COBOL is the obvious choice because you know it. ISU's ADP (applied data processing) writes new programs in COBOL for ISU financial, ummm, stuff. Of course, they use a modern language (Java) for web stuff.

Imperative vs. Functional

Nowadays, COBOL and Fortran look remarkably similar. They are both imperative languages: variables, a=b+c*7, if-else, while(x<10), dothing(a,b). Do one line at a time until you get to the end. Almost all of the languages here are imperative. C++ and Java are sometimes called "object oriented" languages, but they are imperative languages which also have object orientedness.

The other major paradigm for computer languages is functional as in, using functions for everything. LISP, scheme and ML fall into this category. They've been around for a while, and work pretty well, but are generally not used for production programming, so most people haven't heard of them. At ISU, we use scheme in (required course) cs342.

Here's a snippet of fake LISP that counts the number of a's in a list:

countas lambda(alist)
 (if(= alist ()) 0
    (if(= (car alist) 'a) (+ 1 (countas(cdr alist)))
                          (countas(cdr alist))
 ))

>(countas '(h a y a a b a))
4

It says the answer is 0 if alist is empty, otherwise check if the first thing is an a. If it is, the answer is 1 plus a's in the rest of the list (restart with that shorter list) otherwise the answer is the a's in the rest of the list. Here is the same thing in an imperative language:

numa=0; index=0
while(index < listsize)
 { if(alist[index]=='a') numa=numa+1
   index=index+1
 }

Basic

As colleges got the "new" mini-computers, students started writing programs (mostly grad students, at first, for their research.) Fortran was easier to learn than assembly, but still not exactly easy. BASIC was written at Dartmouth college, in 1964, as something students could learn quickly. For example, x ** 4 computes x to the fourth power (compare to pow(x,4), which calls the power function with inputs x and 4.)

In 1975, when the Altair personal computer kit needed a very simple programming langauge, Maker Ed Roberts decided on a stripped-down version of Basic (the first people who answered his ad were Paul Allan and Bill Gates, then in Harvard.) The Apple-II, in 1977, came with Basic, written by Apple creator Steve Wozniak. You could turn it on and start typing in a Basic program, then save it on your 5-1'4" floppy disk with a command. Books with games, written in Basic, became popular. Later PC's, the PET and TRS-80 also used Basic. People with PCs in the 80's, the author included, learned Basic as a first language.

Here's sort of a fake Basic craps program:

10 for ST = 1 to 80
20 print "*"
30 endfor
40 printline
45 MY=100
50 printline "     Welcome to BlackJack. Enter bet:"
55 read BT
70 RL=rand(6)+rand(6)
80 if RL=7 or RL=11
85 printline "You win"
90 let MY = MY + BT
100 goto 1000
105 endif
110 print "Point is " RL " Press space to roll again"
....
1000 print "Play again?"
1010 read AG
1020 if AG = "Y"
1030 goto 50

Microsoft's VisualBasic is an imperatative language, but isn't Basic.

Interpreted langauges

As a practical matter, a program could be written, saved on magnetic tape (or paper tape,) compiled and saved on a different magnetic tape. On a personal computer, that wasn't practical. A compiler was a fairly large program, home computers didn't have lots of fast off-line storage and, well, it would be nice to just type it in and say RUN. The solution was an interpreter.

An interpreter is a program that reads lines in a computer language and runs them. You write your interpreter in assembly (or in some other language and compile it, but the first were in assembly) and put it in the chip (that way it is there when you turn on the computer -- it "knows" Basic.) They are slower than compiled programs for the obvious reason: the computer reads the next line of the interpreter, which says to read the next line of your program. If it is x=3 the interpreter jumps to the part of itself that knows about equals which looks up x in a table (instead of having precomputed that a is location 3AF6.)

For a PC, interpreters were fine. They were small, the programs took up less space than if you compiled them, and home users didn't care that much about a few hundreths of a second, if they were playing hangman or entering recipes. Of course, games on early PC's were all written in assembly, for the speed.

Shell scripts (sh, csh, bash) are interpreted. This way the same program that reads your keyboard commands can also run scripts. It is formally known as the command interpreter. The slowness isn't much of a drawback, since most of the things they are doing (cat, tr, cut, ...) are compiled programs.

Other langauges

There were hundreds(?) of imperative languages written, with various features. Many are still in limited use. ALGOL(~'60) was the first popular rewrite of FORTRAN. It never caught on, but the ideas were copied by many other langauges. PL/1 (~'64, programming language 1) was written by IBM and used for part of the operating system for their 360 (OS/360). It also never caught on since it was difficult to write a compiler for, as was known as a "kitchen sink" language, for having way too many features. But, since it was written by IBM, some PL/1 is still in use. Samples (from Wikipedia):

AlgolPL/I
procedure Absmax(a) Size:(n, m) Result:(y) Subscripts:(i, k);
    value n, m; array a; integer n, m, i, k; real y;
comment The absolute greatest element of the matrix a,
        of size n by m is transferred to y, and the subscripts of
        this element to i and k;
begin integer p, q;
    y := 0; i := k := 1;
    for p:=1 step 1 until n do
    for q:=1 step 1 until m do
        if abs(a[p, q]) > y then
            begin y := abs(a[p, q]);
            i := p; k := q
            end
end Absmax
PROCEDURE(ARRAY,N); /* BUBBLE SORT*/
DECLARE (I,J) FIXED BIN(15);
DECLARE S BIT(1);        /* SWITCH */
DECLARE Y FIXED BIN(15); /* TEMPO */
DO I = N-1 BY -1 TO 1;
  S = '1'B;
  DO J = 1 TO I;
    IF X(J)>X(J+1) THEN DO;
      S = '0'B;
      Y = X(J);
      X(J) = X(J+1);
      X(J+1) = Y;
      END;
    END;
  IF S THEN RETURN;
  END;
RETURN;
END SRT;

Pascal was written as a clean, modern (for it's time) langauge, without odd special cases or "cool" features that no one really understands and cause errors. It became popular as a teaching language -- ISU used it in the 80's. Sample (Wikipedia):

procedure stupidsort (var a: array of integer);
 var i: integer;
 begin
   i := 0;
   repeat
     inc (i);
     if (a[i+1] < a[i]) then begin
       exchange (a[i], a[i+1]);
       i:=0;
     end;
   until i = length(A) - 2;
 end;

The military used dozens, or more, programming langauges. One of the earliest was named JOVIAL. In the 80's they commisioned ADA (named after Countess Ada Lovelace) as the military language. It was designed to be good at real-time processing without crashing.

Snobol(~'65) was made for string-processing, and has been replaced by perl, etc... .

C/C++

UNIX was written at AT&T Bell labs in PDP assembly code. When it came time to rewrite it, in 1972, a programming language C was written (yes, there is a B, but it never amounted to anything.) C was designed to allow you to easily manipulate individual bits and bytes, look at specific memory locations and in general be very close to the computer (a low-level high-level language.) These aren't things a normal programmer cares about. In 1991, when Linux Torvalds wanted to write his own OS (now named Linux) C was the obvious choice. Most Unix/Linux utilities (cat, tr, bash) are written in C.

As campuses bought mini-computers, running UNIX, computer science students started learning and prefering C, the language of UNIX.

In the 80's Bjarne Stroustrup at AT&T BELL Labs worked on an improved "object oriented" version of C, named C++. It became "official" in 1998, but various version were used long before that. ISU taught programming in C++ from (at least) 1997 to 2004. MS-Windows is written in assembly and in C++.

Java

SUN was using C++ to develope applications for chips in cell phones, cable TV boxes and such. C++ can be a tricky language to test, and a bad or unlucky programmer can make mistakes that are very difficult to find. Also, C++ was written for speed and efficiency. While it can produce fine graphics, it can be tricky.

In 1992 James Gosling, at SUN, had written OAK (later renamed JAVA, after someone in a focus group said it made you feel energetic, like coffee.) It was intended to run on "things like VCRs, telephones, games, automobiles, dishwashers, thermostats...". The idea was it would compile down to fake assembly (bytecode) and each kind of chip would have a small program that knew how to run bytecode, making it appear to run the same, anywhere. It didn't catch on at first inthe consumer electronics market, but the sudden growth of the internet around 1994 was a lucky break.

SUN contacted the leading browser maker, NetScape, about adding a Java bytecode interpreter (a virtual machine (as opposed to a real one) to their browser. This allowed web page makers to program in Java, where it would run in a browser on a Mac, UNIX, MS-Windows, ... . One important feature of Java is can run in a "sand-box." An amount of memory is allocated which it can't go outside of, leaving no chance for viruses, etc... .

As the the web became popular Java became the prefered way to write the "front-end" of applications. For example, it might be used to bring up a pretty web-like page you fill in, with help from the Java program. Once it has all the data, it sends it to a COBOL program to update the database.

Java is still owned and controlled by SUN. For a computer to be labelled "Java compliant" SUN needs to approve the compiler/interpreter. In theory, this means a Java program will run without changes on any computer.

Scripting languages

The sh shell for Unix was written in 1974. It was known as a scripting langauge and the programs known as scripts. There's no official definition of a scripting language, but probably anything interpreted that run other programs (such as running unix commands or data-base queries.)

Awk was designed as an easy way to do line-by-line file processing on Unix systems. In 1987 Larry Wall wrote an improved version named Perl. It can be thought of as a very high level language and is popular for server-side web applications. A snippet that will print the part before the dot of any files ending in .prl:

opendir (indir,".");
@a = readdir(indir);
foreach (@a) {
 if($_ =~ /(.*)\.prl$/) { print "-- $1\n"; }

foreach goes through everything in the list @a, making $_ be each in turn. =~ does a grep-style pattern match. $1 refers to whatever was in the (.*).

PHP is a perl-like language embedded inside HTML. Ruby is an updated version of Perl. Python is another, sort of.

javascript is mostly not a scripting language, since it can't run other programs.

Object Oriented languages

High-level languages are a little slower than assembly code, but allow much faster programming with fewer errors. For most programs assembly is completely impractical -- it would take too long and and have too many errors. Linux does use assembly for small snippets of frequently used code. As computers' speed and memory increased, we had the same problem with imperative high-level languages -- too long to write programs and too many errors. A solution to this was object-oriented languages.

Two problems with really big programs involve the fact that we tend to have large data structures (say, a list of employees, each with name, address, etc... ) and lots of functions that do something to them (compute total weekly salary, highest salary, add an employee, ... .) When we need to split overtime into time-and-a-half and double-overtime, nearly everything needs to be rewritten to account for it. Even worse, we may have a list of free-lance employees that we made by copying all of the employee stuff and making a few changes. All of that needs to be updated as well. With enough of these (suddenly employee addresses need to include country) it can be a huge mess.

An object-oriented language forces you to limit who can look directly at the data-structure, making everyone else use them. For example, a function salary(x) would compute the salary of employee x. The totalSalary and highSalary functions would use it. This makes them slightly slower, takes a little longer to program, but will make it much easier when you need to change the way salary is computed. This is called information hiding.

For free-lance employees, object-oriented languages allow you to say they are an employee, with these few changes. You don't need to copy any of the employee code. This is called inheritance (free-lance employees inherit the data structure and functions of employees.) Now, if there is a chance in how taxes are deducted, you can change it for employees and it will be automatically changed for free-lance ones.

C++, Java are object-oriented. Perl also has it (but Perl programs are often small enough there is no point.)

Niche Web langauges

MS visual basic

Forms and javascript are now probably the most popular way to make pretty Graphical User Interfaces (over the web.) Before they became standardized, Microsoft, in 1991, introduced it's "GUI-language" visual basic (no relationship to basic -- it could have been called visual fortran, or pascal, etc... .) It has been revised 5 times (to VB6, in 1998) as it had to handle more common graphics features, interact with different versions of MS-Windows, etc... .

Like javascript, most of VB is about how to make various types of boxes, looking various ways and how to get the data from them.

VBA (visual basic for applications) is similar to VB, and is for writing snippits of code that can be run from Excel, MS-Word, etc... as a sort of "macro."

javascript

During the browser fight between NetScape and InternetExplorer/MicroSoft, in 1995, NetScape introduced a small computer language, built into its browser, that could run little programs to "cool up" a web page (or make it incredibly annoying.) This was named LiveScript, but renamed JavaScript for no good reason. Even though MS put Netscape out of business, JavaScript stayed popular.

 

  1. IEEE's history site: http://www.computer.org/history/development/1952.htm
  2. Go To: the story of the math majors, bridge players, engineers, chess wizards, maverick scientists and iconoclasts -- the programmers who created the software revolution; Steve Lohr, Basic Books, 2001