Saturday, August 23, 2014

E.F. Codd and the Relational Database Model

E. F. Codd (1923-2003)
On August 23, 1923, English computer scientist Edgar Frank "Ted" Codd was born. His main achievement besides many contributions to computer science was the invention of the relational model for database management, the theoretical basis for relational databases.

When you talk about databases today, usually you are referring to relational databases that store their data within tables, interconnected via so-called keys. Of course there are also modern alternatives such as e.g. graph based databases, but relational databases are widespread and rather common today. And this is also thanks to E.F. Codd and his relational algebra.

Edgar Frank Codd was born the youngest of seven children in Portland Bill, in Dorset, England, in 1923. His father was a leather manufacturer, his mother a schoolteacher. After attending Poole Grammar School, he studied mathematics and chemistry at Exeter College, Oxford, before serving as a pilot in the Royal Air Force during the Second World War. In 1948 at age 25, he moved to New York to work for IBM as a mathematical programmer. In 1953, angered by Senator Joseph McCarthy, Codd moved to Ottawa, Canada. While in Canada, he established a computing center for the Canadian guided missile program. A decade later he returned to the U.S. and received his doctorate in computer science from the University of Michigan in Ann Arbor. His thesis was about self-replication in cellular automata, extending on work of von Neumann and showing that a set of eight states was sufficient for universal computation and construction.

Two years later he moved to San Jose, California, to work at IBM's San Jose Research Laboratory, where he continued to work until the 1980s. There he found existing data management systems “seat-of-the-pants, with no theory at all,” he recalled in one interview. “I began reading documentation,” Codd said, “and I was disgusted.” [2]. Subsequently, Codd worked out his theories of data arrangement, issuing his paper "A Relational Model of Data for Large Shared Data Banks" in 1970, after an internal IBM paper one year earlier. In fact, the 1970 paper became one of the most important research papers in computer history. Codd believed that all the information in a database should be represented as values in the rows and columns of tables, and that no information should be represented by pointers or connections among records.[2] To his frustration, IBM largely ignored his work, as the company was investing heavily at the time in commercializing a different type of database system, the IMS/DB [1].

Then IBM included in its Future Systems project a System R subproject — but put in charge of it developers who were not thoroughly familiar with Codd's ideas, and isolated the team from Codd. As a result, they did not use Codd's own Alpha language but created a non-relational one, SEQUEL. Even so, SEQUEL was so superior to pre-relational systems that it was copied, in 1979, based on pre-launch papers presented at conferences, by Larry Ellison, of Relational software Inc, in his Oracle Database, which actually reached market before SQL/DS — because of the then-already proprietary status of the original name, SEQUEL had been renamed SQL. System R was a success, and in 1981 IBM announced its first relational database product, SQL/DS. DB2, initially for large mainframe machines, was announced in 1983 [3].

Codd continued to develop and extend his relational model, sometimes in collaboration with Chris Date. One of the normalized forms, the Boyce–Codd normal form, is named after him. Codd's theorem, a result proven in his seminal work on the relational model, equates the expressive power of relational algebra and relational calculus (both of which, lacking recursion, are strictly less powerful than first-order logic). As the relational model started to become fashionable in the early 1980s, Codd fought a sometimes bitter campaign to prevent the term being misused by database vendors who had merely added a relational veneer to older technology. As part of this campaign, he published his 12 rules to define what constituted a relational database. This made his position in IBM increasingly difficult, so he left to form his own consulting company with Chris Date and others.

Nevertheless, Codd was appointed IBM Fellow in 1976. In 1981, Codd was honoured with the Turing Award, the most prestigious award in computer science similar to the Fields medal in mathematics. During the 1990s, his health deteriorated and he ceased work. Codd died of heart failure at his home in Williams Island, Florida, at the age of 79 on April 18, 2003.

At yovisto you can watch a lecture from Dr. Jens-Peter Dittrich from ETH Zürich about 'Dataspaces' where he is talking about Codd's Relational Model.

References and Further Reading:
Related Articles at Yovisto Blog:

If you like the daily blog posts of yovisto about the history of science, please support us by clicking on the amazon links and making your next amazon purchase via our offered links. Nevertheless, please do also support your local (real world) bookstore at the corner of the street.
Post a Comment