Document Content Architecture

The Document Content Architecture, or DCA for short, is a standard developed by IBM for text documents in the early 1980s. DCA was used on mainframe and iSeries systems, and formed the basis of DisplayWrite's file format. DCA was later extended as MO:DCA (Mixed Object Document Content Architecture), which added embedded data files, like graphics.

DCA
Developed byIBM
Type of formatDocument file format
Extended toMO:DCA

The original purpose of DCA was to provide a common document format that could be used across multiple IBM word processing platforms–such as the IBM PC, IBM mainframes, the Displaywriter dedicated word processor, and the IBM 5520 Administrative System.[1]

DCA defines two types of documents:[2]

  • Revisable-form Text (DCA/RFT) which is editable.
  • Final-form Text (DCA/FFT) which is "formatted for a particular output device and cannot be changed."

Description

DCA defines a data stream representing a document.

Documents may contain fonts, overlays and other resource objects required at presentation time to present the data properly. Finally, documents may contain resource objects, such as a document index and tagging elements supporting the search and navigation of document data, for a variety of application purposes.[3]:2

MO:DCA is the wrapper or container for various objects that can make up the document. Each object is defined by its own subordinate architecture. The architectures are:[3]:4

  • Presentation Text Object Content Architecture (PTOCA) describes formatted text, including text attributes such as font or color.
  • Image Object Content Architecture (IOCA) describes resolution-independent images.
  • Graphics Object Content Architecture (GOCA) describes vector graphic images. A variation of GOCA, AFP GOCA, is used in Advanced Function Presentation environments.
  • Bar Code Object Content Architecture (BCOCA) describes bar codes in a number of different formats.
  • Font Object Content Architecture (FOCA) describes fonts to be used in the document
  • Color Management Object Content Architecture (CMOCA) describes required color management information.

Each architecture uses a series of binary structured fields to describe its corresponding object.

History

The drive to initiate international standards for the DCAs was initiated in 1980 at the IBM Rochester facility. The team consisting of two MODCA architects, an RTOCA architect, and a PTOCA architect was assembled. These architects as they were called were responsible for bringing together the IBM consensus for the design of the data streams and to take the work into the international standards arena. There was a concerted effort to bring the international community into the development. This decision was based in part on the experience gained over the acceptance of GML into an international SGML standard. To avoid the long delay of creating the architecture, they wanted to get everyone involved early. SGML [4]standardization had taken many years and man-hours to develop. IBMs work with document content had been pushed by the needs of main frame computers where GML and DCA were in use, but that experience was pointing to a need for standardized component architectures for revisable and non-revisable text in particular.

In 1981, shortly after its inception, the group was moved along with the IBM 5280 Distributed Data System to IBM Austin near Round Rock, TX, where the work continued with mixed success. As the architectures were becoming more firmly positioned on the international stage, the team was moved again in 1987 to The IBM Dallas Programming Center near Roanoke, Texas (Westlake), where in 1998 it was disbanded and the work discontinued on the DCA architectures due mainly to the pc-community which had gone in a different direction of necessity. The DCA architectures were fully completed, but not totally agreed upon in the details after 18 years. And there were no active implementations in sight.

The world of the PC had decided on HTML (believed to be an application of the SGML international standard) and used portions of it for their purposes, Microsoft Word eventually used the similar datastream for the internal working datastream for storage of editable content. Even though the SGML standard was available, it was impractical for the full SGML parser implementation to be useful so a potential subset of it became the de facto standard for revisable text used today in the pc arena.

At about the same time Adobe Systems designed and produced the printable document encoding called PDF which has become the standard for PC-produced printable documents. The international standard was set in 2008 without any input from anyone except the users who decided to use the products offered in greater numbers than the managers of the data stream architects had ever dreamed possible. The decision was driven by the need for the product and the solution found was far more acceptable than the standards committees could design as a standard in the time frame in which the decision was needed. Over 10 years of work had not produced the acceptable method and the pc-computing community created what they needed in less time.

Attempting to achieve a consensus document data stream was quickly out-flanked by the available and usable content provided by the companies who did not attempt to share with others, but created a workable solution and sold it to users - and they liked it. So the output of the word processing software is 'printed' into the PDF format provided by the most used presentation product. That is, for example, Microsoft Word provides a printer selection 'Microsoft Print to PDF' in order to produce the requisite output for a PDF document - a very acceptable solution for most users. A similar method could have been used to produce the international standard had one eventually arrived.

When IBM disbanded its Dallas Programming Center in 1998, the entire staff of architects retired and left the company except the manager who was moved to another location and another position, ending the DCA architecture project for the foreseeable future at IBM.

gollark: Why not? I do count. Sometimes I count to 12.
gollark: Yes, deleting 2000 messages because of [REDACTED] is totally not causing any problems.
gollark: As they should be.
gollark: er, points.
gollark: so now I just need to add three more words.

See also

References

  1. Henkel, Tom (21 May 1984), "IBM taking the standardization route to DPP", Computerworld, IDG Enterprise, 18 (21), p. 7, ISSN 0010-4841
  2. "PC Magazine Encyclopedia". Retrieved July 25, 2012.
  3. IBM Corporation (May 2006). Mixed Object Document Content Architecture Reference (PDF). Retrieved Feb 7, 2020.
  4. http://www.sgmlsource.com/


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.