Piece table
A Piece Table is a data structure typically used to represent a series of edits on a text document. An initial reference (or 'span') to the whole of the original file is created, with subsequent inserts and deletes being created as combinations of one, two, or three references to sections of either the original document or of the spans associated with earlier inserts.[1]
Typically the text of the original document is held in one immutable block, and the text of each subsequent insert is stored in new immutable blocks. Because even deleted text is still included in the piece table, this makes multi-level or unlimited undo easier to implement with a piece table than with alternative data structures such as a gap buffer.
This data structure was invented by J Strother Moore.[2]
Description
For this description, we use buffer as the immutable block to hold the contents.
A piece table consists of three columns:[1]
- Which buffer
- Start index in the buffer
- Length in the buffer
In addition to the table, two buffers are used to handle edits:
- "Original buffer": A buffer to the original text document. This buffer is read-only.
- "Add buffer": A buffer to a temporary file. This buffer is append-only.
Operations
Index
Definition:
Index(i)
: return the character at position i
To retrieve the i-th character, the appropriate entry in a piece table is read.
Example
Given the following buffers and piece table:
Buffer | Content |
---|---|
Original file | ipsum sit amet |
Add file | Lorem deletedtext dolor |
Which | Start Index | Length |
---|---|---|
Add | 0 | 6 |
Original | 0 | 6 |
Add | 18 | 5 |
Original | 6 | 9 |
To access the i-th character, the appropriate entry in the piece table is looked up.
For instance, to get the value of Index(15)
, the 3rd entry of piece table is retrieved. This is because the 3rd entry describes the characters from index 12 to 17 (the first entry describes characters in index 0 to 5, the next one is 6 to 11). The piece table entry instructs the program to look for the characters in the "add file" buffer, starting at index 18 in that buffer. The relative index in that entry is 3 (15 - 12), making the value of Index(15)
the character "o".
For the buffers and piece table given above, the following text is shown:
Lorem ipsum dolor sit amet
The buffer implies such a sequence of action:
start: ipsum sit amet (piece table is now Original[0,9]) insert: Lorem ipsum sit amet (piece table is now Add[0,6], Original[0,9]) insertion of " deletedtext " somewhere "deletedtext" is removed (but it stays in the Add buffer) insert: Lorem ipsum dolor sit amet (piece table ends at the final state)
Insert
Inserting characters to the text consists of:
- Appending characters to the "add file" buffer, and
- Updating the entry in piece table (breaking an entry into two or three)
Delete
Deletion involves only modifying the appropriate entry in piece table.
Time complexity
Although a piece table itself is a data structure, it does not state how it should be implemented. The time complexity of operations depends on how the table is being implemented. One possible implementation of a piece table is in a splay tree, because it provides quick access to recently-accessed elements.[3]
Usage
Several text editors use an in-RAM piece table internally, including Bravo,[1]Abiword[4][5][6], Atom[3] and Visual Studio Code.[7]
The "fast save" feature in some versions of Microsoft Word uses a piece table for the on-disk file format.[2]
The on-disk representation of text files in the Oberon System uses a piece chain technique that allows pieces of one document to point to text stored in some other document, similar to transclusion. [8]
See also
- Rope (computer science)
- Gap buffer, a data structure commonly used in text editors that allows efficient insertion and deletion operations clustered near the same location
References
- Crowley, Charles (1998-06-10). "Data Structures for Text Sequences - 6.4 The piece table method" (PDF). Archived from the original (PDF) on 2018-02-23. Cite journal requires
|journal=
(help) - David Lu. "What's been wrought using the Piece Table?". (discussion)
- "Atom's new concurrency-friendly buffer implementation"
- "AbiWord Development: Piece Table Background".
- James Brown. "Piece Chains: Design & Implementation of a Win32 Text Editor".
- Joaquin Cuenca Abela. "Improving the AbiWord's Piece Table".
- "VS Code 1.21 Release Notes (source code)
- Niklaus Wirth, Jürg Gutknecht. "Project Oberon: The Design of an Operating System and Compiler" Archived 2013-04-12 at the Wayback Machine. 2005. p. 90.