Polish orthography

Polish orthography is the system of writing the Polish language. The language is written using the Polish alphabet, which derives from the Latin alphabet, but includes some additional letters with diacritics.[1]:6 The orthography is mostly phonetic, or rather phonemic – the written letters (or combinations of them) correspond in a consistent manner to the sounds, or rather the phonemes, of spoken Polish. For detailed information about the system of phonemes, see Polish phonology.

Polish alphabet

The diacritics used in the Polish alphabet are the kreska (graphically similar to the acute accent) in the letters ć, ń, ó, ś, ź; the kreska ukośna (stroke) in the letter ł; the kropka (overdot) in the letter ż; and the ogonek ("little tail") in the letters ą, ę. There are 35 letters[1]:4[2] in the Polish alphabet: 9 vowels and 26 consonants.

Polish alphabet, bold letters are only used for loanwords
Majuscule forms (also called uppercase or capital letters)
AĄBCĆDEĘ FGHIJKLŁ MNŃOÓPQRS ŚTUVWXYZŹŻ
Minuscule forms (also called lowercase or small letters)
aąbcćdeę fghijklł mnńoópqrs śtuvwxyzźż
Name of Letters
aąbececiedeeę efgiehaijotkael emenoo kreskowanepekueres teufauwuiksigrekzetzietżet
The Polish alphabet. Grey indicates letters not used in native words.

The letters q (named ku), v (named fau or rarely we[3]), and x (named iks) are used in some foreign words and commercial names. In loanwords they are often replaced by kw, w, and (ks or gz), respectively (as in kwarc "quartz", weranda "veranda", ekstra "extra", egzosfera, "exosphere").

When giving the spelling of words, certain letters may be said in more emphatic ways to distinguish them from other identically pronounced characters. For example, H may be referred to as samo h ("h alone") to distinguish it from CH (ce ha). The letter Ż may be called "żet (or zet) z kropką" ("Ż with a dot") to distinguish it from RZ (er zet). The letter U may be called u otwarte ("open u", a reference to its graphical form) or u zwykłe ("regular u"), to distinguish it from Ó, which is sometimes called ó zamknięte ("closed ó"), ó kreskowane or ó z kreską ("ó with a stroke accent"), alternatively o kreskowane or o z kreską ("o with a stroke accent"). The letter ó is a relic from hundreds of years ago when there was a length distinction in Polish similar to that in Czech, with á and é also being common at the time. Subsequently, the length distinction disappeared and á and é were abolished, but ó came to be pronounced the same as u.

Note that Polish letters with diacritics are treated as fully independent letters in alphabetical ordering (unlike in languages such as French and Spanish). For example, być comes after bycie. The diacritic letters also have their own sections in dictionaries (words beginning with ć are not usually listed under c). However, there are no regular words that begin with ą or ń.

Digraphs

Polish additionally uses the digraphs ch, cz, dz, , , rz, and sz. Combinations of certain consonants with the letter i before a vowel can be considered digraphs: ci as a positional variant of ć, si as a positional variant of ś, zi as a positional variant of ź, and ni as a positional variant of ń (but see a special remark on ni below); and there is also one trigraph dzi as a positional variant of . These are not given any special treatment in alphabetical ordering. For example, ch is treated simply as c followed by h, and not as a single letter as in Czech or Slovak (e.g. Chojnice has only first letter capitalized and will be sorted before Cybina).

Spelling rules

Vowels
Grapheme Usual value Other values
a /a/
ą /ɔ̃/ [ɔn], [ɔŋ], [ɔm]; merges with /ɔ/ before /w/ (see below)
e /ɛ/
ę /ɛ̃/ [ɛn], [ɛŋ], [ɛm]; merges with /ɛ/ before /w/ and often word-finally (see below)
i /i/ [j] before a vowel; marks palatalization of the preceding consonant before a vowel (see below)
o /ɔ/
ó /u/
u [w] after vowels
y /ɨ/
Consonants
Grapheme Usual value Voiced or devoiced
b /b/ [p] if devoiced
c1 /t͡s/ [d͡z] if voiced
ć1 /t͡ɕ/ [d͡ʑ] if voiced
cz /t͡ʂ/ [d͡ʐ] if voiced
d /d/ [t] if devoiced
dz1 /d͡z/ [t͡s] if devoiced
1 /d͡ʑ/ [t͡ɕ] if devoiced
/d͡ʐ/ [t͡ʂ] if devoiced
f /f/ [v] if voiced
g /ɡ/ [k] if devoiced
h /x/ [ɣ] if voiced2
ch
j /j/
k /k/ [ɡ] if voiced
l /l/
ł /w/
m /m/
n1 /n/
ń1 /ɲ/
p /p/ [b] if voiced
r /r/
s1 /s/ [z] if voiced
ś1 /ɕ/ [ʑ] if voiced
sz /ʂ/ [ʐ] if voiced
t /t/ [d] if voiced
w /v/ [f] if devoiced
z1 /z/ [s] if devoiced
ź1 /ʑ/ [ɕ] if devoiced
ż /ʐ/ [ʂ] if devoiced
rz3

^1 See below for rules regarding spelling of alveolo-palatal consonants.

^2 H may be glottal [ɦ] in a small number of dialects.

^3 Rarely, rz isn't a digraph and represents two separate sounds:

  • in various forms of the verb zamarzać - "to freeze"
  • in various forms of the verb mierzić - "to disgust"
  • in the place name Murzasichle
  • in borrowings, for example erzac (from German Ersatz), Tarzan

Voicing and devoicing

Voiced consonant letters frequently come to represent voiceless sounds (as shown in the above tables). This is due to the neutralization that occurs at the end of words and in certain consonant clusters; for example, the b in klub ("club") is pronounced like a p, and the rz in prze- sounds like sz. Less frequently, voiceless consonant letters can represent voiced sounds; for example, the k in także ("also") is pronounced like a g. The conditions for this neutralization are described under Voicing and devoicing in the article on Polish phonology.

Palatal and palatalized consonants

The spelling rule for the alveolo-palatal sounds /ɕ/, /ʑ/, /t͡ɕ/, /d͡ʑ/ and /ɲ/ is as follows: before the vowel i the plain letters s z c dz n are used; before other vowels the combinations si zi ci dzi ni are used; when not followed by a vowel the diacritic forms ś ź ć dź ń are used. For example, the s in siwy ("grey-haired"), the si in siarka ("sulphur") and the ś in święty ("holy") all represent the sound /ɕ/.

Sound Word-finally
or before a consonant
Before a vowel
other than i
Before i
/t͡ɕ/ ć ci c
/d͡ʑ/ dzi dz
/ɕ/ ś si s
/ʑ/ ź zi z
/ɲ/ ń ni n

Special attention should be paid to n before i plus a vowel. In words of foreign origin the i causes the palatalization of the preceding consonant n to /ɲ/, and it is pronounced as /j/. This situation occurs when the corresponding genitive form ends in -nii, pronounced as /ɲji/, not with -ni, pronounced as /ɲi/ (which is a situation typical to the words of Polish origin). For examples, see the table in the next section.

Similar principles apply to the palatalized consonants /kʲ/, /ɡʲ/ and /xʲ/, except that these can only occur before vowels. The spellings are thus k g (c)h before i, and ki gi (c)hi otherwise. For example, the k in kim ("whom", instr.) and the ki in kiedy both represent /kʲ/.

Other issues with i and j

Except in the cases mentioned in the previous paragraph, the letter i if followed by another vowel in the same word usually represents /j/, but it also has the palatalizing effect on the previous consonant. For example, pies ("dog") is pronounced [pʲjɛs]. Some words with n before i plus a vowel also follow this pattern (see below). In fact i is the usual spelling of /j/ between a preceding consonant and a following vowel. The letter j normally appears in this position only after c, s and z if the palatalization effect described above has to be avoided (as in presja "pressure", Azja "Asia", lekcja "lesson", and the common suffixes -cja "-tion", -zja "-sion": stacja "station", wizja "vision"). The letter j after consonants is also used in concatenation of two words if the second word in the pair starts with j, e.g. wjazd "entrance" originates from w + jazd(a). The pronunciation of the sequence wja (in wjazd) is the same as the pronunciation of wia (in wiadro "bucket").

The ending -ii which appears in the inflected forms of some nouns of foreign origin, which have -ia in the nominative case (always after g, k, l, and r; sometimes after m, n, and other consonants), is pronounced as [ji], with the palatalization of the preceding consonant. For example, dalii (genitive of dalia "dalia"), Bułgarii (genitive of Bułgaria "Bulgaria"), chemii (genitive of chemia "chemistry"), religii (genitive of religia "religion"), amfibii (genitive of amfibia "amphibia"). The common pronunciation is [i]. This is why children commonly misspell and write -i in the inflected forms as armii, Danii or hypercorrectly write ziemii instead of ziemi (words of Polish origin do not have the ending -ii but simple -i, e.g. ziemi, genitive of ziemia).

In some rare cases, however, when the consonant in case is preceded by another consonant, -ii may be pronounced as [i], but the preceding consonant is still palatalized, for example, Anglii (genitive etc. of Anglia "England") is pronounced [anɡlʲi]. (The spelling Angli, very frequently met with on the Internet, is simply an error in orthography, caused by this pronunciation.)

A special situation applies to n: it has the full palatalization to [ɲ] before -ii which is pronounced as [ji] - and such a situation occurs only when the corresponding nominative form in -nia is pronounced as [ɲja], not as [ɲa].

For example, (pay attention to the upper- and lower-case letters):

Case Word Pronunciation Meaning Word Pronunciation Meaning
Nominative dania /daɲa/ dishes (plural) Dania /daɲja/ Denmark
Genitive (dań) (/daɲ/) (of dishes) Danii /daɲji/ of Denmark
Nominative Mania /maɲa/ Mary (diminutive of "Maria") mania /maɲja/ mania
Genitive (Mani) (/maɲi/) (of Mary) manii /maɲji/ of mania

The ending -ji, is always pronounced as /ji/. It appears only after c, s and z. Pronunciation of it as a simple /i/ is considered a pronunciation error. For example, presji (genitive etc. of presja "pressure") is /prɛsji/; poezji (genitive etc. of poezja "poetry") is /pɔɛzji/; racji (genitive etc. of racja "reason") is /rat͡sji/.

Nasal vowels

The letters ą and ę, when followed by plosives and affricates, represent an oral vowel followed by a nasal consonant, rather than a nasal vowel. For example, ą in dąb ("oak") is pronounced /ɔm/, and ę in tęcza ("rainbow") is pronounced /ɛn/ (the nasal assimilates with the following consonant). When followed by l or ł (and in the case of ę, often at the end of words) these letters are pronounced as just /ɔ/ or /ɛ/.

Homophonic spellings[4]

Apart from the cases in the sections above, there are three sounds in Polish that can be spelt in two different ways, depending on the word. Those result from historical sound changes. The correct spelling can often be deduced from the spelling of other morphological forms of the word or cognates in Polish or in other Slavic languages.

  • /x/ can be spelt either h or ch.
    • h only occurs in loanwords; however, many of them have been nativized and are not perceived as loanwords. h is used:
      • when cognate words have the letter g, ż or z, e.g.:
        wahadło - waga
        druh - drużyna
        błahy - błazen
      • when the same letter is used in the language from which the word was borrowed, e.g. Greek prefixes hekto-, hetero-, homo-, hipo-, hiper-, hydro-, also honor, historia, herbata, etc.
    • ch is used:
      • in all native words, e.g. chyba, chrust, chrapać, chować, chcieć
      • when the same digraph is used in the language from which the word was borrowed, e.g. chór, echo, charakter, chronologia, etc.
  • /u/ can be spelt u or ó; the spelling ó indicates that the sound developed from the historical long /oː/.
    • u is used:
      • usually at the beginning of a word (except for ósemka, ósmy, ów, ówczesny, ówdzie)
      • always at the end of a word
      • in the endings -uch, -ucha, -uchna, -uchny, -uga, -ula, -ulec, -ulek, -uleńka, -ulka, -ulo, -un, -unek, -uni, -unia, -unio, -ur, -us, -usi, -usieńki, -usia, -uszek, -uszka, -uszko, -uś, -utki
    • ó is used:
      • when cognate words or other morphological forms have the letter o, e or a, e.g.:
        mróz - mrozu
        wiózł - wieźć
        skrócić - skracać
      • in the endings -ów, -ówka, -ówna (except for zasuwka, skuwka, wsuwka)
  • /ʐ/ can be spelt either ż or rz; the spelling rz indicates that the sound developed from /r̝/ (cf. Czech ř).
    • ż is used:
      • when cognate words or other morphological forms have the letter/digraph g, dz, h, z, ź, s, e.g.:
        może - mogę
        mosiężny - mosiądz
        drużyna - druh
        każe - kazać
        wożę - woźnica
        bliżej - blisko
      • in the particle że, e.g. skądże, tenże, także
      • after l, ł, r, e.g.:
        lżej
        łże
        rżysko
      • in loanwords, especially from French, e.g.:
        rewanż
        żakiet
        garaż
      • when cognates in other Slavic languages contain the sound /ʐ/ or /ʒ/, e.g. żuraw - Russian журавль
    • rz is used:
      • when cognate words or other morphological forms have the letter r, e.g. morze - morski, karze - kara
      • usually after p, b, t, d, k, g, ch, j, w, e.g.:
        przygoda
        brzeg
        trzy
        drzewo
        krzywy
        grzywa
        chrzest
        ujrzeć
        wrzeć
      • when cognates in other Slavic languages contain the sound /r/ or /r̝/, e.g. rzeka - Russian река

Other points

The letter u represents /w/ in the digraphs au and eu in loanwords, for example autor, Europa; but not in native words, like nauka, pronounced [naˈu.ka].

There are certain clusters where a written consonant would not normally be pronounced. For example, the ł in the words mógł ("could") and jabłko ("apple") is omitted in ordinary speech.

Capitalization

Names are generally capitalized in Polish as in English. Polish does not capitalize the months and days of the week, nor adjectives and other forms derived from proper nouns (for example, angielski "English").

Titles such as pan ("Mr"), pani ("Mrs/Ms"), lekarz ("doctor"), etc. and their abbreviations are not capitalized, except in written polite address. Second-person pronouns are traditionally capitalized in formal writing (e.g. letters or official emails); so may be other words used to refer to someone directly in a formal setting, like Czytelnik ("reader", in newspapers or books). Third-person pronouns are capitalized to show reverence, most often in a sacred context.

Punctuation

Polish punctuation is similar to that of English. However, there are more rigid rules concerning use of commas—subordinate clauses are almost always marked off with a comma, while it is normally considered incorrect to use a comma before a coordinating conjunction with the meaning "and" (i, a or oraz).

Abbreviations (but not acronyms or initialisms) are followed by a period when they end with a letter other than the one which ends the full word. For example, dr has no period when it stands for doktor, but takes one when it stands for an inflected form such as doktora and prof. has period because it comes from profesor (professor).

Apostrophes are used to mark the elision of the final sound of foreign words not pronounced before Polish inflectional endings, as in Harry'ego ([xaˈrɛɡɔ], genitive of Harry [ˈxarɨ] - the final [ɨ] is elided in the genitive). However, it is often erroneously used to separate a loanword stem from any inflectional ending, for example, *John'a, which should be Johna (genitive of John; no sound is elided).

Quotation marks are used in different ways: either „ordinary Polish quotes” or «French quotes» (without space) for first level, and ‚single Polish quotes’ or «French quotes» for second level, which gives three styles of nested quotes:

  1. „Quote ‚inside’ quote”
  2. „Quote «inside» quote”
  3. «Quote ‚inside’ quote»

Some older prints have used „such Polish quotes“.

History

Poles adopted the Latin alphabet in the 12th century. However, that alphabet was ill-equipped to represent certain Polish sounds, such as the palatal consonants and nasal vowels. Consequently, Polish spelling in the Middle Ages was highly inconsistent, as different writers used different systems to represent these sounds, For example, in early documents the letter c could signify the sounds now written c, cz, k, while the letter z was used for the sounds now written z, ż, ś, ź. Writers soon began to experiment with digraphs (combinations of letters), new letters (φ and ſ, no longer used), and eventually diacritics.

The Polish alphabet was one of two major forms of Latin-based orthography developed for Slavic languages, the other being Czech orthography, characterized by carons (háčeks), as in the letter č. The other major Slavic languages which are now written in Latin-based alphabets (Slovak, Slovene, and Serbo-Croatian) use systems similar to the Czech. However a Polish-based orthography is used for Kashubian and usually for Silesian, while the Sorbian languages use elements of both systems.

Computer encoding

There are several different systems for encoding the Polish alphabet for computers. All letters of the Polish alphabet are included in Unicode, and thus Unicode-based encodings such as UTF-8 and UTF-16 can be used. The Polish alphabet is completely included in the Basic Multilingual Plane of Unicode. ISO 8859-2 (Latin-2), ISO 8859-13 (Latin-7), ISO 8859-16 (Latin-10) and Windows-1250 are popular 8-bit encodings that support Polish alphabet.

The Polish letters which are not present in the English alphabet use the following HTML character entities[5] and Unicode codepoints:[6][7]

Upper case ĄĆĘŁŃÓŚŹŻ
HTML entity Ą
Ą
Ć
Ć
Ę
Ę
Ł
Ł
Ń
Ń
Ó
Ó
Ś
Ś
Ź
Ź
Ż
Ż
Unicode U+0104U+0106U+0118U+0141U+0143U+00D3U+015AU+0179U+017B
Result ĄĆĘŁŃÓŚŹŻ
Lower case ąćęłńóśźż
HTML entity ą
ą
ć
ć
ę
ę
ł
ł
ń
ń
ó
ó
ś
ś
ź
ź
ż
ż
Unicode U+0105U+0107U+0119U+0142U+0144U+00F3U+015BU+017AU+017C
Result ąćęłńóśźż

For other encodings, see the following table. Numbers in the table are hexadecimal.

Other encodings
character
set
ĄĆĘŁŃÓŚŹŻąćęłńóśźż
ISO 8859-2 A1C6CAA3D1D3A6ACAFB1E6EAB3F1F3B6BCBF
Windows-1250 A5C6CAA3D1D38C8FAFB9E6EAB3F1F39C9FBF
IBM 852 A48FA89DE3E0978DBDA586A988E4A298ABBE
Mazovia 8F95909CA5A398A0A1868D9192A4A29EA6A7
Mac 848CA2FCC1EEE58FFB888DABB8C497E690FD
ISO 8859-13 and Windows-1257 C0C3C6D9D1D3DACADDE0E3E6F9F1F3FAEAFD
ISO 8859-16 A1C5DDA3D1D3D7ACAFA2E5FDB3F1F6F7AEBF
IBM 775 B580B7ADE0E3978DA3D087D388E7A298A5A4
CSK 808182838485868887A0A1A2A3A4A5A6A8A7
Cyfromat 808182838485868887909192939495969897
DHN 808182838485868887898A8B8C8D8E8F9190
IINTE-ISIS 808182838485868788909192939495969798
IEA-Swierk 8F80909CA599EB9D92A09B829FA4A287A891
Logic 808182838485868788898A8B8C8D8E8F9091
Microvex 8F80909CA593989D92A09B829FA4A287A891
Ventura 9799A5A6928F8E90809694A4A791A2848287
ELWRO-Junior C1C3C5CCCECFD3DAD9E1E3E5ECEEEFF3FAF9
AmigaPL C2CACBCECFD3D4DADBE2EAEBEEEFF3F4FAFB
TeXPL 8182868A8BD391999BA1A2A6AAABF3B1B9BB
Atari Club (Atari ST) C1C2C3C4C5C6C7C8C9D1D2D3D4D5D6D7D8D9
CorelDraw! C5F2C9A3D1D3FFE1EDE5ECE6C6F1F3A5AABA
ATM C4C7CBD0D1D3D6DADCE4E7EBF0F1F3F6FAFC

A common test sentence containing all the Polish diacritic letters is the nonsensical "Zażółć gęślą jaźń".

gollark: Your proof is thus invalid.
gollark: Sorry, your inductive step was "accidentally" eaten by bees.
gollark: Prove it by mathematical induction.
gollark: Really?
gollark: ↑ certain LyricTech™ entities

See also

Further reading

  • Sadowska, Iwona (2012). Polish: A Comprehensive Grammar. Oxford; New York City: Routledge. ISBN 978-0-415-47541-9.

References

  1. The Polish Language (PDF). Polish Language Council. ISBN 978-83-916268-2-5. Retrieved 5 November 2018.
  2. https://sjp.pwn.pl/poradnia/haslo/Q-V-X;10937.html
  3. "nazwa litery v". Poradnia Językowa PWN. Retrieved 5 September 2018.
  4. Słownik ortograficzny języka polskiego (XVI ed.). Warszawa: Wydawnictwo Naukowe PWN. 1993. pp. 17–21, 27–29.
  5. "HTML 5.1 2nd Edition: 8. The HTML syntax: §8.5: Named character references". www.w3.org. Retrieved 5 November 2018.
  6. "Latin Extended-A: Range: 0100–017F" (PDF). Retrieved 5 November 2018.
  7. "C1 Controls and Latin-1 Supplement: Range: 0080–00FF" (PDF). Retrieved 5 November 2018.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.