Most common words in Spanish

Below are two estimates of the most common words in Modern Spanish. Each estimate comes from an analysis of a different text corpus. A text corpus is a large collection of samples of written and/or spoken language, that has been carefully prepared for linguistic analysis. To determine which words are the most common, researchers create a database of all the words found in the corpus, and categorise them based on the context in which they are used.

The first table lists the 100 most common word forms from the Corpus de Referencia del Español Actual (CREA), a text corpus compiled by the Real Academia Española (RAE). The RAE is Spain's official institution for documenting, planning, and standardising the Spanish language. A word form is any of the grammatical variations of a word.

The second table is a list of 100 most common lemmas found in a text corpus compiled by Mark Davies and other language researchers at Brigham Young University in the United States. A lemma is the primary form of a word—the one that would appear in a dictionary. The Spanish infinitive tener ("to have") is a lemma, while tiene ("has")—which is a conjugation of tener—is a word form.

Real Academia Española

The list below comes from "1000 formas más frecuentes" (transl.1000 most frequent word forms)", a list published by the Real Academia Española (RAE) from analysis of more than 160 million word forms found in the Corpus de Referencia del Español Actual (transl.Reference Corpus of Current Spanish), or CREA. CREA is a computerised corpus of texts written in Spanish, and of transcripts of spoken Spanish. It includes books, magazines, and newspapers with a wide variety of content, as well as transcripts of spoken language from radio and television broadcasts and other sources. All the works in the collection are from 1975 to 2004. CREA includes samples from all Spanish-speaking countries.[1]

The list of "1000 most frequent word forms" comes from an analysis of CREA version 3.2.[2] Plurals, verb conjugations, and other inflections are ranked separately. Homonyms, however, are not distinguished from one another. CREA 3.2 was published in June 2008.[1]

Most frequent word forms out of ~160 million words
(RAE 2008)
RankWord formOccurrencesPart of speechTranslation
1de9,999,518prepositionof; from
2la6,277,560article, pronounthe; third person feminine singular pronoun
3que4,681,839conjunctionthat, which
4el4,569,652articlethe
5en4,234,281prepositionin, on
6y4,180,279conjunctionand
7a3,260,939prepositionto, at
8los2,618,657article, pronounthe; third person masculine direct object
9se2,022,514pronoun-self, oneself (reflexive)
10del1,857,225prepositionfrom the
11las1,686,741article, pronounthe; third person feminine direct object
12un1,659,827articlea, an
13por1,561,904prepositionby, for, through
14con1,481,607prepositionwith
15no1,465,503adverbno; not
16una1,347,603articlea, an, one
17su1,103,617possessivehis/her/its/your
18para1,062,152prepositionfor, to, in order to
19es1,019,669verbis
20al951,054prepositionto the
21lo866,955article, pronounthe; third person masculine direct object
22como773,465conjunctionlike, as
23más661,696adjectivemore
24o542,284conjunctionor
25pero450,512conjunctionbut
26sus449,870possessiveyour
27le413,241pronounthird person indirect object
28ha380,339verbhe/she/it has [done something]; you (formal) have [done something]
29me374,368pronounme
30si327,480conjunctionif, whether
31sin298,383prepositionwithout
32sobre289,704prepositionon top of, over, about
33este285,461adjectivethis
34ya274,177adverbalready; still
35entre267,493prepositionbetween
36cuando257,272conjunctionwhen
37todo247,340adjectiveall, every
38esta238,841adjectivethis
39ser232,924verbto be
40son232,415verbthey are, you (pl.) are
41dos228,439numbertwo
42también227,411adverbtoo, also, as well
43fue223,791verbwas
44había223,430verbI/he/she/it/there was (or used to be)
45era219,933verbwas
46muy208,540adverbvery
47años203,027noun
(masculine)
years
48hasta202,935prepositionuntil
49desde198,647prepositionfrom; since
50está194,168verbis
51mi186,360possessivemy
52porque185,700conjunctionbecause
53qué184,956pronounwhat?; which?; how adjective
54sólo170,552adverbonly, solely
55han169,718verbthey/you (pl.) have [done something]
56yo167,684pronounI
57hay164,940verbthere is/are
58vez163,538noun
(feminine)
time, instance
59puede161,219verbcan
60todos158,168adjectiveall; every
61así155,645adverblike that
62nos154,412pronounus
63ni153,451conjunction, adverbneither; nor; no even
64parte148,750noun
(masculine / feminine)
part; message
65tiene147,274verbhas
66él139,080pronoun
(masculine)
he, it
67uno136,020numberone
68donde132,077prepositionwhere
69bien130,957adjectivefine, well
70tiempo130,896noun
(masculine)
time; weather
71mismo130,746adjectivesame
72ese127,976pronounthat
73ahora125,661adverbnow
74cada124,558determinereach; every
75e123,729conjunctionand
76vida123,491noun
(feminine)
life
77otro121,983adjectiveother, another
78después121,746prepositionafter
79te120,052pronounto you, for you; yourself
80otros119,500pronounothers
81aunque115,556conjunctionthough, although, even though
82esa115,377adjectivethat
83eso114,523pronounthat
84hace114,507verbhe/she/it does/makes
85otra113,982adjective, pronounother; another
86gobierno113,011noun
(masculine)
government
87tan112,471adverbso
88durante112,020prepositionduring
89siempre111,557adverbalways
90día110,921noun
(masculine)
day
91tanto110,679adjective, adverbso much
92ella110,620pronounshe, her; it
93tres109,542numberthree
94108,631noun, pronounyes; reflexive pronoun
95dijo108,471verbsaid; told
96sido107,352past participlebeen
97gran106,991adjectivelarge, great, big
98país104,568noun
(masculine)
country
99según104,204prepositionas; according to
100menos103,498adjectiveless; fewer

Mark Davies

In 2006, Mark Davies, an associate professor of linguistics at Brigham Young University, published his estimate of the 5000 most common words in Modern Spanish. To make this list, he compiled samples only from 20th-century sources—especially from the years 1970 to 2000. Most of the sources are from the 1990s. Of the 20 million words in the corpus, about one-third (~6,750,000 words) come from transcripts of spoken Spanish: conversations, interviews, lectures, sermons, press conferences, sports broadcasts, and so on. Among the written sources are novels, plays, short stories, letters, essays, newspapers, and the encyclopedia Encarta. The samples, written and spoken, come from Spain and at least 10 Latin American countries. Most of the samples were previously compiled for the Corpus del Español (2001), a 100 million-word corpus that includes works from the 13th century through the 20th.[3][4]

The 5000 words in Davies' list are lemmas.[5] A lemma is the form of the word as it would appear in a dictionary.[6] Singular nouns and plurals, for example, are treated as the same word, as are infinitives and verb conjugations. The table below includes the top 100 words from Davies' list of 5000.[7][8] This list distinguishes between the definite articles lo and la and the pronouns lo and la; all are ranked individually. The adjectives ese and esa are ranked together (as are este and esta) ), but the pronoun eso is separate. All conjugations of a verb are ranked together.

A highlighted row indicates that the word was found to occur especially frequently in samples of spoken Spanish.[9]

Most frequent lemmas out of ~20 million words
(Davies 2006)
RankLemmaOccurrencesPart of speechTranslation
1el / la2,037,803articlethe
2de1,319,834prepositionof, from
3que662,653conjunctionthat, which
4y562,162conjunctionand
5a529,899prepositionto, at
6en507,233prepositionin, on
7un434,022articlea, an
8ser374,194verbto be
9se329,012pronoun-self, oneself (reflexive)
10no257,365adverbno
11haber196,962verbto have
12por190,975prepositionby, for, through
13con184,597prepositionwith
14su187,810adjectivehis, her, their, your
15para126,061prepositionfor, to, in order to
16como106,840conjunctionlike, as
17estar106,429verbto be
18tener106,642verbto have
19le98,211pronounthird person indirect object
20lo91,035articlethe
21lo92,519pronounthird person masculine direct object
22todo88,057adjectiveall, every
23pero82,435conjunctionbut, yet, except
24más92,352adjectivemore
25hacer81,619verbto do; to make
26o82,444conjunctionor
27poder76,738verbto be able to, can
28decir79,343verbto tell, say
29este / esta80,544adjectivethis
30ir70,352verbto go
31otro61,726adjectiveother, another
32ese / esa60,989adjectivethat
33la55,523pronounthird person feminine direct object
34si53,608conjunctionif, whether
35me95,577pronounme
36ya46,778adverbalready, still
37ver45,854verbto see
38porque44,500conjunctionbecause
39dar40,233verbto give
40cuando39,726conjunctionwhen
41él38,597pronounhe
42muy39,558adverbvery, really
43sin40,432prepositionwithout
44vez35,286noun
(feminine)
time, occurrence
45mucho36,391adjectivemuch, many, a lot
46saber37,092verbto know
47qué42,000pronounwhat?; which?; how adjective
48sobre35,038prepositionon top of, over, about
49mi45,636adjectivemy
50alguno30,485adjective / pronounsome; someone
51mismo29,569adjectivesame
52yo54,635pronounI
53también33,348adverbalso
54hasta29,506preposition / adverbuntil, up to; even
55año33,053noun
(masculine)
year
56dos27,733numbertwo
57querer28,696verbto want, love
58entre30,756prepositionbetween
59así24,832adverblike that
60primero26,553adjectivefirst
61desde25,288prepositionfrom, since
62grande25,963adjectivelarge, great, big
63eso31,636pronoun
(neuter gender)
that
64ni24,261conjunctionnot even, neither, nor
65nos26,349pronounus
66llegar22,878verbto arrive
67pasar22,466verbto pass; to happen; to spend time
68tiempo22,432noun
(masculine)
time, weather
69ella(s)24,770pronounshe; (plural) them
7033,828adverbyes
71día24,715noun
(masculine)
day
72uno21,407numberone
73bien21,589adverbwell
74poco20,986adjective / adverblittle, few; a little bit
75deber22,232verbshould, ought to; to owe
76entonces23,548adverbso, then
77poner20,330verbto put (on); to get [adjective]
78cosa23,943noun
(feminine)
thing
79tanto20,531adjectivemuch
80hombre20,292noun
(masculine)
man, mankind, husband
81parecer19,964verbto seem, to look like
82nuestro20,666adjectiveour
83tan19,002adverbsuch, a, too, so
84donde18,852conjunctionwhere
85ahora21,030adverbnow
86parte20,319noun
(feminine)
part, portion
87después20,229adverbafter
88vida18,045noun
(feminine)
life
89quedar18,152verbto remain, to stay
90siempre17,689adverbalways
91creer21,257verbto believe
92hablar19,006verbto speak, to talk
93llevar17,062verbto take, to carry
94dejar18,185verbto let, to leave
95nada19,365pronounnothing
96cada17,155adjectiveeach, every
97seguir16,104verbto follow
98menos15,527adjectiveless, fewer
99nuevo17,381adjectivenew
100encontrar15,556verbto find
gollark: 2G prizekin.
gollark: Or coppers; you can never, ever have enough (CB) (green) coppers.
gollark: I'm pondering whether to accept it. I mostly just want xenowyrms or reds for incubate.
gollark: I think the pinkness of the omens just looks weird.
gollark: Ah, *those* things.

See also

Notes

  1. "CREA". RAE.es (in Spanish). Real Academia Española. Retrieved 2017-07-13.
  2. "Corpus de Referencia del Español Actual (CREA) — Listado de frecuencias". RAE.es (in Spanish). Real Academia Española. Retrieved 2017-07-13.
  3. Davies (2006), p. 2–3
  4. "El Corpus del Español". corpusdelespanol.org. Retrieved 2017-07-13.
  5. Davies (2006), pp. 4–6
  6. Davies (2006), p. 4
  7. Davies (2006), pp. 12–14
  8. "Top Spanish Vocabulary". Vistawide World Languages & Cultures. Retrieved 2017-07-13.
  9. Davies (2006), p. 9

References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.