Content analysis

Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner.[1] One of the key advantages of using content analysis to analyse social phenomena is its non-invasive nature, in contrast to simulating social experiences or collecting survey answers.

Practices and philosophies of content analysis vary between academic disciplines. They all involve systematic reading or observation of texts or artifacts which are assigned labels (sometimes called codes) to indicate the presence of interesting, meaningful pieces of content.[2][3] By systematically labeling the content of a set of texts, researchers can analyse patterns of content quantitatively using statistical methods, or use qualitative methods to analyse meanings of content within texts.

Computers are increasingly used in content analysis to automate the labeling (or coding) of documents. Simple computational techniques can provide descriptive data such as word frequencies and document lengths. Machine learning classifiers can greatly increase the number of texts that can be labeled, but the scientific utility of doing so is a matter of debate. Further, numerous computer-aided text analysis (CATA) computer programs are available that analyze text for pre-determined linguistic, semantic, and psychological characteristics.[4]

Goals

Content analysis is best understood as a broad family of techniques. Effective researchers choose techniques that best help them answer their substantive questions. That said, according to Klaus Krippendorff, six questions must be addressed in every content analysis:[5]

Which data are analyzed?
How are the data defined?
From what population are data drawn?
What is the relevant context?
What are the boundaries of the analysis?
What is to be measured?

The simplest and most objective form of content analysis considers unambiguous characteristics of the text such as word frequencies, the page area taken by a newspaper column, or the duration of a radio or television program. Analysis of simple word frequencies is limited because the meaning of a word depends on surrounding text. Key Word In Context (KWIC) routines address this by placing words in their textual context. This helps resolve ambiguities such as those introduced by synonyms and homonyms.

A further step in analysis is the distinction between dictionary-based (quantitative) approaches and qualitative approaches. Dictionary-based approaches set up a list of categories derived from the frequency list of words and control the distribution of words and their respective categories over the texts. While methods in quantitative content analysis in this way transform observations of found categories into quantitative statistical data, the qualitative content analysis focuses more on the intentionality and its implications. There are strong parallels between qualitative content analysis and thematic analysis.[6]

Qualitative and Quantitative Content Analysis

Quantitative content analysis highlights frequency counts and objective analysis of these coded frequencies.[7] Additionally, quantitative content analysis begins with a framed hypothesis with coding decided on before the analysis begins. These coding categories are strictly relevant to the researcher's hypothesis. Quantitative analysis also takes a deductive approach.[8]

Siegfried Kracauer provides a critique of quantitative analysis, asserting that it oversimplifies complex communications in order to be more reliable. On the other hand, qualitative analysis deals with the intricacies of latent interpretations, whereas quantitative has a focus on manifest meanings. He also acknowledges an "overlap" of qualitative and quantitative content analysis.[7] Patterns are looked at more closely in qualitative analysis, and based on the latent meanings that the researcher may find, the course of the research could be changed. It is inductive and begins with open research questions, as opposed to a hypothesis.[8]

Computational tools

More generally, content analysis is research using the categorization and classification of speech, written text, interviews, images, or other forms of communication. In its beginnings, using the first newspapers at the end of the 19th century, analysis was done manually by measuring the number of columns given a subject. The approach can also be traced back to a university student studying patterns in Shakespeare's literature in 1893.[9] With the rise of common computing facilities like PCs, computer-based methods of analysis are growing in popularity.[10][11][12] Answers to open ended questions, newspaper articles, political party manifestos, medical records or systematic observations in experiments can all be subject to systematic analysis of textual data.

By having contents of communication available in form of machine readable texts, the input is analyzed for frequencies and coded into categories for building up inferences.

Computer-assisted analysis can help with large, electronic data sets by cutting out time and eliminating the need for multiple human coders to establish inter-coder reliability. However, human coders can still be employed for content analysis, as they are often more able to pick out nuanced and latent meanings in text. A study found that human coders were able to evaluate a broader range and make inferences based on latent meanings.[13]

Reliability

Robert Weber notes: "To make valid inferences from the text, it is important that the classification procedure be reliable in the sense of being consistent: Different people should code the same text in the same way".[14] The validity, inter-coder reliability and intra-coder reliability are subject to intense methodological research efforts over long years.[5] Neuendorf suggests that when human coders are used in content analysis at least two independent coders should be used. Reliability of human coding is often measured using a statistical measure of inter-coder reliability or "the amount of agreement or correspondence among two or more coders".[4] Lacy and Riffe identify the measurement of inter-coder reliability as a strength of quantitative content analysis, arguing that, if content analysts do not measure inter-coder reliability, their data are no more reliable than the subjective impressions of a single reader.[15]

Kinds of text

There are five types of texts in content analysis:

written text, such as books and papers
oral text, such as speech and theatrical performance
iconic text, such as drawings, paintings, and icons
audio-visual text, such as TV programs, movies, and videos
hypertexts, which are texts found on the Internet

History

Over the years, content analysis has been applied to a variety of scopes. Hermeneutics and philology have long used content analysis to interpret sacred and profane texts and, in many cases, to attribute texts' authorship and authenticity.[3][5]

In recent times, particularly with the advent of mass communication, content analysis has known an increasing use to deeply analyze and understand media content and media logic. The political scientist Harold Lasswell formulated the core questions of content analysis in its early-mid 20th-century mainstream version: "Who says what, to whom, why, to what extent and with what effect?".[16] The strong emphasis for a quantitative approach started up by Lasswell was finally carried out by another "father" of content analysis, Bernard Berelson, who proposed a definition of content analysis which, from this point of view, is emblematic: "a research technique for the objective, systematic and quantitative description of the manifest content of communication".[17]

Quantitative content analysis has enjoyed a renewed popularity in recent years thanks to technological advances and fruitful application in of mass communication and personal communication research. Content analysis of textual big data produced by new media, particularly social media and mobile devices has become popular. These approaches take a simplified view of language that ignores the complexity of semiosis, the process by which meaning is formed out of language. Quantitative content analysts have been criticized for limiting the scope of content analysis to simple counting, and for applying the measurement methodologies of the natural sciences without reflecting critically on their appropriateness to social science.[18] Conversely, qualitative content analysts have been criticized for being insufficiently systematic and too impressionistic.[18] Krippendorff argues that quantitative and qualitative approaches to content analysis tend to overlap, and that there can be no generalisable conclusion as to which approach is superior.[18]

Content analysis can also be described as studying traces, which are documents from past times, and artifacts, which are non-linguistic documents. Texts are understood to be produced by communication processes in a broad sense of that phrase—often gaining mean through abduction.[3][19]

Latent and Manifest Content

Manifest content is readily understandable at its face value. Its meaning is direct. Latent content is not as overt, and requires interpretation to uncover the meaning or implication.[20]

Uses

Holsti groups fifteen uses of content analysis into three basic categories:[21]

make inferences about the antecedents of a communication
describe and make inferences about characteristics of a communication
make inferences about the effects of a communication.

He also places these uses into the context of the basic communication paradigm.

The following table shows fifteen uses of content analysis in terms of their general purpose, element of the communication paradigm to which they apply, and the general question they are intended to answer.

Uses of Content Analysis by Purpose, Communication Element, and Question
Purpose	Element	Question	Use
Make inferences about the antecedents of communications	Source	Who?	Answer questions of disputed authorship (authorship analysis)
Make inferences about the antecedents of communications	Encoding process	Why?	Secure political & military intelligence Analyse traits of individuals Infer cultural aspects & change Provide legal & evaluative evidence
Describe & make inferences about the characteristics of communications	Channel	How?	Analyse techniques of persuasion Analyse style
	Message	What?	Describe trends in communication content Relate known characteristics of sources to messages they produce Compare communication content to standards
	Recipient	To whom?	Relate known characteristics of audiences to messages produced for them Describe patterns of communication
Make inferences about the consequences of communications	Decoding process	With what effect?	Measure readability Analyse the flow of information Assess responses to communications
Note. Purpose, communication element, & question from Holsti.[21] Uses primarily from Berelson[22] as adapted by Holsti.[21]

The development of the initial coding scheme

The process of the initial coding scheme or approach to coding is contingent on the particular content analysis approach selected. Through a directed content analysis, the scholars draft a preliminary coding scheme from pre-existing theory or assumptions. While with the conventional content analysis approach, the initial coding scheme developed from the data.

The conventional process of coding

With either approach above, immersing oneself into the data to obtain an overall picture is recommendable for researchers to conduct. Furthermore, identifying a consistent and clear unit of coding is vital, and researchers' choices range from a single word to several paragraphs, from texts to iconic symbols. Last, constructing the relationships between codes by sorting out them within specific categories or themes.[23]

gollark: It's got no IO, no type system, makes basically all programs convoluted and complex, and is hard to write.

gollark: Untyped LC is just bad anyway. People don't care about speed much in their decision to ignore it.

gollark: Just fixed a WHYJIT compiler bug!

gollark: No, it's not good for projects because it's just not suited to any actual applications.

gollark: ```id (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void (Void ^C(Void (Void (Void (Void (VoiInterrupted.```

References

Alan., Bryman (2011). Business research methods. Bell, Emma, 1968- (3rd ed.). Cambridge: Oxford University Press. ISBN 9780199583409. OCLC 746155102.
Hodder, I. (1994). The interpretation of documents and material culture. Thousand Oaks etc.: Sage. p. 155. ISBN 978-0761926870.
Tipaldo, G. (2014). L'analisi del contenuto e i mass media. Bologna, IT: Il Mulino. p. 42. ISBN 978-88-15-24832-9.
Kimberly A. Neuendorf (30 May 2016). The Content Analysis Guidebook. SAGE. ISBN 978-1-4129-7947-4.
Krippendorff, Klaus (2004). Content Analysis: An Introduction to Its Methodology (2nd ed.). Thousand Oaks, CA: Sage. p. 413. ISBN 9780761915454.
Vaismoradi, Mojtaba; Turunen, Hannele; Bondas, Terese (2013-09-01). "Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study". Nursing & Health Sciences. 15 (3): 398–405. doi:10.1111/nhs.12048. ISSN 1442-2018. PMID 23480423.
Kracauer, Siegfried (1952). "The Challenge of Qualitative Content Analysis". Public Opinion Quarterly. 16 (4, Special Issue on International Communications Research): 631. doi:10.1086/266427. ISSN 0033-362X.
White, Marilyn Domas; Marsh, Emily E. (2006). "Content Analysis: A Flexible Methodology". Library Trends. 55 (1): 22–45. doi:10.1353/lib.2006.0053. hdl:2142/3670. ISSN 1559-0682.
Sumpter, Randall S. (July 2001). "News about News". Journalism History. 27 (2): 64–72. doi:10.1080/00947679.2001.12062572. ISSN 0094-7679.
Pfeiffer, Silvia, Stefan Fischer, and Wolfgang Effelsberg. "Automatic audio content analysis." Technical Reports 96 (1996).
Grimmer, Justin, and Brandon M. Stewart. "Text as data: The promise and pitfalls of automatic content analysis methods for political texts." Political analysis 21.3 (2013): 267-297.
Nasukawa, Tetsuya, and Jeonghee Yi. "Sentiment analysis: Capturing favorability using natural language processing." Proceedings of the 2nd international conference on Knowledge capture. ACM, 2003.
Conway, Mike (March 2006). "The Subjective Precision of Computers: A Methodological Comparison with Human Coding in Content Analysis". Journalism & Mass Communication Quarterly. 83 (1): 186–200. doi:10.1177/107769900608300112. ISSN 1077-6990.
Weber, Robert Philip (1990). Basic Content Analysis (2nd ed.). Newbury Park, CA: Sage. p. 12. ISBN 9780803938632.
Lacy, Stephen R; Riffe, Daniel (1993). "Sins of Omission and Commission in Mass Communication Quantitative Research". Journalism & Mass Communication Quarterly. 70 (1): 126–132. doi:10.1177/107769909307000114.
Lasswell, Harold Dwight (1948). Power and Personality. New York, NY.
Berelson, B. (1952). Content Analysis in Communication Research. Glencoe: Free Press. p. 18.
Krippendorff, Klaus (2004). Content Analysis: An Introduction to Its Methodology. California: Sage. pp. 87–89. ISBN 978-0-7619-1544-7.
Timmermans, Stefan; Tavory, Iddo (2012). "Theory Construction in Qualitative Research" (PDF). Sociological Theory. 30 (3): 167–186. doi:10.1177/0735275112457914.
Jang-Hwan Lee; Young-Gul Kim; Sung-Ho Yu. "Stage model for knowledge management". Proceedings of the 34th Annual Hawaii International Conference on System Sciences. IEEE Comput. Soc. doi:10.1109/hicss.2001.927103. ISBN 0-7695-0981-9.
Holsti, Ole R. (1969). Content Analysis for the Social Sciences and Humanities. Reading, MA: Addison-Wesley.
Berelson, Bernard (1952). Content Analysis in Communication Research. Glencoe, Ill: Free Press.
"Content Analysis". Sage. Retrieved December 16, 2019.

Psychology
History Philosophy Portal Psychologist
Basic psychology	Abnormal Affective science Affective neuroscience Behavioral genetics Behavioral neuroscience Behaviorism Cognitive/Cognitivism Cognitive neuroscience Social Comparative Cross-cultural Cultural Developmental Differential Ecological Evolutionary Experimental Gestalt Intelligence Mathematical Moral Neuropsychology Perception Personality Positive Psycholinguistics Psychophysiology Quantitative Social Theoretical
Applied psychology	Anomalistic Applied behavior analysis Assessment Clinical Coaching Community Consumer Counseling Critical Educational Ergonomics Feminist Forensic Health Industrial and organizational Legal Media Medical Military Music Occupational health Pastoral Political Psychometrics Psychotherapy Religion School Sport and exercise Suicidology Systems Traffic
Methodologies	Animal testing Archival research Behavior epigenetics Case study Content analysis Experiments Human subject research Interviews Neuroimaging Observation Psychophysics Qualitative research Quantitative research Self-report inventory Statistical surveys
Psychologists	Wilhelm Wundt (1832–1920) William James (1842–1910) Ivan Pavlov (1849–1936) Sigmund Freud (1856–1939) Edward Thorndike (1874–1949) Carl Jung (1875–1961) John B. Watson (1878–1958) Clark L. Hull (1884–1952) Kurt Lewin (1890–1947) Jean Piaget (1896–1980) Gordon Allport (1897–1967) J. P. Guilford (1897–1987) Carl Rogers (1902–1987) Erik Erikson (1902–1994) B. F. Skinner (1904–1990) Donald O. Hebb (1904–1985) Ernest Hilgard (1904–2001) Harry Harlow (1905–1981) Raymond Cattell (1905–1998) Abraham Maslow (1908–1970) Neal E. Miller (1909–2002) Jerome Bruner (1915–2016) Donald T. Campbell (1916–1996) Hans Eysenck (1916–1997) Herbert A. Simon (1916–2001) David McClelland (1917–1998) Leon Festinger (1919–1989) George A. Miller (1920–2012) Richard Lazarus (1922–2002) Stanley Schachter (1922–1997) Robert Zajonc (1923–2008) Albert Bandura (b. 1925) Roger Brown (1925–1997) Endel Tulving (b. 1927) Lawrence Kohlberg (1927–1987) Noam Chomsky (b. 1928) Ulric Neisser (1928–2012) Jerome Kagan (b. 1929) Walter Mischel (1930–2018) Elliot Aronson (b. 1932) Daniel Kahneman (b. 1934) Paul Ekman (b. 1934) Michael Posner (b. 1936) Amos Tversky (1937–1996) Bruce McEwen (b. 1938) Larry Squire (b. 1941) Richard E. Nisbett (b. 1941) Martin Seligman (b. 1942) Ed Diener (b. 1946) Shelley E. Taylor (b. 1946) John Anderson (b. 1947) Ronald C. Kessler (b. 1947) Joseph E. LeDoux (b. 1949) Richard Davidson (b. 1951) Susan Fiske (b. 1952) Roy Baumeister (b. 1953)
Lists	Counseling topics Disciplines Important publications Organizations Outline Psychologists Psychotherapies Research methods Schools of thought Timeline Topics
Wiktionary definition Wiktionary category Wikisource Wikimedia Commons Wikiquote Wikinews Wikibooks