Index term

An index term, subject term, subject heading, or descriptor, in information retrieval, is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic records. They are an integral part of bibliographic control, which is the function by which libraries collect, organize and disseminate documents. They are used as keywords to retrieve documents in an information system, for instance, a catalog or a search engine. A popular form of keywords on the web are tags which are directly visible and can be assigned by non-experts. Index terms can consist of a word, phrase, or alphanumerical term. They are created by analyzing the document either manually with subject indexing or automatically with automatic indexing or more sophisticated methods of keyword extraction. Index terms can either come from a controlled vocabulary or be freely assigned.

Keywords are stored in a search index. Common words like articles (a, an, the) and conjunctions (and, or, but) are not treated as keywords because it's inefficient. Almost every English-language site on the Internet has the article "the", and so it makes no sense to search for it. The most popular search engine, Google removed stop words such as "the" and "a" from its indexes for several years, but then re-introduced them, making certain types of precise search possible again.

The term "descriptor" was coined by Calvin Mooers in 1948. It is in particular used about a preferred term from a thesaurus.

The Simple Knowledge Organization System language (SKOS) provides a way to express index terms with Resource Description Framework for use in the context of Semantic Web.[1]

In web search engines

Most web search engines are designed to search for words anywhere in a document—the title, the body, and so on. This being the case, a keyword can be any term that exists within the document. However, priority is given to words that occur in the title, words that recur numerous times, and words that are explicitly assigned as keywords within the coding.[2] Index terms can be further refined using Boolean operators such as "AND, OR, NOT." "AND" is normally unnecessary as most search engines infer it. "OR" will search for results with one search term or another, or both. "NOT" eliminates a word or phrase from the search, getting rid of any results that include it. Multiple words can also be enclosed in quotation marks to turn the individual index terms into a specific index phrase. These modifiers and methods all help to refine search terms, to better maximize the accuracy of search results.[3]

Author keywords

Author keywords are an integral part of literature.[1] Many journals and databases provide access to index terms made by authors of the respective articles. How qualified the provider is decides the quality of both indexer-provided index terms and author-provided index terms. The quality of these two types of index terms is of research interest, particularly in relation to information retrieval. In general, an author will have difficulty providing indexing terms that characterizes his or her document relative to other documents in the database.

Examples

gollark: As if that's possible.
gollark: Fearsome.
gollark: I might have to release apioforms from the beecloud.
gollark: It must comfort you to think so.
gollark: > There is burgeoning interest in designing AI-basedsystems to assist humans in designing computing systems,including tools that automatically generate computer code.The most notable of these comes in the form of the first self-described ‘AI pair programmer’, GitHub Copilot, a languagemodel trained over open-source GitHub code. However, codeoften contains bugs—and so, given the vast quantity of unvettedcode that Copilot has processed, it is certain that the languagemodel will have learned from exploitable, buggy code. Thisraises concerns on the security of Copilot’s code contributions.In this work, we systematically investigate the prevalence andconditions that can cause GitHub Copilot to recommend insecurecode. To perform this analysis we prompt Copilot to generatecode in scenarios relevant to high-risk CWEs (e.g. those fromMITRE’s “Top 25” list). We explore Copilot’s performance onthree distinct code generation axes—examining how it performsgiven diversity of weaknesses, diversity of prompts, and diversityof domains. In total, we produce 89 different scenarios forCopilot to complete, producing 1,692 programs. Of these, wefound approximately 40 % to be vulnerable.Index Terms—Cybersecurity, AI, code generation, CWE

See also

References

  1. Svenonius, Elaine (2009). The intellectual foundation of information organization (1st MIT Press pbk. ed.). Cambridge, Massachusetts: MIT Press. ISBN 9780262512619.
  2. Cutts, Matt. (2010, March 4). How search works. Retrieved from https://www.youtube.com/watch?v=BNHR6IQJGZs
  3. CLIO. Keyword search. Columbia University Libraries. Retrieved from http://www.columbia.edu/cu/lweb/help/clio/keyword.html

Further reading

  • Ferris, Anna M. (2018). "Birth of a Subject Heading". Library Resources & Technical Services. 62 (1): 16–27.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.