Reserved_word Knowpia

In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use". This is a syntactic definition, and a reserved word may have no user-defined meaning.

A closely related and often conflated notion is a keyword, which is a word with special meaning in a particular context. This is a semantic definition. By contrast, names in a standard library but not built into a language are not considered reserved words or keywords. The terms "reserved word" and "keyword" are often used interchangeably – one may say that a reserved word is "reserved for use as a keyword" – and formal use varies from language to language. For this article, we distinguish as above.

In general reserved words and keywords need not coincide, but in most modern languages keywords are a subset of reserved words, as this makes parsing easier, since keywords cannot be confused with identifiers. In some languages, like C or Python, reserved words and keywords coincide, while in other languages, like Java, all keywords are reserved words, but some reserved words are not keywords, being reserved for future use. In yet other languages, such as the older languages ALGOL, FORTRAN, and PL/I, there are keywords but no reserved words, with keywords being distinguished from identifiers by other means.

Distinction edit

The sets of reserved words and keywords in a language often coincide or are almost equal, and the distinction is subtle, so the terms are often used interchangeably. However, in careful use they are distinguished.

Making keywords be reserved words makes lexing easier, as a string of characters will unambiguously be either a keyword or an identifier, without depending on context; thus keywords are usually a subset of reserved words. However, reserved words need not be keywords. For example, in Java, goto is a reserved word, but has no meaning and does not appear in any production rules in the grammar. This is usually done for forward compatibility, so a reserved word may become a keyword in a future version without breaking existing programs.

Conversely, keywords need not be reserved words, with their role understood from context, or they may be distinguished in another manner, such as by stropping. For example, the phrase if = 1 is unambiguous in most grammars, since a control statement of an if clause cannot start with an =, and thus is allowed in some languages, such as FORTRAN. Alternatively, in ALGOL 68, keywords must be stropped – marked in some way to distinguished – in the strict language by listing in bold, and thus are not reserved words. Thus in the strict language the following expression is legal, as the bold keyword if does not conflict with the ordinary identifier if:

if if eq 0 then 1 fi

However, in ALGOL 68 there is also a stropping regime in which keywords are reserved words, an example of how these distinct concepts often coincide; this is followed in many modern languages.

Syntax edit

A reserved word is one that "looks like" a normal word, but is not allowed to be used as a normal word. Formally this means that it satisfies the usual lexical syntax (syntax of words) of identifiers – for example, being a sequence of letters – but cannot be used where identifiers are used. For example, the word if is commonly a reserved word, while x generally is not, so x = 1 is a valid assignment, but if = 1 is not.

Keywords have varied uses, but mainly fall into a few classes: part of the phrase grammar (specifically a production rule with nonterminal symbols), with various meanings, often being used for control flow, such as the word if in most procedural languages, which indicates a conditional and takes clauses (the nonterminal symbols); names of primitive types in a language that support a type system, such as int; primitive literal values such as true for Boolean true; or sometimes special commands like exit. Other uses of keywords in phrases are for input/output, such as print.

The distinct definitions are clear when a language is analyzed by a combination of a lexer and a parser, and the syntax of the language is generated by a lexical grammar for the words, and a context-free grammar of production rules for the phrases. This is common in analyzing modern languages, and in this case keywords are a subset of reserved words, as they must be distinguished from identifiers at the word level (hence reserved words) to be syntactically analyzed differently at the phrase level (as keywords).

In this case reserved words are defined as part of the lexical grammar, and are each tokenized as a separate type, distinct from identifiers. In conventional notation, the reserved words if and then for example are tokenized as types IF and THEN, respectively, while x and y are both tokenized as type Identifier.

Keywords, by contrast, syntactically appear in the phrase grammar, as terminal symbols. For example, the production rule for a conditional expression may be IF Expression THEN Expression. In this case IF and THEN are terminal symbols, meaning "a token of type IF or THEN, respectively" – and due to the lexical grammar, this means the string if or then in the original source. As an example of a primitive constant value, true may be a keyword representing the boolean value "true", in which case it should appear in the grammar as a possible expansion of the production BinaryExpression, for instance.

Reserved ranges edit

Beyond reserving specific lists of words, some languages reserve entire ranges of words, for use as private spaces for future language version, different dialects, compiler vendor-specific extensions, or for internal use by a compiler, notably in name mangling.

This is most often done by using a prefix, often one or more underscores. C and C++ are notable in this respect: C99 reserves identifiers that start with two underscores or an underscore followed by an uppercase letter, and further reserves identifiers that start with a single underscore (in the ordinary and tag spaces) for use in file scope;^[1] with C++03 further reserves identifiers that contain a double underscore anywhere^[2] – this allows the use of a double underscore as a separator (to connect user identifiers), for instance.

The frequent use of a double underscores in internal identifiers in Python gave rise to the abbreviation dunder; this was coined by Mark Jackson^[3] and independently by Tim Hochberg,^[4] within minutes of each other, both in reply to the same question in 2002.^[5]^[6]

Specification edit

The list of reserved words and keywords in a language are defined when a language is developed, and both form part of a language's formal specification. Generally one wishes to minimize the number of reserved words, to avoid restricting valid identifier names. Further, introducing new reserved words breaks existing programs that use that word (it is not backwards compatible), so this is avoided. To prevent this and provide forward compatibility, sometimes words are reserved without having a current use (a reserved word that is not a keyword), as this allows the word to be used in future without breaking existing programs. Alternatively, new language features can be implemented as predefineds, which can be overridden, thus not breaking existing programs.

Reasons for flexibility include allowing compiler vendors to extend the specification by including non-standard features, different standard dialects of language to extend it, or future versions of the language to include additional features. For example, a procedural language may anticipate adding object-oriented capabilities in a future version or some dialect, at which point one might add keywords like class or object. To accommodate this possibility, the current specification may make these reserved words, even if they are not currently used.

A notable example is in Java, where const and goto are reserved words — they have no meaning in Java but they also cannot be used as identifiers. By reserving the terms, they can be implemented in future versions of Java, if desired, without breaking older Java source code. For example, there was a proposal in 1999 to add C++-like const to the language, which was possible using the const word, since it was reserved but currently unused; however, this proposal was rejected – notably because even though adding the feature would not break any existing programs, using it in the standard library (notably in collections) would break compatibility.^[7] JavaScript also contains a number of reserved words without special functionality; the exact list varies by version and mode.^[8]

Languages differ significantly in how frequently they introduce new reserved words or keywords and how they name them, with some languages being very conservative and introducing new keywords rarely or never, to avoid breaking existing programs, while other languages introduce new keywords more freely, requiring existing programs to change existing identifiers that conflict. A case study is given by new keywords in C11 compared with C++11, both from 2011 – recall that in C and C++, identifiers that begin with an underscore followed by an uppercase letter are reserved:^[9]

The C committee prefers not to create new keywords in the user name space, as it is generally expected that each revision of C will avoid breaking older C programs. By comparison, the C++ committee (WG21) prefers to make new keywords as normal‐looking as the old keywords. For example, C++11 defines a new thread_local keyword to designate static storage local to one thread. C11 defines the new keyword as _Thread_local. In the new C11 header <threads.h>, there is a macro definition to provide the normal‐looking name:^[10]
#define thread_local _Thread_local

That is, C11 introduced the keyword _Thread_local within an existing set of reserved words (those with a certain prefix), and then used a separate facility (macro processing) to allow its use as if it were a new keyword without any prefixing, while C++11 introduce the keyword thread_local despite this not being an existing reserved word, breaking any programs that used this, but without requiring macro processing.

Predefined names edit

A related notion to reserved words are predefined functions, methods, subroutines, types, or variables, particularly library routines from the standard library. These are similar in that they are part of the basic language, and may be used for similar purposes. However, these differ in that the name of one of these entities is typically categorized as an identifier instead of a reserved word, and is not treated specially in the syntactic analysis. Further, reserved words may not be redefined by the programmer, but predefineds can often be overridden for the extent of some scope.

Languages vary as to what is provided as a keyword and what is a predefined. Some languages, for instance, provide keywords for input/output operations whereas in others these are library routines. In Python (versions earlier than 3.0) and many BASIC dialects, print is a keyword. In contrast, the C, Lisp, and Python 3.0 equivalents printf, format, and print are functions in the standard library. Similarly, in Python prior to 3.0, None, True, and False were predefined variables, but not reserved words, but in Python 3.0 they were made into reserved words.^[11]