Search Top Index
HELP ITEM_CHARTYPE Updated A.Sloman July 1986 item_chartype(<ascii_code>) -> <integer> <integer> -> item_chartype(<ascii_code>) To find the class number currently associated with a character with a given *ASCII code, do: item_chartype(<ascii_code>) -> <class_number> To assign a new class number to the character with the given ASCII code, do: <class_number> -> item_chartype(<ascii_code>) ITEM_CHARTYPE is used to access or update the 'type' information associated with a character. This information affects the way the POP-11 itemiser breaks up a stream of characters into text items. For example, to turn Z into a character which does not combine with anything else, do: 5 -> item_chartype(`Z`); (5 is the class number of the separators). ITEM_CHARTYPE can optionally be given an item repeater as its last argument (i.e. a procedure created by *INCHARITEM, or *ITEMREAD), in which case only that item repeater is affected by the change in class number, for example 5 -> item_chartype(`Z`, itemread); only affects the type of Z for the current call of *COMPILE. There are 12 character classes, as follows: Class Description ----- ----------- 1 Alphabetic - the letters a-z, A-Z. 2 Numeric - the numerals 0-9. 3 Signs - characters like "+", "-", "#", "$", "&" etc. A character in classes 10 and 11 ("bracketed comment" 1 & 2) will default to this class if not occurring in the context of such a comment. 4 Underscore, i.e. "_" 5 Separators- the characters ".", ",", ";", """, "%" and the brackets "[", "]", "{", "}". Control characters are also included in this class, except for those in class 6. Also included are 8 bit characters - 128 - 255 6 Spaces - the space, tab and newline characters. 7 String quote - the character "'" (See HELP * STRINGS) 8 Character quote - the character "`" (See HELP * ASCII) 9 End-of-line comment character - the character ";" In the case of ";" only, three successive characters are needed for a comment (See REF *ITEMISE). 10 Bracketed comment or sign, 1st character - default "/" e.g. for comments of form /* ...... */ 11 Bracketed comment or sign, 2nd character - default "*" 12 Alphabeticiser - this is a special class that forces the next character in the input stream to be of class alphabetic, i.e. class 1 - see notes below. New character classes other than these can be defined with the procedure -item_newtype- (see REF * ITEMISE/item_newtype). Notes: 1. A character of class 12 forces the character immediately following it to be treated as alphabetic, regardless of its normal class. The next character is also interpreted like one following "\" in strings and character constants (e.g. "n" represents newline, "t" tab, and "^A" Ctrl-A, etc). As yet, no character has class 12 as standard, but assuming "\" did, then the following would be examples of legal input words for the itemiser: \+ABC\^A (characters +, A, B, C, CTRL-A) \1234 ( " 1, 2, 3, 4) \n ( " <newline> i.e. ASCII 10) XY\t\n ( " X, Y, <tab>, <newline>) 2. There are two classes for UNIX-style 'bracketed' comments (classes 10 and 11), allowing comments like 1 -> x; /* this is a comment */ 2 -> y; where in this example "/" has class BC1 (Bracketed Comment 1) and "*" class BC2 (Bracketed Comment 2). In other words, BC1 followed by BC2 starts the comment and BC2 followed by BC1 ends it. Nested comments are allowed as in: /* 1 -> x; /* this is a comment */ 2 -> y; */ All other occurrences of BC1 or BC2 are taken as of class SIGN, so that while "/" and "*" have these classes as standard, the arithmetic operators that use these as sign characters remain unchanged. See also REF *ITEMISE - for further details of itemisation procedures, see HELP *ASCII - on character codes in POP-11 HELP *COMMENT - on commenting POP-11 programs HELP *ALPHABETICISER - turning arbitrary characters into alphabetic -----<Copyright University of Sussex 1986. All rights reserved.>-------