The Naming of Cats is a difficult matter,
It isn't just one of your holiday games;
You may think at first I'm as mad as a hatter
When I tell you, a cat must have THREE DIFFERENT NAMES.
— T. S. Eliot (1888–1965), The Naming of Cats
Suppose we have a tomato defined with
name 'fried' 'green' 'tomato',
but which is going to redden later and need to be
referred to as “red tomato”. The name
property
holds an array of dictionary words, so that
(tomato.#name)/2 == 3 tomato.&name-->0 == 'fried' tomato.&name-->1 == 'green' tomato.&name-->2 == 'tomato'
(Recall that X.#Y
tells you the number of
->
entries in such a property array, in this case six,
so that X.#Y/2
tells you the number of -->
entries, in this case three.) You are quite free to alter this array
during play:
tomato.&name-->1 = 'red';
The down side of this technique is that it's clumsy,
when all's said and done, and not so very flexible, because you can't
change the length of the tomato.&name
array during
play. Of course you could define the tomato
with name 'fried' 'green' 'tomato' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.',
or something similar, giving yourself another (say)
fifteen “slots” to put new names into, but this is
inelegant even by Inform standards. Instead, an object like the tomato
can be given a parse_name
routine, allowing complete flexibility
for the designer to specify just what names it does and doesn't match.
It is time to begin looking into the parser and how it works.
The Inform parser has two cardinal principles: firstly, it is designed to be as “open-access” as possible, because a parser cannot ever be general enough for every game without being highly modifiable. This means that there are many levels on which you can augment or override what it does. Secondly, it tries to be generous in what it accepts from the player, understanding the broadest possible range of commands and making no effort to be strict in rejecting ungrammatical requests. For instance, given a shallow pool nearby, “examine shallow” has an adjective without a noun: but it's clear what the player means. In general, all sensible commands should be accepted but it is not important whether or not nonsensical ones are rejected.
The first thing the parser does is to read in text from the keyboard and break it up into a stream of words: so the text “wizened man, eat the grey bread” becomes
wizened / man / , / eat / the / grey / bread
and these words are numbered from 1. At all times
the parser keeps a “word number” marker to keep its place
along this line, and this is held in the variable wn
.
The routine NextWord()
returns the word at the current
position of the marker, and moves it forward, i.e., adds 1 to wn
.
For instance, the parser may find itself at word 6 and trying to
match “grey bread” as the name of an object. Calling
NextWord()
returns the value 'grey'
and
calling it again gives 'bread'
.
Note that if the player had mistyped “grye bread”,
“grye” being a word which isn't mentioned anywhere in the
program or created by the library, then NextWord()
returns
0 for ‘not in the dictionary’. Inform creates the dictionary
of a story file by taking all the name
words of objects, all
the verbs and prepositions from grammar lines, and all the words used
in constants like 'frog'
written in the source code,
and then sorting these into alphabetical order.
▲
However, the story file's dictionary only has 9-character resolution.
(And only 6 if Inform has been told to compile an early-model story
file: see §45.) Thus the values of
'polyunsaturate'
and 'polyunsaturated'
are equal. Also, upper case and lower case letters are considered the
same. Although dictionary words are permitted to contain numerals
or typewriter symbols like -
, :
or
/
, these cost as much as two ordinary letters, so
'catch-22'
looks the same as 'catch-2'
or
'catch-207'
.
▲▲
A dictionary word can even contain spaces, full stops or commas, but
if so it is ‘untypeable’. For instance, 'in,out'
is an untypeable word because if the player were to type something
like “go in,out”, the text would be broken up into four
words, go
/
in
/ ,
/ out
.
Thus 'in,out'
may be in the story file's dictionary but
it will never match against any word of what the player typed. Surprisingly,
this can be useful, as it was at the end of §18.
Since the story file's dictionary isn't always perfect, there is sometimes no alternative but to actually look at the player's text one character at a time: for instance, to check that a 12-digit phone number has been typed correctly and in full.
The routine WordAddress(wordnum)
returns
a byte array of the characters in the word, and WordLength(wordnum)
tells you how many characters there are in it. Given the above example
text of “wizened man, eat the grey bread”:
WordLength(4) == 3 WordAddress(4)->0 == 'e' WordAddress(4)->1 == 'a' WordAddress(4)->2 == 't'
because word number 4 is “eat”. (Recall that the comma is considered as a word in its own right.)
▲
The parser provides a basic routine for comparing a word against the
texts '0'
, '1'
, '2'
, …,
'9999'
, '10000'
or, in other words, against
small numbers. This is the library routine TryNumber(wordnum)
,
which tries to parse the word at wordnum
as a number and
returns that number, if it finds a match. Besides numbers written
out in digits, it also recognises the texts 'one'
,
'two'
, 'three'
, …, 'twenty'
.
If it fails to recognise the text as a number, it returns −1,000;
if it finds a number greater than 10,000, it rounds down and returns
10,000.
To return to the naming of objects, the parser normally
recognises any arrangement of some or all of the name
words of an
object as a noun which refers to it: and the more words, the better the
match is considered to be. Thus “fried green tomato” is
a better match than “fried tomato” or “green tomato”
but all three are considered to match. On the other hand, so is
“fried green”, and “green green tomato green fried
green” is considered a very good match indeed. The method is quick
and good at understanding a wide variety of sensible texts, though
poor at throwing out foolish ones. (An example of the parser's strategy
of being generous rather than strict.) To be more precise, here is what
happens when the parser wants to match some text against an object:
parse_name
routine, ask
this routine to determine how good a match there is.parse_name
routine, or if there was
but it returned −1, ask the entry point routine ParseNoun
,
if the game has one, to make the decision.ParseNoun
entry point, or if there
was but it returned −1, look at the name
of the object
and match the longest possible sequence of words given in the name
.So: a parse_name
routine, if provided,
is expected to try to match as many words as possible starting from
the current position of wn
and reading them in one at
a time using the NextWord()
routine. Thus it must not stop
just because the first word makes sense, but must keep reading and find
out how many words in a row make sense. It should return:
0 | if the text didn't make any sense at all, |
k | if k words in a row of the text seem to refer to the object, or |
−1 | to tell the parser it doesn't want to decide after all. |
The word marker wn
can be left anywhere
afterwards. For example, here is the fried tomato with which this section
started:
parse_name [ n colour; if (self.ripe) colour = 'red'; else colour = 'green'; while (NextWord() == 'tomato' or 'fried' or colour) n++; return n; ],
The effect of this is that if tomato.ripe
is
true then the tomato responds to the names “tomato”, “fried”
and “red”, and otherwise to “tomato”, “fried”
and “green”.
As a second example of how parse_name
can
be useful, suppose you define:
Object -> "fly in amber" with name 'fly' 'in' 'amber';
If the player then types “put fly in amber in hole”, the parser will be thrown, because it will think “fly in amber in” is all just naming the object and then it won't know what the word “hole” is doing at the end. However:
Object -> "fly in amber" with parse_name [; if (NextWord() ~= 'fly' or 'amber') return 0; if (NextWord() == 'in' && NextWord() == 'amber') return 3; return 1; ];
Now the word “in” is only recognised as part of the fly's name if it is followed by the word “amber”, and the ambiguity goes away. (“amber in amber” is also recognised, but then it's not worth the bother of excluding.)
▲
parse_name
is also used to spot plurals: see
§29.
•
EXERCISE 71
Rewrite the tomato's parse_name
to insist that the adjectives
must come before the noun, which must be present.
•
EXERCISE 72
Create a musician called Princess who, when kissed, is transformed
into “/?%?/ (the artiste formerly known as Princess)”.
•
EXERCISE 73
Construct a drinks machine capable of serving cola, coffee or tea,
using only one object for the buttons and one for the possible drinks.
•
EXERCISE 74
Write a parse_name
routine which looks through name
in just the way that the parser would have done anyway if there hadn't
been a parse_name
in the first place.
•▲
EXERCISE 75
Some adventure game parsers split object names into ‘adjectives’
and ‘nouns’, so that only the pattern ‹0 or more adjectives›
‹1 or more nouns› is recognised. Implement this.
•
EXERCISE 76
During debugging it sometimes helps to be able to refer to objects by
their internal numbers, so that “put object 31 on object 5”
would work. Implement this.
•▲
EXERCISE 77
How could the word “#” be made a wild-card,
meaning “match any single object”?
•▲▲
EXERCISE 78
And how could “*” be a wild-card for “match
any collection of objects”? (Note: you need to have read
§29 to answer this.)
•
REFERENCES
Straightforward parse_name
examples are the chess pieces
object and the kittens class of ‘Alice Through the Looking-Glass’.
Lengthier ones are found in ‘Balances’, especially
in the white cubes class.
•Miron Schmidt's library
extension "calyx_adjectives.h", based on earlier
work by Andrew Clover, provides for objects to have “adnames”
as well as “names”: “adnames” are usually
adjectives, and are regarded as being less good
matches for an object
than “names”. In this system “get string”
would take either a string bag or a ball of string, but if both were
present would take the ball of string, because “string”
is in that case a noun rather than an adjective.