Trailing-Edge
-
PDP-10 Archives
-
custsupcuspmar86_bb-x130b-sb
-
strlib.mem
There are 5 other files named strlib.mem in the archive. Click here to see a list.
The information in this document is subject to change without notice
and should not be construed as a comitment by Digital Equipment
Corporation. Digital Equipment Corporation assumes no responsibility
for and errors that may appear in this document.
The software described in this document is furnished under a license
and may be used or copied only in accordance with the terms of such
license.
Digital Equipment Corporation assumes no responsibility for the use or
reliability of its software on equipment that is not supplied by
DIGITAL.
Copyright (C) 1974,1979 by Digital Equipment Corporation
The following are trademarks of Digital Equipment Corporation:
DIGITAL DECsystem-10 MASSBUS
DEC DECtape OMNIBUS
PDP DIBOL OS/8
DECUS EDUSYSTEM PHA
UNIBUS FLIP CHIP RSTS
COMPUTER LABS FOCAL RSX
COMTEX INDAC TYPESET-8
DDT LAB-8 TYPESET-10
DECCOMM DECsystem-20 TYPESET-11
Page 2
table of contents
1.0 overview 3
1.1 precepts and document organization 3
1.2 arrays, strings, and string operations 4
1.3 usage of "strlib" 6
2.0 declarative conventions and the string data-types 8
2.1 storage allocation 8
2.2 data-typing of strings 8
2.3 string pointers 10
2.4 bounds checking 11
3.0 the level-1 routines 12
3.1 the comparative routines 14
3.2 the copying routines 15
3.3 routines which return substrings 16
3.4 the routines which search strings 18
3.5 miscellaneous routines 20
4.0 level-2 related terminology 24
4.1 subsidiary values 24
4.2 data-directed routines -- "mode" values 24
4.3 character search terminology 25
4.4 comparing strings of different lengths 26
5.0 the level-2 routines 27
5.1 the data-directed comparative routine 27
5.2 the data-directed copying routines 29
5.3 the data-directed string searching routine 32
5.4 character-searching 35
5.5 conversions and mappings 39
6.0 error conditions 44
6.1 the defined conditions 42
7.0 implementation characteristics 46
7.1 "strlib" configurations 46
8.0 a programming example 48
Page 3
1.0 overview
this document describes the character string manipulation facilities
provided for the fortran-10 user by the string manipulation package,
"strlib". this initial section is devoted to outlining the interface
between the fortran user and "strlib" and to developing the
operational primitives upon which most string usage is based.
1.1 precepts and document organization
historically, fortran has been word-oriented. but whereas the line
between "word-machines" and "character-machines" has softened under
the pressure of user needs, fortran has lagged behind.
consequently the bulk of the string
manipulating capabilities one would like
to have must be grafted onto fortran in
a manner which is essentially
transparent to the source language
syntax.
in other words, one must use subprograms in lieu of string
manipulating statements, and one must establish conventions
by which the existing data-typing and storage allocating
mechanisms of fortran (e.g. "integer" statement,
"dimension" statement) can be used to describe strings to a
string manipulation package. the declarative conventions
employed by "strlib", and also the role of literals, are
discussed in section two of this document.
the routines constituting the string manipulation package
are divided into two groups. the two groups will
respectively be labeled the level-1 routines and the level-2
routines. the distinction is made purely for expositional
clarity. the level-1 routines are intended to provide the
"basic" string manipulation capabilities, and the level-2
routines provide either more specialized routines or more
efficient mechanisms for performing certain operations.
however the added capability of level-2 is achieved at the
cost of a more complicated user interface and additional
terminology. section three is devoted to developing the
level-1 routines and sections four and five will
respectively introduce the additional terminology and
describe the level-2 routines.
section six is important as a reference since it contains a
description of each of the run-time warnings which "strlib"
can generate. it also tells how to control the amount of
error checking which is done. section seven can be skipped
by most readers since it is used to delve into some of the
internal workings of string and to suggest how "strlib" can
be used in other than the fortran environment.
section eight contains two commented programming examples
which illustrate many of the capabilities within "strlib".
Page 4
1.2 arrays, strings, and string operations
a string is to a character approximately what an array is to
a word. and in fact, even though the character-referencing
routines have not been given names that reflect it, one can
think of a string as an array of characters. for
expositional purposes this analogy will be taken advantage
of -- i.e. the familiar subscript notation will be used to
denote the characters of a string and to introduce the basic
string operations.
(note: within this document a string constant (i.e.
literal) will be represented as it is within a fortran
program -- enclosed in single quotes. for example, 'zzz'
is a string constant in the same sense that 3 is a numeric
constant. also the term, "length of a string", will be
used interchangeably with the term, "the number of
characters in a string" -- e.g. the length of 'abcde' is
five.)
1.2.1 concatenation
def. 1.1 concatenation is the string operation for
combining a group of strings together into a single string.
(!!) will be used to denote the infix concatenation
operator.
in terms of arrays, "c = a !! b" means the following:
dimension a(5),b(6),c(11)
do 100 i=1,5
100 c(i)=a(i)
do 200 i=6,11
200 c(i)=b(i-5)
thus b(1) is made to immediately follow a(5) within (c).
similarly " 'aaabbbccc' = 'aaa' !! 'bbb' !! 'ccc' ".
1.2.2 lexical comparisons
just as it is useful to compare numbers, it is useful to
compare strings. however the mechanism of comparison is
slightly different in that string comparison is character by
character and left justified. on the other hand there are
six basic lexical relational operators just as there are six
numeric relational operators. that is, one string can be --
equal to, not equal to, greater than, less than, greater
than or equal to, or less than or equal to -- a second
string.
in the descriptions which follow, it is always assumed that
the two strings are of equal length.
def. 1.2. two strings are considered equal if each
character in the first string is equal to the corresponding
Page 5
character in the second string (e.g. 'abd' = 'abd', but
'abc' not = 'abd').
def. 1.3 thru 1.7. the comparative rule -- in terms of
arrays -- for ".op." equaling one of .ne., .lt., .gt., .ge.,
or .le. is as follows:
dimension a(n),b(n)
do 100 i=1,n
if a(i) .op. b(i) goto success
if a(i) .ne. b(i) goto failure
100 continue
in other words "a .op. b" succeeds if and only if the first
encountered unequal pair of characters is related by ".op.".
for example, 'abd' is greater than 'abcc' because the first
unequal character pair -- respectively 'd' and 'c' -- is
such that the character in the first string is greater than
the character in the second string.
(note: for strings, each a(i) and each b(i) is constrained
to be in the range, zero to 2**n-1, where (n) is the number
of bits in a character (i.e. for ascii, the range is
0-127)).
(note: all comparative routines use the full character set.
for instance, capital "a" is in no way considered equal to
little "a").
1.2.3 parts of strings
the converse of the concatenation operation is the ability
to deal with parts of a string.
def. 1.8. a substring of a string is any contiguous group
of characters within the containing string.
for instance, 'bbb' is a substring 'aaabbbccc'. in the
array example following, (b) is caused to equal the 2nd thru
5th elements of (a).
dimension a(11),b(4)
do 100 i=2,5
100 b(i-1)=a(i)
1.2.4 input/output
although input/output is not strictly a string operation,
the word orientation of fortran does make it necessary to
make some provision of word-orienting strings for output and
"un-word-orienting" them for input. the concept in "strlib"
central to this issue is that of "storage block", and
storage blocks are discussed in sections 2.1 and 2.2.
additionally the routines which most closely relate to this
issue are "bldstr" for input (see section 3.5) and "alnstr"
for output (see section 3.1).
Page 6
1.2.5 the null string
def. 1.9. the null string is any string with length of
zero.
the null string can be used locate a point within a string,
and the usefulness of this will be seen in the examples of
sections three and five. a user can create a null string in
several ways. the three direct ways -- in the sense that
one explicitly sets string length to zero -- are to use
"bldstr", "setstr", or "vecstr" which are described in
section three.
1.3 usage of "strlib"
the capabilities of the string manipulation package "strlib"
are accessible to the fortran programmer as user functions
and/or user subroutines, and the package exists as a library
file named "string.rel". for example, to use "strlib" from
"ccl" one could type:
.load user.prg,string/lib
and to use it from "link" one could type:
.r link
*user.prg,string/sea/go
the location of "strlib" is obviously installation
dependent, but normally one would expect it to be either
"sys:" or "new:".
within "strlib" a naming convention is upheld for the
routine names. each routine name consists of three
descriptive letters followed by either "str" or "chr",
whichever is applicable. for example, there is "copstr" and
"fndchr".
an alphabetical list of the level-1 routines in "strlib" is
as follows:
aftstr allstr alnstr befstr
bldstr bndstr catstr copstr
eqlstr geqstr gtrstr lenstr
leqstr lesstr relstr repstr
revstr setstr trcstr vecstr
whistr appstr
the list of level-2 routines is:
aftchr allchr befchr chkstr
cmbstr cmpstr cnvstr fndchr
fndstr mapstr tabstr taostr
tazstr tofstr tonstr whichr
copchr
Page 7
2.0 declarative conventions and the string data-types
just as their are several numeric data types (e.g. real,
complex), it is useful to define more than one string data
type. but as noted in section one, it is necessary to use
the existing data types of fortran in conjunction with
conventions while achieving this. similarly it is necessary
to use the existing storage allocation mechanisms of
fortran, and these are outlined next.
2.1 storage allocation
within fortran one can allocate storage in essentially three
ways:
1) in a data-typing statement (i.e. integer, real)
2) in a array dimensioning statement (e.g. dimension
statement)
3) by using a constant (or literal) in an executable
statement.
aside from obvious length restrictions as regards
unsubscripted scalars, all of these mechanisms can be used
to allocate strings. however it is necessary to recall the
word orientation of fortran; if one uses the statement
"dimension a(6)" to allocate space for a string, one has
actually allocated space for thirty (ascii) characters since
there are five characters per word.
2.2 data-typing of strings
before enumerating the string data-types, argument passing
must be discussed since it is via this mechanism that all
communication between the fortran programmer and "strlib"
occurs. there are four classes of arguments that one will
have occasion to pass to "strlib".
1) fixed-point numbers.
2) storage blocks. these can be created by any of the
mechanisms discussed in section 2.1 and can be declared
with any data type.
3) bit masks. these are one word quantities in which each
bit position carries independent information -- only a
user of level-2 routines need be cognizant of this sort
of argument.
4) strings. again note that it is only for arguments that
are expected to be strings that "strlib" will do any
data-type checking.
the data-typing conventions are as follows:
1) a string argument which has been typed either "real" or
"integer" will be treated as a string of length five
(irrespective of any dimensioning). for example:
Page 8
integer istr
real rstr
rstr='happy'
istr='days'
call a-string-routine (rstr)
call a-string-routine (istr)
the first call passes the string 'happy', and the second
call passes the string 'days '. note that fortran will
blank pad in the assignment " istr='days' ".
similarly " istr='' " will set "istr" equal to five
blanks.
2) a string argument which has been typed "double precision"
will be treated as a string of length ten (irrespective
of any dimensioning).
3) a string argument which has been typed "logical" will be
treated as a data-varying string. a data-varying string
has the property that the length of the string is stored
in the word preceding the string.
(note: the maximum possible length that can be specified
for a data-varying string is 2**18-1 characters.)
storage is normally allocated for a data-varying string
by a dimensioning statement of some sort. also one must
be careful to allocate room for the character count as
well.
consider an example:
dimension l(7)
logical l
100 l(1)=9
200 call a-string-routine( l(2) )
300 l(1)=5
400 call a-string-routine( l(2) )
the above program excerpt allocates a word for the
character count (i.e. l(1)) and space for a string of
thirty characters starting at l(2). the statement at 100
causes the call at 200 to treat l(2) as the starting
point of a string which currently contains nine
characters, and the statement at 300 causes the
invocation of "a-string-routine" at 400 to treat l(2) as
a string of length five.
4) a string argument which has been typed "complex" will be
treated as a "string pointer". the idea of a string
pointer is very important to the internal workings of the
string manipulation package, but it is possible to give a
casual user a "black box" description of what string
pointers make possible without going into the nature of a
string pointer. this will be done here and a more
detailed description will be delayed until the next
section, section 2.3.
Page 9
the power of string pointers is inherent in the way
fortran-10 currently defines the concept of "function".
within fortran, a function is a subprogram which
(computes and) returns a value in hardware registers zero
(and one). consequently to avoid the necessity of having
to say something like:
call copy-string(tstring,any-string,2,13)
iyesno=equal-string(tstring,string2)
as opposed to:
iyesno=eql-string(sub-string(any-string,2,13),string2)
one must be able to pass the information communicated by
an arbitrary length string within an actual argument
containing one or two words. and this is exactly what a
string ptr does allow. also string pointers allow
references to (sub)strings to be made without copying the
substring into a user variable or temporary area.
there are several routines in "strlib" which return
string pointers (see sections three and five). to use
these routines as "black boxes" (i.e. as though they
return strings), one need only do two things.
a) declare these routines as "complex".
b) use these routines only as arguments to other "strlib"
routines.
5) a fortran literal (i.e. hollerith constant) will be
treated by each of the routines of "strlib" as a string
constant. however the word orientation of fortran is
such that all literals are given a length which is a
multiple of five characters. for example 'aaabbbc' will
be right padded with three blanks so that its length will
be ten. to circumvent the unavailability of exact length
string constants, there is a routine in "strlib" to
truncate (strip) trailing blanks from a string, and it is
called "trcstr". note that "trcstr" is one of the string
pointer returning routines.
(note: a literal may appear anywhere any other string can.
consequently "strlib" will attempt to overwrite a literal
which is used as the destination of one of the string
modifying routines.)
2.3 string pointers
what information precisely describes a string? there are
essentially two things one must know. first of all, one
must know where the string starts -- i.e. the address of
its first character. secondly since a string can be an
arbitrary number of characters, one has to know its length.
consequently a string pointer contains a byte pointer (as
Page 10
its first word) since this is the decsystem-10's mechanism
of dealing with characters (i.e bytes), and its second word
contains the number of characters in the string pointed at.
what does a string pointer point at? it points at a user
allocated storage block. in other words, the processes of
declaring a string pointer and allocating storage for the
string pointed at are independent of one another. in order
to transform a storage block and a desired initial length
into a string pointer, one must use the routine called
"bldstr". this routine is described in detail in section
three. for example:
dimension block(11)
complex bldstr,strptr
data initl/26/
strptr=bldstr(block,initl,0)
this example builds a string pointer which points at "block"
and has initial length of twenty-six. note that there is
room for 11*5 = 55 characters however.
2.4 bounds checking
just as it is possible to reference an array element which
is out of bounds of the array's storage allocation, it is
possible to reference or attempt to modify a character which
is outside of the allocated length of a string. a user of
"strlib" can cause bounds checking of string usage to occur,
when string pointers or data-varying strings are used, by
specifying a maximum length for the string when a call to
either "bldstr" or "setstr" is made -- see section 3.5 for
details. also for string variables typed "integer", "real"
and "double precision", bounds checking is always done as a
side effect of the fixed-length nature of these sorts of
string variables.
whenever a routine of "strlib" detects an out-of-bounds
reference it will print a warning message on the user's
console -- see section six for a description of the warning
messages.
Page 11
3.0 the level-1 routines
the routines in "strlib" communicate information to their
callers in one of four ways. the rest of this section will
be used to introduce these methods of communication and
enumerate the routines which fall into each class (including
level-2 routines). then in sections 3.1 thru 3.5 the
level-1 routines will be grouped by function and described
individually.
3.01 string-modifying routines
since arbitrary length strings can not be returned by
functions, text movement has to occur by modification of one
of the arguments specified in the invokation of a
string-modifying routine. for each of the string modifying
routines, the string to be modified is the first argument of
the routine.
the second point about the string-modifying routines is that
each is (potentially) aware of an attempt to overflow the
destination string. when one does overflow the
destination-string in a call to one of these routines, the
not out-of-bounds string movement does occur, and the
routine will return a zero to indicate that string movement
was not completed. completion of string movement is
signalled by returning -1 rather than 0. for example, let
s1 be a string whose maximum is 12 characters and let s2 be
a string whose maximum is 20 characters:
integer modif-routine
i1=modif-routine(s1,'is 15 characs ')
i2=modif-routine(s2,'is 15 characs ')
the first call would overflow (s1) and leave it equal to 'is
15 charac' and set i1 to equal 0. the second call would run
to completion and leave i2 equal to -1 and s2 equal to 'is
15 characs '.
the string modifying routines are:
alnstr appstr catstr chkstr
cmbstr cnvstr copchr copstr
mapstr repstr revstr
(note: "alnstr" is a special case in that the destination
is a storage block rather than a string.
note: "copstr" and "copchr" are special in that they
return no completion value.
note: "revstr" returns a string pointer rather than a
completion value.
note: "repstr" leaves the destination string unchanged if
the replace cannot succeed.
note: if one has no need to worry about out-of-bounds
references, one can invoke a string-modifying routine as a
Page 12
subroutine rather than as a function if that is desired.)
3.02 routines which return a truth value or integer
these routines should be declared as integer functions;
they return information rather than strings. the simplest
example is "lenstr" which simply returns the number of
characters in its argument. consider some examples:
integer eqlstr,lenstr
if (eqlstr(string1,string2)) go to are-equal
i=eqlstr(string1,string2)
if (i.eq.-1) go to are-equal
j=lenstr(string1)
if (j.gt.lenstr(string2)) goto s1-longer
the information returning routines are:
cmpstr eqlstr fndchr fndstr
geqstr gtrstr lenstr leqstr
lesstr neqstr
(note: some of these routines return subsidiary information
-- e.g. a second integer. the mechanism by which this is
done and the description of each type of subsidiary
information will be presented in section five. for the
purposes of section three, each level-1 routine returns
either an integer or a truth-value).
3.03 routines which return string pointers
this class of routines was introduced in sections 2.2 and
2.3. the following are the routines which return string
pointers:
aftchr aftstr allchr allstr
befchr befstr bldstr bndstr
relstr revstr trcstr vecstr
whichr whistr
(note: allchr and allstr are special cases in that each
actually sets up three string pointers -- one of which is a
return value and two of which are arguments passed to them
by the user.
note: some of these routines can "fail". when one does,
double zero is returned rather a string pointer.)
3.04 routines invoked as subroutines rather than functions
these routines are somewhat miscellaneous. their
commonality is that each communicates with its caller only
by modifying one of its arguments. the routines in this
class are:
setstr tabstr taostr tazstr
Page 13
tofstr tonstr
(note: "copstr", and "cnvstr" under certain circumstances,
can be viewed as belonging to this class.)
3.1 the comparative routines
these six routines implement the relational operators
discussed in section 1.2.2. each of them returns ".true."
(ie. -1) if a comparison succeeds, and ".false." (ie. 0)
if it fails.
when the length of the two strings differs, the shorter
string is padded with blanks until it is equal in length
with the longer string. the comparison then precedes by the
rules outlined in defs. 1.2 thru 1.7.
eqlstr
usage: i = eqlstr(string1,string2,0)
(i) will be set to "true" if string1 is lexically equal
to string2 and "false" otherwise.
geqstr
usage: i = geqstr(string1,string2,0)
(i) will be set to "true" if string1 is lexically
greater than or equal to string2 and "false" otherwise.
gtrstr
usage: i = gtrstr(string1,string2,0)
(i) will be set to "true" if string1 is lexically
greater than string2 and "false" otherwise.
leqstr
usage: i = leqstr(string1,string2,0)
(i) will be set to "true" if string1 is lexically less
than or equal to string2 and "false" otherwise.
lssstr
usage: i = lssstr(string1,string2,0)
(i) will be set to "true" if string1 is lexically less
than string2 and "false" otherwise.
neqstr
usage: i = neqstr(string1,string2,0)
(i) will be set to "true" if string1 is lexically not
equal to string2 and "false" otherwise.
examples:
integer eqlstr, lesstr, geqstr
Page 14
real*8 dstr
data dstr/'aaa '/
istr = 'aaa'
astr = 'bbb'
if (eqlstr(istr,astr,0)) goto succes
if (lesstr(istr,astr,0)) goto succes
if (geqstr(istr,dstr,0)) goto succes
the first "goto" will not be taken and the second will be
taken. the third will be taken because "istr" will be
padded with blanks to a length of ten.
(note: the non-zero values of the third argument of each of
these routines are discussed in section 5.1.)
3.2 the copying routines
these routines will be described in terms of the
concatenation operation defined in section 1.2.1.
copstr
usage: call copstr(dest-string,s1)
this is the simplest copying routine. it implements
the assignment statement, " dest-string = s1 ".
appstr
usage: i = appstr(dest-string,s1)
this routine implements efficiently the assignment
statement,
" dest-string = dest-string !! s1 ".
catstr
usage: i = catstr(dest-string,n,s1,s2,...,sn)
this routine implements the assignment,
" dest-string = s1 !! s2 !! ... !! sn ",
where the second argument (n) is a count of the number
of strings to be concatenated.
alnstr
usage: i = alnstr(storage-block,word-cnt,n,s1,...sn)
this routine is designed to facilitate the process of
using fortran formatted i/o to output arbitrary
strings. it implements the same function as "catstr"
with two differences. first, "storage-block" and
"word-cnt" are used in place of "dest-string" to
identify a destination string. this destination string
starts at "storage block", is word aligned, and has
length in characters of "word-cnt * 5". secondly, if
the combined length of the source strings is less than
Page 15
"word-cnt * 5", the string created at "storage-block"
will be blank padded until its length is "word-cnt *
5".
the utility of "alnstr" hinges upon two facts. the
fortran format code "a" always assumes that a sequence
of characters starts at the left end of a word.
strings that start on an arbitrary character boundary
are thus difficult to deal with. it is also difficult
to dynamically specify the length of a string -- hence
the need to blank pad to a defined length.
examples:
logical l1(11)
real*8 d1
complex c1
dimension storag(20),outblk(10)
complex bldstr
integer alnstr,appstr,catstr
data l1/22,'a data-varying string '/
data d1/'a d-p one'/
call copstr (istring,' abc')
c1 = bldstr(storag,0,0)
i = appstr(l1(2),d1)
if (.not. i) goto fail
i = catstr(c1, 3, l1(2), istring,'more')
if (.not. i) goto fail
i = alnstr(outblk, 10, 1, l1(2))
write (1,101) outblk
101 format(1h ,10a5)
after executing the above program excerpt, (l1) would equal
'a data-varying string a d-p one', (c1) would point at a
string equal to 'a data-varying string a d-p one abc more ',
and the write statement would have generated a record equal
to:
' a data-varying string a d-p one '.
also note the spaces following 'abc' and 'more' in the
literal above, this is caused by the fact that "integer"
string variables have length five and literals are padded to
a length which is a multiple of five. also note the blank
padding exhibited in the execution of the write statement
because of "alstr". lastly, note the "fail" checks for
their format even though they were not strictly necessary
since no bounds checking was set up for any of the
destination strings.
3.3 routines which return substrings
these routines (and the level-2 routines) are predicated on
the notion that string length has a geometrical basis --
that a character can be viewed as having a left side and a
right side. in other words, the length of a string can be
Page 16
viewed as being the distance from the left side of the first
character to the right side of the last character in the
string. what this means is that one must substitute the
concept of "position" for the concept of "subscript" when
discussing substrings. (note however that in many
circumstances the two concepts are equivalent.)
def 3.1. position -- a position is an index which locates a
particular point in a string. the index assigned to the
point immediately preceding the first character in a string
is one. in general, the (i)th position in a string is the
point to the right of the (i-1)th character and to the left
of the (i)th character in a string.
def 3.2 starting position -- the starting position of a
substring is the point to the left of the first character in
the string. for example, within "example", the starting
position of "xamp" is two.
def 3.3. ending position -- the ending position of a
substring is the point to the right of the last character in
the string. obviously this is equivalent to the point to
the left of the (last + 1)th character, so that the ending
position of "xamp" (see def 3.2) is (5+1) = 6.
(note the identity: string-length = ending-position -
starting-position.
note: henceforth "pos1" will be used as shorthand for
starting-position, and "pos2" will be used as shorthand for
ending-position.)
relstr
usage: string-ptr = relstr(string1, addrel)
this routine returns a string pointer which points at a
string whose starting-position within string1 equals
"addrel + 1" and whose length is "string1-length -
addrel" and whose maximum is equal to:
"string1-maximum - addrel".
vecstr
usage: string-ptr = vecstr(string1, pos1, length)
this routine returns a string pointer which points at a
string whose starting position within string1 is
"pos1", whose length is "length", and whose maximum is:
"maximum of string1 - pos1 + 1".
bndstr
usage: string-ptr = bndstr(string1, pos1, pos2)
this routine returns a string pointer which points at
the substring whose starting-position within string1 is
"pos1", whose length is "pos2 - pos1", and whose
maximum is "string1-length - pos1 + 1". however if
Page 17
"pos2" is zero, "bndstr" will default "pos2" to equal
the ending position of "string1" -- ie. string1-length
+ 1.
examples:
complex relstr, vecstr, bndstr
complex sp1,sp2,sp3,sp4
sp1 = relstr('1122334455',2)
sp2 = vecstr('1122334455',1,6)
sp3 = bndstr('1122334455',7,9)
sp4 = bndstr('1122334455',7,0)
this excerpt causes sp1 to point at '22334455' and have
length eight. sp2 will point at '112233' and have length
six. sp3 will point at '44' and have length two. lastly
sp4 will point at '4455' and have length four since the zero
third argument will be defaulted to eleven -- the ending
position of '1122334455'.
3.4 the routines which search strings
each of these routines will perform the identical search if
passed the same group of search-related arguments. the way
they differ from one another is in the value of the string
pointer(s) they return.
the form of the search-related arguments of this class of
routine is:
search-routine(host-string,n,s1,s2,...,sn)
where host-string is the string being searched, (n) is the
count of the search strings, and s1 thru sn are the search
strings. the search can be described as follows:
do 100 i=1,length-of-host
do 100 j=1,num-of-search-strings
if (host(i) .eq. s(j,1)) goto compar
100 continue
goto search-failed
compar: if (eqlstr(vecstr(host,i,lenstr(s(j)),
s(j))) goto search-succeeded
goto 100
s(j) is informal notation for the (j)th search string, and
s(j,1) is informal notation for the first character of the
(j)th search string, and host(i) is notation for the (i)th
character of the host. the program excerpt illustrates that
the search works as a parellel search, finding the search
string which occurs earliest in the host-string. for
example:
search-routine('0123456789', 2, '56789','01234')
will find '01234' within the host string.
Page 18
(note: if none of the search-strings are found within the
host string, a search routine will return double zero
rather than a valid string pointer value).
befstr
usage: string-ptr = befstr(host, n, s1,s2,...,sn)
the arguments are as described above. this routine
returns a string pointer which points at the string
within the host which precedes the found substring
within the host string.
whistr
usage: string-ptr = whistr(host, n, s1,s2,...,sn)
the arguments are as described above. this routine
returns a string pointer to the string which was found
in the host string.
(note: it is assumed that this routine would be used
only when there is more than one search string and one
wants to know "which" search string was found).
aftstr
usage: string-ptr = aftstr(host, n, s1,s2,...,sn)
the arguments are as described above. this routine
returns a string pointer to the string within the host
string which is after the matched string.
allstr
usage: string-ptr = allstr(host,bef-ptr,aft-ptr,
n,s1,.,sn)
besides bef-ptr and aft-ptr, the arguments are as
above. this routine combines the functions of the
three preceding routines. it returns the same string
pointer as "whistr", sets up bef-ptr to be the value
that "befstr" would have returned, and sets up aft-ptr
to be the value that "aftstr" would have returned.
(note: if "allstr" fails, bef-ptr and aft-ptr are left
unchanged).
usage: string-ptr = allstr(host,bef-ptr,0,n,s1,...,sn)
this usage changes the success behavior of "allstr".
setting the "aft-ptr" argument to a fixed-point zero
causes "allstr" to return just two string pointer
values:
1) string-ptr is set to found-string !! rest-of-string
2) bef-ptr is set as in the first usage.
usage: string-ptr = allstr(host,0,aft-ptr,n,s1,...,sn)
this usage is similar to usage-two. this time,
however:
1) string-ptr is set to beginning-of-string !!
found-string
Page 19
2) aft-ptr is set as in the first usage.
examples:
complex sp1,sp2,sp3,sp4,sp5,sp6,sp7,sp8,sp9
complex allstr,befstr,aftstr,whistr
logical l1(2),l2(2),l3(2)
real*8 digits
data digits/'0123456789'/
data l1/0,0/ !the null string
data l2/2,'34'/
data l3/2,'23'/
sp1 = befstr('0123456789', 1, '34567')
sp2 = aftstr('0123456789', 1, '34567')
sp3 = whistr('0123456789', 1, '34567')
sp4 = allstr('0123456789', sp5, sp6, 1, '34567')
******* end of part one ********
sp1 = befstr(digits, 3, '56789', l2(2), l3(2))
sp2 = befstr(digits, 3, l3(2), l2(2), '56789')
sp3 = aftstr(digits, 1, 'abcde')
sp4 = aftstr(digits, 1, '012')
sp5 = whistr(digits, 2, l2(2), '34567')
sp6 = whistr(digits, 2, '34567', l2(2))
****** end of part 2 ********
sp1 = allstr(digits, sp2, sp3, 1, l1(2))
sp4 = allstr(digits, sp5, sp6, 2, '01234', l1(2))
sp7 = allstr(digits, sp8, sp9, 2, '23456', l1(2))
********** end of part 3 *********
sp1 = allstr(digits,sp2,0, 1,l2(2))
sp3 = allstr(digits,0,sp4,0, 1,l3(2))
after executing part one, sp1 would point at '012'; sp2
would point at '89'; sp3 would point at '34567'; sp4 would
equal sp3; sp5 would equal sp1; and sp6 would equal sp2.
after executing part two, sp1 and sp2 would point at '01'.
in both cases, l3 would be the matched string because
search-string-order only comes into play when more than one
search string starts at the same place in the host. in
particular, sp5 would point at '34' and sp6 would point at
'34567' after the execution of the two "whistr"s. after
execution of the two "aftstr"s, sp3 would equal zero since
'abcde' is not within "digits", and similarly sp4 would
equal zero because the search string is actually equal to
'012 '.
the calls of part three deal with the null string. since
the null string matches anything, it would match the point
immediately preceding the first character in "digits", and
consequently both sp1 and sp2 would be set to a null string
Page 20
pointing to that position. conversely sp3 would point at
'0123456789' -- the entirety of "digits". however in the
next call to "allstr", sp4 would be caused to point at
'01234' since search order would cause the '01234' to be
encountered before the null string l1. this call would also
set sp5 and sp6: to the null string and '56789'
respectively. the result of the third call to "allstr"
would be the same as the result of the first call to
"allstr" in the sense that sp7 would equal sp1 and so on.
this is the case because '23456' does not start at the
beginning of "digits", and hence the parellel search would
encounter the null string, l1, first.
the calls of part four show the affect of setting either the
bef-ptr or aft-ptr argument to zero. in the first call, sp1
is set to '3456789'; and sp2 is set to '12'.
in the second call, sp3 is set to '123'; and sp4 is set to
'456789'.
3.5 miscellaneous routines
bldstr
usage: str-ptr = bldstr(storage-blk, length, maximum)
this routine returns a string pointer which points at
the beginning of storage-blk. the string pointed at is
given a length of "length" and a maximum of "maximum".
specifying a maximum of zero is the mechanism for
specifying no maximum.
lenstr
usage: i = lenstr(string1)
(i) is set to the length, in characters, of string1.
(note again that lenstr('literal') will return 10
rather than 7).
repstr
usage: i = repstr(string1, string2, string3)
this routine causes string1 to be modified such that
string2, which is a substring within string1, is
replaced by string3. (if one makes the simplifying
assumption that the value of string2 occurs within
string1 in only one place,) one can describe "repstr"
as follows -- lettng s1, s2, s3 be short for string1,
string2, string3:
s1 = befstr(s1,1,s2) !! s3 !! aftstr(s1,1,s2)
if the replacement of string2 with string3 would cause
the maximum of string1 to be exceeded, string1 will not
be modified at all and (i) will be set to zero rather
than -1.
(note: repstr('12345', '23', 'bc') is meaningless
Page 21
because '23' is not a substring of '12345'; for even
though the value '23' is within '12345', the literal
'23' is totally distinct from literal '12345' and has
a totally distinct starting address.)
revstr
usage: string-ptr = revstr(string1, string2 or 0)
this routine will reverse the source string in the
sense that the first character of the source string
will be made the last character of the destination
string and vice versa, the second -- the next to last,
and so on.
if the second argument is 0, string1 will be treated as
both the source string and destination string. if
argument two is non-zero, string2 will be treated as
the source string and string1 will be treated as the
destination string.
(note: differences in length between string1 and
string2 are ignored since the returned string pointer
will correctly identify the length of the string
reversed. however if the maximum of string1 is less
than the length of string2, the reversal will not
occur and the string pointer will be set to double
zero.)
setstr
usage: call setstr(string1, length, maximum)
this routine provides the suggested mechanism for
(initializing and) setting either the length or maximum
length of a string. if "length" is non-negative,
string1 will be given a length of "length"; and if
"maximum" is non-negative, string1 will be given a
maximum of "maximum". however if the data-type of
string1 is not "complex" or "logical", this routine
will act as a no-op.
(note: specifying a maximum of 0 is the mechanism for
specifying no maximum at all.
note: a 4th argument can be specified when string1 is
a string pointer. this argument is used to create
non-ascii strings. in particular it causes "setstr"
to set the byte size of string1 to the 4th argument.
accordingly one would usually expect it to be 6 -- for
sixbit strings.).
trcstr
usage: string-ptr = trcstr(string1)
this routine returns a string pointer to a substring of
string1 such that string1 and the substring start at
the same character and the substring has no trailing
blanks. because this is such a basic routine, a short
non-standard name is provided in addition to "trcstr".
one can invoke this routine as "np" -- no padding.
Page 22
examples:
integer repstr
complex revstr,bldstr, trcstr
complex sp1,sp2,sp3
logical l1(2),l2(3),anynul(2)
dimension inblk(5)
data l1,l2/4, '1234', 7, 'abcdefg'/
data anynul/0,0/
read (1,101) inblk
101 format(4a5)
sp1 = bldstr (inblk,20, 20)
i = lenstr(sp1)
sp2 = revstr(l2(2), l1(2))
sp3 = revstr(l1(2), 0)
call setstr (l2(2), 6, 0)
i = repstr(sp1, vecstr(sp1,2,0), 'abc')
if (i) goto succes
call setstr(sp1, -1, 25)
300 i = repstr(sp1, vecstr(sp1,2,0), 'abc')
301 i = repstr(sp1, vecstr(sp1,2,0), trcstr('abc'))
i = repstr(l1(2), vecstr(l1(2),2,2), anynul(2))
the call to "bldstr" would point sp1 at "inblk" and give it
a length and maximum of twenty characters. the succeeding
statement would set (i) to 20.
the first "revstr" would set l2 to '4321efg', but would
point sp2 at '4321'. the second "revstr" would reverse l1
in place and leave it equal to '4321'. lastly the call to
"setstr" has the effect of truncating l2 by one and leaving
it equal to '4321ef'.
the group of calls to "repstr" attempts to illustrate the
role of the null string as well as the nature of "repstr".
the first call will fail because it is an attempt to replace
a zero-length string with a string of length five when the
destination string is already at its maximum. this is
"gotten around" by using "setstr" to increase sp1's maximum,
while leaving its current length untouched, and repeating
the call to "repstr". after that call, 'abc ' would be the
2nd thru 6th characters of the string pointed at by sp1, and
that string would have length of 25. on the other hand, if
the statement at 301 had been executed rather than the one
at 300, only the 'abc' would be inserted in the destination
string, and its new length would be 23. the last "repstr"
truncates a destination string, removing its second and
third characters and leaving it equal to '41'.
Page 23
4.0 level-2 related terminology
4.1 subsidiary values
most of the information-returning routines make available
more than one piece of information to their caller. the
primary piece of information can always be "gotten at" by
declaring the routine as an integer function as noted in
section three. when one wishes to get at the subsidiary
information, one must do something analogous to the
following:
complex c1
dimension ic (2)
equivalence (c1,ic)
c1=eqlstr(string1, string2, 0)
after execution of "eqlstr", ic(1) would contain either
"true" or "false", and ic(2) would contain one of -1, 0,1
depending upon whether lenstr(string1) was greater than,
equal to, or less than lenstr(string2).
(note: henceforth the primary value of an information
returning routine will be referred to as r0, and the
subsidiary value as r1).
(note: each of the routines which can potentially return a
subsidiary value has been given a second name of the form
<id>(sts or chs) where the trailing "s" stands for
subsidiary. for instance, "fndchr" can also be invoked as
"fndchs").
4.2 data-directed routines -- "mode" values
"mode" is an argument common to several of the level-2
routines. it is a bit mask which controls the direction of
processing within a particular routine. in all cases,
"mode" consists of some number of 1-bit switches which can
be set independently (either by or-ing or adding switches
together). a number of the switches are antonym pairs--for
example if the "append" bit is on, string combination is
"append" mode; if the same bit is off, string combination
is "copy" (or overwrite) mode. when a particular bit
defines an antonym pair, the off condition will be noted in
parenthesis.
defined switches (by routine):
for cmbstr and chkstr
append(copy) = 1 (i.e. bit 35 is on)
numeric (character) = 4 (bit 33)
pad = 8 (bit 32)
for cmpstr
ignore = 1 (bit 35)
exact = 2 (bit 34)
if neither "ignore" nor "exact" are
Page 24
set
"padded" is implied.
translate = 4 (bit 33)
trace = 8 (bit 32)
for fndstr
idxend (idxbegin) = 1 (bit 35)
anchor = 2 (bit 34)
partial = 4 (bit 33)
multiple = 8 (bit 32)
which (length) = 16 (bit 31)
for fndchr
idxend (idxbegin) = 1 (bit 35)
anchor = 2 (bit 34)
partial = 4 (bit 33)
backwards (forwards) = 2 (bit 32)
for mapstr
toascii (tosixbit) = 1 (bit 35)
bounds = 2 (bit 34)
translate = 4 (bit 33)
yesbound (nobound) = 8 (bit 32)
for cnvstr
toascii (tonumeric) = 1 (bit 35)
zeropad (blankpad) = 2 (bit 34)
nofill = 4 (bit 33)
always = 8 (bit 32)
(note: the fortran "include" file "string.for" contains a
parameter statement defining a symbolic value for each of
the numeric mode values defined above. the symbols in
"string.for" are the symbols used above, or if these are
too long, the first six characters thereof).
(note: in section five, the pseudo-mode "others" will be
used to indicate that the switch setting being described is
not affected by other unrelated switches being turned on).
4.3 character search terminology
def. 4.1. bit index -- there are 36 bits in a word and
each is assigned an index; bit 0 of a word is the sign bit
of a word, and bit 35 of a word is the far right bit in a
word.
def. 4.2. a bit-vector of length (n) is the sequence of
bits consisting of the (i)th bit of (n) consecutive words.
clearly there are 36 bit-vectors in any storage block -- one
corresponding to each bit index.
def 4.3. a boolean character table(bct) is a bit vector of
length (128) whose purpose is to encode an arbitrary group
of (distinct) characters in such a way as to make the
execution speed of the analogue of:
Page 25
search-routine(host, n, char1, c2,..., cn)
independent of the value of (n).
def. 4.4 let bct(storage-block, bit-index) denote the
specific boolean character table starting at storage-block
and consisting of the (bit-index)th bit of each word in the
storage-block.
def. 4.5 let the notation:
bct(block,bit-index) = c1 !! c2 !! ... cn
indicate that bct(block,bit-index) is the encoded analogue
of search string list, (n, c1, c2,...,cn).
(note: the routines which manipulate boolean character
tables and the character searching routines are described
in section 5.4).
4.4 comparing strings of different lengths
as noted earlier, the only "difficult" situation that can
occur in comparing two strings is that they are not the same
length but are equal for the extent of the shorter string.
in section 3.1, one method of reacting to inequality of
length was introduced, namely padding the shorter string
with blanks. at this point, two other reactions will be
defined.
def. 4.6 an "exact-style" comparison will consider two
strings equal only if they are identical, i.e. contain the
same characters and have the same length. additionally, if
the two strings are lexically equal for the extent of the
shorter, the longer string will be treated as lexically
greater than the shorter string.
def. 4.7 in an "ignore-style" comparison, the mechanism of
obtaining equality of string lengths is to truncate the
longer string to the length of the shorter string rather
than to pad the shorter string.
Page 26
5.0 the level-2 routines
the level-2 routines are grouped approximately in the same
manner as the level-1 routines were. however, if
applicable, several usages will be presented for a routine,
and features common to all usages for a particular routine
will be described before its list of "usage:" paragraphs.
5.1 the data-directed comparative routine
each usage of "cmpstr", the data-directed comparative
routine, contains an argument, "code", which determines the
relational operator which is to be applied to the strings
being compared. "code" is an integer from 0 to 5 such that
code equaling:
0 ==> the operator is "equal"
1 ==> the operator is "not equal"
2 ==> the operator is "greater than or equal"
3 ==> the operator is "less than or equal"
4 ==> the operator is "greater than"
5 ==> the operator is "less than"
(note: "string.for" also contains parameter statements for
each of the "code"s defined above).
"cmpstr" returns a subsidiary value in r1 which indicates
the relative lengths of the two strings it compared. in r1,
"cmpstr" will return:
-1 if string1 is shorter than string2
0 if they are equal in equal
1 if string1 is greater in length
the routine usages follow:
for mode = not exact and not ignore.
usage: cmpstr(string1, string2, code, not exact and
not ignore)
if the relationship between string1 and string2 in a
"padded-style" comparison is the relationship denoted
by "code", the comparision will be considered
successful (ie. -1 will be returned); otherwise the
comparison will be considered to have failed and will
return 0 in r0. for instance, "cmpstr(s1, s2, 2, 0)"
is completely equivalent to "geqstr(s1, s2, 0)".
for mode = ignore.
usage: cmpstr(string1, string2, code, ignore)
if the relationship between string1 and string2 in a
"ignore-style" comparison is the relationship denoted
by "code", the comparision will be considered
successful (ie. -1 will be returned); otherwise the
Page 27
comparison will be considered to have failed and will
return 0 in r0.
for mode = exact.
usage: cmpstr(string1,string2, code, exact)
if the relationship between string1 and string2 in a
"exact-style" comparison is the relationship denoted by
"code", the comparision will be considered successful
(ie. -1 will be returned); otherwise the comparison
will be considered to have failed and will return 0 in
r0.
for mode = trace.
usage: cmpstr(string1, string2, code, trace + others)
if the relationship between string1 and string2, using
the specified style of comparison, is the relationship
denoted by "code", the comparision will be considered
successful (ie. -1 will be returned).
otherwise the comparison will be considered to have
failed and will return in r1 the position of the
character which caused the comparison to fail.
for mode = translate.
usage: cmpstr(string1,string2, code, translate +
others, translation)
this usage will cause each character in string2, for
the extent of the shorter string, to be translated by
the numeric value specified in the 5th argument,
"translation". the "translate" mode can be used to
compare numbers to letters, for instance; or it can be
used to compare ascii to sixbit strings when the
translation factor is octal 40, and so on.
examples:
complex cmpstr,cc
integer ic(2)
equivalence (ic,cc)
100 cc=cmpstr('abcde','xyz',lss,0)
200 cc=cmpstr('xyz','abcde',lss,0)
300 cc=cmpstr('abcde','abc',eql,ignore)
400 cc=cmpstr('abcde','abc',gtr,ignore)
500 cc=cmpstr('abcde',np('abc'),eql,ignore)
600 cc=cmpstr('abcde',np('abc'),gtr,ignore)
700 cc=cmpstr(np('abc'),'abc',eql,exact)
800 cc=cmpstr(np('abc'),'abc',lss,exact)
900 cc=cmpstr(np('abc'),'abc',eql,trace)
1000 cc=cmpstr(np('abc'),'abc',eql,trace+exact)
1100 cc=cmpstr('abcde','12345',eql,transl,"20)
1200 cc=cmpstr('12345','abcde',eql,transl,-"20)
1300 cc=cmpstr('abcd0','1234 ',eql,trans,"20)
1400 cc=cmpstr('abcd0',np('1234 '),eql,transl,"20)
Page 28
the call at 100 returns "true" in r0 (ie. ic(1)) and zero
(to indicate lengths are equal) in r1. on the other hand
the call at 200 returns "false" in r0.
the call at 300 returns "false" in spite of the "ignore"
mode because the actual length of 'abc' is five. conversely
the call at 500 does return "true" in r0 and 1 in r1. the
call at 400 returns "true" in r0 because "d" is lexically
greater than " ". however the call at 600 returns "false"
since 'abc' equals 'abc' and the "d" is irrelevant.
correspondingly 1 is returned in r1 for 600.
the call at 700 returns "false" because the two strings have
different lengths. correspondingly -1 is returned in r1.
the call at 800 does return "true" because the two strings
are equal for the extent of the shorter (i.e three
characters), and the first string is the shorter.
the call at 900 simply (re)pads string1 and returns "true"
in r0 and -1 in r1. the call at 1000 notes the failure also
detected at 700 by returning 4 in r0 to indicate that the
failure was detected during the attempt to compare the
fourth characters of the two strings.
the call at 1100 returns "true" in r0 since the octal code
of "a" is 101 and the octal code of "1" is 61, etc. the
call at 1200 notes the equivalence of simultaneously
inverting the strings and the sign of the translation. the
next two calls, 1300 and 1400, show the difference between a
blank trailing character and blank padding when "translate"
is set. the call at 1300 returns "true" since the codes of
blank and zero are octal 40 and 60, but the call at 1400
fails because string2 has no fifth character and the padding
blank is not translated.
5.2 the data-directed copying routines
"cmbstr" and "chkstr" are essentially generalizations of
"catstr" -- generalizations in the sense that they have more
modes of operation. but since the exact same modes apply to
the two routines, the modes are described only for "cmbstr".
the only difference between "cmbstr" and "chkstr" is how
they react to an attempt to extend a string beyond its
maximum. both will return 0 rather than -1, but "chkstr"
will checkpoint the copying operation by modifying two of
its arguments as well.
chkstr
usage: i = chkstr(dest-string, others, start-ptr,
n-left, n, s1,...,sn)
start-ptr and n-left provide the information to do the
checkpointing. for each call, start-ptr should point
at the character at which one wants string movement to
start (resume), and n-left should identify which source
Page 29
string that character is in. if s1 is that source
string, n-left should equal (n); if s2, then n-left
should equal (n - 1), et cetera. under normal
circumstances, one would continue to call "chkstr" with
the same arguments only so long as it continued to
fail. and after each failure, "chkstr" would itself
set start-ptr to point at the next character to move
and n-left to equal the index of the current source
string.
conversely, on the first attempt to concatenate the
source strings, (n) and s1 are the "resume" values.
consequently it is not necessary that n-left and
start-ptr be explicitly set up since "chkstr" can get
the appropriate values from (n) and s1. one tells
"chkstr" this is the first time through by setting
n-left to zero.
cmbstr
for mode = append.
usage: i = cmbstr(dest, append + others, n, s1,...sn)
this routine usage implements the assignment:
dest = dest !! s1 !! ... !! sn
for mode = not append.
usage: i = cmbstr(dest, not append + others, n,
s1,...,sn)
this routine usage implements the assignment:
dest = s1 !! ... !! sn
for mode = pad.
usage: cmbstr(dest,pad + others, n, s1,...sn)
if the length of "dest" before the call is greater than
the combined lengths of the source strings, "dest" is
blank padded to that length after (sn) has been copied
into (appended to) "dest". if "dest"s length before
the call is less than the combined lengths of the
source strings, it is adjusted upwards to the larger
value.
for mode = numeric.
usage: i = cmbstr(dest, numeric + others, n,
source-array)
source-array contains a list of characters encoded as
fixed point numbers, and (n) is the number of items in
the list. this usage will cause the items in
source-array to be decoded, concatenated and copied
into (appended to) "dest". for instance, 3 is the
encodement of control-c and octal 40 is the encodement
Page 30
of "blank".
examples:
complex sp1,relstr
integer chkstr
logical l1(3)
data s1,s2,s3,s4/'1122','3344','5566','7788'/
data left/0/
call setstr(l1(2), 0, 7)
100 if (chkstr(l1(2),0, sp1,left, 4,s1,s2,s3,s4)) return
write (1,101) l1(2),l1(3)
goto 100
101 format(1h ,a5,a2)
******* part 2 ******
complex sp1,sp2
logical l1(9)
integer onelet,sevlet(6)
data sevlet/"101,"103,"105,"15,"12, 0/
data onelet/"10/
data l1(1),l1(2)/5, 'start'/
call cmbstr(l1(2), append,2,'more1','more2')
call setstr(l1(2),40,-1)
call cmbstr(l1(2),pad,2,'first half','second half')
call cmbstr(sp1,numeric,1,onelet)
call cmbstr(sp2,numeric, 5,sevlet)
part one shows how to use "chkstr". the three-statement
loop starting at 100 is keyed on the return value of
"chkstr". also note that "left" is initialized to zero in a
data statement.
the result of executing part one is to write out seven
characters three times. in particular '1122 33' is written
out the first time; '44 5566' is written out the second
time; and ' 7788 ' is written out the third time.
after the execution of part two, sp1 will equal "backspace"
and sp2 will equal 'ace<crlf>'. as regards the other two
calls to "cmbstr", the first will set the length of l1 to 15
and its value to 'startmore1more2', and the second will set
its length to 40 and its value to:
'first halfsecond half '.
there is another level-2 copying routine, and it is called
"copchr". it's sole purpose is to deal efficiently with
single bytes of arbitrary size.
copchr
usage: call copchr(str-ptr1,index1,str-ptr2,index2)
this routine implements the assignment, string1(index1)
= string2(index2), where string-ptr1 points at the
string starting at string1 (i.e. the first character
Page 31
of string1 is denoted by string1(1)) -- and similarly
for string-ptr2 and string2.
if index1 is less than or equal to 1, 1 is assumed.
if index2 is less than zero, the potential for
"negative bytes" exists. in other words, if index2
were -3, the third byte of string2 would be picked up
and its left most bit would left extended -- treated as
a sign bit.
if index2 is zero, 1 is assumed.
"copchr" makes no attempt to detect if either index1 or
index2 is too large -- out-of-bounds.
examples:
complex bldstr
complex sp1,sp2
sp1=bldstr(i,1,0)
call setstr(sp1,1,0,36)
sp2=bldstr('abcde',5,0)
call copchr(sp1,1,sp2,4)
call copchr(sp2,2,sp1,1)
i=-5
call copchr(sp2,5,sp1,1)
i=0
call copchr(sp1,1,sp2,-5)
one of the primary (potential) uses of "copchr" is to deal
with "compressed" numbers. this can be done by copying such
a number into a byte whose byte size is thirty-six, i.e. a
full word.
the first pair of "copchr"s copies a right-justified 'd'
into "i", and then modifies the ascii string to equal
'adcde'. the second pair of "copchr"s places a compressed
-5 in the third byte of sp2 and then restores that -5 to "i"
after clobbering "i" in the assignment, i=0.
5.3 the data-directed string searching routine
the discussion of string searching at the beginning of
section 3.4 also applies to "fndstr", the data-directed
searching routine.
fndstr
for mode = not idxend.
usage: fndstr(host, not idxend, s1)
this routine usage causes "fndstr" to search the host
string for s1. if it is found, the starting-position
of the matched substring is returned in r0, otherwise 0
is returned in r0.
for mode = idxend.
Page 32
usage: fndstr (host, idxend, s1)
this routine usage causes "fndstr" to search the host
string for s1. if it is found, the ending-position of
the matched substring is returned in r0, otherwise 0 is
returned in r0.
for mode = partial.
usage: fndstr(host, partial, pos1, pos2, s1)
this routine usage causes "fndstr" to search only part
of the host string for s1, and "bndstr(host, pos1,
pos2)" is the substring searched. if s1 is found, the
starting-position of the matched substring within
"host" is returned in r0, otherwise 0 is returned in
r0.
(note: as before, pos2 equal to zero means assume the
ending-position of "host").
for mode = partial + anchor.
usage: fndstr(host, partial + anchor, pos1, pos2, s1)
processing is as with "partial and not anchor" except
that it is now only necessary that the first character
of s1 be within the bounds specified by pos1 and pos2
-- rather than all of s1. in other words, this usage
is the generalized solution of the problem posed by,
"it's a long string, but its known to start between
the (pos1)th and (pos2)th characters of the editing
buffer".
for mode = anchor + not partia.
usage: fndstr(host,anchor + not partia,s1)
this usage exists as a convenience. it is equivalent
to specifying "partia" and "anchor" together and
setting pos1 to (1) and pos2 to (2). in other words,
the first character of the host must match the first
character of (one of) the search string(s).
for mode = multiple.
usage: fndstr(host, multiple + others, n, s1,...,sn)
specifying the "multiple" switch tells "fndstr" to
expect a count (n) and (n) search strings as the last
arguments in its argument list. all searching is as
specified in the earlier usages except that there are
now (n) search strings rather than one search string.
for mode = partial + multiple.
usage: fndstr(host, multiple + partial + others, pos1,
pos2, n, s1,...,sn)
this usage is shown explicitly only to show the form of
the argument list when both partial and multiple are
specified.
Page 33
for mode = multiple and which.
usage: fndstr(host, which + multiple + others, n,
s1,...,sn)
if one of the search strings is found within the host,
say the (i)th search string, (i) will be returned in
r1. otherwise zero will be returned in r1.
for mode = not which.
usage: fndstr(host,not which + others, s1)
if (one of) the search string(s) is found within the
host, its length will be returned in r1. otherwise
zero will be returned in r1.
examples:
complex cc,ccext
integer ic(2),icext(2)
equivalence (cc,ic),(ccext,icext)
complex fndstr,np
logical l1(2)
real*8 digits,filnam
data l1/3,'123'/
data filnam/'file.ext'/
data digits/'0123456789'/
cc=fndstr(digits,multip+which,2,'345',l1(2))
cc=fndstr(digits,multip,2,'345',l1(2))
mode=multip+which+partia
ccext=fndstr(filnam,mode,2,8, 2,np('.'),np('['))
if (iext(2).eq.2) goto noext
******* part two *******
integer hasdev,hasppn
real*8 filspc
complex np,filspc
integer fndstr
data filspc/'d:f.x[1,2]'/
mode=partia+anchor+idxend
hasdev=fndstr(filspc, mode,2,8,np(':'))
hasppn=fndstr(filspc,partia+anchor,hasdev,0,np('['))
part one, among other things, shows a potential use of a
subsidiary value. the if-statement after the search of
"filnam" checks to see whether the filename was ended by a
directory or an extension. in this case it is ended by an
extension. note also that an index of 5 is returned in r0.
the first search of "digits" returns 2 in r0 and 2 in r1
also. the second search of "digits" is identical except for
the subsidiary information returned. this time the length
of l1, 3, is returned in r1.
the two searches in part2 illustrate how a string can be
"stepped" thru. the first search of "filspc" returns 3 in
r0 (ie. the starting-position of the file name). note that
Page 34
the minimum number of characters is searched -- assuming no
extraneous blanks. the second search takes advantage of the
restricted choice provided from the first search by using
"hasdev" as its pos1. and as noted above, the pos2 of 0
causes the rest of "filspc" to be in the search path. the
result of this search is to set "hasppn" to 6, and the
combined information of "hasdev" and "hasppn" provides the
starting and ending position of the file name, 'f.x'.
5.4 character-searching
before describing the actual searching routines, the
routines which manipulate boolean character tables will be
delineated.
5.4.1 manipulating boolean character tables
as desscribed in section 4.3, one aspect of identifying a
particular boolean character table is its bit-index. in
order to make it possible to specify more than one (bct)
simultaneously, all of the character-search related routines
accept the bit-index information in an encoded form. in
particular, to identify the (i)th boolean character table of
a storage-block, one passes a bit mask which has its (i)th
bit turned on. for example to pass a bit index of 35, one
would set the mask equal to 1; and to pass a bit index of
zero, one would set the mask to the octal quantity "400000
000000"; and to pass both simultaneously, one would set the
mask to the octal quantity "400000 000001".
tazstr
usage: call tazstr(storage-blk, mask)
this routine will remove all characters from the
specified table(s).
taostr
usage: call taostr(storage-block, mask)
this routine will place all characters in the specified
table(s).
tonstr
usage: call tonstr(storage-block, mask, string1)
this routine will place (add) each of the characters in
string1 into the specified table(s).
tofstr
usage: call tofstr(storage-block, mask,string1)
this routine will remove each of the characters in
string1 from the specified table(s).
Page 35
tabstr
usage: call tabstr(strorage-block, mask, string1)
this routine combines most of the above functions.
calling "tabstr" with a mask in which exactly one bit
is off will cause "tabstr" to do the equivalent of:
call taostr(storage-block, .not. mask)
call tofstr(storage-block, .not. mask, string1)
calling "tabstr" with a mask in which at least two bits
are off will cause "tabstr" to do the equivalent of:
"call tonstr(storage-block, mask, string1)".
5.4.2 character-searching routines
the names of the character searching routines are patterned
after the names of the string searching routines. for each
string searching routine, xxxstr, there is a corresponding
character searching routine, xxxchr.
the power which derives from the ability to simultaneously
pass more than one (bct) to a character searching routine
(or table manipulating routine) is not plainly apparent.
what it allows one to do is set-operations with groups of
characters (i.e. one group = one (bct)). for example, if
bct(block1,1) = "the vowels" and bct(block1,2) = "the
digits", passing a mask set to 3 will cause the character
searching routine to match either the vowels or the digits.
the power inherent in the ability to easily invert a boolean
character table (ie. tofstr) is also not apparent. for
instance, if one wished to find the first arbitrary length
sequence of blanks, tabs, carriage returns, and line feeds,
i.e. "span(these 4 characters)", one could set a (bct) to
these 4 characters and find the first such character with
one of the character searching routines. then one could set
a second (bct) to all characters but these four characters
and find the 1st occurence of a character from this second
table. the string between these two poinst would be the
"span".
for each of the descriptions below, let:
bct(block1, unencoded-mask) = c1 !! c2 !! ... !! cn
where (ci) is an arbitrary character.
befchr
usage: string-ptr = befchr(host, block1, mask)
the output behavior of this routine is completely
equivalent to the output behavior of:
"befstr(host, n, c1, c2, ..., cn)".
whichr
usage: string-ptr = whichr(host, block1, mask)
the output behavior of this routine is completely
Page 36
equivalent to the output behavior of:
"whistr(host, n, c1, c2, ..., cn)".
aftchr
usage: string-ptr = aftchr(host, block1, mask)
the output behavior of this routine is completely
equivalent to the output behavior of:
"aftstr(host, n, c1, c2, ..., cn)".
allchr
usage: string-ptr = allchr(host, block1, mask,
bef-ptr, aft-ptr)
the output behavior of this routine is completely
equivalent to the output behavior of:
"allstr(host, bef-ptr, aft-ptr, n, c1, c2, ..., cn)".
usage: string-ptr = allchr(host, block1, mask,
bef-ptr,0)
this usage is analogous to the "allstr" usage in which
the aft-ptr argument is zero. string-ptr is set to
latter part of the host starting with the matched
character, and bef-ptr is set to the part of the host
before the matched character.
usage: string-ptr = allchr(host, block1, mask,
0,aft-ptr)
this usage is of course analogous to the similar
"allstr" usage; string-ptr is set to the beginning of
the host thru the matched character, and aft-ptr is set
to remainder of the host after the matched character.
fndchr
the data-directed character searching routine is called
"fndchr". its possible modes are similar to "fndstr"s but
there are differences which will be outlined below. also
"fndchr" returns a different piece of subsidiary information
in r1. if the character search is successful, "fndchr" will
return in r1 the numeric code of the character which matched
a character in the host, otherwise it will return zero in
r1.
for mode = not idxend.
usage: fndchr(host, not idxend, block1, mask)
the output behavior of this routine is completely
equivalent to the output behavior of "fndstr(host, not
idxend + multiple, n, c1, c2, ..., cn)" except for the
subsidiary information in r1.
for mode = idxend.
usage: fndchr(host, idxend, block1, mask)
Page 37
the output behavior of this routine is completely
equivalent to the output behavior of "fndstr(host,
idxend + multiple, n, c1, c2, ..., cn)" except for the
subsidiary information in r1.
for mode = partial.
usage: fndchr(host, partial, block1, mask, pos1, pos2)
the output behavior of this routine is completely
equivalent to the output behavior of "fndstr(host,
partial + multiple, pos1, pos2, n, c1, c2, ..., cn)"
except for the subsidiary information in r1.
(note: as before, pos2 equaling zero means assume the
ending-position of the host).
for mode = anchor + not partia.
usage: fndchr(host,anchor + not partia, block1, mask)
this usage exists solely as a convenience. it is
equivalent to specifying "anchor" and "partia" together
and setting pos1 to (1) and pos2 to (2).
for mode = backwards.
usage: fndchr(host, backwards + others, block1, mask)
this switch setting will cause "fndchr" to find the
last occurence within "host" of a character in
bct(block1, unencoded-mask), rather than the first.
and as the switch name suggests, "fndchr" does this by
searching the host string from its last character to
its first character rather than vice versa.
examples:
integer fndchr,innrp,innlp,mask
complex sp1,sp2,sp3
complex bndstr,aftchr,befchr
dimension table(128)
real*8 numlst, expres
data numlst /' +73 24'/
data expres /'(i+(j+k)-)'/
mask="1
call tabstr(table,mask,'aeiou')
mask=.not. 2
call tabstr(table,mask,' ')
call tazstr(table,1)
call tonstr(table,1,'1234567890')
call tonstr(table,1,'.-+')
call tofstr(table,2,' ') !a tab
call taostr(table,4)
call tofstr(table,4,'+-1234567890')
sp1=aftchr(numlst,table,2)
sp2=befchr(sp1,table,4)
call tabstr(table,8,np('('))
Page 38
call tabstr(table,16,np( ')' ))
innlp=fndchr(expres,backwa, table,8)
innrp=fndchr(expres,partia+idxend, table, 16)
sp3 = bndstr(expres, innlp,innrp)
sp1 = allchr(expres,table,8,sp2,0)
innlp = fndchr(expres,anchor,table,8)
the first "tabstr" will set up a table which will match any
of "a", "e", "i", "o", "u", and the second "tabstr" will set
up a (bct) which will match the first non-blank character
encountered. note that both (bct)'s are in the same
128-word block.
the "tazstr" and "tonstr" respectively zero the table
containing the vowels and replace it with a table containing
the digits. the second "tonstr" illustrates that a (bct)
can be added to by including the signs within the digit
table. the first "tofstr" shows this same principle in
converse by adding non-tab into the concept of non-blank.
lastly the "taostr" and "tofstr" create a new table which
contains everything but the digits.
the calls to "aftchr" and "befchr" are used to set up sp2 to
point at '+73'. they use tables 34 (mask=2) and 33 (mask=4)
to respectively strip off leading blanks and then find the
first non-numeric character in the left-truncated string
pointed at by sp1.
the calls dealing with "express" show how to find the
innermost parenthesized expression in a string. in
particular sp3 will be caused to point at '(j+k)'. the
technique used to setup sp3 is to find the rightmost left
parenthesis by searching backwards thru "expres". and then
using that as a context, search forward until the first
right parenthesis is found. note the use of "idxend" to
insure that the positions actually returned are before the
left parenthesis and after the right parenthesis.
the last two calls cause sp1 to be set to 'i+(j+k)-)', sp2
to '(', and innlp to 0. note that the second of these calls
is intuitively equivalent to saying, "is the character i am
looking at any of those i am interested in".
5.5 conversions and mappings
the routines, "cnvstr" and "mapstr", respectively implement
string <---> numeric transformations and string <---> string
transformations. "cnvstr" will be described first.
cnvstr
for mode = not toascii.
usage: i = cnvstr(integer1, string1, base, not
Page 39
toascii)
this routine usage will convert string1 into a
fixed-point number and copy it into integer1, where
string1 is the string representation of a number in
base, "base". as regards string1 format, it may
contain leading blanks and may optionally have a minus
sign immediately preceding the high order digit of the
number. if string1 is not the representation of a
legal number in base "base", (i) will be set to zero
and integer1 will be left unchanged.
(note: usually one would expect "base" to be 10 or 8).
for mode = always + not toascii.
usage: i = cnvstr(integer1, string1, base, always +
not toascii)
same as before except that even if string1 is not the
representation of a legal number, integer1 is set to
the "converted" string. as before, (i) is set to -1
for a "good" number and 0 for a "bad" number.
for mode = toascii + not zeropad.
usage: i = cnvstr(string1, integer1, base, toascii +
not zeropad)
this routine will convert the fixed-point number,
integer1, into the string which represents integer1 in
base, "base". if the number of characters needed to
represent integer1 in base "base" is greater than the
length of string1 at the time of the call, "cnvstr"
will return 0 -- signalling the failure of the
conversion. otherwise it will return -1 and right
justify the string representation of integer1 within
string1 with respect to the length of string1 at the
time of the call (i.e. the low-order digit of integer1
will be located at:
"vecstr(string1,1,lenstr(string1))").
if there is room, string1 will be padded with leading
blanks. if integer1 is negative, the minus sign will
be to the right of the leading blanks, if any.
for mode = nofill + toascii.
usage: i = cnvstr(string1, integer1, base, nofill +
toascii)
the conversion is as with the previous usage. what
"nofill" causes is the left justification of the
converted integer within string1. additionally the
length of string1 will be adjusted so that there are no
trailing characters after the low order digit of the
converted number. failure will be signalled only if
the converted number requires more characters than the
maximum of string1.
for mode = zeropad + toascii.
Page 40
usage: i = cnvstr(string1,integer1, base, zeropad +
toascii)
this usage is as with "zeropad" turned off except that
"cnvstr" will generate leading zeroes rather than
leading blanks when there is room. note also that if a
minus sign is needed it will be to the left of the
leading zeroes rather than to their right.
mapstr
the usages of "mapstr" will be described now. note that
"mapstr" conforms to the rules described in section 3.2 for
bounds checking and setting up a completion value as the
return variable.
for mode = translate.
usage: i = mapstr(string1, string2, translation,
translate)
with this switch setting, "mapstr" will, while copying
string2 to string1, translate each of the characters in
string2 by the fixed-point number, "translation". for
instance one could use this routine to convert a string
of lower-case letters to a string of upper-case letters
or vice versa. in particular, one would respectively
set "translation" to -32 and 32.
for mode = translate + bounds + not yesbound.
usage: i = mapstr(string1, string2, translation,
translate + bounds + not yesbound, bounding)
this setting is a generalization of the previous switch
setting. a character in string2 will be translated
only if it is not between the bounds specified by
"bounding". "bounding"s left half contains the lower
bound and its right half contains the upper bound. a
character is considered outside of the bounds only if
it is less than the lower bound or greater than the
upper bound. in other words the bounds are inclusive.
this sort of call can be used to convert a mixed group
of upper and lower case characters to either all upper
or all lower case.
for mode = translate + bounds + yesbound.
usage: i = mapstr(string1, string2, translation,
translate + bounds + yesbound, bounding)
this switch setting is identical to the previous
setting except that translation only occurs if the
character is in the range specified by "bounding"
rather than outside of it.
for mode = toascii.
usage: i = mapstr(string1, string2, 0, toascii)
Page 41
this switch setting will cause the character (byte)
size of string2 to be forced to 6 and the byte size of
string1 to be forced to 7. and then sixbit to ascii
conversion will be done.
(note: the result of attempting to convert ascii
characters in the two ranges below octal 40 and above
octal 140 is undefined).
for mode = not toascii.
usage: i = mapstr(string1, string2, 0, not toascii)
this switch setting will force byte sizes to 6 and 7
respectively and then do ascii to sixbit conversion of
string2 to string1.
examples:
dimension sparea(2)
complex sp1
integer cnvstr,mapstr
integer istr,inum
i=cnvstr(inum,' -12',10,0)
i=cnvstr(inum,' -12',8,0)
i=cnvstr(inum,' -12 ',10,0)
i=cnvstr(istr,-20,10, toasci)
i=cnvstr(istr,-20,8,toasci)
i=cnvstr(istr,-20,10,zeropa+toasci)
i=cnvstr(istr,-20,10,nofill+toasci)
i=mapstr(sp1,'12325',"40,transl)
mode = transl+bounds+yesbou
i=mapstr(istr,'abc45',"20,"000061 000071)
i=mapstr(istr,'abc45',-"20,"000101 000111)
call setstr(sp1,0,12,6)
i=mapstr(istr,'123456',0,0)
the first "cnvstr" sets "inum" to -12 while the second
"cnvstr" sets its to -10 since "cnvstr" was told to treat
the '-12' as an octal number. the third "cnvstr" will fail
(ie. set (i) to 0) since the trailing blank is spurious.
also "inum" would be left with its old value.
the first time "istr" is set, it will be set to ' -20'.
the second time it will be represented octally and set to '
-24'. the third string setting "cnvstr" will set "istr" to
'-0024', and the last one will set it to '-24' followed by
two unknown characters since an integer string variable
cannot have its length set to other than five. if sp1 had
been the destination string, it would have pointed at '-24'
and had a length of 3.
the first "mapstr" will set "istr" to 'abcde', as will the
second. note that the octal code of "1" is 60 and "9", 71.
the third "mapstr" will translate in the other direction
and set "istr" to '12345'. finally the last "mapstr" will
create a sixbit string corresponding to the ascii '123456'.
Page 42
note that the byte size of sp1 was previously set to 6 by
the "setstr".
Page 43
6.0 error conditions
it is possible to control the error detection and error
message facilities of "strlib" in two ways.
1) assembly parameters -- if the symbol, "check" is given a
non-zero value, no error checking will be done by any of
the entry points of "strlib".
if the symbol, "messag", is given a non-zero value,
checks will be made and overflows corrected -- but no
error messages will be generated.
2) load-time parameter -- if the global symbol, "str.nw" is
appropriately used in a link-10 "define" switch, message
generation can be turned off:
.r link
*exampl,string/search/define:str.nw:-1
note that setting str.nw to -1 does note affect the
process of overflow detection and correction.
6.1 the defined conditions
all messages generated by "strlib" are warnings in the sense
that they are "%" messages. each message is associated with
a standard six character mnemonic of the form, "str<code>".
the three-letter codes are organized such that they can be
quite useful in debugging. corresponding to each code is a
global symbol of the form <code>$ such that this is the
location branched to when the condition is encountered.
the messages:
1) %strllz. length less than zero
an entry point was passed a string length which was less
than zero; the string length is set to zero.
2) %strlem. length exceeds maximum
an entry point was passed a string length which exceeded
the maximum for that string. the string length is caused
to equal to the maximum.
3) %strnss. no source strings (count under 1)
the count passed to one of the concatenative or
string-search routines was less than one. in this event
the failure path is taken by the called routine.
4) %strciv. code invalid value (not 0-5)
"cmpstr" was passed an illegal code. this causes the
failure path to be taken.
5) %strspe. 2nd position past end of string
in either "fndchr" or "fndstr", pos2 was greater than is
sensible. it is reduced to the largest meaningful value.
Page 44
6) %strsli. 1st position such that string length increased
one of the string relative functions was used to generate
a superstring rather than a substring. the change is
simply allowed to occur.
7) %strfes. 1st position exceeds second
in effect a zero length string was being used as a host
string; the failure path is taken.
8) %struof. under or overflow of string pointer length or
maximum
one of the string relative functions was caused to create
an illegal value for maximum or length. less than zero
values are set to zero, and greater than 2**18-1 values
are set to 2**18-1.
9) %strmli. maximum and length inconsistent
"bldstr" was passed inconsistent values; it will
increase the maximum to match the length passed it.
10) %strrpu. replacement unsuccessful:
a second message is printed after this message (e.g.
%strlem). in any event, the failure path is taken.
11) %streps. end of substring past end of string.
one of the string relative functions was used to create a
string pointer which was partially out-of-bounds of the
original string. the user is simply allowed to do this.
12) %stridt. string argument has illegal data type - null
string assumed
Page 45
7.0 implementation characteristics
each routine in "strlib" can be invoked by any program which
makes use of the standard calling sequence (e.g. fortran-10
and cobol). additionally, note that any routine can be
called (if one does not need the return value) or invoked as
a function. in any event though, all routines always
preserve registers two and up.
the internal format of a string pointer is as follows:
word 1: byte pointer to first character in the string
such that an "ildb" would load it.
word 2: left side is 0 or the maximum allowed length
right is the current length
fortran literals are actually a special case of asciz
strings -- strings that are terminated by a nul character.
if one is programming in macro, for instance, "lenstr" and
"setstr" can be used to access and set the length of an
asciz string. data-type information, including type asciz,
is communicated in an argument list. for a description of
the mechanism, see an appendix of the fortran language
manual. note also that if no type code is specified for a
string argument, it is assumed to be a string pointer.
the full argument type table is: (bits 9-12)
0/ string pointer
1/ data-varying string
2/ integer string (ie. length always 5)
3/ illegal
4/ real string (ie. length always 5)
5/ illegal
6/ illegal
7/ illegal
10/ double string (ie. length always 10)
11/ same as 10.
12/ illegal
13/ illegal
14/ string ptr
15/ string ptr
16/ illegal
17/ asciz string (ie. a literal)
7.1 "strlib" configurations
normally the routines of "strlib" will load into the low
segment, but by setting the assembly parameter "high" to
zero, the routines of "strlib" will reside exclusively in
the high segment. in point of fact, one could build a .shr
file containing all of string if one wished.
with field image "strlib", one has complete byte size
generality in the sense that all routines will work
Page 46
correctly with all valid byte sizes (1-36). if one wishes
to deal solely with ascii strings, one can set the assembly
switch "anysiz" to 1.
although bounds checking will have no effect if one never
passes a maximum length to either "setstr" or "bldstr",
there must still be checks to see if a maximum has been
specified. if one wishes to eliminate the concept of bounds
checking, so to speak, from "strlib", one sets the assembly
switch "bnd.ch" to 1.
Page 47
8.0 a programming example
the following has been abstracted from the running of an
actual control file.
.r fortra
**exastr,tty:=exastr
00001 complex sp1,trcstr,bldstr
00002 integer allrep
00003 dimension l1(5)
00004 data l1/'aaaaaaaaaa'/
00005
00006 sp1=bldstr(l1,10,0)
00007 i=allrep(sp1, trcstr('aa'),bldstr(
'1111',5,0))
00008 if (.not. i) type 88
00009 type 101,l1,sp1
00010 88 format(' bombed')
00011 101 format(1h ,5a5,2o12)
00012 end
subprograms called
allrep
bldstr trcstr
%ftnwrn main. no fatal errors and 1 warnings
00001 integer function allrep(sp1,sp2,sp3)
00002 complex sp1,sp2,sp3,tp
00003 integer pos1,len2,len3
00004 integer fndstr, repstr,lenstr,newpos
00005 complex vecstr, bndstr
00006
00007 len2 = lenstr(sp2)
00008 len3 = lenstr(sp3)
00009 pos1 = 1
00010
00011 allrep=-1
00012 10 tp=bndstr(sp1, pos1, 0)
00013 newpos=fndstr(tp,0,sp2)
00014 if (newpos.eq.0) return
00015 if (.not. repstr(sp1,
vecstr(tp,newpos,len2), sp3)) go to 88
00016 pos1 = pos1 + newpos - 1 + len3
00017 go to 10
00018
00019 88 allrep=0
00020 89 format('?arpfai. aborted')
00021 end
subprograms called
Page 48
lenstr
bndstr repstr vecstr fndstr
allrep no errors detected
.ex exastr,string/lib
link: loading
[lnkxct exastr execution]
1111 1111 1111 1111 1111 440700000143000000000031
end of execution
cpu time: 0.05 elapsed time: 0.08
exit
this example consists of a "testing" main program and a
"user-written" string manipulation program which will
replace all occurences of a given string with a second
string. it is important to note that "allrep" expects its
three arguments to be string pointers. this is the case
because a fortran subprogram cannot do data-type checking as
"string can. similarly generic library routines like "sin"
can do numeric data type checking but fortran subprograms
had better receive exclusively real values or exclusively
double precision values.
the call to "allrep" in the main program asks "allrep" to
set every occurence of 'aa' in sp1 to '1111 '. in overview,
the technique "allrep" uses to accomplish this is to search
a shorter and shorter substring of sp1 until all occurences
of the second argument have been found. note also that sp1,
rather than a temporary, must appear in the call to "repstr"
so that the length of sp1 will be appropriately adjusted
(ie. by 3) each time 'aa' is found.
the statements of the loop:
00012) sets up the host string (for the "fndstr" which
follows) so that its first character is immediately
after the last character of the previously found
substring and its last character is the last
character of sp1 (ie. the string pointed at by sp1).
00013) searches the constructed substring of sp1 for an
occurence of sp2 (ie. 'aa').
00014) if "newpos" is set to zero, no occurence of 'aa' was
found this time thru the loop, and we are finished.
00015) the "repstr" should not fail, but if it does the
branch to "88" will be taken. the "vecstr" will
return the substring of sp1 which is the current
occurence of 'aa', and sp3 points at '1111 '.
00016) adjusts pos1 past the inserted '1111 ' by adding its
length (ie. len3) and its offset within "tp" (ie.
Page 49
newpos - 1).
the output shown from executing "exastr" is the new value of
sp1 followed by the octal representation of the string
pointer, sp1. the "31" is a decimal 25 -- the length of
sp1.