Perl description ---------------- [written by mike slattery - jun 1996] Perl is a language created by Larry Wall which is especially good at scanning files, string matching, and various Unix system administration tasks. It's very useful for simple database applications (e.g. class grades) as well as pattern-based tasks such as simple translation or reformatting. Perl is an interpreted language. We're discussing version 4, but there is a version 5 available these days. Perl incorporates a wide variety of features and capabilities. We'll just be talking about a subset. In particular, there are lots of magical features designed to interact with the operating system (execute system commands, list files, etc.) which we are not talking about. These notes are based on the man pages written by Larry Wall. Data types: scalars (both numeric and string), arrays of scalars, and associative arrays of scalars. Normal arrays are indexed by numbers and associative arrays by strings. A scalar is interpreted as TRUE if it is not the null string or 0. Contexts: string, numeric (these two also referred to as a scalar context), and array. Variables: Reference to scalars always begin with $, even if the scalar is part of an array. Array references (or slices) begin with @, and associative arrays begin %. Assignment to a scalar evals the RHS in a scalar context, assignment to an array or array slice evals the RHS in an array context. Multidimensional arrays are not supported. Case is significant. Ordinary names start with a letter and may contain digits and underscores. Names are case-sensitive. In addition to ordinary variables, names are also used for labels, subroutine names, and filehandles. Literals: Numeric literals as usual: 123, 43.5, .43E2 as well as hex (0x3fe5) and octal (0377). String literals are delimited by either single or double quotes. Double quoted strings are subject to backslash and variable substitution. Array literals are denoted by separating individual values by commas and enclosing the list in parentheses. An associative array literal contains pairs of values to be interpreted as a key and a value. Special variables: Here are a few of perl's many built-in variables. $[ Index of the first element in an array. This is 0 by default. $" Separator between array values when an array is substituted into a double-quoted string. This is space by default. $_ Default input and pattern-searching variable. $+ Substring matching last selected piece (via parentheses) of latest pattern match. $& Substring matching pattern in latest pattern match. $` Substring coming before matched pattern in latest pattern match. $' Substring coming after matched pattern in latest pattern match. I/O: Evaluating a filehandle in angle brackets yields the next line from that file (newline included). If the input symbol is the only thing inside the conditional of a while loop, the next line is assigned to the special variable $_ . The filehandles STDIN, STDOUT, and STDERR are predefined. The null filehandle <> reads from standard input or from files listed on the command line. Programs: A perl script consists of declarations and commands. The sequence of commands is executed once. A declaration can be put anywhere in the file, but all declarations are read before any statements are executed. Declarations are only needed for subroutines. The executable portion of the input file can be limited with a line containing just the symbol __END__ . Comments: Are indicated by # and extend to the end of the line. Statements: A BLOCK is a sequence of commands enclosed in curly braces. if (EXPR) BLOCK if (EXPR) BLOCK else BLOCK if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK The following statements begin with an optional label (which is a name followed by a colon). LABEL while (EXPR) BLOCK LABEL while (EXPR) BLOCK continue BLOCK LABEL for (EXPR; EXPR; EXPR) BLOCK LABEL foreach VAR (ARRAY) BLOCK LABEL BLOCK continue BLOCK Simple statements consist of an expression evaluated for its side-effects. Can be modified by if EXPR, unless EXPR, while EXPR, and until EXPR. Expressions: Expressions in perl are very much like those in C. Perl also includes an exponentiation operator (**) and a number of special string operators. These include . for concatenation and eq, ne, lt, gt, le, ge for string comparisons. One difference from C expressions which is often used in perl programs is the || (or) operator. In perl, this is evaluated with lazy eval and returns the last expression evaluated. This means that it returns the first true (non-zero, non-null) expression. Operators: $#name gives the subscript of the last element of @name. Built-in functions: Here is a partial list of functions in perl. Most of these allow an alternate syntax in which the parentheses are omitted. For example, chop VARIABLE is legal as well as chop(VARIABLE). chop(LIST) chop(VARIABLE) Chops off the last character of a string (or every string in LIST). If (VARIABLE) is omitted, last character of $_ is chopped. Usually used to remove newline from input. close(FILEHANDLE) Closes the file or pipe associated with the file handle. delete $ASSOC{KEY} Deletes the specified value from the specified associative array. Returns the deleted value or undefined if nothing was deleted. die(LIST) Prints the value of LIST to STDERR and exits. each (ASSOC_ARRAY) Returns a 2-element array consisting of the key and value for the next entry of an associative array (so you can iterate over it). A null array (FALSE) is returned when the entire array has been read. grep(EXPR,LIST) Evaluates EXPR for each element of LIST and returns an array value consisting of those elements for which the expression is true. index(STR,SUBSTR,POSITION) Returns the position of the first occurrence of SUBSTR in STR after POSITION (if POSITION is omitted, starts searching from the beginning of STR). keys (ASSOC_ARRAY) Returns a normal array consisting of all the keys of the named associative array. last LABEL Like the break command in C; immediately exits the loop called LABEL. If LABEL is omitted, exits the innermost enclosing loop. length(EXPR) Returns the length in characters of EXPR. next LABEL Like the continue command in C; starts the next iteration of the loop called LABEL. If LABEL is omitted, the next refers to the innermost enclosing loop. open(FILEHANDLE,EXPR) Opens the file whose filename is given by EXPR and associates it with FILEHANDLE. If the filename begins with "<" or nothing, the file is opened for input. If it begins ">" it is opened for output. Similarly, "+" denotes read/write and ">>" denotes append). pop(ARRAY) Pops and returns the last value of the array. print(FILEHANDLE LIST) Prints a string or comma-separated list of strings to the file FILEHANDLE. If FILEHANDLE is omitted, prints to STDOUT. push(ARRAY,LIST) Pushes the values of LIST onto the end of ARRAY, extending ARRAY by the length of LIST. reverse(LIST) In an array context, returns an array consisting of the elements of LIST in the opposite order. In a scalar context, LIST should be a single string, and this returns a string with the bytes in the opposite order. shift(ARRAY) Shifts the first value of the array off and returns it. sort BLOCK LIST Sort LIST and return the sorted list. If the optional code block BLOCK is present, it should be code which compares variables $A and $B and returns a negative value if $A comes first (in the desired order), 0 if they're the same, and a positive value if $B comes first. If BLOCK is omitted, the entries of list are sorted by standard string comparison order. split(/PATTERN/,EXPR) Splits a string into an array of strings and returns it. Anything matching PATTERN is taken to be a delimiter separating fields. If PATTERN is omitted, splits on whitespace (spaces, tabs, and newlines). sprintf(FORMAT,LIST) Returns a string formatted by the usual printf conventions. substr(EXPR,OFFSET,LEN) Extracts a substring from EXPR and returns it. The substring starts with character OFFSET and includes LEN characters. If LEN is omitted, returns everything to the end of the string. tr/SEARCHLIST/REPLACELIST/ Replaces (translates) all occurrences of characters in the SEARCHLIST with the corresponding character of REPLACELIST. By default, tr translates the string $_, however =~ can be used to specify a different string (see Pattern matching below). unshift(ARRAY,LIST) Attaches LIST to the front of the array and returns the number of elements in the new array. Pattern matching and substitution: /PATTERN/ Searches a string for a pattern and returns true or false. The default string to search is $_ . You can specify another string to search with the =~ or !~ operators. The syntax for searching a string $demo would be $demo =~ /PATTERN/ or $demo !~ /PATTERN/ The second form does the same search, but returns true if the search fails. s/PATTERN/REPLACEMENT/ Searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Again, the default string is $_ , but you can use =~ to specify another string. Patterns are specified by regular expressions based on those used in the Unix routine regexp (See the regexp man page for details. You'll need to use the command man -s5 regexp on studsys). These are similar to regular expressions used by a variety of Unix tools. When a pattern match succeeds, perl sets some built-in variables to record what happened. The variable $& gets set to the substring which matched the pattern, $` gets set to the substring preceding whatever was matched, and $' gets set to the substring following whatever was matched. In addition, within the patterns, you can select important pieces by enclosing them in parentheses. After the match, these pieces can be referred to by the variables $1 (for the first piece in parentheses), $2, ..., etc. If you use a pattern match containing selected pieces (by parens) in an array context, the match returns an array containing all the selected pieces (and does not set the various built-in variables). Subroutines: A subroutine is declared as follows: sub NAME BLOCK Any arguments passed to the routine come in as array @_ . The return value of the routine is the last expression evaluated, and can be either an array value or a scalar. The local operator can be used to name the arguments and create other local variables for the subroutine. A subroutine is called using the do or & operator. Subroutines may be called recursively. Passing in the arguments through the @_ array and then copying them into local vars with the local operator effectively creates a call-by-value. It is possible to get call-by-reference as well (but I won't say how here). Non-local references are handled using dynamic scope rules. local(LIST) Declares the listed vars to be local. Actually pushes the current values and restores them on exit from the subroutine. return LIST returns from a subroutine with the value LIST. It's actually preferred that you just use the automatic return of the last expression evaluated. do SUBROUTINE (LIST) Executes a SUBROUTINE declared by sub and returns the value of the last expression evaluated in SUBROUTINE. Alternate syntax: &SUBROUTINE (LIST) . Examples: Here are some very brief examples of a few of the unusual language constructs. Look up a value and use the or operator to provide a default: a$ = $list{$i} || 'null'; Double each element of the array @elements: foreach $elem (@elements) { $elem *= 2; } Print each entry in the associative array %ENV: while (($key, $value) = each %ENV) { print "$key=$value\n"; } Copy all lines from @bar to @foo except those that start with #: @foo = grep(!/^#/, @bar);