The following variables define the standard channels.
stdin : InChannel
The standard input channel, open for reading.
stdout : OutChannel
The standard output channel, open for writing.
stderr : OutChannel
The standard error channel, open for writing.
The fopen function opens a file for reading or writing.
$(fopen file, mode) : Channel file : File mode : String
The file is the name of the file to be opened. The mode is a combination of the following characters.
Binary mode is not significant on Unix systems, where text and binary modes are equivalent.
$(close channel...) channel : Channel
The close function closes a file that was previously opened with fopen.
$(read channel, amount) : String channel : InChannel amount : Int raises RuntimeException
The read function reads up to amount bytes from an input channel, and returns the data that was read. If an end-of-file condition is reached, the function raises a RuntimeException exception.
$(write channel, buffer, offset, amount) : String channel : OutChannel buffer : String offset : Int amount : Int $(write channel, buffer) : String channel : OutChannel buffer : String raises RuntimeException
In the 4-argument form, the write function writes bytes to the output channel channel from the buffer, starting at position offset. Up to amount bytes are written. The function returns the number of bytes that were written.
The 3-argument form is similar, but the offset is 0.
In the 2-argument form, the offset is 0, and the amount if the length of the buffer.
If an end-of-file condition is reached, the function raises a RuntimeException exception.
$(lseek channel, offset, whence) : Int channel : Channel offset : Int whence : String raises RuntimeException
The lseek function repositions the offset of the channel channel according to the whence directive, as follows:
The lseek function returns the new position in the file.
rewind(channel...) channel : Channel
The rewind function set the current file position to the beginning of the file.
$(tell channel...) : Int... channel : Channel raises RuntimeException
The tell function returns the current position of the channel.
$(flush channel...) channel : OutChannel
The flush function can be used only on files that are open for writing. It flushes all pending data to the file.
$(dup channel) : Channel channel : Channel raises RuntimeException
The dup function returns a new channel referencing the same file as the argument.
dup2(channel1, channel2) channel1 : Channel channel2 : Channel raises RuntimeException
The dup2 function causes channel2 to refer to the same file as channel1.
set-nonblock-mode(mode, channel...) channel : Channel mode : String
The set-nonblock-mode function sets the nonblocking flag on the given channel. When IO is performed on the channel, and the operation cannot be completed immediately, the operations raises a RuntimeException.
set-close-on-exec-mode(mode, channel...) channel : Channel mode : String raises RuntimeException
The set-close-on-exec-mode function sets the close-on-exec flags for the given channels. If the close-on-exec flag is set, the channel is not inherited by child processes. Otherwise it is.
$(pipe) : Pipe raises RuntimeException
The pipe function creates a Pipe object, which has two fields. The read field is a channel that is opened for reading, and the write field is a channel that is opened for writing.
mkfifo(mode, node...) mode : Int node : Node
The mkfifo function creates a named pipe.
$(select rfd..., wfd..., wfd..., timeout) : Select rfd : InChannel wfd : OutChannel efd : Channel timeout : float raises RuntimeException
The select function polls for possible IO on a set of channels. The rfd are a sequence of channels for reading, wfd are a sequence of channels for writing, and efd are a sequence of channels to poll for error conditions. The timeout specifies the maximum amount of time to wait for events.
On successful return, select returns a Select object, which has the following fields:
lockf(channel, command, len) channel : Channel command : String len : Int raises RuntimeException
The lockf function places a lock on a region of the channel. The region starts at the current position and extends for len bytes.
The possible values for command are the following.
The InetAddr object describes an Internet address. It contains the following fields.
A Host object contains the following fields.
$(gethostbyname host...) : Host... host : String raises RuntimeException
The gethostbyname function returns a Host object for the specified host. The host may specify a domain name or an Internet address.
The Protocol object represents a protocol entry. It has the following fields.
$(getprotobyname name...) : Protocol... name : Int or String raises RuntimeException
The getprotobyname function returns a Protocol object for the specified protocol. The name may be a protocol name, or a protocol number.
The Service object represents a network service. It has the following fields.
$(getservbyname service...) : Service... service : String or Int raises RuntimeException
The getservbyname function gets the information for a network service. The service may be specified as a service name or number.
$(socket domain, type, protocol) : Channel domain : String type : String protocol : String raises RuntimeException
The socket function creates an unbound socket.
The possible values for the arguments are as follows.
The domain may have the following values.
The type may have the following values.
The protocol is an Int or String that specifies a protocol in the protocols database.
bind(socket, host, port) socket : InOutChannel host : String port : Int bind(socket, file) socket : InOutChannel file : File raise RuntimeException
The bind function binds a socket to an address.
The 3-argument form specifies an Internet connection, the host specifies a host name or IP address, and the port is a port number.
The 2-argument form is for Unix sockets. The file specifies the filename for the address.
listen(socket, requests) socket : InOutChannel requests : Int raises RuntimeException
The listen function sets up the socket for receiving up to requests number of pending connection requests.
$(accept socket) : InOutChannel socket : InOutChannel raises RuntimeException
The accept function accepts a connection on a socket.
connect(socket, addr, port) socket : InOutChannel addr : String port : int connect(socket, name) socket : InOutChannel name : File raise RuntimeException
The connect function connects a socket to a remote address.
The 3-argument form specifies an Internet connection. The addr argument is the Internet address of the remote host, specified as a domain name or IP address. The port argument is the port number.
The 2-argument form is for Unix sockets. The name argument is the filename of the socket.
$(getc) : String $(getc file) : String file : InChannel or File raises RuntimeException
The getc function returns the next character of a file. If the argument is not specified, stdin is used as input. If the end of file has been reached, the function returns false.
$(gets) : String $(gets channel) : String channel : InChannel or File raises RuntimeException
The gets function returns the next line from a file. The function returns the empty string if the end of file has been reached. The line terminator is removed.
$(fgets) : String $(fgets channel) : String channel : InChannel or File raises RuntimeException
The fgets function returns the next line from a file that has been opened for reading with fopen. The function returns the empty string if the end of file has been reached. The returned string is returned as literal data. The line terminator is not removed.
Output is printed with the print and println functions. The println function adds a terminating newline to the value being printed, the print function does not.
fprint(<file>, <string>) print(<string>) eprint(<string>) fprintln(<file>, <string>) println(<string>) eprintln(<string>)
The fprint functions print to a file that has been previously opened with fopen. The print functions print to the standard output channel, and the eprint functions print to the standard error channel.
Values can be printed with the printv and printvln functions. The printvln function adds a terminating newline to the value being printed, the printv function does not.
fprintv(<file>, <string>) printv(<string>) eprintv(<string>) fprintvln(<file>, <string>) printvln(<string>) eprintvln(<string>)
The fprintv functions print to a file that has been previously opened with fopen. The printv functions print to the standard output channel, and the eprintv functions print to the standard error channel.
Many of the higher-level functions use regular expressions. Regular expressions are defined by strings with syntax nearly identical to awk(1).
Strings may contain the following character constants.
Regular expressions are defined using the special characters .\^$[(){}*?+.
Character classes can be used to specify character sequences abstractly. Some of these sequences can change depending on your LOCALE.
cat(files) : Sequence files : File or InChannel Sequence
The cat function concatenates the output from multiple files and returns it as a string.
grep(pattern) : String # input from stdin, default options pattern : String grep(pattern, files) : String # default options pattern : String files : File Sequence grep(options, pattern, files) : String options : String pattern : String files : File Sequence
The grep function searches for occurrences of a regular expression pattern in a set of files, and prints lines that match. This is like a highly-simplified version of grep(1).
The options are:
The pattern is a regular expression.
If successful (grep found a match), the function returns true. Otherwise, it returns false.
awk(input-files) case pattern1: body1 case pattern2: body2 ... default: bodyd
The awk function provides input processing similar to awk(1), but more limited. The function takes filename arguments. If called with no arguments, the input is taken from stdin. If arguments are provided, each specifies an InChannel, or the name of a file for input. Output is always to stdout.
The variables RS and FS define record and field separators as regular expressions. The default value of RS is the regular expression \r| |\r . The default value of FS is the regular expression [ \t]+.
The awk function operates by reading the input one record at a time, and processing it according to the following algorithm.
For each line, the record is first split into fields using the field separator FS, and the fields are bound to the variables $1, $2, .... The variable $0 is defined to be the entire line, and $* is an array of all the field values. The $(NF) variable is defined to be the number of fields.
Next, the cases are evaluated in order. For each case, if the regular expression pattern_i matches the record $0, then body_i is evaluated. If the body ends in an export, the state is passed to the next clause. Otherwise the value is discarded. If the regular expression contains \(r\) expression, those expression override the fields $1, $2, ....
For example, here is an awk function to print the text between two delimiters \begin{<name>} and \end{<name>}, where the <name> must belong to a set passed as an argument to the filter function.
filter(names) = print = false awk(Awk.in) case $"^\\end\{\([:alpha:]+\)\}" if $(mem $1, $(names)) print = false export export default if $(print) println($0) case $"^\\begin\{\([:alpha:]+\)\}" print = $(mem $1, $(names)) export
Note, if you want to redirect the output to a file, the easiest way is to redefine the stdout variable. The stdout variable is scoped the same way as other variables, so this definition does not affect the meaning of stdout outside the filter function.
filter(names) = stdout = $(fopen file.out, w) awk(Awk.in) ... close(stdout)
fsubst(files) case pattern1 [options] body1 case pattern2 [options] body2 ... default bodyd
The fsubst function provides a sed(1)-like substitution function. Similar to awk, if fsubst is called with no arguments, the input is taken from stdin. If arguments are provided, each specifies an InChannel, or the name of a file for input.
The RS variable defines a regular expression that determines a record separator, The default value of RS is the regular expression \r| |\r .
The fsubst function reads the file one record at a time.
For each record, the cases are evaluated in order. Each case defines a substitution from a substring matching the pattern to replacement text defined by the body.
Currently, there is only one option: g. If specified, each clause specifies a global replacement, and all instances of the pattern define a substitution. Otherwise, the substitution is applied only once.
Output can be redirected by redefining the stdout variable.
For example, the following program replaces all occurrences of an expression word. with its capitalized form.
section stdout = $(fopen Subst.out, w) fsubst(Subst.in) case $"\<\([[:alnum:]]+\)\." g value $(capitalize $1). close(stdout)
The Lexer object defines a facility for lexical analysis, similar to the lex(1) and flex(1) programs.
In omake, lexical analyzers can be constructed dynamically by extending the Lexer class. A lexer definition consists of a set of directives specified with method calls, and set of clauses specified as rules.
For example, consider the following lexer definition, which is intended for lexical analysis of simple arithmetic expressions for a desktop calculator.
lexer1. = extends $(Lexer) other: . eprintln(Illegal character: $* ) lex() white: $"[[:space:]]+" lex() op: $"[-+*/()]" switch $* case + Token.unit($(loc), plus) case - Token.unit($(loc), minus) case * Token.unit($(loc), mul) case / Token.unit($(loc), div) case $"(" Token.unit($(loc), lparen) case $")" Token.unit($(loc), rparen) number: $"[[:digit:]]+" Token.pair($(loc), exp, $(int $* )) eof: $"\'" Token.unit($(loc), eof)
This program defines an object lexer1 the extends the Lexer object, which defines lexing environment.
The remainder of the definition consists of a set of clauses, each with a method name before the colon; a regular expression after the colon; and in this case, a body. The body is optional, if it is not specified, the method with the given name should already exist in the lexer definition.
NB The clause that matches the longest prefix of the input is selected. If two clauses match the same input prefix, then the last one is selected. This is unlike most standard lexers, but makes more sense for extensible grammars.
The first clause matches any input that is not matched by the other clauses. In this case, an error message is printed for any unknown character, and the input is skipped. Note that this clause is selected only if no other clause matches.
The second clause is responsible for ignoring white space. If whitespace is found, it is ignored, and the lexer is called recursively.
The third clause is responsible for the arithmetic operators. It makes use of the Token object, which defines three fields: a loc field that represents the source location; a name; and a value.
The lexer defines the loc variable to be the location of the current lexeme in each of the method bodies, so we can use that value to create the tokens.
The Token.unit($(loc), name) method constructs a new Token object with the given name, and a default value.
The number clause matches nonnegative integer constants. The Token.pair($(loc), name, value) constructs a token with the given name and value.
Lexer object operate on InChannel objects. The method lexer1.lex-channel(channel) reads the next token from the channel argument.
During lexical analysis, clauses are selected by longest match. That is, the clause that matches the longest sequence of input characters is chosen for evaluation. If no clause matches, the lexer raises a RuntimeException. If more than one clause matches the same amount of input, the first one is chosen for evaluation.
Suppose we wish to augment the lexer example so that it ignores comments. We will define comments as any text that begins with the string (*, ends with *), and comments may be nested.
One convenient way to do this is to define a separate lexer just to skip comments.
lex-comment. = extends $(Lexer) level = 0 other: . lex() term: $"[*][)]" if $(not $(eq $(level), 0)) level = $(sub $(level), 1) lex() next: $"[(][*]" level = $(add $(level), 1) lex() eof: $"\'" eprintln(Unterminated comment)
This lexer contains a field level that keeps track of the nesting level. On encountering a (* string, it increments the level, and for *), it decrements the level if nonzero, and continues.
Next, we need to modify our previous lexer to skip comments. We can do this by extending the lexer object lexer1 that we just created.
lexer1. += comment: $"[(][*]" lex-comment.lex-channel($(channel)) lex()
The body for the comment clause calls the lex-comment lexer when a comment is encountered, and continues lexing when that lexer returns.
Clause bodies may also end with an export directive. In this case the lexer object itself is used as the returned token. If used with the Parser object below, the lexer should define the loc, name and value fields in each export clause. Each time the Parser calls the lexer, it calls it with the lexer returned from the previous lex invocation.
The Parser object provides a facility for syntactic analysis based on context-free grammars.
Parser objects are specified as a sequence of directives, specified with method calls; and productions, specified as rules.
For example, let's finish building the desktop calculator started in the Lexer example.
parser1. = extends $(Parser) # # Use the main lexer # lexer = $(lexer1) # # Precedences, in ascending order # left(plus minus) left(mul div) right(uminus) # # A program # start(prog) prog: exp eof return $1 # # Simple arithmetic expressions # exp: minus exp :prec: uminus neg($2) exp: exp plus exp add($1, $3) exp: exp minus exp sub($1, $3) exp: exp mul exp mul($1, $3) exp: exp div exp div($1, $3) exp: lparen exp rparen return $2
Parsers are defined as extensions of the Parser class. A Parser object must have a lexer field. The lexer is not required to be a Lexer object, but it must provide a lexer.lex() method that returns a token object with name and value fields. For this example, we use the lexer1 object that we defined previously.
The next step is to define precedences for the terminal symbols. The precedences are defined with the left, right, and nonassoc methods in order of increasing precedence.
The grammar must have at least one start symbol, declared with the start method.
Next, the productions in the grammar are listed as rules. The name of the production is listed before the colon, and a sequence of variables is listed to the right of the colon. The body is a semantic action to be evaluated when the production is recognized as part of the input.
In this example, these are the productions for the arithmetic expressions recognized by the desktop calculator. The semantic action performs the calculation. The variables $1, $2, ... correspond to the values associated with each of the variables on the right-hand-side of the production.
The parser is called with the $(parser1.parse-channel start, channel) or $(parser1.parse-file start, file) functions. The start argument is the start symbol, and the channel or file is the input to the parser.
The parser generator generates a pushdown automation based on LALR(1) tables. As usual, if the grammar is ambiguous, this may generate shift/reduce or reduce/reduce conflicts. These conflicts are printed to standard output when the automaton is generated.
By default, the automaton is not constructed until the parser is first used.
The build(debug) method forces the construction of the automaton. While not required, it is wise to finish each complete parser with a call to the build(debug) method. If the debug variable is set, this also prints with parser table together with any conflicts.
The loc variable is defined within action bodies, and represents the input range for all tokens on the right-hand-side of the production.
Parsers may also be extended by inheritance. For example, let's extend the grammar so that it also recognizes the << and >> shift operations.
First, we extend the lexer so that it recognizes these tokens. This time, we choose to leave lexer1 intact, instead of using the += operator.
lexer2. = extends $(lexer1) lsl: $"<<" Token.unit($(loc), lsl) asr: $">>" Token.unit($(loc), asr)
Next, we extend the parser to handle these new operators. We intend that the bitwise operators have lower precedence than the other arithmetic operators. The two-argument form of the left method accomplishes this.
parser2. = extends $(parser1) left(plus, lsl lsr asr) lexer = $(lexer2) exp: exp lsl exp lsl($1, $3) exp: exp asr exp asr($1, $3)
In this case, we use the new lexer lexer2, and we add productions for the new shift operations.
$(gettimeofday) : Float
The gettimeofday function returns the time of day in seconds since January 1, 1970.
The echo function prints a string.
$(echo <args>) echo <args>
The jobs function prints a list of jobs.
jobs
The cd function changes the current directory.
cd(dir) dir : Dir
The cd function also supports a 2-argument form:
$(cd dir, e) dir : Dir e : expression
In the two-argument form, expression e is evaluated in the directory dir. The current directory is not changed otherwise.
The behavior of the cd function can be changed with the CDPATH variable, which specifies a search path for directories. This is normally useful only in the osh command interpreter.
CDPATH : Dir Sequence
For example, the following will change directory to the first directory ./foo, ~/dir1/foo, ~/dir2/foo.
CDPATH[] = . $(HOME)/dir1 $(HOME)/dir2 cd foo
The bg function places a job in the background.
bg <pid...>
The fg function brings a job to the foreground.
fg <pid...>
The stop function suspends a job.
stop <pid...>
The wait function waits for a job to finish. If no process identifiers are given, the shell waits for all jobs to complete.
wait <pid...>
The kill function signals a job.
kill [signal] <pid...>
$(history-index) : Int $(history) : String Sequence history-file : File history-length : Int
The history variables manage the command-line history in osh. They have no effect in omake.
The history-index variable is the current index into the command-line history. The history variable is the current command-line history.
The history-file variable can be redefined if you want the command-line history to be saved. The default value is ~/.omake/osh_history.
The history-length variable can be redefined to specify the maximum number of lines in the history that you want saved. The default value is 100.
omake(1), omake-quickstart(1), omake-options(1), omake-root(1), omake-language(1), omake-shell(1), omake-rules(1), omake-base(1), omake-system(1), omake-pervasives(1), osh(1), make(1)
Version: 0.9.6.9 of April 11, 2006.
©2003-2006, Mojave Group, Caltech
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Jason Hickey et. al.
Caltech 256-80
Pasadena, CA 91125, USA
Email: omake-devel@metaprl.org
WWW: http://www.cs.caltech.edu/~jyh