Using DataScript
DataScript is used to manipulate data in the expressions part of a transport and in pre and post processing. (See Transforming Data). It is not intended to be a complete programming language. For quick reference see DataScript Reference
Value Expressions
In the transport window the Source and Target Columns actually represent an object called a tuple. A tuple is like a dictionary or a hash that holds key, value pairs. Each key (variable name) is associated with a variable in the tuple. Expressions often operate on the variables in the source tuple and assign the results to variables in the destination tuple. A 'Value Expression' field is associated with the 'Target Column' field in the same row. The simplest and most common operation is a direct assignment from source to destination. Put the name of the source variable in the expressions row corresponding to the destination variable. The values assigned to the Destination tuple are ultimately sent to the database or output file.
In the expressions column you assign values to the destination variables. They can be values from a source column with or without modifications or they can be derived in some other way.
Source Columns |
name |
monthlySal |
|
Value Expression | Target Columns |
name | Name |
monthlySal * 12 | yearlySal |
"Schema Research Corp." | employer |
|
The most important functions are available by selecting 'Functions' from the Transport menu. To use one, select a target column name and select the desired function from the menu. The function will appear in the 'Value Expression' for the selected column. Fill in the required values in the parenthesis. (See the function explanations under 'Pre and Post Processing' for more information on what parameters and return types functions take.)
Pre and Post Processing
In the center of the main 'Transport' window is a button marked with an ellipsis. If you click on it the Transport Customization window will be displayed. This window has a pop-up menu that contains a number of places to change the way SRTransport handles the rows as they move through from source to destination. If you are unfamiliar with the sequence of events read SRTransport Processing Architecture. What concerns us here are the four pre and post processing areas.
In pre and post processing, the language is DataScript. You can set a variable to a constant value, to some other variable value or to the result of a function call that returns a value. (Procedures that do not return a value are used with the keyword call).
You can try these examples in the Interactive DataScript window ( select 'Interactive DataScript' from the Transport menu).
Working With Variables
There are 4 types of variables that are important for processing - integer, double, string and text. There are several other types that SRTransport uses that are not needed for processing but you might see mentioned. They include 'tuple', and 'list'. As explained above a tuple is basically a dictionary, and a 'list' is basically an array.
All variables are global and affect SRTransport's processing space.
All variables must be 'declared' and should be 'removed' when no longer needed.
Variables that have been 'declared' can be used in the 'Value Expression' column of the transport.
The fields of the source tuple are available as constants. They should not be 'declared', 'removed' or changed.
If you need to manipulate them before using them in the Expressions, assign them to an intermediate variable and change that variable.
The '#' sign precedes comments.
The use of declare, set, print and remove when working with variables.
The output of print is directed to the Log output in the Transporter Room or to the console in the Log window.
declare i integer
set i = 100
print i
remove i
Variables may be initialized with a constant when they are declared.
More than one variable may be declared on the same line.
Multiple variables may be removed in one statement.
declare j integer = 100, d double
set d = 99.99
print = j + d
remove j, d
If you need a variable for processing rows, declare it in Run preprocessing and remove it in Run postprocessing.
Manipulating Strings
Strings, like all other variables must be declared.
You can assign a constant value to a string in the declaration, or use set.
Constant strings must appear in quotes.They may be single or double quoted.
If a string needs to contain one use the other to quote it with.
declare aString string = "some text"
print aString
set aString = 'some "other" text'
declare anotherString string
set anotherString = aString
If you must have both kinds of quotes in a string, double quote the string and escape double quotes with another double quote.
set aString = """A very 'confused' quote"", he said."
Strings can be concatenated using the '+' operator.
set aString = aString + " with yet more text added"
print aString
There are a number of utility functions for working with strings.
To remove white space (spaces, tabs, CR's) from either or both ends of a string use
trim(string) (both sides), ltrim(string) (left side) or rtrim(string) (right side).
set aString = " empty spaces on either end "
set aString = trim(aString)
or
set aString = ltrim(aString)
or
set aString = rtrim(aString)
To remove characters from a string use strip(string, 'characters').
set aString = '"text that includes quotes and $dollar signs"'
set aString = strip(aString, '"$')
You can set strings to all upper or lower case with upper(string) and lower(string).
set aString = 'lowercase text'
set aString = upper(aString)
set aString = lower(aString)
You can find out how long a string is with length(string).
You can find a substring within a string with position(substring, string) (returns 0 if not found).
You can select a substring from a string with substr(string, position, length).
set aString = "This long string contains a 'short string'"
declare shortString string = "short string"
set i = position(shortString, aString)
declare anotherString string = substr(aString, i, length(shortString))
print anotherString
Convert an expression to a string with convert(type, expression).
set aString = convert(string, i*12)
Substitutions can be made with three different substitute functions.
substitute1(key-string, substitute-column, key-column, table)
Look for a given key-string in the key-column of the table. If the key-string is found, return the associated value found in the substitute-column of the same table. If the key-string is not found in the key-column of the table, then the empty string is returned.
substitute2 (key-string, default-string {,value, substitute}+)
Look for a given key-string in the various values. If the key-string matches one of the values, then return the corresponding substitute value. If the key-string does not match any of the values, return the default-string. Example: substitute2 (aString, "Unknown Color", "R", "Red", "G", "Green", "B", "Blue")
substitute3 (key-string, filename)
Look for the key-string in the file and returns its substitute. If the key-string is not found, return the empty string. The file is formatted with a pipe (|) separating the key-value pairs, each pair on its own line. Example: given a file "color_map" with this data, R|Red G|Green B|Blue You would write substitute3 (aString, "color_map" )
That's it for strings - Don't forget to clean up.
remove aString, shortString, anotherString, i,
Working with the 'text' type
'text' differs from 'string' in that it can contain binary data. It can also contain ascii characters (and therefore be printable). 'text' can also be longer( megabytes - gigabytes ) whereas a 'string' is limited to 1000 characters. Some functions return 'text' types. Blobs are 'text'. To assign data to a 'text' variable use the print to keywords.
declare someText text
print to someText "some very interesting text"
print someText
remove someText
Working With Files
During processing you can read from and write to files.
To output a 'text' type to a file use redirect [text] to [filename] with print to [text].
To read the contents of a file into a 'text' use filecontents(path).
One example of this is removing blobs from a database and writing their contents to the file system.
Then store an ID and the file name in the database.
declare F text
redirect F to file "/tmp/photo.data"
print to F photo
reset F
remove F
Going in the opposite direction - Suppose you had a file, 'paths.txt',
containing the file path names for blobs you wanted to insert into a table.
Use 'paths.txt' as a source and configure one field called 'path' with a type 'string'.
In 'Run preprocessing' declare a text variable -
declare bigdata text
In 'Row preprocessing' set bigdata's value to the contents of the file in the variable 'path'-
set bigdata = filecontents(path)
then in 'Value Expression' you can use 'bigdata' as the value of the destination blob.
Important Functions
execsql(sqlString) is a powerful tool in pre- and post- processing.
It allows you to send arbitrary sql at any time during processing.
Warning - execsql() sends sql during "trial" runs of transports!!
Suppose you have a source, sTable, with these columns fname, lname, etc.
but the customer_id is in another table, aTable, with columns name, customer_id etc.
You want to combine the two in the destination table.
In 'Run preprocessing' declare some variables -
declare sql string, c_id integer
If in 'Row preprocessing' one wrote something like this -
(We want our sql to look something like this, 'SELECT customer_id FROM aTable WHERE name = "Smith, Joe"')
(one line)
set sql = 'SELECT customer_id FROM aTable
WHERE name = "' + lname + ', ' + fname + '"'
set c_id = execsql(sql)
You can now use c_id in the Value Expressions column
In 'Run postprocessing' clean up.
remove sql, c_id
sys(cmdString) (not available on "Windows")
The command is executed (by /bin/sh) and the stdout is returned as a 'text' type (can be converted to a string). This allows you to use many unix functions for processing source data. You can execute shell scripts.
For example. Suppose you have a shell script called /usr/home/myscript.sh that takes a string, myString, as an argument and returns some output that you want to assign to a string variable.
declare result string, cmd string
set cmd = "sh /usr/home/myscript " + myString
set result = convert( string, sys(cmd) )
Other Functions
isnull(a,b) If a is null, substitute b.
isnotnull(a,b) If a is not null, substitute b.
hex(string) Convert the hex codes to a string or binary value. The result of function hex() is TText.
Control Statements
DataScript has a number of control structures that are similar to C.
if - else if - else
while - break - continue
Notice that parenthesis are not required around the condition, that brackets are only required when there is more than one statement as part of the block and that you use '=' for equality (not '==' as you would in C, C++ or Java).
Break causes an exit from the while loop. Continue causes the next iteration of the loop to begin, bypassing any subsequent statements in the loop.
while [some condition]
{
if 1 < 2
print "one is less than two"
else if 1 = 2
{
print "there is something wrong with this math"
continue # goes to the top of the loop
}
else # 1 > 2
{
print "No way!"
break # exits the loop
}
print "end of loop" # prints only when 1 < 2
}
DataScript does not have a for loop but you can use a while loop to accomplish the same thing.
declare j integer = 100
declare i integer = 0
while i < j {
// do stuff
set i = i+1
}
remove i, j
Function declarations use the keyword 'procedure'. To return a value, use the return statement. If you don't explicitly return a value, then NULL is returned implicitly.
Here is a trivial example that shows the basic syntax.
procedure MyAdder ( a integer, b integer )
{
declare answer integer
set answer = a + b
return answer
}
declare i integer
set i = MyAdder( 1, 2 )
print i
SQL in DataScript
|