perl(2).pdf

(90 KB) Pobierz
S* Online Course
S* Tutorial
Perl in Bioinformatics
Version 1.1
Last Revised: 19
th
Feb 2004
S* Online Course
Perl Basics
1. Preface
This is a short tutorial to help you get started programming in Perl. We will go through
several examples in developing simple application used in the area of Bioinformatics. If
you would like to learn more, you can pick up books or visit website to learn more about
Perl (see Appendix for more). There are many well written tutorials on Perl so we suggest
that you go through those tutorials which will give you a thorough understanding of Perl.
In this tutorial, we will focus on how to use Perl and BioPerl and applying them to
bioinformatics. This tutorial assumes certain levels of background knowledge and
familiarity with the use of computer.
If you find any error in this tutorial, please email to us at
course@s-star.org.
We welcome
any feedback or comments in improving this tutorial. Thanks to Alan Wardroper for his
fabulous feedback.
Thank you.
Sincerely,
S* Team
2. Motivation
Before you begin this tutorial, you may want to read an interesting article on “How Perl
Saved The Human Genome Project” by Lincoln Stein
(http://www.bioperl.org/GetStarted/tpj_ls_bio.html)
Bioinformatics is the application of information technology to manage and analyse
biological data. As more and more researchers start to deploy computers to their everyday
tasks, learning an easy and simple to use programming language to assist in storing,
organizing, analyzing data will put you in great advantage. We have decided to choose
Perl to be taught to participants of S* because of its practicality and ease of use. Many
tasks that you encounter can be easily solved using subset of Perl and usually "there's
more than one way to do it" (TMTOWTDI, pronounced "tim toady").
3. Getting Perl
Perl (Practical Extraction and Report Language) is a language which offers you the
capability and flexibility to get your job done easily. It is easy to learn and can be easily
found installed on most UNIX or Linux machines. Its strength lies mainly in the area of
text processing. However, it has grown to become a rich and sophisticated programming
language.
Last Revised: 19
th
Feb 2004
S* Online Course
Perl is freely available and freely distributable. To get a copy of Perl to be installed on
your computer, please find it at
http://www.cpan.org/.
Click on “Perl
binary distributions
("ports")”
and find the corresponding installation package for the system installed on
your computer.
For installation instruction, please follow the readme file contained in the installation
package. The readme is well written. However, should you encounter problem please
post to the IVLE forum.
If you have are using Unix/Linux, you probably already have Perl installed.
Type
perl –v
at the shell prompt to find out the version installed.
For Windows users, please download “ActivePerl” from this link to install Perl On your
computer :
http://www.cpan.org/ports/index.html#win32
4. Language Basics
Like other programming languages, Perl has its way of expressing variables. In this
tutorial, we will learn the following :
Scalar – represented by $. Stores single value only (a string or number)
o
e.g.
$dna_sequence
Array – represented by @. Stores list of values in orderly manner and accessible
by specifying the numeric position in the list. The index starts from 0.
o
e.g. @taxonomy
Hash – represented by %. Stores pairs of key/value in random fashion and
accessible by specifying the key (string).
o
e.g. %dna_sequences
TIPS / NOTE:
I. All statements in the code should end with “;”. See the examples in the
following sections.
II. Good indentation makes reading and debugging (i.e. to locate and
correct error in code) easy.
III. White space(s) is ignored. Practise good spacing to make code readable.
Last Revised: 19
th
Feb 2004
S* Online Course
4.1 Strings
Strings are just sequence of characters. In Perl, strings are represented using either single
quote or double quote. e.g. ‘hello world’. “hello Perl”
#!/usr/bin/perl
$course_no = "4th";
print "Welcome to S* $course_no Course !\n";
So, you can set a variable using a single quotes or double quotes.
$course_no = "4th";
$course_no = ‘4th’;
OR
However, when printing, using single quotes print literally the variable whereas double
quotes interpolates the content of the variable.
print ‘Welcome to S* $course_no Course !\n’;
merely prints out
Welcome to S* $course_no Course !\n.
But substituting the code with
double quotes gives us
Welcome to S* 4th Course !.
In the simple code above, you notice that there is the special character “\n” at the end of
the line. By including this special character, it forces a new line in the results. There are
several other commonly used special characters.
Character
\n
\t
\r
\xhh
Description
new line
tab
carriage return (CR)
Character with
hexadecimal code
hh
4.2 Comments
Comment is denoted by the symbol ‘#’ and is frequently used in programs to allow the
author/programmer to document the code. When it is written at the beginning of a file, it
is a directive to tell the machine where to look for the perl program. For Windows users,
it not required to write the directive code, however, it’s usually a good practice to do so
since this is required in other machines installed with Unix/Linux.
Last Revised: 19
th
Feb 2004
S* Online Course
Example Program 4.2.1: Copy and paste the code into a new file. Then type perl
<filename> to execute the program.
#!/usr/bin/perl -w
use strict;
The line “#!/usr/bin/perl” tells the machine to
run Perl from that installed location. The “-w”
indicates that any warnings should be displayed.
“use strict” asserts that all the variables in the
programs should be properly declared
# Initialize a scalar and then print the value
my $sequence = "ATGCCAGGATCGCCC";
print $sequence;
# Initialize an array of strings and then print the 1
st
value
my @taxonomy = ("archaea", "eubacteria", "eukaryota", "virus");
print $taxonomy[0], "\n";
# Initialize a hash with key/value pairs and then print the value
my %dna_sequences = (
"AY258503" => "E.coli hemolysin plasmid pO113",
"AY131333" => "E.coli 16S ribosomal RNA gene",
"AY319289" => "E.coli strain K12 transposon Tn10.10"
);
print $dna_sequences{"AY258503"}, "\n";
exit;
4.3 Operators
4.3.1 Arithmetic Operators
Operator
+
-
/
*
%
**
Description
Addition
Subtraction
Division
Multiplication
Modulus
Exponential
4.3.2 String Operators
Given two strings, Perl allow concatenation of the two by using the dot operator ‘.’.
$s1 = ‘Homo’;
$s2 = ‘Sapien’;
print $s1. " ".$s2;
# prints out Homo Sapien
Last Revised: 19
th
Feb 2004
Zgłoś jeśli naruszono regulamin