Python exercises for beginners

Average values in a file

  • Please download file1.txt and file2.txt

  • Create a avg1.py script that computes the average of the number in the 3rd column in file1

 $ avg1.py 3 file1.txt 
0.33608439
  • Modify the avg1.py script into a avg2.py script that computes the average number in any column when the data are available either in a file or in the standard input

Hint: use sys.argv to read the argument and store the column number and the file name.

 $ avg2.py 3 file1.txt 
0.33608439
 $ cat file1.txt | avg2.py 3
0.33608439

Hint: differentiate whether you have a second argument (sys.argv[2]) or not with a if statement. To check whether you have a second argument (the name of the file), test the length of sys.argv (len(sys.argv))

  • Modify the avg2.py script in to a avg3.py script that first filter the lines in the files that contain a certain pattern
 $ cat file2.txt | avg3.py Energy 3
0.33608439
 $ avg3.py Energy 3 file2.txt 
0.33608439

Hint: use the in operation on strings:

>>> s = 'abcde'
>>> 'bc' in s
True
>>> 'ac' in s
False
>>> if 'bc' in s:
...   print "YES"
... else:
...   print "Too bad"
... 
YES

Create passwords

  • Create a makepassword.py script that creates random password of desired length
 $ makepassword.py 10
4Bu>3MgeEA

Hints:

  • create a string containing all possible characters you want to use (strings and lists are very similar in python)
  • use the random module
  • and the join function on lists:
>>> a = ['a', 'b', 'c', 'd']
>>> "".join(a)
'abcd'
>>> " ".join(a)
'a b c d'
>>> " and ".join(a)
'a and b and c and d'

Extract primary sequence in a PDB file

  • PDB file: 1EJG

  • create a getpdbseq.py script that list the amino acid residues that are present in a PDB file

 $ cat 1EJG.pdb | getpdbseq.py 
   1   THR |    2   THR |    3   CYS |    4   CYS |    5   PRO |    6   SER |    7   ILE |    8   VAL |    9   ALA |   10   ARG | 
  11   SER |   12   ASN |   13   PHE |   14   ASN |   15   VAL |   16   CYS |   17   ARG |   18   LEU |   19   PRO |   20   GLY | 
  21   THR |   22   PRO |   23   GLU |   24   ALA |   25   LEU |   26   CYS |   27   ALA |   28   THR |   29   TYR |   30   THR | 
  31   GLY |   32   CYS |   33   ILE |   34   ILE |   35   ILE |   36   PRO |   37   GLY |   38   ALA |   39   THR |   40   CYS | 
  41   PRO |   42   GLY |   43   ASP |   44   TYR |   45   ALA |   46   ASN 

Extract a subsystem from a PDB file

  • create a pdbsphere.py script that output all atoms of amino acid that are within a given distance of a given amino acid. For example:
 $ pdbsphere.py 35 6.5 1EJG.pdb

extracts all atoms (residue-based) that are within 6.5 angstroms from residue #35 (ILE)