Genome BLAT by Jim Kent

 
 
 
>> 
 
 

BLAT News

Welcome to the first in an occasional series of newsletters about BLAT, Jim Kent's genome alignment software. In this issue:

Note: All software mentioned in this newsletter is copyrighted. BLAT requires a license for commercial users.

BLAT Upgrade Blasts BLAST

If you've been wanting to switch from BLAST to BLAT but dreaded reprogramming your IT structure to deal with BLAT's different output format, here's some great news.

Jim has added options to the -out switch to let you generate output in BLAST, wuBLAST, BLASTz-associated axt, or MULTIz-associated maf formats. The switch works for both BLAT and gfClient. The syntax is as follows:

-out=type   Controls output file format.  Type is one of:

     psl - Default.  Tab separated format without actual sequence
     pslx - Tab separated format with sequence
     axt - blastz-associated axt format
     maf - multiz-associated maf format
     wublast - similar to wublast format
     blast - similar to NCBI blast format

Pretty Day: psl File Format Utilities

If you are using BLAT's native output format, psl, you might have noticed that its output is more machine-readable than human readable. A new utility, pslPretty, lets you convert psl to a format that is easier on the eye.

Here's a sample piece of BLAT output in psl format:

psLayout version 3

match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q               Q       Q
Q       T               T       T       T       block   blockSizes      qStarts  tStarts
        match   match           count   bases   count   bases           name            size
start   end     name            size    start   end     count
----------------------------------------------------------------------------------------------------
-----------------------------------------------------------
420     0       0       0       0       0       3       786     +       RNA1    420     0       420
GENOMIC1        7854    3436    4642    4       70,96,222,32,   0,70,166,388,   3436,3673,4264,4610,

Now here's the same piece of output after running through the pslPretty formatter:

>RNA1:0+420 GENOMIC1:3436+4642
agtggacaaccctggccaccccttcatcaagactgtgggcatggtggctggagatgagga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
agtggacaaccctggccaccccttcatcaagactgtgggcatggtggctggagatgagga

gacctatgag------167------gtatttgctgaactgtttgaccctgtgatccaaga
||||||||||               |||||||||||||||||||||||||||||||||||
gacctatgaggtaggg...tttcaggtatttgctgaactgtttgaccctgtgatccaaga

gcggcataatggatatgaccccagaacaatgaagcacaccactgaccttgatgccagtaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
gcggcataatggatatgaccccagaacaatgaagcacaccactgaccttgatgccagtaa

a------495------attcgttctggctactttgatgagaggtatgtattgtcttcaag
|               ||||||||||||||||||||||||||||||||||||||||||||
agtgagc...cctcagattcgttctggctactttgatgagaggtatgtattgtcttcaag

agtcagaactggccgaagtatcaggggactcagtctccctccagcctgcactcgggcaga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
agtcagaactggccgaagtatcaggggactcagtctccctccagcctgcactcgggcaga

gcgaagagaggtagaacgtgttgtggtggatgctctgagtggcctgaagggtgacctggc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
gcgaagagaggtagaacgtgttgtggtggatgctctgagtggcctgaagggtgacctggc

tggacggtactataggctcagtgagatgacggaggccgaacagcagcagcttattgat--
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tggacggtactataggctcagtgagatgacggaggccgaacagcagcagcttattgatgt

----124------gaccattttctgtttgataaacctgtgtcccc
             ||||||||||||||||||||||||||||||||
gagg...cctcaggaccattttctgtttgataaacctgtgtcccc

Ok, pretty may be a relative thing. This output looks best in a constant width font, of course....

Here is the usage information for pslPretty:

pslPretty - Convert PSL to human readable output
usage:
   pslPretty in.psl target.lst query.lst pretty.out
options:
   -axt - save in Scott Schwartz's axt format
   -dot=N Put out a dot every N records
   -long - Don't abbreviate long inserts

It's a really good idea if the psl file is sorted by target if it contains multiple targets. Otherwise this will be very very slow. The target and query lists can either be fasta files, nib files, or a list of fasta and/or nib files one per line. Currently this only handles nucleotide based psl files.

In addition to pslPretty, BLAT users now have a concatenation tool called pslCat. Here's the usage information:

pslCat - concatenate psl files
usage:
   pslCat file(s)
options:
   -check parses input. Detects more errors but slower
   -nohead omit psl header
   -dir   files are directories (concatenate all in dirs)
   -out=file put output to file rather than stdout
   -ext=.xxx   limit files in directories to those with extension

Both pslPretty and pslCat are considered upgrades to BLAT, and as such they are automatically included in all current licenses!

In addition to pslPretty and pslCat, Jim has written three other utilities for working with psl files, pslSort, pslSortAcc, and pslReps. These copyrighted programs are available free of charge for your convenience. However, they are not included in the BLAT license and are therefore not formally supported. Following is the usage information for these handy utilities:

pslSort - merge and sort psCluster .psl output files
usage:
   pslSort dirs[1|2] outFile tempDir inDir(s)
This will sort all of the .psl files in the directories 
inDirs in two stages - first into temporary files in 
tempDir and second into outFile. The device on tempDir 
needs to have enough space (typically 15-20 gigabytes 
if processing whole genome)
   pslSort g2g[1|2] outFile tempDir inDir(s)
This will sort a genome to genome alignment, reflecting 
the alignments across the diagonal. Adding 1 or 2 after 
the dirs or g2g will limit the program to only the first 
or second pass repectively of the sort.

pslSortAcc - sort pslSort .psl output file by accession.
Make one output .psl file per accession.
usage:
   pslSortAcc how outDir tempDir inFile(s)
This will sort the inFiles by accession in two steps. 
Intermediate results will be put in tempDir. The final 
result (one .psl file per target) will be put in outDir. 
Both outDir and tempDir will be created if they do not 
already exist. The 'how' parameter should be either 
'head' or 'nohead'.

pslReps - analyse repeats and generate genome wide best 
alignments from a sorted set of local alignments
usage:
      pslReps in.psl out.psl out.psr
where in.psl is an alignment file generated by psLayout 
and sorted by pslSort, out.psl is the best alignment output 
and out.psr contains repeat info
options:
      -ignoreSize Will not weigh in favor of larger alignments so much
      -singleHit   Takes single best hit, not splitting into parts
      -minCover=0.N minimum coverage to output. Default is 0.
      -minAli=0.N minimum alignment ratio
                       default is 0.93
      -nearTop=0.N how much can deviate from top and be taken
                       default is 0.01
      -minNearTopSize=N   Minimum size of alignment that is near top
                       for aligmnent to be kept. Default 20.

Parasol: Ease and Comfort under the Sun (or any other cluster)

Parasol is the cluster management system for the University of California Santa Cruz kilocluster, which runs the UCSC Genome Browser and most of the University's bioinformatics jobs, such as the human/mouse alignment project. (Better example, something with BLAT!) Jim Kent wrote Parasol when he couldn't find a commercial program robust enough (and cost-effective enough) to support the needs of the cluster.

Parasol is available free of charge to any user. For information, go to the Parasol documentation web site.

Jim Kent: Upcoming Speaking Engagements and Conference Schedule

If you're hoping to run into Jim to chat about BLAT and other hot topics in bioinformatics, here is his upcoming speaking and conference schedule:

Want more info on BLAT?

Please fill out the form below to receive BLAT licensing information via email.

Send me a reprint of the Genome Research BLAT article.
I am interested in reselling BLAT.

This web site maintained by Heidi Brumbaugh, Business Manager for BLAT.
We respect your privacy! Information submitted to this site will be held in confidence.