The Double Helix

BLAT Genome Alignment Software

The World’s Fastest Alignment Tool

BLAT performs extremely fast mRNA/DNA alignments, and cross-species protein alignments.

BLAT is hundreds of times faster—and more accurate—than other popular tools for mRNA/DNA alignment, and dozens of times faster for protein alignment at settings typical for vertebrates (see BLAT—The BLAST-Like Alignment Tool for details).

How BLAT Works

On DNA, BLAT works by keeping an index of an entire genome in memory. Thus, the target database of BLAT is not a set of GenBank sequences, but instead an index derived from the assembly of the entire genome. By default, the index consists of all non-overlapping 11-mers except for those heavily involved in repeats, and it uses less than a gigabyte of RAM. This small size means that BLAT runs efficiently on commodity hardware and is easily mirrored.

On proteins, BLAT uses 4-mers rather than 11-mers. The protein index requires slightly more than 2 gigabytes of RAM. In practice—due to sequence divergence rates over evolutionary time—DNA BLAT works well within humans and primates, while protein BLAT continues to find good matches within terrestrial vertebrates and earlier organisms for conserved proteins.

BLAT is commonly used to look up the location of a sequence in the genome or determine the exon structure of an mRNA molecule, but expert users can run large batch jobs and make internal parameter sensitivity changes by installing command-line BLAT on their own Linux server. A license is required for commercial use.

BLAT helps scientists align DNA sequences to a genome

Still using BLAST?

Is your BLAST farm thrashing and crashing? Nodes down and out of sync? Server hard disk full again? System administrators frazzled and grumpy? Still waiting for those ESTs to align, and now there's a new genome assembly?

BLAT is not BLAST.

BLAT will do in a day what BLAST can do in a month, in many cases on a single CPU. Other advantages include:

No queues—get a response in seconds

Submit a list of simultaneous queries in FASTA format

Launch the alignment later as part of a custom track

Easy to parse native output format

Also provides BLAST-compatible format

Five convenient output sort options

Alignment block details in natural genomic order

Splice sites are found without exon bleed-over

Link into the UCSC Genome Browser (licensed separately from UCSC)

Give your staff—and your server—a rest and switch to BLAT today!

Start now!

Use Cases for BLAT

BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more.

In practice, DNA BLAT works well on primates, and protein BLAT on land vertebrates. BLAT is particularly good at aligning mRNA and ESTs to the genome, and translated alignments between species.

Find the genomic coordinates of mRNA or protein within a given assembly

Determine the exon structure of a gene

Display a coding region within a full-length gene

Isolate an EST of special interest as its own track

Search for gene family members

Find human homologs of a query from another species

Find homologs of a query in all species

Align reads against the genome to find SNPs and other polymorphisms

Cluster together redundant protein, mRNA, or EST records from GenBank

Map annotations from one version of the human genome to another

Look for duplication in the genome that may cause cross-hybridization problems in experiments

Programs Included with BLAT

Source code and executables always included!

gfServer

A server that maintains an index of the genome in memory and uses the index to quickly find regions with high levels of sequence similarity to a query sequence.

gfClient

A program that queries gfServer over the network and does a detailed alignment of the query sequence with regions found by gfServer.

blat

Client and server combined into a single program, first building the index, then using the index, and then exiting.

webBlat

A web-based version of gfClient that presents the alignments in an interactive fashion.

pslSort

Combines and sorts the output of multiple BLAT runs. The blat default output format is PSL.

pslReps

Selects the best alignments for a particular query sequence, using a 'near best in genome' approach.

pslPretty

Converts alignments from PSL format, which is a tab-delimited format that does not include the bases themselves, to a more readable alignment format.

pslCat

Concatenates PSL files.

faToTwoBit

Converts FASTA format sequence files to the dense, randomly accessible .2bit format that gfClient can use.

twoBitToFa

Converts from .2bit format back to FASTA format.

faToNib

Converts from FASTA to the somewhat less dense, randomly accessible .nib format that predated .2bit format. Note that each .nib file can contain only a single sequence.

nibFrag

Converts portions of a .nib file back to FASTA format.

In-Silico PCR

This bioinformatics tool searches a sequence database with a pair of PCR primers. It uses an indexing strategy to do this quickly. When the search is successful, the output is a FASTA format sequence file containing all the regions in the database that lie between the primer pair.

This tool includes webPCR, an interface similar to webBlat. You can try out webPCR here.

In-Silico PCR helps scientists match PCR primers to a genome

Genome Browser

On June 22, 2000, UCSC and the other members of the International Human Genome Project consortium completed the first working draft of the human genome assembly, forever ensuring free public access to the genome and the information it contains. A few weeks later, on July 7, the newly assembled genome was released on the web, along with the initial prototype of a graphical viewing tool, the UCSC Genome Browser. The Genome Browser displays annotations as a series of tracks on top of the genome.

In the ensuing years, the website has grown to include a broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing and downloading data. You can learn about the history of the Genome Browser on its history page and by watching this video.

Kent Informatics does not own the Genome Browser. This popular research tool is available for commercial license directly from UCSC.

The Genome Browser website