A flood of data means that many of the challenges in biology are now challenges in computing. Bioinformatics, the application of computational techniques to analyse the information associated with biomolecules on a large-scale, has now firmly established
itself as a discipline in molecular biology, and encompasses a wide range of subject areas from structural biology, genomics to gene expression studies.
Bio Informatics Management System can be categories in the following modules.
A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name; the input sequence with a description of the type of molecule; the scientific name of the source organism from which it was isolated; and, often, literature citations associated with the sequence. View
Coding of Sequence:-
Coding of sequence means when a sequence is entered in the database it sequence will be coded with a unique id. The sequence submitted to our site will be tested in our laboratory for its uniqueness and if that sequence was not in our database ever before then this sequence will be publish in our site and provided an accession no. By This accession no this sequence in future user can access its sequence for analysis process.
Classification of Data:-
Nucleotide sequences
Genome Sequences
Protein Sequence( primary)
Protein Sequence( composite )
Protein Sequence( secondary )
Macromolecular Structures
Integrated Database
Design Algorithm:-
Analysis of data of biological sequence uses many algorithm on the bases of which analysis procedures are followed.
Interchange Between Different Formats:-There are different formats used for represent the genetic sequence in the computer.
Making Searching Algorithm:-
Searching of a sequence is very common part in sequence analysis. There is a lot of need of searching algorithm that can match a query sequence to database in less time.
Analysis on Data:-
Analysis of biological data includes alignment of sequences, searching of a sequence, mapping, protein analysis, prediction of protein structure, protein analysis . There are following analysis done on a sequence.
Alignment:- Sequence alignment is the procedure of comparing two or more sequence by searching for a series of individual characters or character patterns that are in the same order in the sequences. There are two ways in which we can match two sequences.
Models of Alignment:-There are following models of alignment
Pair wise Alignment:- In pair wise sequence alignment we take two sequences at a time ant then make an alignment of these sequence by two methods.
Global Alignment:- In global alignment an attempt is made to align the entire sequence using as many character as possible, up to both ends of sequences .sequences that quite similar and approximately the same length are suitable candidates for global alignment.
Local Alignment: - In local alignment the alignment stop at the end of regions of identity or strong similarity.
End Free Space Alignment:-Two Sequences of different length are taken for alignment and substrings are taken from these sequences under consideration which have maximum similarity.
Methods of Alignments:-These are following methods of alignment.
Dot Matrix
Brute Force
Dynamic Programming
Heuristics Methods
Multiple Alignments:- In multiple alignments the alignment is done with three or more sequences. Multiple alignment of a set of sequences provides information as to the most alike regions in the set. If the structure of one or more members of the alignment is known it may be possible to predict which amino acid occupy the same spatial relationship in other protein in the alignment.Blosum62 matrix is commonly used for this purpose.
Database Sequence Searching:-We can search any sequence by its accession no which is provided to it and can be used in analysis. We can also search a sequence that is similar to the query sequence. Sequence can be search by following algorithms/methods.
BLAST Tool:-Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold.
BLAST searches for high scoring sequence alignments between the query sequence and sequences in the database using a heuristic approach that approximates the Smith-Waterman algorithm. The exhaustive Smith-Waterman approach is too slow for searching large genomic databases such as GenBank. Therefore, the BLAST algorithm uses a heuristic approach that is slightly less accurate than Smith-Waterman but over 50 times faster. The speed and relatively good accuracy of BLAST are the key technical innovation of the BLAST programs and arguably why the tool is the most popular bioinformatics search tool.
The BLAST algorithm can be conceptually divided into three stages.
In the first stage, BLAST searches for exact matches of a small fixed length W between the query and sequences in the database. For example, given the sequences AGTTAC and ACTTAG and a word length W = 3, BLAST would identify the matching substring TTA that is common to both sequences. By default, W = 11 for nucleic seeds.
In the second stage, BLAST tries to extend the match in both directions, starting at the seed. The ungapped alignment process extends the initial seed match of length W in each direction in an attempt to boost the alignment score. Insertions and deletions are not considered during this stage. For our example, the ungapped alignment between the sequences AGTTAC and ACTTAG centered around the common word TTA would be:
..AGTTAC..
| |||
..ACTTAG..
If a high-scoring un-gapped alignment is found, the database sequence is passed on to the third stage.
In the third stage, BLAST performs a gapped alignment between the query sequence and the database sequence using a variation of the Smith-Waterman algorithm. Statistically significant alignments are then displayed to the user.
FASTA:- FASTA, added the ability to do DNA:DNA searches, translated protein:DNA searches, and also provided a more sophisticated shuffling program for evaluating statistical significance. There are several programs in this package that allow the alignment of protein sequences and DNA sequences. FASTA is pronounced "FAST-Aye", and stands for "FAST-All", because it works with any alphabet, an extension of "FAST-P" (protein) and "FAST-N" (nucleotide) alignment.
Fasta takes a given nucleotide or amino-acid sequence and searches a corresponding sequence database by using local sequence alignment to find matches of similar database sequences.
The FASTA program follows a largely heuristic method which contributes to the high speed of its execution. It initially observes the pattern of word hits, word-to-word matches of a given length, and marks potential matches before performing a more time-consuming optimized search using a Smith-Waterman type of algorithm. The size taken for a word, given by the parameter ktup, controls the sensitivity and speed of the program.