The program MatInd constructs a description for a consensus (e.g. of a transcription factor binding site) which consists of
MatInd employs an alignment algorithm based on the method described by Frech
et al. and creates the nucleotide distribution matrix by counting the
bases at each position of the alignment.
The Ci-vector is constructed by calculating the Ci-value for each position
i of the matrix:
[1] Ci(i) = (100 / ln5) * ( sum(P(i,b) * ln P(i,b)) + ln5)
where
This Ci-vector represents the conservation of the individual nucleotide positions in the matrix in numerical values and is used by MatInspector:
Ci=100 | a position with total conservation of one nucleotide |
---|---|
Ci=0 | a position with equal distribution of all four nucleotides and gaps |
MatInd also defines a core region within the matrix which is represented by the four consecutive nucleotide positions with the highest Ci-sum. This core region of the matrix is used by MatInspector to preselect potential matches.
MatInspector's large library (>600) of transcription factor binding site
matrices was created with MatInd and has been compiled on the basis of published
matrices with emphasis on sequences with experimentally verified binding capacity.
The MatInspector library also includes information on
MatInspector uses
to scan sequences of unlimited length for matches to the consensus matrix description.
The core similarity is calculated for each position of the sequence:
[2] core_sim = (sum( score(b,j))) / (sum(max_score(j)))
where
[3] mat_sim = (sum(Ci(j)*score(b,j)))/(sum(Ci(j)*max_score(j)))
where
matrix similarity = 1 | only if the candidate sequence corresponds to the most conserved nucleotide at each position of the matrix. |
---|
Multiplying each score with the Ci-value emphasizes the fact that mismatches at less conserved positions are easier tolerated than mismatches at highly conserved positions.
The output of MatInspector consists of those matches that reach the user-defined minimum core and matrix similarity. Optionally the optimized matrix threshold for each matrix can be used as cut-off criterion.
For further reading please refer to the MatInspector publications.
© 2021 Precigen Bioinformatics Germany GmbH - All rights reserved |