Nw.cab.zju.edu.cn/cab/ xueyuanxiashubumen/nx/ bioinplant.htm《生物信息学札记》樊龙江 See Expressed Sequence Tag Expect value(E)(E值) E value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. In a database similarity search, the probability that an alignment score as good as the one found between a query sequence and a database sequence would be found in as many comparisons between random sequences as was done to find the matching sequence. In other types of sequence analysis, E has a similar meaning Expectation maximization ( sequence analysis) An algorithm for locating similar sequence patterns in a set of sequences. A guessed alignment of the sequences is first used to generate an expected scoring matrix representing the distribution of sequence characters in each column of the alignment, this pattern is matched to each sequence, and the scoring matrix values are then updated to maximize the alignment of the matrix to the sequences. The procedure is repeated until there is no further improvement Exon(外显子 Coding region of DNA. See CDS Expressed sequence Tag(EsT)(表达序列标签) Randomly selected, partial CDNA sequence; represents it's corresponding mRNA dbEST is a large database of ESTs at GenBank, NCBI FASTA(一种主要数据库搜索程序) The first widely used algorithm for database similarity searching. The program looks for optimal local alignments by scanning the sequence for smal matches called words". Initially, the scores of segments in which there are multiple word hits are calculated (init1). Later the scores of several segments may be summed to generate an initn " score. An optimized alignment that includes gaps is shown in the output as"opt". The sensitivity and speed of the search are inversely related and controlled by the k-tup variable which specifies the size of a word"(Pearson and Lipman) Extreme value distribution(极值分布) Some measurements are found to follow a distribution that has a long tail which decays at high values much more slowly than that found in a normal distribution. This slow-falling type is called the extreme value distribution. The alignment scores between unrelated or random sequences are an example These scores can reach very high values, particularly when a large number of comparisons are made, as in a database similarity search. The probability of a particular score may be accurately predicted by the extreme value distribution which follows a double negative exponential function after Gumbel False negative(假阴性 A negative data point collected in a data set that was incorrectly reported due to a failure of the test in avoiding negative resultswww.cab.zju.edu.cn/cab/xueyuanxiashubumen/nx/bioinplant.htm 《生物信息学札记》 樊龙江 See Expressed Sequence Tag Expect value (E)(E值) E value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. In a database similarity search, the probability that an alignment score as good as the one found between a query sequence and a database sequence would be found in as many comparisons between random sequences as was done to find the matching sequence. In other types of sequence analysis, E has a similar meaning. Expectation maximization (sequence analysis) An algorithm for locating similar sequence patterns in a set of sequences. A guessed alignment of the sequences is first used to generate an expected scoring matrix representing the distribution of sequence characters in each column of the alignment, this pattern is matched to each sequence, and the scoring matrix values are then updated to maximize the alignment of the matrix to the sequences. The procedure is repeated until there is no further improvement. Exon (外显子) Coding region of DNA. See CDS. Expressed Sequence Tag (EST) (表达序列标签) Randomly selected, partial cDNA sequence; represents it's corresponding mRNA. dbEST is a large database of ESTs at GenBank, NCBI. FASTA (一种主要数据库搜索程序) The first widely used algorithm for database similarity searching. The program looks for optimal local alignments by scanning the sequence for small matches called "words". Initially, the scores of segments in which there are multiple word hits are calculated ("init1"). Later the scores of several segments may be summed to generate an "initn" score. An optimized alignment that includes gaps is shown in the output as "opt". The sensitivity and speed of the search are inversely related and controlled by the "k-tup" variable which specifies the size of a "word". (Pearson and Lipman) Extreme value distribution(极值分布) Some measurements are found to follow a distribution that has a long tail which decays at high values much more slowly than that found in a normal distribution. This slow-falling type is called the extreme value distribution. The alignment scores between unrelated or random sequences are an example. These scores can reach very high values, particularly when a large number of comparisons are made, as in a database similarity search. The probability of a particular score may be accurately predicted by the extreme value distribution, which follows a double negative exponential function after Gumbel. False negative(假阴性) A negative data point collected in a data set that was incorrectly reported due to a failure of the test in avoiding negative results. 132