
                               UPDATE (part 2)
                               ~~~~~~~~~~~~~~~
                               
         The main changes since publication of the manual for PEPI 
         version 2 are described below.  These apply to CASECONT, 
         COMBINE, CONFINT, DIFFER, KAPPA, MANNWHIT, MANTELX, MATCHED, 
         PAIRS, PVALUE, RANDOM, RATES1, RATES2, SCRN, SEASONAL and WHATS. 
         
         LOGISTIK and LOGX are described in part 1 (UPDATE1.TXT)
         
         References are listed in part 3 (UPDATE3.TXT).

         ----------------------------------------------------------------

         CASECONT
         ~~~~~~~~ 
         Options for the DerSimonian-Laird procedure and for computing the 
         exact power of exact tests have been added. 
         
         Version 2.01 adds the DerSimonian-Laird procedure (DerSimonian 
         and Laird 1986) for computing an overall odds ratio based on 
         stratified data, using a random-effects model.  This may be 
         helpful in meta-analyses based on studies with heterogeneous 
         results (Fleiss and Gross 1991, Petitti 1994, Whitehead and 
         Whitehead 1991).  Unlike the usual fixed-effect model (Mantel-
         Haenszel procedure, etc.), which assumes that the studies provide 
         estimates of the same true effect, the random-effects model 
         assumes that they estimate randomly differing effects.  The 
         variation between studies is taken into account, resulting in 
         wider confidence intervals and a more conservative significance 
         test.  The procedure may be inappropriate if sample sizes are 
         very small.  
         
         Version 2.04 adds an option for computing the exact power of 
         exact Fisher and mid-P tests for comparing two independent 
         binomial distributions.  It computes conditional and expected 
         power for these tests.  Conditional power is dependent on all 
         four marginal totals, and may be appropriate when these are 
         known.  Expected power (Casagrande, Pike and Smith 1978), also 
         termed overall, average or unconditional power (Bennett and Hsu 
         1960, Hirji et al. 1994), is appropriate when a study is being 
         designed; it requires knowledge of the sizes of the two samples, 
         but allows for the probability of different numbers with the 
         attribute under study.  The program can also be used to determine 
         the sample sizes required for an exact test, by entering 
         different sample sizes in a series of trial-and-error estimations 
         of expected power (Hirji et al. 1994). 
                                                  
         When computing power, the program uses the 2x2 table that is 
         entered (see Manual, p. 33) to determine the sample sizes and (if 
         conditional power is required) the other marginal totals. For 
         expected power, the table must provide correct sample sizes; for 
         conditional power, it must also provide correct totals with and 
         without the attribute; the entries in the body of the table, 
         however, can be fictional.  The odds ratio that it is wished to 
         detect (which is ad/bc) can be entered or (optionally) determined 
         from the entries in the body of the table.             
         
         Power computation is based on the sizes of the samples, the 
         significance level (alpha) and the odds ratio that it is wished 
         to detect.  Power is computed for a two-tailed test, where P = 
         twice the smaller one-tailed P; to compute the power of a one-
         tailed test with alpha = 0.05, an alpha of 0.1 must be entered.  
         
         Conditional power is dependent on the sample sizes, alpha, the 
         odds ratio, and the total number with the attribute ("exposed") 
         under study.  The odds ratio may be more or less than 1.  The 
         program reports the actual alpha to which each power calculation 
         pertains; this may be considerably less than the nominal alpha, 
         because the distribution is stepwise (Bennett and Hsu 1960, 
         Casagrande et al. 1978a).  The critical number of "exposed" cases 
         or controls is reported; if this is decreased by 1, P exceeds the 
         nominal alpha.  

         Expected power is computed from the sample sizes, alpha, the  
         odds ratio to be detected, and the assumed proportion with the 
         attribute ("exposed") in the population represented by the second 
         sample ("controls").  The latter proportion must be entered.  The 
         program cannot compute expected power if the odds ratio is <1, 
         i.e., if the percentage of "exposed" is assumed to be higher in 
         the "controls" than in the "cases". It may be necessary to 
         transpose and re-label the "cases" and "controls" for this 
         purpose. 
                                         
         If an odds ratio of 1 is entered, the computed power is the 
         "true" type I error (Casagrande et al. 1978b). 
         
         The computation of exact probabilities and exact confidence 
         intervals for odds ratios for stratified data is now much faster, 
         because of the employment (Version 2.02) of a more efficient 
         algorithm for calculating the coefficients of the conditional 
         distribution (Martin and Austin 1991, 1996), using code from 
         David O. Martin's public-domain EXACTBB program. 

         An error in the display of the single-stratum odds ratio on the 
         screen showing exact intervals was corrected in Version 2.06. 
         
         Formulae: 
         
         The Dersimonian-Laird procedure described by Fleiss and Gross 
         (1991) is used.  For this purpose the log odds ratio for each 
         stratum, and its standard error, are computed after adding 0.5 to 
         each cell in the 2x2 table (Fleiss 1981: 165-166); the Q 
         statistic, which plays a central role in the analysis, is based 
         on these results and (as suggested by Petitti (1994: 111-113) the 
         Mantel-Haenszel estimate of the common odds ratio; 90, 95 and 99% 
         confidence intervals are displayed; chi-square is sqr(DL/se), 
         where DL is the DerSimonian-Laird estimator and se is its 
         standard error.  If Q + 1 is less than the number of strata the 
         random-effects and fixed-effect models yield identical results. 
         
         Formulae for conditional power are provided by Casagrande, Pike 
         and Smith (1978) and Andres and Tejedor (1995, formula 6), and 
         for expected power by Casagrande, Pike and Smith (1978) and 
         Bennett and Hsu (1960).  The accuracy of expected power 
         computations has been checked against a program (Hirji et al. 
         1994) kindly provided by Prof. S.E. Vollset. 

         COMBINE
         ~~~~~~~
         Version 2.07 incorporates POOLING, previously a separate program 
         for combining probabilities from independent tests (see the 
         manual). 

         Version 2.01 provides an option for the entry of 90, 95 or 99% 
         confidence limits of measures to be combined, instead of the 
         standard errors of the measure or its log. The standard error is 
         computed from the confidence interval. 
         
         Version 2.01 also adds the DerSimonian-Laird procedure 
         (DerSimonian and Laird 1986) for computing an overall measure of 
         association, using a random-effects model.  The measure may be a 
         difference between rates, proportions or means (or some other 
         measure with an approximately normal distribution and a value of 
         zero when there is no association) or an odds or rate ratio (or 
         some other measure whose log has an approximately normal 
         distribution and a value of zero when there is no association).  
         The standard errors of the measures or their logs, or the 
         confidence intervals of the measures, must be entered.  
         
         See the description of CASECONT for a note on the use of the 
         DerSimonian-Laird procedure.
         
         An option for the computation and combination of effect sizes 
         (standardized differences between two means) was added in version 
         2.01.  The effect size is defined as the difference between the 
         means, divided by the pooled standard deviation (which the 
         program computes from the sample sizes and standard deviations).  
         Being "unitless", this measure of association has been used for 
         combining the results of studies that use different measurement 
         scales; but, as pointed out by Greenland (1987), Petitti (1994: 
         123) and others, the measure may be misleading.  
         
         As a rough guide to the meaning of the overall effect size, the 
         program displays the approximate percentage of members of one 
         group who have values below the mean of the other group (Glass et 
         al. 1981: 29), based on the assumption that the population 
         variances in the two groups are equal.  An odds ratio (added in 
         Version 2.02) is computed as an additional aid to the 
         interpretation of the effect size.  This expresses the accuracy 
         with which individuals might be allocated to the two groups on 
         the basis of likelihood ratios derived from a comparison of the 
         distributions, using an arbitrary cutting point.  It is the ratio 
         of the odds in favour of correctly classifying members of either 
         one of the groups to the odds in favour of incorrectly 
         classifying members of the other group.  
                                                          
         Formulae: 
         
         The random-effects procedure of Dersimonian and Laird (1986) is 
         used.  Zero odds and rate ratios are changed to 0.00001. The Q 
         statistic, which plays a central role in the analysis, is based 
         on the data for separate strata and the precision-based estimate 
         of the common measure.  If Q + 1 is less than the number of 
         strata the random-effects and fixed-effect models yield identical 
         results.  Results may differ slightly from those provided by 
         CASECONT, RATES1 and RATES2, which use the Mantel-Haenszel 
         estimate of the common measure, not the precision-based estimate. 
        
         The analysis of effect sizes follows the lines described by 
         Petitti (1984: 119-123).  The pooled standard deviations 
         (assuming equal population variances) and the approximate 
         variances of the effect sizes are computed by formulae from 
         Hedges and Olkin (1985: 79, 80).  The formula for the odds ratio 
         (based on Tritchler 1995) is: 
                        Odds ratio = sqr[a / (1 - a)]
         where a = 1 - P
         P = one-tailed probability for Z (standardized normal deviate)
         Z = |Effect size| / 2

         CONFINT
         ~~~~~~~
         The confidence intervals computed by Zar's formulae (Manual, pp. 
         60-61) based on the F distribution (Brownlee 1965) may now be 
         regarded as exact (Fisher's) intervals, because of the use in 
         Version 2.01 of a more accurate inverse F distribution function.  
         These formulae are now used for denominators exceeding 30,000 in 
         Options A and C and (if the sum of the two numbers entered 
         exceeds 50) in Options E and G; these options can now handle very 
         large numbers.  Zar's formulae can also be selected as an option 
         for smaller denominators, by pressing 'F' after choosing Option A 
         or C, and (in Version 2.06) for confidence intervals for ratios 
         of person-time rates, by pressing 'F' after choosing option M. 
         
         Approximate mid-P intervals for proportions and rates (Options A 
         and C) are now computed (Version 2.03) in those instances where 
         exact mid-P intervals are not provided -i.e., for denominators of 
         over 30,000 or (if the numerator is zero) over 15,000. 

         The accuracy of exact Fisher's and mid-P intervals computed for 
         very large proportions (in options A and C) or ratios of person-
         time rates (in option M) in very large samples, using procedures 
         from XLIM (by A. Ray Simons), was enhanced in version 2.06, by 
         incorporating improvements made in version SP 2.5 of XLIM.  
         
         Formulae:
         
         The program uses Vollset's formulae for "a closed form 
         approximation to the mid-P exact interval" (Vollset 1993). The 
         formulae for proportion x/N are: 
                 Lower limit[x] = (LF[x] + LF[x+1] / 2 
                 Upper limit[x] = (UF[x] + UF[x-1] / 2 
         where LF and UF are the lower and upper Fisher's limits. Vollset 
         found that this method, "proposed to provide an easily computed 
         alternative to the mid-P interval, has a level of 
         conservativeness in between the mid-P and uncorrected score 
         method".  For large denominators the intervals are almost 
         identical to the true mid-P values. 
         
         For Vollset's procedure, LF[x] and UF[x] are computed by Zar's 
         formulae, and LF[x+1] and UF[x-1] either by Zar's formulae or by 
         Pratt's approximation to the exact method (Blyth 1986).  The 
         Pratt method is suggested by Vollset, who refers to these 
         approximate mid-P intervals as "mean Pratt" intervals. The 
         program uses Zar's formulae for proportions with a numerator less 
         than 50, rates with a base of 10 or 100 and a numerator less than 
         50, rates with a base of 1,000 and a numerator less than 100, 
         rates with a base of 10,000 and a numerator less than 500, and 
         rates with a base of 10,000 or more and a numerator less than 
         700. Pratt's faster method is used in other instances, when it 
         provides identical results to Zar's method, at the level of 
         precision with which the program displays results. 

         DIFFER
         ~~~~~~
         Version 2.05 provides an option for estimating the effect of 
         regression to the mean (RTM), for use in a before-after 
         comparison of two measurements in a sample selected because of 
         its extremely high or extremely low initial values (i.e., above 
         or below a given cut-off point, according to the "before" 
         measurement or the mean of two or more "before" measurements).  
         In a trial, the RTM effect may be confused with a treatment or 
         placebo effect.  The computation assumes a bivariate normal 
         distribution in the population from which the sample is selected; 
         it requires the mean, standard deviation, and correlation between 
         measurements on the two ("before" and "after") occasions in this 
         whole population.  Allowance can be made for changes in the 
         population mean or standard deviation between the two occasions, 
         e.g. as a result of aging. 

         Formulae:

         The methods are described by Yudkin and Stratton (1996); the 
         adjustments to allow for changes in the population mean or 
         standard deviation are described by Chinn and Heller (1981). 

         KAPPA 
         ~~~~~ 
         For comparisons of two sets of ratings, Version 2.04 provides 
         kappa estimates that are adjusted for bias between raters and for 
         differences between the prevalences of the categories; these are 
         BAK (bias-adjusted kappa) and PABAK (prevalence-adjusted bias-
         adjusted kappa) (Byrt et al. 1993). BAK is the value that kappa 
         would take if there were no systematic one-sided variation 
         between the ratings; it is equivalent to Scott's pi coefficient 
         of agreement (Scott 1955).  Low kappa values may be affected by 
         bias.  PABAK is the value that kappa would take if, in addition, 
         the prevalence of each category (as expressed by the mean of the 
         two raters' totals for the category) was equal.  PABAK may be 
         useful in appraising agreement when the percentage agreement is 
         high and kappa is paradoxically low; it approximates to the 
         highest possible kappa if the percentage agreement is above about 
         50% (Lantz and Nebenzahl 1996).  PABAK is called kappa-nor by 
         Lantz and Nebenzahl (1996), and is equivalent to Maxwell's RE 
         (random error) coefficient of agreement (Maxwell 1977) and 
         Bennett's S coefficient (Bennett et al. 1954).  The adjusted 
         values are conditional on the observed percentage agreement. 
         
         For comparisons of two ratings using a dichotomous scale, Version 
         2.04 also provides the overall percentage agreement and the 
         percentage agreement for each category (Cicchetti and Feinstein 
         1990); Version 2.06 provides 95% confidence intervals for the 
         latter percentages (Samsa 1996).  These measures are uncorrected 
         for chance agreement.  For "yes"/"no" ratings in clinical 
         practice, the percentage of agreement for a positive rating (the 
         proportion of positive agreement) represents the probability 
         that, if a subject has been given a positive rating by a typical 
         observer, another observer will concur; similarly, the proportion 
         of negative agreement expresses the probability of concurrence 
         with a negative rating (Samsa 1996).  This requires the 
         assumption that the two observers have a similar tendency to rate 
         subjects as "yes" or "no";  the program does not estimate 
         confidence intervals if there is a significant difference in this 
         respect between the two sets of ratings entered (if P < 0.05 by 
         NcNemar test for bias). 
         
         Version 2.06 also provides indices of bias and skewed prevalence.  
         The bias index and skewed-prevalence index are the indexes of 
         symmetry in disagreement and agreement proposed by Lantz and 
         Nebenzahl (1996). 
                            
         For comparisons of two dichotomous-scale ratings in multiple 
         samples or strata, Version 2.06 provides additional results, 
         using the methods of Donner and Klar (1996) as well as the 
         precision-based method and adjustment by sample size. Computation 
         of the overall kappa value is based on the common correlation 
         model (in which the expected responses for each pair of 
         observations are based on the overall prevalence of the two 
         possible responses), and the associated heterogeneity test (which 
         appraises compatibility of the stratum-specific estimates with 
         the overall kappa) and estimation of confidence intervals are 
         based on a goodness-of-fit approach, which has been shown to 
         provide satisfactory confidence intervals for combined samples 
         with as few as 50 subjects (Donner and Eliasziw 1992). 

         Formulae:

         For a dichotomous scale:

           Percentage agreement = PA x 100, where PA = ad/n
           Percentage agreement for category 1 = 200a / (2a + b + c)
                                for category 2 = 200d / (2d + b + c)
             Confidence intervals are estimated by formulae 1.26 and
             1.27 of Fleiss (1981), using the smaller of (a + b) and
             (a + c), and the smaller of (d + b) and (d + c), as the 
             respective denominators (Samsa 1996)
           For BAK, b and c are equalized before computing kappa: 
             modified b = modified c = (b+c) / 2;
           PABAK = 2.PA - 1; 
           Bias index = 100(|b - c|) / (b + c) 
           Skewed-prevalence index = 100(|a - d|) / (a + d) 

           where a and d are the numbers of concordant ("Yes-Yes" and 
                         "No-No") pairs  of ratings
                 b and c are the numbers of discordant pairs 
                 n = a+b+c+d

         For a polychotomous scale:

           For BAK, all "mirror-image" pairs are equalized: if f is the 
             number rated as X by one rater and Y by the other, and g is 
             the number rated as Y by the first rater and X by the other, 
             modified f = modified g = (f+g) / 2. 
           For PABAK, all the row totals and columns totals are also 
             equalized (making their value n/k, where n = the total 
             number of pairs and k = the number of categories); this is 
             done by modifying the concordant values, whose total remains 
             unchanged.  A warning is displayed if a negative concordant 
             value is needed.  A short-cut formula is used: 
                      PABAK = ([PA.k] - 1) / (k - 1)

         MANNWHIT
         ~~~~~~~~
         An error in the handling of zero and negative values entered in 
         the Wilcoxon test has been corrected (Version 2.04).  Kendal's 
         tau and gamma are no longer displayed in the numerical-options 
         ("N") options (Version 2.07). 

          
         MANTELX
         ~~~~~~~         
         Version 2.06 provides a heterogeneity test, examining the 
         uniformity of the linear trends in the various strata. 
         
         If one variable is a dichotomy (i.e., for comparisons of two 
         samples) and the other has 3-10 ordered categories), Version 2.06 
         provides the cumulative odds ratio and its approximate 95% 
         confidence interval (Liu and Agresti 1996).  This is the odds 
         ratio based on a proportional odds model, which assumes that when 
         the 2 x k table is converted to a 2 x 2 table by collapsing 
         categories the odds ratio is the same whatever cut-point is used.  
         A Mantel-Haenszel-type procedure is used to combine stratified 
         data and obtain a common cumulative odds ratio.  The procedure is 
         appropriate even when data are very sparse.  The common odds 
         ratio is a weighted average of the stratum-specific cumulative 
         odds ratios, and provides a useful summary of the association 
         even if the common cumulative odds ratio assumption does not hold 
         (provided that heterogeneity is not too severe and the directions 
         of the odds ratios are the same).  The confidence intervals may 
         be inaccurate if the true odds ratios are heterogeneous within or 
         between strata. 
         
         An improved user interface is provided in version 2.06.
         
         Formulae
         
         Formulae for the cumulative odds ratio and its variance are 
         provided by Liu and Agresti 1986 (formulae 2 and 3).  The program 
         uses an adaptation of Fortran code provided by Liu and Agresti. 
         
         The heterogeneity chi-square is the sum of the Mantel-Haenszel 
         chi-squares for trend in each table, minus the overall Mantel-
         Haenszel chi-square (Rothman and Boice 1982: 9). 

         MATCHED
         ~~~~~~~
         Exact probabilities and exact confidence intervals for odds 
         ratios are now computed (Version 2.03).  The computation uses an 
         efficient algorithm for calculating the coefficients of the 
         conditional distribution (Martin and Austin 1991, 1996), using 
         code from David O. Martin's public-domain EXACTBB program. 
         
         Version 2.07 computes the power of Walter's test for binary data 
         (comparing cases with multiple matched controls), conditional on 
         the matching ratio(s) entered and the proportion of concordant 
         pairs among all pairs of matched controls entered (this 
         proportion expresses the effect of matching and the balance of 
         '+' and '-' findings in the controls in the various matched 
         sets). 

         An error in Walter's test for a fixed number of controls was 
         corrected in Version 2.06. 
         
         Formula
         
         The power (1 - beta) of Walter's test is computed from formula 3 
         of Walter (1980); this is an approximation, appropriate if delta 
         is small: 
         
               Zbeta = {[delta * (rn)] / [p(r(r+1)N/2)]} - Zalpha
              
         where delta = the difference between the probabilities of '+' 
                       findings in cases and controls 
                 
               r = number of controls in a case-control(s) set
               
               n = number of sets containg r controls
               
               p = (number of discrepant pairs of matched controls)
                   / (total number of pairs of matched controls)
               
               alpha = level of significance
               
         PAIRS
         ~~~~~
         An option for the comparison of paired values, assuming a 
         lognormal distribution, was added in Version 2.07.  
         
         Exact Fisher's and mid-P probabilities and confidence intervals 
         for odds ratios are now computed (Version 2.03) for single 
         samples of matched pairs (option D) and for the pooled data of a 
         set of samples (option X).  The computation uses an efficient 
         algorithm for calculating the coefficients of the conditional 
         distribution (Martin and Austin 1991, 1996), using code from 
         David O. Martin's public-domain EXACTBB program. 

         Matched observations: numerical data, log-normality assumed
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         This procedure is offered both as a stand-alone option and as a 
         sequel (without re-entry of data) to an analysis assuming a 
         normal distribution. 
         
         It is appropriate for the study of log-normally distributed 
         measures, such as bronchial responsiveness, recovery times after 
         drug administrations, survival times, and domestic house-dust 
         allergens levels (Peat et al. 1994).  It may be used to study 
         change, to compare methods of measurement, or to appraise 
         repeatability.  The observations may be measurements of a 
         variable in paired individuals (matched "cases" and "controls") 
         or replicated measurements in the same individuals.  Each pair of 
         values, which are labelled "Value 1" ("X") and "Value 2" ("Y"), 
         can be entered separately, or pairs with the same values can be 
         entered together. 
         
         Using the logarithms of the values (logs to base 10), the program 
         provides Student's paired t test, Pitman's test for the equality 
         of the two variances, the Bradley-Blackwood test, the correlation 
         coefficient and intraclass correlation coefficient (with 95% 
         confidence intervals and significance), the (geometric) mean 
         ratio of paired values (value 2:value 1), with 90, 95 and 99% 
         confidence intervals, the mean proportional difference (value 2 
         minus value 1, as a percentage of value 1), with 90, 95 and 99% 
         confidence intervals, 95% limits of agreement (with their 95% 
         confidence intervals) for the ratio of the values, the S.E. of 
         measurement (in logarithmic units), confidence intervals for 
         "true values" based on single, two or three measurements, and the 
         coefficient of repeatability. 
           
         The paired t test, Pitman's test, the Bradley-Blackwood 
         procedure, and the intraclass correlation coefficient are 
         described in the manual.

         The "limits of agreement" (Bland and Altman 1995a and 1995b, 
         Altman 1991: 397-400) answer the question, "given a measurement 
         by one method, how far might this be from a measurement by the 
         other method?"  These demarcate the bounds of the range that, 
         with a 95% probability, includes the ratio of measurements of the 
         same subject by the two methods.  The 95% confidence intervals of 
         the limits of agreement are estimated; the limits of agreement 
         may be very imprecise if the sample is small. 

         The standard error of measurement (Fleiss 1986:11) - also called 
         the "technical error" (Kahn and Sempos 1989:239-242) or "the SE 
         of an obtained score" (Guilford and Fruchter 1986: 413) - is an 
         index of reliability that expresses variation between observers 
         and other causes of differences between repeated observations. 

         The coefficient of repeatability, defined as the confidence 
         interval for the difference in two repeat measurements (Bland and 
         Altman 1986, Chinn 1990), is the expected maximum below which 95% 
         of ratios between paired values (the higher divided by the lower) 
         may be expected to fall.  The assumption is made that the mean 
         ratio is 1 (no systematic difference between the measurements). 

         To answer the question "How accurate is this value?", approximate 
         confidence intervals are computed for the "true values" that can 
         be inferred from one, two or three observations, based on the 
         standard deviation of a single measurement (Peat et al. 1994). 
         These are appropriate if there is no systematic difference 
         between the two sets of values (mean ratio = 1). 
         
         Matched observations: numerical data, normality assumed
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         Computational changes were made in option T in Version 2.07.  
         The estimation of confidence intervals for "true values" uses a 
         standard error computed solely from the SD of the differences
         (instead of the SE of measurement) and is based on the 
         assumption that there is no systematic difference between the 
         two sets of ratings (mean difference = zero).  Logarithmic 
         transformations are based on logs to base 10, not natural logs, 
         and an error in the computation of the proportional difference 
         was corrected. 

         Formulae
         
         The computations are based on log-transformed values; zero 
         values are first changed to 0.0000001.  Log-transformed values 
         (and their differences) are used in the formulae (e.g for the 
         coefficient of repeatability) provided in the manual. 

         Formulae for confidence intervals of the geometric mean ratio are 
         provided by Peat et al. (1994). 

         Mean proportional difference = (geometric mean ratio - 1) x 100.

         Approximate 95% confidence limits for the true value (Peat et 
         al. 1994, Fleiss 1986: 12) are estimated as (for untransformed 
         values) Obs plus or minus t.S/sqrt(M) or (for log-transformed 
         values) Obs divided or multiplied by antilog[t.S/sqrt(M)], where 

           Obs =  the mean of M observed values (for a single
                  observation, M = 1)
           t   =  the value in the t distribution corresponding to a
                  two-tailed P of 0.05, with N - 1 degrees of freedom
           S   =  SD of a single measurement = SD(diff) / sqrt(2);
                  this is the square root of the residual within-subject 
                  mean square in an analysis of variance, after removal of 
                  the between-ratings component (which is not removed when 
                  the SE of measurement is computed) 
           SD(diff) = standard deviation of the differences between pairs 
                  of log-transformed values
           N   =  number of pairs

         In Version 2.07, confidence intervals of the "true value" are 
         estimated from the SD of the differences (between crude or log-
         transformed values, respectively) by the method described by Peat 
         et al.; a value from the t-distribution is used in the 
         computation. 
                       
         PVALUE
         ~~~~~~
         Since the inverse F distribution function used by PEPI is less 
         accurate than the F distribution function, its accuracy was 
         enhanced (Version 2.01) by adapting its results to those of the 
         latter function.  After initial estimation of F from P (in 
         Option V), the corresponding P value is back-estimated from F, 
         and the F value is increased or decreased until its 
         corresponding P coincides with the entered P value.  If the 
         numerator degrees of freedom = 1, an accurate F value is 
         calculated from the t distribution by the formula 
                             F = sqr(t[P/2,DF2]), 
         where DF2 = denominator degrees of freedom (Diem 1970: 167).

         The modified function is also now used in COMBINE and PAIRS.
                         
         RANDOM
         ~~~~~~
         For convenience, Version 2.06 uses a compressed format (six 
         results per line) for saving or printing random samples and 
         sequences. 

         RATES1
         ~~~~~~
         Version 2.01 adds the DerSimonian-Laird procedure (DerSimonian 
         and Laird 1986) for computing an overall rate ratio based on 
         stratified data, using a random-effects model.  See the 
         description of CASECONT for a note on the use of the 
         DerSimonian-Laird procedure.                    
         
         Formulae: 
         
         The random-effects procedure of Dersimonian and Laird (1986) is 
         used.  For this purpose the log rate ratio for each stratum, and 
         its standard error, are computed after changing any zero rate to 
         0.0001/Base (where Base = 1,000 or whatever other base is used 
         for the rates); the formula used for the SE of the log rate ratio 
         is 
             sqrt[(1 - RateA) / (A x RateA) + (1 - RateB) / (B x RateB)].
         The Q statistic, which plays a central role in the analysis, is 
         based on the data for separate strata and the Mantel-Haenszel 
         estimate of the common rate ratio.  This option is not offered if 
         the attributable-fraction option has been selected.  If Q + 1 is 
         less than the number of strata the random-effects and fixed-
         effect models yield identical results.  
        
         RATES2
         ~~~~~~
         Version 2.01 adds the DerSimonian-Laird procedure (DerSimonian 
         and Laird 1986) for computing an overall rate ratio based on 
         stratified data, using a random-effects model.  See the 
         description of CASECONT for a note on the use of the 
         DerSimonian-Laird procedure. 
         
         In Version 2.06, exact confidence intervals for rate ratios in 
         single strata are available for larger numbers than previously, 
         using an improved procedure from Version SP 2.5 of XLIM (by A. 
         Ray Simons); this is based on an F-distribution algorithm, 
         supplemented by bisection and regula falsi root-solvers when 
         there is a marked imbalance between the two numerators and the 
         sample is large. If the bisection solver (which is relatively 
         slow) is required and the sum of the numerators is 3,000 or less, 
         a fast algorithm from David O. Martin's public-domain EXACTBB 
         program (Martin and Austin 1991, 1996) is used instead. 

         Exact tests in single strata, which were inadvertently dropped in 
         Version 2.04, are restored in Version 2.06. 
         
         The computation of exact probabilities and exact confidence 
         intervals for rate ratios for stratified data is now much faster, 
         because of the employment (Version 2.02) of a more efficient 
         algorithm for calculating the coefficients of the conditional 
         distribution (Martin and Austin 1991, 1996), using code from 
         Martin's public domain EXACTBB program.  
         
         Formulae: 
         
         The random-effects procedure of Dersimonian and Laird (1986) is 
         used.  For this purpose the log rate ratio for each stratum, and 
         its standard error, are computed after changing any zero rate to 
         0.0001/Base (where Base = 1,000 or whatever other base is used 
         for the rates). The Q statistic, which plays a central role in 
         the analysis, is based on the data for separate strata and the 
         Mantel-Haenszel estimate of the common rate ratio. This option is 
         not offered if the attributable-fraction option is selected.  If 
         Q + 1 is less than the number of strata the random-effects and 
         fixed-effect models yield identical results. 

         SCRN
         ~~~~
         For a test that provides a range of values, Version 2.01 displays 
         a ROC (receiver operating or relative operating characteristic) 
         curve with its confidence bounds, and Version 2.06 computes the 
         area under the curve, with its standard error and 95% confidence 
         interval, and reports alternative values for the optimal cutting-
         point; it displays Youden's index for each cutting-point. 
         
         Version 2.06 also provides an option for the entry of pretest 
         probabilities and likelihood ratios in order to compute post-test 
         probabilities. 
         
         The ROC curve shows the association between sensitivity and the 
         false positive rate (100% minus specificity %).  The point on the 
         curve closest to the top left corner of the graph is the point 
         where the sum of sensitivity and specificity is highest. The 
         closer this point is to the top left corner, the better the test 
         (assuming that false negatives and false positives are equally 
         important) (Sackett et al. 1985: 106-107). 
         
         The confidence bounds for the ROC curve (Schaefer 1994) are 
         pointwise 95% bounds, computed for each of the cutting-points for 
         which data are entered; they take account of the variability in 
         both samples (diseased and nondiseased individuals).  Strictly 
         speaking, they are valid only if the specificity of interest has 
         been stipulated in advance, and may be misleading if a search for 
         an appropriate cutting-point is performed along the whole ROC 
         curve; however, "in practice, in most clinical decision 
         situations a range of relevant specificities (or sensitivities) 
         will at least be restricted to a region of interest which can be 
         defined before inspection of the data" (Schaefer 1994). The 
         confidence bounds may be inaccurate if the samples are small; the 
         program computes them only if the size of each sample is between 
         20 and 2,000. 
         
         The area under the ROC curve expresses the probability that the 
         test will correctly rank a randomly chosen person with the 
         disease and a randomly chosen person without the disease (Hanley 
         and McNeil 1982 and 1983, Beck and Shultz 1986, Zweig and 
         Campbell 1993); its value is 50% if the test does not 
         discriminate.  Alternative values are provided for the best 
         cutting-point (i.e., the one that minimizes errors), conditional 
         on the weights allocated to false negatives and false positives 
         to express the relative undesirability or cost of the two kinds 
         of error; more weight may be given to false negatives if missing 
         the disease is very undesirable, or to false positives if 
         misdiagnosis as a case is harmful. The choice of a cutting-point 
         also depends on whether the effect of the prevalence of the 
         disease is taken into consideration, and on what this prevalence 
         is. If prevalence is taken into account as well as the relative 
         weights of false negatives and false positives, the computed 
         cutting-point minimizes the expected costs (deaths, financial, 
         etc.) in the group or population in which the test is to be used. 
                                      
         Formulae
         
         Confidence bounds for the ROC curve are estimated by a method 
         described by Schaefer (1994), based on a statistical test 
         introduced by Greenhouse and Mantel (1950).  The computation is 
         based on source code written by Prof. H. Schaefer, to whom we are 
         grateful for his consent to its use. 
                                                     
         Formulae for computing the area under the ROC curve and its 
         standard error are provided by Hanley and McNeil (1982); its 95% 
         confidence limits are estimating by adding or subtracting 
         1.96(SE). 
         
         The best cutting-point occurs where a line with the slope 
                             [FP(1-P)]/[FN.P]               
         touches the ROC curve  (McNeil et al. 1975, Linnet 1988,
         Zweig and Campbell 1993); FP and FN are the weighted sums of 
         (respectively) false positives and false negatives and P is 
         disease prevalence.  This point is determined by identifying the 
         cutting-point that minimizes the value 
                            1 - Sp + (1 - Se)/m
         where Sp = specificity, 
               Se = sensitivity, 
                m = [FP(1-P)]/[FN.P]
         Cutting-points that ignore the effect of prevalence are computed 
         by setting the value of P as 0.5.       
                

         SEASONAL
         ~~~~~~~~
         Hewitt's rank-sum test for a 6-month peak (Hewitt et al. 1971), 
         extended to 4- and 5-month peaks (Rogerson 1996) was added in 
         version 2.05. This is a conservative test based on the ranks of 
         the monthly numbers of events. The length of the hypothesized 
         peak period should be specified in advance. 

         Formulae:

         Hewitt's statistic is the highest of all possible rank sums based 
         on consecutive periods of the designated length. Tied ranks are 
         reduced by first adjusting the monthly frequencies according to 
         the length of the month and any correction factors entered; if 
         ties occur, an average rank is used (Walter 1980).  The program 
         uses exact significance levels provided by Walter (1980) and 
         Rogerson (1996), with interpolation for non-integer  values 
         (Walter 1980). 

         WHATS
         ~~~~~
         The factorial function was extended in version 2.01.  Factorials 
         are now computed for positive numbers up to 1,754.  Also, they
         are displayed for fractional numbers; this may be helpful if 
         gamma functions are required, since the factorial of any 
         positive number X may be taken as the gamma function at point 
         (X + 1) (Hoel 1984: 88; Abramowitz and Stegun 1970: 255).  
         
         Options for the computation of permutations and combinations were 
         added in Version 2.03.  The program can compute these if the 
         total number of items in the set is up to 1,754.  
         
         The factorial n! is the number of possible arrangements of n 
         items; e.g., if there are three items (a, b and c), 3! = 6 
         arrangements are possible (abc, acb, bac, bca, cab and cba). The 
         number of possible subsets of r items (ignoring their 
         arrangement) drawn from a set of n items is comb(n,r); for 
         example, if n=3 and r=2, there are comb(3,2) = 3 possible subsets 
         (a and b, a and c, and b and c); comb(n,r) is the binomial 
         coefficient ('n over r' or 'n binomial r'). The number of 
         possible arrangements of a sub-set of r items drawn from a set of 
         n items is perm(n,r); for example, if n = 3 and r = 2, there are 
         perm(3,2) = 6 possible arrangements (ab, ac, ba, bc, ca and cb). 

         Formulae:
         
         The program now uses Brenner's algorithm (Ball 1978:215) to 
         compute factorials for numbers up to 275 and Stirling's 
         approximation (Rothman and Boice 1982: 26) for larger numbers. 
         (We are grateful to Ray Simons for bringing Brenner's procedure 
         to our notice). 
         
         The formulae for permutations and combinations are:
                   perm(n,r) = n! / (n-r)!
                   comb(n,r) = perm(n,r) / r! 
                   
         ----------------------------------------------------------------

