### Technical notes

The method described here was designed on a laptop with Intel Core Duo 2.2 GHz processor, 4 GB RAM, running Windows Seven 64 bit and Microsoft Office Excel 2007. The spreadsheets were later tested on Excel 2003, with no differences found in either the calculations or graphs.

The outcome of meta-analyses is the effect summary. However, some reviews may only aim in combining rates or prevalences; technically these cannot be called "effects", since there is nothing "causing" it, and the correct term would be single group summary. We will refer to both these estimates simply as "outcome" in order to avoid confusion, and maintain only the abbreviation as *es* to follow textbooks standard.

Since we have established that the limitation of the existing software packages is handling descriptive data, we will be using rates in our example so that the difference in the final forest plot is more overt. The data could be the prevalence of smoking in a country or the incidence of myocardial infarction in high risk patients. We chose to use theoretical numbers so we could openly distribute the spreadsheets, test particular formulas and compare results obtained with other software. All formulas are presented in traditional equations and also in excel format.

Steps 1 and 2 always require adjustments according to study type and outcome. Columns in light grey in spreadsheet 1 are the ones to be adapted, while columns in dark grey do not require any modification regardless of study type (this includes all further steps of the guide). The necessary adjustments can be easily found on methodological books [8–10].

Cell B14 should be filled with the number of studies being analyzed. There are annotations on the spreadsheet that pop up when the mouse pointer is upon selected cells, so the downloaded file can be used without constant consultation of the full article. The explanation for the formulas and detailing of steps are not present on the spreadsheet though. A recently published paper by Schriger et al [11] reviewed over 300 systematic reviews and highlighted important aspects of producing forest plots, which were considered in developing this approach.

### Steps in analyzing data and producing a forest plot

#### Spreadsheet 1-analysis (Figure 1)

1. Calculating the outcome (effect size, es)

In our example we have the number of events and the number of subjects in columns B and C, so we can simply compute the rate in column D as \frac{{n}_{events}}{{n}_{total}} or *D* 3 = *B* 3/*C* 3 in Excel. It is the same from D3 to D12, and copy and paste will automatically adjust the cell numbers. This copying and pasting should be done for steps 1 through 6 and in step 9 B.1.

2. Calculating Standard Error (SE)

All SE can be derived from the formula \mathsf{\text{SE}}=\frac{\sum {\left(\stackrel{\u0304}{\mathsf{\text{x}}}-\mu \right)}^{2}}{\sqrt{\mathsf{\text{n}}}}, but there are simplified derived equations for different types of studies. Since we are using rates, we can use \mathsf{\text{SE}}=\frac{\mathsf{\text{es}}}{\sqrt{\mathsf{\text{es*n}}}} or \mathsf{\text{SE}}=\frac{\sqrt{\mathsf{\text{events}}}}{\mathsf{\text{n}}}, the same formula used in CMA. In excel this will be *E* 3 = *D* 3/SQRT(*D* 3**C* 3).

3. Computing variance (Var)

This formula is simple: Var = SE^{2}. In Excel, *F* 3 = *E* 3^2.

4. Computing individual study weights (w)

We must weight each study with the inverse of its variance, so \mathsf{\text{w}}=\frac{1}{\mathsf{\text{S}}{\mathsf{\text{E}}}^{2}} or *G* 3 = 1/*F* 3 in Excel.

5. Computing each *weighted effect size* (w*es)

This is computed multiplying each effect size by the study weight. If we are not using any corrections on the weight (meaning, single effect model) this equation will result again in the study size for some types of studies. In excel, this will be *H* 3 = *G* 3**D* 3

6. Other necessary variables (w*es^{2} and w^{2})

We will need two other variables in order to calculate the Q statistics (columns I and J of spreadsheet 1). In excel this will be *I* 3 = *G* 3*(*D* 3 ^ 2) and *J* 3 = *G* 3 ^ 2.

Now we need to sum all values of each variable. In our spreadsheet they are in line 14, labeled "Sums": *G* 14 = *SUM* (*G* 3:*G* 12), *H* 14 = *SUM* (*H* 3:*H* 12), *I* 14 = *SUM* (*I* 3:*I* 12), *J* 14 = *SUM* (*J* 3:*J* 12)

7. Calculating Q

The Q test measures heterogeneity among studies, and works like a t test. It is calculated as the weighted sum of squared differences between individual study effects and the pooled effect across studies, with the weights being those used in the pooling method. Q is distributed as a chi-square statistic with k (number of studies) minus 1 degrees of freedom. Our null hypothesis is that all studies are equal. To test that, we need to calculate Q and compare it against a table of critical values. If our calculated Q is lower than that of the table's, than we fail to reject the null hypothesis (and hence the studies are similar).

The formula is \text{Q}={}^{{\displaystyle \sum}}({\text{w*ES}}^{2})-\frac{{[{}^{{\displaystyle \sum}}(\text{w*ES})]}^{2}}{{{\displaystyle \sum}}^{}\text{w}}, but in our spreadsheet it will be simply *B* 17 = *I* 14 - ((*H* 14 ^ 2)/*G* 14) since we already have all the sums.

8. Calculating I^{2}

The I^{2} was proposed as a method to quantify heterogeneity, and it is expressed in percentage of the total variability in a set of effect sizes due to true heterogeneity, that is, to between-studies variability. The formula is {\mathsf{\text{I}}}^{2}=\frac{\left(\mathsf{\text{Q}}-\mathsf{\text{df}}\right)}{\mathsf{\text{Q}}}\mathsf{\text{*}}100, where "df" stands for "degrees of freedom", simply the total number of studies (k) minus 1. In excel, *B* 18 = ((*B* 17 - *B* 15)/*B* 17)*100.

9. Deciding on effect summary \left(\overline{\mathsf{\text{e}}}\overline{\mathsf{\text{s}}}\right) model.

If heterogeneity is low, we can use a fixed effect model, that assumes the effect size is the same in our parameter population, and differences in studies are just from sampling error. However, if we think our sample populations may differ from each other, we can use a random effects model. Many researchers will choose this model even if heterogeneity is low. In our example, Q is higher than 16.919, the critical value for 9 degrees of freedom found in a chi-square distribution, and I^{2} is 49%, so we have moderate heterogeneity [12]. We must decide whether the data is possible to meta-analyze, and if so we may choose to proceed to a random effects models.

- A.
Fixed effects Model

Our effect summary is \overline{\mathsf{\text{e}}}\overline{\mathsf{\text{s}}}=\frac{\sum \left(\mathsf{\text{w*es}}\right)}{\sum \mathsf{\text{w}}}, or *B* 20 = (*H* 14/*G* 14). The standard error is \mathsf{\text{S}}{\mathsf{\text{E}}}_{\overline{\mathsf{\text{e}}}\overline{\mathsf{\text{s}}}}=\sqrt{\frac{1}{\sum \mathsf{\text{w}}}}, or *B* 21 = *RAIZ* (1/*G* 14). With the \mathsf{\text{S}}{\mathsf{\text{E}}}_{\overline{\mathsf{\text{e}}}\overline{\mathsf{\text{s}}}} we calculate the 95% Confidence Interval, as CI\left(\overline{e}\overline{s}\right)=\overline{e}\overline{s}\mp 1,96*SE. In Excel, *B* 22 = *B* 20 - (1.96**B* 21) and *C* 22 = *B* 20 - (1.96**B* 21). In our example we will not use these results.

- B.
Random effects model

Since we are assuming that variability is not only due to sampling error, but also to variability in the population of effects, in this model the weight of each study will be adjusted with a constant (*v*) that represents this.

B1. The formula is \mathsf{\text{v}}=\frac{\mathsf{\text{Q}}-\left(\mathsf{\text{k}}-1\right)}{\sum \mathsf{\text{w}}-\left(\frac{{\sum \mathsf{\text{w}}}^{2}}{\sum \mathsf{\text{w}}}\right)}. We have all these information, except for \sum {w}^{2}. We can compute w^{2} in column J with *J* 3 = *G* 3 ^ 2, and then its sum with J14 = *SOMA* (*J* 3: *J* 12). Now, applying the formula, *M* 16 = (*B* 17 - *B* 15)/(*G* 14 - (*J* 14/*G* 14)).

B2. Once we have the constant, we can calculate new weight for each study, using {\mathsf{\text{w}}}_{\mathsf{\text{v}}}=\frac{1}{\left(\mathsf{\text{S}}{\mathsf{\text{E}}}^{2}+\mathsf{\text{v}}\right)}. In excel, *L* 3 = 1/((*E* 3 ^ 2)+$*M* $16). We need the $ to fix cell M16, or else it will change when we copy the equation to cells L4 to L12.

B3. Now we repeat steps 5 to 8, but using our new weight W_{v}. The results are in columns M, N and O. Applying the Q and I^{2} formulas we have now an acceptable Q and low heterogeneity. We calculate our effect summary as \overline{e}{\overline{s}}_{\mathsf{\text{v}}}=\frac{\sum \left({\mathsf{\text{w}}}_{\mathsf{\text{v}}}\mathsf{\text{*ES}}\right)}{{\sum \mathsf{\text{w}}}_{\mathsf{\text{v}}}}, and standard error as \mathsf{\text{S}}{\mathsf{\text{E}}}_{\overline{\mathsf{\text{e}}}{\overline{\mathsf{\text{s}}}}_{\mathsf{\text{v}}}}=\sqrt{\frac{1}{{\sum \mathsf{\text{w}}}_{\mathsf{\text{v}}}}}.

In excel: *F* 20 = *M* 14/*L* 14, *F* 21 = *SQRT* (1/*L* 14), *F* 22 = *F* 20 - (1.96**F* 21) and *G* 22 = *F* 20+(1.96**F* 21). The confidence intervals are broader than the ones calculated with fixed effect model, however, little change in the effect summary is expected.

Analyzing these numbers in CMA we achieved exactly the same results. - [Additional files 2 and 3].

#### Spreadsheet 2-forest plot (Figure 2)

Columns A-G have the studies information. The user can insert each study effect size and confidence interval directly into columns D, F and G if he has the data. In our example we copied the calculations from spreadsheet 1, and also the values of the random effects model effect summary.

1. Make sure the information is the way we want it displayed. In our example, we wanted the rates in percentages, so column I = column D*100.

2. We usually read the lower and upper confidence interval as a value, but excel understands it as a difference to the mean. This is key to obtain a proper forest plot. These values are *J* 2 = *I* 2 - (100**F* 2) and *K* 2 = *I* 2 + (100* *F* 2). Again, we multiply by 100 to have it in percentage.

3. In order to have each study in a different line, we will assign ordinal numbers to the studies. Our effect summary must be number 1 if we want it in the bottom of the graph. This is done manually in column H of our spreadsheet.

4. We are ready to build the graph. Insert > Graph > Scatter Plot. X values will be column I, lines 2-12, and Y values column H, lines 2-12.

5. We must now add the error bars. In Excel 2007 this is done in the Layout tab, clicking the "Error Bar" button on the right side. In Excel 2003 we must right click on the data series (points on the graph) and click "format data series", then chose the "X error bar" tab. In this window we mark the option "personalized values", and then assign columns J and K, lines 2 to 12, to the lower and upper value.

6. To insert the line marking the summary effect value we will add another data series. First we manually build this data set in the spreadsheet. Then right click on the graph > Select Data. Click on "add", and chose X values as column C, lines 15 to 26, and Y values as columns B, lines 15 to 26. A new set of points will appear on the graph. Right-click on any of the new dots and select "format data series". Then we will choose "no marker" and "solid line" on the Marker Options and Line Color tabs.

7. We can now format the X axis, right-clicking on it. In our example we want it to begin on 10 and end on 28, interval of 2 units. It is not our case, but if the researcher is dealing with relative data, then "logarithmic scale" must be marked.

8. The graph is ready. The user can format colors, outlines, shadows and sizes. In our example we changed the summary effect to a diamond shape. This is done by selecting only one dot (double click) and then right clicking it.

9. For presentation we recommend copying and pasting the graph over a table with study information (Figure 3).

## FAQs

### Can I do a meta-analysis in Excel? ›

Conclusions: **It is possible to conduct a meta-analysis using only Microsoft Excel**. More important, to our knowledge this is the first description of a method for producing a statistically adequate but graphically appealing forest plot summarizing descriptive data, using widely available software.

### What is forest plot in meta-analysis? ›

A forest plot is **a useful graphical display of findings from a meta-analysis**. It provides essential information to inform our interpretation of the results. Typically, a forest plot contains 6 basic “columns”, though additional columns can be added to provide more information.

### How do you analyze data from an Excel spreadsheet? ›

Simply **select a cell in a data range > select the Analyze Data button on the Home tab**. Analyze Data in Excel will analyze your data, and return interesting visuals about it in a task pane.