randomize using excel

29
Chapter 3-7. Randomize Using Excel In the last chapter, we practiced taking a random sample. In this chapter, we practice randomizing subjects into two treatment arms. This is easier to set up in Excel than in Stata. Simon (1999) presents a simple approach to randomizing subjects to two or more experimental groups, which is the approach we will use. To give a quick overview of what we are going with this topic, here is a suggestion for describing it to a colleague. Suggestion for Advising Someone on Randomization You should use a method called “random permuted blocks”. If you have two groups to assign subjects to, for example, you want the first two subjects to go into group A or group B in a random order. For the next two subjects, this is repeated, by the random order is selected independently of the previous two subjects. This is called a block of size 2. It might look something like: AB, BA, BA, AB, …where each block of size 2 will contain a possible permutation of the group assignments in a random order (thus, “random permuted blocks”). In this way, you achieve balance between the groups as the study progresses, with equal numbers in each group, which insures the groups remain balanced on any possible “learning effect”, such as the study team gets better at the study procedure as the study progresses. This also insures that the sample sizes are equal if the study has to be stopped early. A randomization schedule is created. One column contains the numbers 1 to N, which is the order that the subjects are enrolled. This number is placed on the outside of an envelope. Inside the envelope is the group assignment. When the subject is enrolled, the envelope is opened to discover the group assignment. An improvement over this is called “random permuted blocks with random block size”. In the approach described Chapter 3-7 (revision 14 Jul 2011) p. 1

Upload: nguyennga

Post on 03-Jan-2017

316 views

Category:

Documents


4 download

TRANSCRIPT

Chapter 3-7. Randomize Using Excel

In the last chapter, we practiced taking a random sample.

In this chapter, we practice randomizing subjects into two treatment arms. This is easier to set up in Excel than in Stata.

Simon (1999) presents a simple approach to randomizing subjects to two or more experimental groups, which is the approach we will use.

To give a quick overview of what we are going with this topic, here is a suggestion for describing it to a colleague.

Suggestion for Advising Someone on Randomization

You should use a method called “random permuted blocks”. If you have two groups to assign subjects to, for example, you want the first two subjects to go into group A or group B in a random order. For the next two subjects, this is repeated, by the random order is selected independently of the previous two subjects. This is called a block of size 2. It might look something like: AB, BA, BA, AB, …where each block of size 2 will contain a possible permutation of the group assignments in a random order (thus, “random permuted blocks”). In this way, you achieve balance between the groups as the study progresses, with equal numbers in each group, which insures the groups remain balanced on any possible “learning effect”, such as the study team gets better at the study procedure as the study progresses. This also insures that the sample sizes are equal if the study has to be stopped early.

A randomization schedule is created. One column contains the numbers 1 to N, which is the order that the subjects are enrolled. This number is placed on the outside of an envelope. Inside the envelope is the group assignment. When the subject is enrolled, the envelope is opened to discover the group assignment.

An improvement over this is called “random permuted blocks with random block size”. In the approach described above, the study coordinator would be able to guess the group assignment of subjects 2, 4, etc., since the subject would be assigned to the opposite group of the previous subject, subject 1, 3, etc. To avoid this, a block size of 2, 4, and perhaps 6, is used, with the order of the block size randomized. This makes it impossible to guess the next subject’s assignment, because the coordinator will never know what block size is currently being used.

_____________________

Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2011. http://www.ccts.utah.edu/biostats/?pageId=5385

Chapter 3-7 (revision 14 Jul 2011) p. 1

Suggested Citation

The random permuted blocks approach, with or without random block sizes, is a well-establish procedure, so there is no need to give a citation for it, particularly in a research paper. However, if you are nervous that a grant reviewer will be unfamiliar with it, a good citation for both approaches is:

Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials, 3rd ed.,New York, Springer, 1998, pp.64-66.

There is no need to cite the Simon (1999) paper, since that just a paper describing specifically how to do it in Excel, rather than an authoritative description and justification of these randomization methods.

Example: Here is an example of researchers using random permuted blocks for their randomization into study groups (Kuhn et al., N Engl J Med, 2008). In their Randomization subsection of their Methods section, they report,

“At 1 month post partum, participants who were still breast-feeding their infants (whose HIV status had not yet been determined) and were willing to continue with the study were randomly assigned to a study group with the use of a computer algorithm that was designed by the study statistician with a randomized permuted-block design within each site….”

Example: Here is an example of researchers using random permuted blocks with random block size for their randomization into study groups (Kirkley et al., N Engl J Med, 2008). In their Study Treatment subsection of their Methods section, they report,

“The patients were randomly assigned, with the use of a computer-generated schedule, to receive optimized physical and medical therapy alone (control group) or to receive both optimized physical and medical therapy and arthroscopic treatment….To minimize the risk of predicting the treatment assignment of the next eligible patient, randomization was performed in permuted blocks of two or four with random variation of the blocking number.”

Chapter 3-7 (revision 14 Jul 2011) p. 2

Simple Randomization

Suppose we want to randomize 10 subjects to 2 groups (intervention & control), with 5 in each group. The first step is to prepare a randomization list.

Enter the column headings Sequence and Group into cells A1 and B1, respectively. To enter the sequence number, enter “1” into cell A2, click on the lower right corner of the cell, hold down the Ctrl-key and drag downward for 10 rows. This will fill up the first column with the numbers 1 to 10. Enter “intervention” into cell B2, click on the cell, and drag down 5 rows (when you do not hold down the Ctrl-key, it simply duplicates the cell contents.) Enter “control” into cell B7, click on the cell, and drag down 5 rows. The spreadsheet should now look like this:

A B1 Sequence Group2 1 intervention

3 2 intervention

4 3 intervention

5 4 intervention

6 5 intervention

7 6 control

8 7 control

9 8 control

10 9 control

11 10 control

This has all of the group assignments we need (5 in each group). It simply needs to be put into a random order. To do that, we add a column of random numbers (uniform random numbers, ranging between 0 and 1). Add the heading RandNum into cell C1. Enter =rand() into cell C2. When you hit the Enter key, a random number replaces what you typed. Click on the number in cell C2. Drag down to row 11, which will fill the column with random numbers.

Chapter 3-7 (revision 14 Jul 2011) p. 3

Your spreadsheet will now look like the following (except your random numbers will be different):

A B C1 Sequence Group RandNum2 1 intervention 0.1317913 2 intervention 0.5352694 3 intervention 0.084445 4 intervention 0.4953986 5 intervention 0.5141967 6 control 0.1323038 7 control 0.5581679 8 control 0.96852510 9 control 0.09807611 10 control 0.41867

The problem with what we have so far, is that these random numbers will automatically change each time the spreadsheet is updated (saving the file and opening it again, for example, will change all the random numbers). To fix the numbers so they are just values that cannot change, rather than calls to the function rand(), highlight the entire column of numbers, from cell C2 to C11. Click on Edit - Copy, then Click on Edit - Paste Special, select Paste - Values, and click OK. This makes the numbers permanent.

The best thing to do next is to copy all of these cells and paste them a few columns to the right, in the spreadsheet. That way, you have two tables to compare to make sure the randomization step (discussed next) worked correctly. It also gives you a way to recover easily if you make a mistake or simply want to start over.

Now we will randomize the group assignments. Highlight the the Group and RandNum columns of your second (duplicated) table. [It does not matter whether or not you include the entire column, or include the variable names (Group and RandNum), when you highlight the columns—Excel is clever enough to consider the first row as column headings, and so the first row does not get sorted along with the other cells.] Click on Data - Sort. Enter RandNum in the “Sort by” box, and then click OK.

Chapter 3-7 (revision 14 Jul 2011) p. 4

Your table will now be sorted by the random numbers and the group column will be scrambled (Chaos created from Order).

A B C1 Sequence Group RandNum2 1 intervention 0.084443 2 control 0.0980764 3 intervention 0.1317915 4 control 0.1323036 5 control 0.418677 6 intervention 0.4953988 7 intervention 0.5141969 8 intervention 0.53526910 9 control 0.55816711 10 control 0.968525

Random Permuted Blocks

The simple randomization approach will work for situations such as randomizing hospital units.

For the clinical trial situation, where subjects enter the study across time, a different approach is required. The problem with the simple randomization approach in this case is that an unequal sample size will accumulate in the two study arms as the study progresses, becoming balanced only at the end.

To assure balance as the data accumulate, which will help to avoid a “learning curve” bias, or any bias introduced by the time of entry into the study, a block randomization is used. We will use a block size of 2, so balance is achieved after every two subjects enter the study. [Any block size, that is a multiple of the number of groups, could be used.]

Beginning with the following columns (our randomization list from above before we sorted),

A B C1 Sequence Group RandNum2 1 intervention 0.1317913 2 intervention 0.5352694 3 intervention 0.084445 4 intervention 0.4953986 5 intervention 0.5141967 6 control 0.1323038 7 control 0.5581679 8 control 0.96852510 9 control 0.09807611 10 control 0.41867

Chapter 3-7 (revision 14 Jul 2011) p. 5

, highlight and copy that into another location on your spreadsheet. Then, add the following Block column to the left of the RandNum column. [To insert a column, highlight the RandNum column, right click the mouse, and click on Insert.]

E F G H1 Sequence Group Block RandNum2 1 intervention 1 0.1317913 2 intervention 2 0.5352694 3 intervention 3 0.084445 4 intervention 4 0.4953986 5 intervention 5 0.5141967 6 control 1 0.1323038 7 control 2 0.5581679 8 control 3 0.96852510 9 control 4 0.09807611 10 control 5 0.41867

Next, highlight the columns F through H, using the mouse.

E F G H1 Sequence Group Block RandNum2 1 intervention 1 0.1317913 2 intervention 2 0.5352694 3 intervention 3 0.084445 4 intervention 4 0.4953986 5 intervention 5 0.5141967 6 control 1 0.1323038 7 control 2 0.5581679 8 control 3 0.96852510 9 control 4 0.09807611 10 control 5 0.41867

Click on the Data icon on the toolbar. Select “Sort”, thenSort by: BlockThen by: RandNumOK

Chapter 3-7 (revision 14 Jul 2011) p. 6

which produces,

A B C D1 Sequence Group Block RandNum2 1 intervention 1 0.1317913 2 control 1 0.1323034 3 intervention 2 0.5352695 4 control 2 0.5581676 5 intervention 3 0.084447 6 control 3 0.9685258 7 control 4 0.0980769 8 intervention 4 0.49539810 9 control 5 0.4186711 10 intervention 5 0.514196

Notice that an intervention and a control are used every two rows of the randomization list, creating sample size balance every two study subjects.

This is called “random permuted blocks” because the permutations “AB” and “BA”, or “Intervention-Control” and “Control-Intervention”, are randomized, rather than just A and B.

Random Permuted Blocks With Randomized Block Sizes

If an investigator is attempting to guess what group the next patient will be randomized to, and has noticed that balance appears to be accomplished every two patients in an un-blinded study, the investigator can introduce bias by decided whether or not to enroll the next patient.

One solution is to use a large block size, say 20, instead of 2. It is also a good idea to not inform the investigator that blocks are used in the randomization.

Another approach is to use multiple block sizes. For a two-arm study, you might use block sizes of 2, 4, and 6 in a random order. This would make it particularly difficult for the investigator to keep track.

To illustrate, let’s say the sample size will be 12 in each group, so we can use two each of block sizes 2, 4, and 6.

Chapter 3-7 (revision 14 Jul 2011) p. 7

1) Start by creating the Sequence column, which establishes the sample size and provides an ID number for the order in which patients will enter the study.

ASequence

123456789

101112131415161718192021222324

Chapter 3-7 (revision 14 Jul 2011) p. 8

2) Copy that column and add two new columns, creating one each of the desired block sizes:

C D ESequence Group BlockSize

1Intervention 2

2 Control 2

3Intervention 4

4Intervention 4

5 Control 46 Control 4

7Intervention 6

8Intervention 6

9Intervention 6

10 Control 611 Control 612 Control 6131415161718192021222324

Chapter 3-7 (revision 14 Jul 2011) p. 9

3) Copy those columns and add one new column, providing a random number in the first cell of each block, using “=rand()”.

G H I JSequence Group BlockSize RandNumBlock

1Intervention 2 0.615181

2 Control 2

3Intervention 4 0.175673

4Intervention 4

5 Control 46 Control 4

7Intervention 6 0.427291

8Intervention 6

9Intervention 6

10 Control 611 Control 612 Control 6131415161718192021222324

Chapter 3-7 (revision 14 Jul 2011) p. 10

4) Copy this to new columns. The, copy the cells in the last three columns and paste just below them. Notice that the random numbers changed, which is fine (they update every time a change is made to the spreadsheet).

L M N OSequence Group BlockSize RandNumBlock

1Intervention 2 0.36642

2 Control 2

3Intervention 4 0.51709

4Intervention 4

5 Control 46 Control 4

7Intervention 6 0.73186

8Intervention 6

9Intervention 6

10 Control 611 Control 612 Control 6

13Intervention 2 0.38613

14 Control 2

15Intervention 4 0.05518

16Intervention 4

17 Control 418 Control 4

19Intervention 6 0.391385

20Intervention 6

21Intervention 6

22 Control 623 Control 624 Control 6

5) We are now ready to fix the random numbers so they do not change. Copy and paste the RandNum column onto itself, using “Paste Special”, as described on page 2.

Chapter 3-7 (revision 14 Jul 2011) p. 11

6) Copy these into new columns. Then, highlight each RandNum number and drag it into the empty cells just below it, so that each row of a block has an identical random number.

Q R S TSequence Group BlockSize RandNumBlock

1Intervention 2 0.36642

2 Control 2 0.36642

3Intervention 4 0.51709

4Intervention 4 0.51709

5 Control 4 0.517096 Control 4 0.51709

7Intervention 6 0.73186

8Intervention 6 0.73186

9Intervention 6 0.73186

10 Control 6 0.7318611 Control 6 0.7318612 Control 6 0.73186

13Intervention 2 0.38613

14 Control 2 0.38613

15Intervention 4 0.05518

16Intervention 4 0.05518

17 Control 4 0.0551818 Control 4 0.05518

19Intervention 6 0.391385

20Intervention 6 0.391385

21Intervention 6 0.391385

22 Control 6 0.39138523 Control 6 0.39138524 Control 6 0.391385

This RandNumBlock column will be used later to randomize the order of the blocks.

Chapter 3-7 (revision 14 Jul 2011) p. 12

7) Copy these into new columns. Then, add a second column of random numbers in the first cell and drag it all the way to the bottom, filling in the entire column.

V W X Y ZSequence Group BlockSize RandNumBlock RandNumGroup

1Intervention 2 0.36642 0.2407044

2 Control 2 0.36642 0.1607768

3Intervention 4 0.51709 0.4609201

4Intervention 4 0.51709 0.1547385

5 Control 4 0.51709 0.87254646 Control 4 0.51709 0.1919791

7Intervention 6 0.73186 0.4904566

8Intervention 6 0.73186 0.8843401

9Intervention 6 0.73186 0.8812662

10 Control 6 0.73186 0.775659411 Control 6 0.73186 0.650367612 Control 6 0.73186 0.7629637

13Intervention 2 0.38613 0.6910306

14 Control 2 0.38613 0.7458984

15Intervention 4 0.05518 0.8386934

16Intervention 4 0.05518 0.807224

17 Control 4 0.05518 0.47776118 Control 4 0.05518 0.4581632

19Intervention 6 0.391385 0.8722347

20Intervention 6 0.391385 0.8057507

21Intervention 6 0.391385 0.8545892

22 Control 6 0.391385 0.633963223 Control 6 0.391385 0.111462224 Control 6 0.391385 0.7572725

8) Copy and Paste Special Values this second column of random numbers, making them fixed

Chapter 3-7 (revision 14 Jul 2011) p. 13

9) Copy these into new columns. Highlight all columns but the Sequence column. Then on the menu bar, click on Data, click on Sort, Sort by RandNumBlock, Then by RandNumGroup, OK.

AB AC AD AE AFSequence Group BlockSize RandNumBlock RandNumGroup

1Intervention 2 0.36642 0.2407044

2 Control 2 0.36642 0.1607768

3Intervention 4 0.51709 0.4609201

4Intervention 4 0.51709 0.1547385

5 Control 4 0.51709 0.87254646 Control 4 0.51709 0.1919791

7Intervention 6 0.73186 0.4904566

8Intervention 6 0.73186 0.8843401

9Intervention 6 0.73186 0.8812662

10 Control 6 0.73186 0.775659411 Control 6 0.73186 0.650367612 Control 6 0.73186 0.7629637

13Intervention 2 0.38613 0.6910306

14 Control 2 0.38613 0.7458984

15Intervention 4 0.05518 0.8386934

16Intervention 4 0.05518 0.807224

17 Control 4 0.05518 0.47776118 Control 4 0.05518 0.4581632

19Intervention 6 0.391385 0.8722347

20Intervention 6 0.391385 0.8057507

21Intervention 6 0.391385 0.8545892

22 Control 6 0.391385 0.633963223 Control 6 0.391385 0.111462224 Control 6 0.391385 0.7572725

Chapter 3-7 (revision 14 Jul 2011) p. 14

10) It now looks like this. Notice that the block sizes are jumbled up, as well as the group assignments within each block. This is what we were after.

Sequence Group BlockSize RandNumBlock RandNumGroup1 Control 4 0.05518 0.4581632 Control 4 0.05518 0.477761

3Intervention 4 0.05518 0.807224

4Intervention 4 0.05518 0.838693

5 Control 2 0.36642 0.160777

6Intervention 2 0.36642 0.240704

7Intervention 2 0.38613 0.691031

8 Control 2 0.38613 0.7458989 Control 6 0.391385 0.757272

10 Control 6 0.391385 0.11146211 Control 6 0.391385 0.633963

12Intervention 6 0.391385 0.805751

13Intervention 6 0.391385 0.854589

14Intervention 6 0.391385 0.872235

15Intervention 4 0.51709 0.154739

16 Control 4 0.51709 0.191979

17Intervention 4 0.51709 0.46092

18 Control 4 0.51709 0.872546

19Intervention 6 0.73186 0.490457

20 Control 6 0.73186 0.65036821 Control 6 0.73186 0.76296422 Control 6 0.73186 0.775659

23Intervention 6 0.73186 0.881266

24Intervention 6 0.73186 0.88434

Chapter 3-7 (revision 14 Jul 2011) p. 15

11) Finally, copy and paste the Sequence and Group columns onto a new worksheet. Do this by clicking on the Sheet2 tab, and then pasting it there.

A BSequence Group

1 Control2 Control

3Intervention

4Intervention

5 Control

6Intervention

7Intervention

8 Control9 Control

10 Control11 Control

12Intervention

13Intervention

14Intervention

15Intervention

16 Control

17Intervention

18 Control

19Intervention

20 Control21 Control22 Control

23Intervention

24Intervention

Right click on the Sheet2 tab and rename it to “Randomization List”. Right click on the Sheet1 tab and rename it “Setting up Randomization”. You keep this Excel File.

After you have saved your file, which has both sheets, you right click on the “Setting up Randomization” tab, and select “Delete”. You then give a spreadsheet with just the Randomization List sheet to the person responsible for the randomization.

One way the randomization can be carried out is having a piece of paper that contains just “Intervention” or “Control” placed inside envelopes. On the outside of the envelopes put the

Chapter 3-7 (revision 14 Jul 2011) p. 16

sequence number. After a patient has been enrolled into the study and consented, the next envelope on the stack is opened and that is the group the study subject goes into.

Not giving anyone else the worksheets that contain the block size guards against the possibility that someone will include the Blocksize column in the envelopes.

Chapter 3-7 (revision 14 Jul 2011) p. 17

Part way into the study, and at the end, you can audit the randomization process by making sure the group assignment order that goes with the enrollment date and time that is recorded on the data collection form (case report form) or in the database, corresponds with your randomization list.

References

Friedman LM, Furberg CD, DeMets DL. (1998). Fundamentals of Clinical Trials, 3rd ed.,New York, Springer, pp.64-66.

Kirkley A, Birmingham TB, Litchfield RB, et al. (2008). A randomized trial of arthroscopicsurgery for osteoarthritis of the knee. N Engl J Med 359(11):1097-107.

Kuhn L, Aldrovandi GM, Sinkala M., et al. (2008). Effect of early, abrupt weaning on HIV-free survival of children in Zambia. N Engl J Med 359(2):130-41.

Simon, S. (1999). A simple approach for randomisation. [Electronic Rapid Response] 17 Sep 1999. Response to Altman DG, Bland JM. Statistics notes: How to randomise. BMJ 319:703-704.

This paper is not printed in BMJ, but it can be obtained from BMJ’s website. To get it, follow these steps:1) Go to the Internet website: www.bmj.com2) click on the Advanced Search link3) Fill in the following with Year: 1999, Volume: 319, First Page: 703

Specify Citation  Year Volume First page/elocator

4) Under Limit Results, select “Articles and electronic communications”

Limit Results 

From   through

Include:

Articles only Articles and electronic communications (eg, rapid responses, Q&A)

5) That finds the paper. Click on Full text.

EDUCATION AND DEBATE:Douglas G Altman and J Martin BlandStatistics notes: How to randomiseBMJ Sep 1999; 319: 703 - 704

Journal Home

Extract

Full text

PDF

Chapter 3-7 (revision 14 Jul 2011) p. 18

why?

6) When the article appears, click on the Read responses to this article link, which brings up Simon’s response.

Chapter 3-7 (revision 14 Jul 2011) p. 19