STA 225 2.0 Design and Analysis of ExperimentsLecture 7 - 9Dr Thiyanga S. Talagala2021-11-16/ 2021-11-19/ 2021-11-261 / 52

Completely randomized design

Advantages of CRD

Easy to design
Analysis of data is simple and straight forward
Even if some values are missing the analysis can be done

Disadvantages of CRD

The design requires homogeneous set of experimental units
If the experimental units are not homogeneous, error component will be large and this will make the treatment comparison less efficient.

2 / 52

Randomized Complete Block Design (RCBD)

Suppose we want to compare $r$ treatments
To run a CRD, we need to find $n_{1} + n_{2} + . . . . + n_{r}$ homogeneous experimental units to apply treatment.
Often it is difficult to find enough experimental homomogeneous units.
However, it may be possible to find several blocks of experimental units, with enough homogeneous experimental units to apply $r$ treatments.
RCBD can be used to compare treatment means in such situations even though there are differences between blocks.

3 / 52

Nuisance factor

A factor that has an effect on the response, but we are not interested in that effect.

Unknown and uncontrollable: Randomization to balance out its effect

Known and uncontrollable but measurable: Analysis of covariance (ANCOVA) (at the time of the analysis)
Nuisance source of variability is Known and controllable: use Blocking to systematically eliminate its effect on the statistical comparison among treatment means.

4 / 52

Example

We want to test the difference between different fertilizers (A, B, C, D, E, F) on strawberry yield. The experimenter has decided to obtain four observations for each fertilizer type.

5 / 52

Example

We want to test the difference between different fertilizers (A, B, C, D, E, F) on strawberry yield. The experimenter has decided to obtain four observations for each fertilizer type. How many experimental units are required to compare the treatments?

5 / 52

6 / 52

Example (cont.)

The experiment is run at 4 different locations having 6 different plots of land each. Hence, a block is given by a location and an experimental unit by a plot of land.

7 / 52

Example (cont.)

randomized complete block design (RCBD)
the experimental units are homogeneous within a block
within each block, the 6 treatments are randomly assigned to 6 experimental units
each block (farms) contains all the treatments (fertilizer)
the design is called complete because we see the complete set of treatments within every block

8 / 52

RCBD

In-class diagram

9 / 52

RCBD

Randomization:

Number the $a$ treatments $1, 2, \dots, a$ .
First form the homogeneous blocks of the experimental units. Then allocate each treatment randomly in each block.
- Number the units in each block as $1, 2, . . ., a$ .
- Randomly allocate the $a$ treatments to $a$ experimental units in each block

10 / 52

RCBD

Replication

Since each block contains all the treatments, so every treatment will appear in all the blocks. So each treatment can be considered as if replicated the number of times as the number of blocks. Hence, in RCBD, the number of blocks and the number of replications are same.

11 / 52

Your turn

Question: How the 4 diets (A, B, C, and D) affect the coagulation of rabbits?

Treatment: Diet

Factor levels: A, B, C, D

Response: Time in seconds that it takes for a cut to stop bleeding (coagulation rate).

Experimental unit: 16 rabits

Replicates: 4

There are 16 rabbits (same age, weight, height)

12 / 52

Your turn

Which approach do you use to analyze the data? CRD or RCBD

13 / 52

Have a jar with the letters A, B, C, D written on separate slips. Catch a rabbit, pick a slip at random from the bowl and assign the rabbit to the diet letter on the slip. Do not replace the slip. Catch the second rabbit and select another slip from the remaining three slips. Assign that treatment to the second rabbit. Continue until the first four rabbits are assigned one of the four diets. Replace the slips and repeat the procedure until all rabbits are assigned to a diet.

Which approach do you use to analyze the data? CRD or RCBD

source: here

14 / 52

Notations

Block 1	Block 2	...	Block b
$y_{11}$	$y_{12}$	...	$y_{1 b}$
$y_{21}$	$y_{22}$	...	$y_{2 b}$
.
.
.
$y_{a 1}$	$y_{a 2}$	...	$y_{a b}$

Number of treatments: $a$

Number of blocks: $b$

15 / 52

"There is one observation per treatment in each block, and the order in which the treatments are run within each block is determined randomly. Because the only, randomization of treatment is within the blocks, we often say that the blocks represent a restriction on randomization". (Montgomery, Design and Analysis of Experiments, 2001)

16 / 52

Statistical Model for the RCBD

1. Mean model

$Y_{i j} = μ_{i j} + ϵ_{i j} {\begin{cases} i = 1, 2, . . ., a \\ j = 1, 2, . . ., b \end{cases}$

$a$ - number of treatments

$b$ - number of blocks

$μ_{i j}$ - mean of the $i$ th factor level or treatment and $j$ th block

$ϵ_{i j}$ - random error (It is assumed $ϵ_{i j}$ 's are independent and $N (0, σ^{2})$ )

17 / 52

Let

$μ$ - overall mean,

$τ_{i}$ - effect of the $i$ th treatment,

$β_{j}$ - effect of the $j$ th block,

Then, $μ_{i j} = μ + τ_{i} + β_{j}$ .

18 / 52

Statistical Model for the RCBD

2. Effects model

$Y_{i j} = μ + τ_{i} + β_{j} + ϵ_{i j} {\begin{cases} i = 1, 2, . . ., a \\ j = 1, 2, . . ., b \end{cases}$

$a$ - number of treatments

$b$ - number of blocks

$μ$ - overall mean

$τ_{i}$ - effect of the $i$ th treatment

$β_{j}$ - effect of the $j$ th block

$ϵ_{i j}$ - random error with $N I D (0, σ^{2})$

19 / 52

Two constraints

Treatment effect and block effects are the deviations from the overall mean. Hence

$\sum_{i = 1}^{a} τ_{i} = 0$ and

$\sum_{j = 1}^{b} β_{j} = 0$

20 / 52

In-class

In CRD

$Y_{i j} = μ + τ_{i} + ϵ_{i j} {\begin{cases} i = 1, 2, . . ., a \\ j = 1, 2, . . ., n \end{cases}$

$τ_{i} = μ + μ_{i}, i = 1, 2, . . . a$

$\frac{\sum_{i = 1}^{a} μ_{i}}{a} = μ$

This definition implies

$\sum_{i = 1}^{a} τ_{i} = 0$

21 / 52

In-class22 / 52

Hypothesis - RCBD23 / 52

We want to test the equality of the treatment means

$H_{0} : μ_{1} = μ_{2} = . . . = μ_{a}$ $H_{1} : at least one μ_{i} \neq μ_{j}$

The $i$ th treatment mean can be written as

$μ_{i} = \frac{\sum_{j = 1}^{b} (μ + τ_{i} + β_{j})}{b} = μ + τ_{i} .$ Hence, an equivalent way of writing the above hypothesis

$H_{0} : τ_{1} = τ_{2} = . . . = τ_{a} = 0$ $H_{1} : τ_{i} \neq 0 at least one i$

24 / 52

Notations

$y_{i .}$ - total of all observations taken under treatment $i$

Write the mathematical equation

$y_{. j}$ - total of all observations in block $j$

Write the mathematical equation

$y_{. .}$ - the grand total of all observations

Write the mathematical equation

$N = a b$ be the total number of observations

25 / 52

Notations

${\bar{y}}_{i .}$ - the average of the observations taken under treatment $i$

${\bar{y}}_{. j}$ - the average of the observations in block $j$

${\bar{y}}_{. .}$ - the grand average of all observations

${\bar{y}}_{i .} = \frac{y_{i .}}{b}$

${\bar{y}}_{. j} = \frac{y_{. j}}{a}$

${\bar{y}}_{. .} = \frac{y_{. .}}{N}$

26 / 52

We express the total corrected sum of squares

$\sum_{i = 1}^{a} \sum_{j = 1}^{b} (y_{i j} - {\bar{y}}_{. .})^{2} = \sum_{i = 1}^{a} \sum_{j = 1}^{b} [({\bar{y}}_{i .} - {\bar{y}}_{. .}) + ({\bar{y}}_{. j} - {\bar{y}}_{. .}) + (y_{i j} - {\bar{y}}_{i .} - {\bar{y}}_{. j} + {\bar{y}}_{. .})]^{2}$

Note:

The sum of squares is the sum of the squared values of a variable.
For example the total corrected sum of squares is the sum of the squared values after subtracting (i.e. correcting for) their mean.

27 / 52

Your turn

Show that the above can be simplified into

$\sum_{i = 1}^{a} \sum_{j = 1}^{b} (y_{i j} - {\bar{y}}_{. .})^{2} = b \sum_{i = 1}^{a} ({\bar{y}}_{i .} - {\bar{y}}_{. .})^{2} + a \sum_{j = 1}^{b} ({\bar{y}}_{. j} - {\bar{y}}_{. .})^{2} + \sum_{i = 1}^{a} \sum_{j = 1}^{b} (y_{i j} - {\bar{y}}_{. j} - {\bar{y}}_{i .} + {\bar{y}}_{. .})$

$S S_{T} = S S_{T r e a t m e n t} + S S_{B l o c k} + S S_{E}$

28 / 52

ANOVA Table

Source of variation
Sum of squares (SS)
DF
Mean Square (MS)
F
p-value


Treatments
SSTreatmentsSSTreatments
a−1a−1
MSTreatmentsMSTreatments
F0=MSTreatmentsMSEF0=MSTreatmentsMSE
P(F≥F0)P(F≥F0)


Blocks
SSBlocksSSBlocks
b−1b−1
MSBlocksMSBlocks




Error
SSESSE
(a−1)(b−1)(a−1)(b−1)
MSEMSE



Total
SSTSST
N−1N−1





29 / 52

Source of variation	Sum of squares (SS)	DF	Mean Square (MS)	F	p-value
Treatments	$S S_{T r e a t m e n t s}$	$a - 1$	$M S_{T r e a t m e n t s}$	$F_{0} = \frac{M S_{T r e a t m e n t s}}{M S_{E}}$	$P (F \geq F_{0})$
Blocks	$S S_{B l o c k s}$	$b - 1$	$M S_{B l o c k s}$
Error	$S S_{E}$	$(a - 1) (b - 1)$	$M S_{E}$
Total	$S S_{T}$	$N - 1$

Example

Section 4.1, Montgomery, D. C. (2017). Design and analysis of experiments. John wiley & sons.

Experiment: Hardness testing experiment

We wish to determine whether 4 different tips produce different (mean) hardness reading on a hardness tester.
A hardness testing machine operates by pressing a tip into a metal test “coupon.”
The hardness of the coupon is measured from the depth of the resulting depression. - Response variable
Four tip types are being tested to see if they produce significantly different readings. - Treatment

30 / 52

Example (cont.)

The coupons might differ slightly in their hardness (for example, if they are taken from ingots produced in different heats). - Block
Within each coupon (block) the order in which the four tips were tested was randomly determined.

References: https://web.ma.utexas.edu/users/mks/384Esp08/rcbdexample.pdf

31 / 52

Example (cont.)

tip <- c(1, 2, 3, 4)
coupon1 <- c(9.3, 9.4, 9.2, 9.7)
coupon2 <- c(9.4, 9.3, 9.4, 9.6)
coupon3 <- c(9.6, 9.8, 9.5, 10.0)
coupon4 <- c(10.0, 9.9, 9.7, 10.2)
df <- data.frame(tip=tip, coupon1=coupon1, coupon2=coupon2, coupon3=coupon3, coupon4=coupon4)
df

  tip coupon1 coupon2 coupon3 coupon4
1   1     9.3     9.4     9.6    10.0
2   2     9.4     9.3     9.8     9.9
3   3     9.2     9.4     9.5     9.7
4   4     9.7     9.6    10.0    10.2

32 / 52

summary(df[, 2:5])

    coupon1         coupon2         coupon3          coupon4     
 Min.   :9.200   Min.   :9.300   Min.   : 9.500   Min.   : 9.70  
 1st Qu.:9.275   1st Qu.:9.375   1st Qu.: 9.575   1st Qu.: 9.85  
 Median :9.350   Median :9.400   Median : 9.700   Median : 9.95  
 Mean   :9.400   Mean   :9.425   Mean   : 9.725   Mean   : 9.95  
 3rd Qu.:9.475   3rd Qu.:9.450   3rd Qu.: 9.850   3rd Qu.:10.05  
 Max.   :9.700   Max.   :9.600   Max.   :10.000   Max.   :10.20

33 / 52

library(tidyverse)
df.pl <- df %>% pivot_longer(2:5, "Block", "value")
df.pl

# A tibble: 16 × 3
     tip Block   value
   <dbl> <chr>   <dbl>
 1     1 coupon1   9.3
 2     1 coupon2   9.4
 3     1 coupon3   9.6
 4     1 coupon4  10  
 5     2 coupon1   9.4
 6     2 coupon2   9.3
 7     2 coupon3   9.8
 8     2 coupon4   9.9
 9     3 coupon1   9.2
10     3 coupon2   9.4
11     3 coupon3   9.5
12     3 coupon4   9.7
13     4 coupon1   9.7
14     4 coupon2   9.6
15     4 coupon3  10  
16     4 coupon4  10.2

34 / 52

library(tidyverse)
df.pl$tip <- as.factor(df.pl$tip)
df.block <- df.pl %>% group_by(Block) %>% summarize(mean = mean(value))
df.block

# A tibble: 4 × 2
  Block    mean
  <chr>   <dbl>
1 coupon1  9.4 
2 coupon2  9.43
3 coupon3  9.72
4 coupon4  9.95

df.treatment <- df.pl %>% group_by(tip) %>% summarize(mean = mean(value))
df.treatment

# A tibble: 4 × 2
  tip    mean
  <fct> <dbl>
1 1      9.57
2 2      9.6 
3 3      9.45
4 4      9.88

35 / 52

Block

36 / 52

Treatment

37 / 52

ANOVA

two.way <- aov(value~ tip + Block, data = df.pl)
summary(two.way)

            Df Sum Sq Mean Sq F value   Pr(>F)    
tip          3  0.385 0.12833   14.44 0.000871 ***
Block        3  0.825 0.27500   30.94 4.52e-05 ***
Residuals    9  0.080 0.00889                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

38 / 52

Model Adequacy Checking39 / 52

Residuals

library(broom)
residdf <- augment(two.way)

Warning: Tidiers for objects of class aov are not maintained by the broom team,
and are only supported through the lm tidier method. Please be cautious in
interpreting and reporting broom output.

residdf

# A tibble: 16 × 9
   value tip   Block   .fitted  .resid  .hat .sigma  .cooksd .std.resid
   <dbl> <fct> <chr>     <dbl>   <dbl> <dbl>  <dbl>    <dbl>      <dbl>
 1   9.3 1     coupon1    9.35 -0.0500 0.438 0.0972 5.56e- 2     -0.707
 2   9.4 1     coupon2    9.38  0.0250 0.438 0.0993 1.39e- 2      0.354
 3   9.6 1     coupon3    9.67 -0.0750 0.438 0.0935 1.25e- 1     -1.06 
 4  10   1     coupon4    9.9   0.100  0.438 0.0882 2.22e- 1      1.41 
 5   9.4 2     coupon1    9.38  0.0250 0.437 0.0993 1.39e- 2      0.354
 6   9.3 2     coupon2    9.4  -0.100  0.438 0.0882 2.22e- 1     -1.41 
 7   9.8 2     coupon3    9.7   0.100  0.437 0.0882 2.22e- 1      1.41 
 8   9.9 2     coupon4    9.93 -0.0250 0.437 0.0993 1.39e- 2     -0.354
 9   9.2 3     coupon1    9.22 -0.0250 0.438 0.0993 1.39e- 2     -0.354
10   9.4 3     coupon2    9.25  0.150  0.438 0.0707 5.00e- 1      2.12 
11   9.5 3     coupon3    9.55 -0.0500 0.438 0.0972 5.56e- 2     -0.707
12   9.7 3     coupon4    9.78 -0.0750 0.438 0.0935 1.25e- 1     -1.06 
13   9.7 4     coupon1    9.65  0.0500 0.438 0.0972 5.56e- 2      0.707
14   9.6 4     coupon2    9.68 -0.0750 0.438 0.0935 1.25e- 1     -1.06 
15  10   4     coupon3    9.98  0.0250 0.437 0.0993 1.39e- 2      0.354
16  10.2 4     coupon4   10.2   0      0.438 0.100  2.81e-30      0

40 / 52

The normality assumption

ggplot(residdf, 
       aes(x=.resid))+
  geom_histogram(colour="white")+ggtitle("Distribution of Residuals")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(residdf, 
       aes(sample=.resid))+
  stat_qq() + stat_qq_line()+labs(x="Theoretical Quantiles", y="Sample Quantiles")

41 / 52

The normality assumption (cont.)

shapiro.test(residdf$.resid)


    Shapiro-Wilk normality test
data:  residdf$.resid
W = 0.93957, p-value = 0.3438

42 / 52

Plot of the residuals vs type

ggplot(data=residdf, aes(x=tip, y=.resid)) + geom_point()

43 / 52

Plot of the residuals vs block

ggplot(data=residdf, aes(x=Block, y=.resid)) + geom_point()

44 / 52

Plot of residuals versus fitted values

To check the assumption of constant variance
Residuals should be structureless. Residuals should not contain any obvious patterns

ggplot(data=residdf, aes(x=.fitted, y=.resid)) + geom_point()

45 / 52

ANOVA - Computing formulas

$S S_{T} = \sum_{i = 1}^{a} \sum_{j = 1}^{b} y_{i j}^{2} - \frac{y_{. .}^{2}}{N}$ $S S_{T r e a t m e n t s} = \frac{1}{b} \sum_{i = 1}^{a} y_{i .}^{2} - \frac{y_{. .}^{2}}{N}$

$S S_{B l o c k s} = \frac{1}{a} \sum_{i = j = 1}^{b} y_{. j}^{2} - \frac{y_{. .}^{2}}{N}$

$S S_{E} = S S_{T} - S S_{T r e a t m e n t s} - S S_{B l o c k s}$

46 / 52

Expected value of mean squares, if treatments and blocks are fixed

$E (M S_{T r e a t m e n t s}) = σ^{2} + \frac{b \sum_{i = 1}^{a} τ_{i}^{2}}{a - 1}$

$E (M S_{B l o c k s}) = σ^{2} + \frac{a \sum_{j = 1}^{b} β_{j}^{2}}{b - 1}$

$E (M S_{E}) = σ^{2}$

47 / 52

Test the equality of treatment means

$F_{0} = \frac{M S_{T r e a t m e n t s}}{M S_{E}}$ which is distributed as $F_{a - 1, (a - 1) (b - 1)}$

Comparing block treatment means

$F_{0} = \frac{M S_{B l o c k s}}{M S_{E}}$ which is distributed as $F_{b - 1, (a - 1) (b - 1)}$

48 / 52

RCBD - Multiple Comparisons

The rejection of the null hypothesis indicates a significant difference in treatment means.
Any of the multiple comparison methods can be used for detecting which means are significantly different. In this course we use, Tukey's method can be used.
In general we are not interested to compare block means.

49 / 52

TukeyHSD(two.way,  "tip")

  Tukey multiple comparisons of means
    95% family-wise confidence level
Fit: aov(formula = value ~ tip + Block, data = df.pl)
$tip
      diff         lwr        upr     p adj
2-1  0.025 -0.18311992 0.23311992 0.9809005
3-1 -0.125 -0.33311992 0.08311992 0.3027563
4-1  0.300  0.09188008 0.50811992 0.0066583
3-2 -0.150 -0.35811992 0.05811992 0.1815907
4-2  0.275  0.06688008 0.48311992 0.0113284
4-3  0.425  0.21688008 0.63311992 0.0006061

plot(TukeyHSD(two.way,  "tip"))

50 / 52

Randomized Incomplete Block Designs

Every treatment is not present in every block

Balanced Incomplete Block Design (BIBD)

All pairs of treatments occur together within a block an equal number of times

51 / 52

Acknowledgement

Some of the slide content is based on

Montgomery, D. C. (2017). Design and analysis of experiments. John wiley & sons.

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

STA 225 2.0 Design and Analysis of Experiments

Lecture 7 - 9

Dr Thiyanga S. Talagala

2021-11-16/ 2021-11-19/ 2021-11-26

Completely randomized design

Advantages of CRD

Disadvantages of CRD

Randomized Complete Block Design (RCBD)

Nuisance factor

Example

Example

Example (cont.)

Example (cont.)

RCBD

RCBD

Randomization:

RCBD

Replication

Your turn

Your turn

Which approach do you use to analyze the data? CRD or RCBD

Notations

Statistical Model for the RCBD

Statistical Model for the RCBD

Two constraints

In-class

In-class

Hypothesis - RCBD

Notations

Notations

Your turn

ANOVA Table

Example

Example (cont.)

Example (cont.)

Block

Treatment

ANOVA

Model Adequacy Checking

Residuals

The normality assumption

The normality assumption (cont.)

Plot of the residuals vs type

Plot of the residuals vs block

Plot of residuals versus fitted values

ANOVA - Computing formulas

Expected value of mean squares, if treatments and blocks are fixed

Test the equality of treatment means

Comparing block treatment means

RCBD - Multiple Comparisons

Randomized Incomplete Block Designs

Balanced Incomplete Block Design (BIBD)

Acknowledgement

Completely randomized design

Advantages of CRD

Disadvantages of CRD

Help