+ - 0:00:00
Notes for current slide
Notes for next slide

STA 225 2.0 Design and Analysis of Experiments

Lecture 7 - 9

Dr Thiyanga S. Talagala

2021-11-16/ 2021-11-19/ 2021-11-26

1 / 52

Completely randomized design

Advantages of CRD

  • Easy to design

  • Analysis of data is simple and straight forward

  • Even if some values are missing the analysis can be done

Disadvantages of CRD

  • The design requires homogeneous set of experimental units

  • If the experimental units are not homogeneous, error component will be large and this will make the treatment comparison less efficient.

2 / 52

Randomized Complete Block Design (RCBD)

  • Suppose we want to compare r treatments

  • To run a CRD, we need to find n1+n2+....+nr homogeneous experimental units to apply treatment.

  • Often it is difficult to find enough experimental homomogeneous units.

  • However, it may be possible to find several blocks of experimental units, with enough homogeneous experimental units to apply r treatments.

  • RCBD can be used to compare treatment means in such situations even though there are differences between blocks.

3 / 52

Nuisance factor

A factor that has an effect on the response, but we are not interested in that effect.

  • Unknown and uncontrollable: Randomization to balance out its effect
  • Known and uncontrollable but measurable: Analysis of covariance (ANCOVA) (at the time of the analysis)

  • Nuisance source of variability is Known and controllable: use Blocking to systematically eliminate its effect on the statistical comparison among treatment means.

4 / 52

Example

We want to test the difference between different fertilizers (A, B, C, D, E, F) on strawberry yield. The experimenter has decided to obtain four observations for each fertilizer type.

5 / 52

Example

We want to test the difference between different fertilizers (A, B, C, D, E, F) on strawberry yield. The experimenter has decided to obtain four observations for each fertilizer type. How many experimental units are required to compare the treatments?

5 / 52

6 / 52

Example (cont.)

The experiment is run at 4 different locations having 6 different plots of land each. Hence, a block is given by a location and an experimental unit by a plot of land.

7 / 52

Example (cont.)

  • randomized complete block design (RCBD)

  • the experimental units are homogeneous within a block

  • within each block, the 6 treatments are randomly assigned to 6 experimental units

  • each block (farms) contains all the treatments (fertilizer)

  • the design is called complete because we see the complete set of treatments within every block

8 / 52

RCBD

In-class diagram

9 / 52

RCBD

Randomization:

  • Number the a treatments 1,2,,a.

  • First form the homogeneous blocks of the experimental units. Then allocate each treatment randomly in each block.

    • Number the units in each block as 1,2,...,a.
    • Randomly allocate the a treatments to a experimental units in each block
10 / 52

RCBD

Replication

Since each block contains all the treatments, so every treatment will appear in all the blocks. So each treatment can be considered as if replicated the number of times as the number of blocks. Hence, in RCBD, the number of blocks and the number of replications are same.

11 / 52

Your turn

Question: How the 4 diets (A, B, C, and D) affect the coagulation of rabbits?

Treatment: Diet

Factor levels: A, B, C, D

Response: Time in seconds that it takes for a cut to stop bleeding (coagulation rate).

Experimental unit: 16 rabits

Replicates: 4

There are 16 rabbits (same age, weight, height)

12 / 52

Your turn

Which approach do you use to analyze the data? CRD or RCBD

13 / 52

Have a jar with the letters A, B, C, D written on separate slips. Catch a rabbit, pick a slip at random from the bowl and assign the rabbit to the diet letter on the slip. Do not replace the slip. Catch the second rabbit and select another slip from the remaining three slips. Assign that treatment to the second rabbit. Continue until the first four rabbits are assigned one of the four diets. Replace the slips and repeat the procedure until all rabbits are assigned to a diet.

Which approach do you use to analyze the data? CRD or RCBD

source: here

14 / 52

Notations

Block 1 Block 2 ... Block b
y11 y12 ... y1b
y21 y22 ... y2b
.
.
.
ya1 ya2 ... yab

Number of treatments: a

Number of blocks: b

15 / 52

"There is one observation per treatment in each block, and the order in which the treatments are run within each block is determined randomly. Because the only, randomization of treatment is within the blocks, we often say that the blocks represent a restriction on randomization". (Montgomery, Design and Analysis of Experiments, 2001)

16 / 52

Statistical Model for the RCBD

1. Mean model

Yij=μij+ϵij{i=1,2,...,aj=1,2,...,b

a - number of treatments

b - number of blocks

μij - mean of the ith factor level or treatment and jth block

ϵij - random error (It is assumed ϵij's are independent and N(0,σ2))

17 / 52

Let

μ - overall mean,

τi - effect of the ith treatment,

βj - effect of the jth block,

Then, μij=μ+τi+βj.

18 / 52

Statistical Model for the RCBD

2. Effects model

Yij=μ+τi+βj+ϵij{i=1,2,...,aj=1,2,...,b

a - number of treatments

b - number of blocks

μ - overall mean

τi - effect of the ith treatment

βj - effect of the jth block

ϵij - random error with NID(0,σ2)

19 / 52

Two constraints

Treatment effect and block effects are the deviations from the overall mean. Hence

i=1aτi=0 and

j=1bβj=0

20 / 52

In-class

In CRD

Yij=μ+τi+ϵij{i=1,2,...,aj=1,2,...,n

τi=μ+μi, i=1,2,...a

i=1aμia=μ

This definition implies

i=1aτi=0

21 / 52

In-class

22 / 52

Hypothesis - RCBD

23 / 52

We want to test the equality of the treatment means

H0:μ1=μ2=...=μa H1:at least one μiμj

The ith treatment mean can be written as

μi=j=1b(μ+τi+βj)b=μ+τi. Hence, an equivalent way of writing the above hypothesis

H0:τ1=τ2=...=τa=0 H1:τi0 at least one i

24 / 52

Notations

yi. - total of all observations taken under treatment i

Write the mathematical equation

y.j - total of all observations in block j

Write the mathematical equation

y.. - the grand total of all observations

Write the mathematical equation

N=ab be the total number of observations

25 / 52

Notations

y¯i. - the average of the observations taken under treatment i

y¯.j - the average of the observations in block j

y¯.. - the grand average of all observations

y¯i.=yi.b

y¯.j=y.ja

y¯..=y..N

26 / 52

We express the total corrected sum of squares

i=1aj=1b(yijy¯..)2=i=1aj=1b[(y¯i.y¯..)+(y¯.jy¯..)+(yijy¯i.y¯.j+y¯..)]2

Note:

  • The sum of squares is the sum of the squared values of a variable.

  • For example the total corrected sum of squares is the sum of the squared values after subtracting (i.e. correcting for) their mean.

27 / 52

Your turn

i=1aj=1b(yijy¯..)2=i=1aj=1b[(y¯i.y¯..)+(y¯.jy¯..)+(yijy¯i.y¯.j+y¯..)]2

Show that the above can be simplified into

i=1aj=1b(yijy¯..)2=bi=1a(y¯i.y¯..)2+aj=1b(y¯.jy¯..)2+i=1aj=1b(yijy¯.jy¯i.+y¯..)

SST=SSTreatment+SSBlock+SSE

28 / 52

ANOVA Table

Source of variation Sum of squares (SS) DF Mean Square (MS) F p-value
Treatments SSTreatments a1 MSTreatments F0=MSTreatmentsMSE P(FF0)
Blocks SSBlocks b1 MSBlocks
Error SSE (a1)(b1) MSE
Total SST N1
29 / 52

Example

Section 4.1, Montgomery, D. C. (2017). Design and analysis of experiments. John wiley & sons.

Experiment: Hardness testing experiment

  • We wish to determine whether 4 different tips produce different (mean) hardness reading on a hardness tester.

  • A hardness testing machine operates by pressing a tip into a metal test “coupon.”

  • The hardness of the coupon is measured from the depth of the resulting depression. - Response variable

  • Four tip types are being tested to see if they produce significantly different readings. - Treatment

30 / 52

Example (cont.)

  • The coupons might differ slightly in their hardness (for example, if they are taken from ingots produced in different heats). - Block

  • Within each coupon (block) the order in which the four tips were tested was randomly determined.

References: https://web.ma.utexas.edu/users/mks/384Esp08/rcbdexample.pdf

31 / 52

Example (cont.)

tip <- c(1, 2, 3, 4)
coupon1 <- c(9.3, 9.4, 9.2, 9.7)
coupon2 <- c(9.4, 9.3, 9.4, 9.6)
coupon3 <- c(9.6, 9.8, 9.5, 10.0)
coupon4 <- c(10.0, 9.9, 9.7, 10.2)
df <- data.frame(tip=tip, coupon1=coupon1, coupon2=coupon2, coupon3=coupon3, coupon4=coupon4)
df
tip coupon1 coupon2 coupon3 coupon4
1 1 9.3 9.4 9.6 10.0
2 2 9.4 9.3 9.8 9.9
3 3 9.2 9.4 9.5 9.7
4 4 9.7 9.6 10.0 10.2
32 / 52
summary(df[, 2:5])
coupon1 coupon2 coupon3 coupon4
Min. :9.200 Min. :9.300 Min. : 9.500 Min. : 9.70
1st Qu.:9.275 1st Qu.:9.375 1st Qu.: 9.575 1st Qu.: 9.85
Median :9.350 Median :9.400 Median : 9.700 Median : 9.95
Mean :9.400 Mean :9.425 Mean : 9.725 Mean : 9.95
3rd Qu.:9.475 3rd Qu.:9.450 3rd Qu.: 9.850 3rd Qu.:10.05
Max. :9.700 Max. :9.600 Max. :10.000 Max. :10.20
33 / 52
library(tidyverse)
df.pl <- df %>% pivot_longer(2:5, "Block", "value")
df.pl
# A tibble: 16 × 3
tip Block value
<dbl> <chr> <dbl>
1 1 coupon1 9.3
2 1 coupon2 9.4
3 1 coupon3 9.6
4 1 coupon4 10
5 2 coupon1 9.4
6 2 coupon2 9.3
7 2 coupon3 9.8
8 2 coupon4 9.9
9 3 coupon1 9.2
10 3 coupon2 9.4
11 3 coupon3 9.5
12 3 coupon4 9.7
13 4 coupon1 9.7
14 4 coupon2 9.6
15 4 coupon3 10
16 4 coupon4 10.2
34 / 52
library(tidyverse)
df.pl$tip <- as.factor(df.pl$tip)
df.block <- df.pl %>% group_by(Block) %>% summarize(mean = mean(value))
df.block
# A tibble: 4 × 2
Block mean
<chr> <dbl>
1 coupon1 9.4
2 coupon2 9.43
3 coupon3 9.72
4 coupon4 9.95
df.treatment <- df.pl %>% group_by(tip) %>% summarize(mean = mean(value))
df.treatment
# A tibble: 4 × 2
tip mean
<fct> <dbl>
1 1 9.57
2 2 9.6
3 3 9.45
4 4 9.88
35 / 52

Block

36 / 52

Treatment

37 / 52

ANOVA

two.way <- aov(value~ tip + Block, data = df.pl)
summary(two.way)
Df Sum Sq Mean Sq F value Pr(>F)
tip 3 0.385 0.12833 14.44 0.000871 ***
Block 3 0.825 0.27500 30.94 4.52e-05 ***
Residuals 9 0.080 0.00889
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
38 / 52

Model Adequacy Checking

39 / 52

Residuals

library(broom)
residdf <- augment(two.way)
Warning: Tidiers for objects of class aov are not maintained by the broom team,
and are only supported through the lm tidier method. Please be cautious in
interpreting and reporting broom output.
residdf
# A tibble: 16 × 9
value tip Block .fitted .resid .hat .sigma .cooksd .std.resid
<dbl> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 9.3 1 coupon1 9.35 -0.0500 0.438 0.0972 5.56e- 2 -0.707
2 9.4 1 coupon2 9.38 0.0250 0.438 0.0993 1.39e- 2 0.354
3 9.6 1 coupon3 9.67 -0.0750 0.438 0.0935 1.25e- 1 -1.06
4 10 1 coupon4 9.9 0.100 0.438 0.0882 2.22e- 1 1.41
5 9.4 2 coupon1 9.38 0.0250 0.437 0.0993 1.39e- 2 0.354
6 9.3 2 coupon2 9.4 -0.100 0.438 0.0882 2.22e- 1 -1.41
7 9.8 2 coupon3 9.7 0.100 0.437 0.0882 2.22e- 1 1.41
8 9.9 2 coupon4 9.93 -0.0250 0.437 0.0993 1.39e- 2 -0.354
9 9.2 3 coupon1 9.22 -0.0250 0.438 0.0993 1.39e- 2 -0.354
10 9.4 3 coupon2 9.25 0.150 0.438 0.0707 5.00e- 1 2.12
11 9.5 3 coupon3 9.55 -0.0500 0.438 0.0972 5.56e- 2 -0.707
12 9.7 3 coupon4 9.78 -0.0750 0.438 0.0935 1.25e- 1 -1.06
13 9.7 4 coupon1 9.65 0.0500 0.438 0.0972 5.56e- 2 0.707
14 9.6 4 coupon2 9.68 -0.0750 0.438 0.0935 1.25e- 1 -1.06
15 10 4 coupon3 9.98 0.0250 0.437 0.0993 1.39e- 2 0.354
16 10.2 4 coupon4 10.2 0 0.438 0.100 2.81e-30 0
40 / 52

The normality assumption

ggplot(residdf,
aes(x=.resid))+
geom_histogram(colour="white")+ggtitle("Distribution of Residuals")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(residdf,
aes(sample=.resid))+
stat_qq() + stat_qq_line()+labs(x="Theoretical Quantiles", y="Sample Quantiles")

41 / 52

The normality assumption (cont.)

shapiro.test(residdf$.resid)
Shapiro-Wilk normality test
data: residdf$.resid
W = 0.93957, p-value = 0.3438
42 / 52

Plot of the residuals vs type

ggplot(data=residdf, aes(x=tip, y=.resid)) + geom_point()

43 / 52

Plot of the residuals vs block

ggplot(data=residdf, aes(x=Block, y=.resid)) + geom_point()

44 / 52

Plot of residuals versus fitted values

  • To check the assumption of constant variance

  • Residuals should be structureless. Residuals should not contain any obvious patterns

ggplot(data=residdf, aes(x=.fitted, y=.resid)) + geom_point()

45 / 52

ANOVA - Computing formulas

SST=i=1aj=1byij2y..2N SSTreatments=1bi=1ayi.2y..2N

SSBlocks=1ai=j=1by.j2y..2N

SSE=SSTSSTreatmentsSSBlocks

46 / 52

Expected value of mean squares, if treatments and blocks are fixed

E(MSTreatments)=σ2+bi=1aτi2a1

E(MSBlocks)=σ2+aj=1bβj2b1

E(MSE)=σ2

47 / 52

Test the equality of treatment means

F0=MSTreatmentsMSE which is distributed as Fa1,(a1)(b1)

Comparing block treatment means

F0=MSBlocksMSE which is distributed as Fb1,(a1)(b1)

48 / 52

RCBD - Multiple Comparisons

  • The rejection of the null hypothesis indicates a significant difference in treatment means.

  • Any of the multiple comparison methods can be used for detecting which means are significantly different. In this course we use, Tukey's method can be used.

  • In general we are not interested to compare block means.

49 / 52
TukeyHSD(two.way, "tip")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = value ~ tip + Block, data = df.pl)
$tip
diff lwr upr p adj
2-1 0.025 -0.18311992 0.23311992 0.9809005
3-1 -0.125 -0.33311992 0.08311992 0.3027563
4-1 0.300 0.09188008 0.50811992 0.0066583
3-2 -0.150 -0.35811992 0.05811992 0.1815907
4-2 0.275 0.06688008 0.48311992 0.0113284
4-3 0.425 0.21688008 0.63311992 0.0006061
plot(TukeyHSD(two.way, "tip"))

50 / 52

Randomized Incomplete Block Designs

Every treatment is not present in every block

Balanced Incomplete Block Design (BIBD)

All pairs of treatments occur together within a block an equal number of times

51 / 52

Acknowledgement

Some of the slide content is based on

Montgomery, D. C. (2017). Design and analysis of experiments. John wiley & sons.

52 / 52

Completely randomized design

Advantages of CRD

  • Easy to design

  • Analysis of data is simple and straight forward

  • Even if some values are missing the analysis can be done

Disadvantages of CRD

  • The design requires homogeneous set of experimental units

  • If the experimental units are not homogeneous, error component will be large and this will make the treatment comparison less efficient.

2 / 52
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow