galtonfamiliesall.csv

galtonfamiliesmain.csv

galtonfamiliessub.csv

galtonparentheights.csv

galtonfamiliesnotebook.csv

See also Pearson Height Dataset and Anthropometric Dataset

Description

Francis Galton, a cousin of Charles Darwin, studied the relationship between parent heights and the heights of their offspring. From his original article on regression, cited below: “My data consisted of the heights of 930 [sic] adult children and of their respective parentages, 205 in number. In every case I transmuted the female statures to their corresponding male equivalents and used them in their transmuted form… The factor I used was 1.08, which is equivalent to adding a little less than one-twelfth to each female height. It differs a very little from the factors employed by other anthropologists…”

The galtonfamiliesmain dataset was created under the direction of Dr. James A. Hanley from Galton’s original paper notebooks. Eight families were left out for illustrative purposes. The “female statures” are in their raw (untransmuted) form. Information about the eight families is found in the galtonfamiliessub dataset. The galtonfamiliesall dataset has all of the families together. The galtonparentheights dataset contains just the heights of the parents.

Variables—Full and Main Dataset

Rows: 934
Columns: 6
$ FamilyID <chr> "1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "4", "4", "…
$ Children <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6…
$ Father   <dbl> 78.5, 78.5, 78.5, 78.5, 75.5, 75.5, 75.5, 75.5, 75.0, 75.0, 7…
$ Mother   <dbl> 67.0, 67.0, 67.0, 67.0, 66.5, 66.5, 66.5, 66.5, 64.0, 64.0, 6…
$ Child    <chr> "Son", "Daughter", "Daughter", "Daughter", "Son", "Son", "Dau…
$ Height   <dbl> 73.2, 69.2, 69.0, 69.0, 73.5, 72.5, 65.5, 65.5, 71.0, 68.0, 7…
# A tibble: 6 × 6
  FamilyID Children Father Mother Child    Height
  <chr>       <dbl>  <dbl>  <dbl> <chr>     <dbl>
1 1               4   78.5   67   Son        73.2
2 1               4   78.5   67   Daughter   69.2
3 1               4   78.5   67   Daughter   69  
4 1               4   78.5   67   Daughter   69  
5 2               4   75.5   66.5 Son        73.5
6 2               4   75.5   66.5 Son        72.5
Rows: 898
Columns: 6
$ FamilyID <chr> "1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "4", "4", "…
$ Children <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6…
$ Father   <dbl> 78.5, 78.5, 78.5, 78.5, 75.5, 75.5, 75.5, 75.5, 75.0, 75.0, 7…
$ Mother   <dbl> 67.0, 67.0, 67.0, 67.0, 66.5, 66.5, 66.5, 66.5, 64.0, 64.0, 6…
$ Child    <chr> "Son", "Daughter", "Daughter", "Daughter", "Son", "Son", "Dau…
$ Height   <dbl> 73.2, 69.2, 69.0, 69.0, 73.5, 72.5, 65.5, 65.5, 71.0, 68.0, 7…
# A tibble: 6 × 6
  FamilyID Children Father Mother Child    Height
  <chr>       <dbl>  <dbl>  <dbl> <chr>     <dbl>
1 1               4   78.5   67   Son        73.2
2 1               4   78.5   67   Daughter   69.2
3 1               4   78.5   67   Daughter   69  
4 1               4   78.5   67   Daughter   69  
5 2               4   75.5   66.5 Son        73.5
6 2               4   75.5   66.5 Son        72.5

Variables—Subset Dataset

Rows: 36
Columns: 6
$ FamilyID <dbl> 13, 13, 50, 50, 84, 84, 84, 84, 84, 111, 120, 120, 120, 120, …
$ Children <dbl> 2, 2, 2, 2, 4, 4, 4, 4, 4, 1, 11, 11, 11, 11, 11, 11, 11, 11,…
$ FatherR  <dbl> 13.0, 13.0, 11.0, 11.0, 10.5, 10.5, 10.5, 10.5, 10.5, 9.0, 9.…
$ MotherR  <dbl> 7.0, 7.0, 5.4, 5.4, 3.0, 3.0, 3.0, 3.0, 3.0, 3.5, 2.0, 2.0, 2…
$ Child    <chr> "Son", "Daughter", "Son", "Daughter", "Son", "Son", "Son", "D…
$ HeightR  <dbl> 11.0, 2.0, 13.0, 2.0, 10.0, 8.5, 5.5, 5.5, 3.5, 5.5, 12.0, 10…
# A tibble: 6 × 6
  FamilyID Children FatherR MotherR Child    HeightR
     <dbl>    <dbl>   <dbl>   <dbl> <chr>      <dbl>
1       13        2    13       7   Son         11  
2       13        2    13       7   Daughter     2  
3       50        2    11       5.4 Son         13  
4       50        2    11       5.4 Daughter     2  
5       84        4    10.5     3   Son         10  
6       84        4    10.5     3   Son          8.5

Variables—Main Parents Only

Rows: 205
Columns: 3
$ FamilyID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ Father   <dbl> 78.5, 75.5, 75.0, 75.0, 75.0, 74.0, 74.0, 74.0, 74.5, 74.0, 7…
$ Mother   <dbl> 67.0, 66.5, 64.0, 64.0, 58.5, 68.0, 68.0, 66.5, 66.0, 65.5, 6…
# A tibble: 6 × 3
  FamilyID Father Mother
     <dbl>  <dbl>  <dbl>
1        1   78.5   67  
2        2   75.5   66.5
3        3   75     64  
4        4   75     64  
5        5   75     58.5
6        6   74     68  

References

Galton’s family data on human stature

Galton, Francis. (1886). Regression toward mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland, 15, pp. 246-263.