Reads and does basic cleaning on the Health Survey for England 2007.

read_2007(
  root = c("X:/", "/Volumes/Shared/")[1],
  file =
    "HAR_PR/PR/Consumption_TA/HSE/Health Survey for England (HSE)/HSE 2007/UKDA-6112-tab/tab/hse07ai.tab",
  select_cols = c("tobalc", "all")[1]
)

Arguments

root

Character string - the root directory. This is the section of the file path to where the data is stored that might vary depending on how the network drive is being accessed. The default is "X:/", which corresponds to the University of Sheffield's X drive in the School of Health and Related Research. Within the function, the root is pasted onto the front of the rest of the file path specified in the 'file' argument. Thus, if root = NULL, then the complete file path is given in the 'file' argument.

file

Character string - the file path and the name and extension of the file. The function has been designed and tested to work with tab delimited files '.tab'. Files are read by the function [data.table::fread].

select_cols

Character string - select either: "all" - keep all variables in the survey data; "tobalc" - keep a reduced set of variables associated with tobacco and alcohol consumption and a selected set of survey design and socio-demographic variables that are needed for the functions within the hseclean package to work.

Value

Returns a data table.

Survey details

The Health Survey for England 2007 was designed to provide data at both national and regional level about the population living in private households in England. The sample for the HSE 2007 comprised of two components: the core (general population) sample and a boost sample of children aged 2-15. The core sample was designed to be representative of the population living in private households in England and should be used for analyses at the national level. The core sample was split in two for some modules of the 2007 survey, further details are shown in Appendix A.

A random sample of 720 PSUs (Primary Sampling Units) was selected for the core and the boost sample, an additional 180 PSUs were used to supplement the child boost sample. The PSUs were selected with probability proportional to the total number of addresses within them. Once selected, the PSUs were randomly allocated to the 12 months of the year (60 per month in the core sample, 15 per month in the additional child boost) so that each quarter provided a nationally representative sample.

Within each of the 720 core PSUs a sample of 36 addresses was selected. The selected addresses were randomly allocated to either the core or child boost sample: 10 addresses to the core sample and 26 to the child boost sample. In total therefore, there were 10 core addresses allocated within each PSU, giving a total sample of 7,200 (720 x 10) core addresses, and 18,720 child boost addresses (720 x 26).

For the 180 additional child boost PSUs, a random sample of 41 addresses was selected in each PSU, giving a total sample of 7,380 addresses (180 x 41) for the additional child boost sample. The total child boost sample was thus 26,100 addresses (18,720 from the child boost sample in core points and 7,380 from the additional child boost sample).

For the HSE core sample, all adults aged 16 years or older at each household were selected for the interview (up to a maximum of ten adults). However, a limit of two was placed on the number of interviews carried out with children aged 0-15. For households with three or more children, interviewers selected two children at random.

At boost addresses interviewers screened for households containing at least one child aged 2-15 years. For households which included eligible children, up to two were selected by the interviewer for inclusion in the survey.

An interview with each eligible person was followed by a nurse visit both using computer assisted interviewing (CAPI). The 2007 survey for adults focused on lifestyle behaviour, knowledge and attitudes. Adults were asked modules of questions on general health, alcohol consumption, smoking, and fruit and vegetable consumption. Knowledge and attitudes were covered in self-completion questionnaires.

Children aged 13-15 were interviewed themselves, and parents of children aged 0-12 were asked about their children, with the interview including questions on eating habits (fat and sugar consumption) and fruit and vegetable consumption. Children in the boost sample only were asked about physical activity.

Weighting

Individual weight

For analyses at the individual level, the weighting variable to use is (wt_int). These weights are generated separately for adults and children:

  • for adults (aged 16 or more), the interview weights are a combination of the household weight and a component which adjusts the sample to reduce bias from individual nonresponse within households;

  • for children (aged 0 to 15), the weights are generated from the household weights and the child selection weights – the selection weights correct for only including a maximum of two children in a household. The combined household and child selection weight were adjusted to ensure that the weighted age/sex distribution matched that of all children in co-operating households.

For analysis of children aged 0-15 in both the Core and the Boost sample, taking into account child selection only and not adjusting for non-response, the (wt_child) variable can be used. For analysis of children aged 2-15 in the only Boost sample the (wt_childb) variable can

Missing values

  • -1 Not applicable: Used to signify that a particular variable did not apply to a given respondent usually because of internal routing. For example, men in women only questions.

  • -2 Schedule not applicable: Used mainly for variables on the self-completions when the respondent was not of the given age range, also used for children without legal guardians in the home who could not participate in the nurse schedule.

  • -8 Don't know, Can't say.

  • -9 No answer/ Refused

How the data is read and processed

The data is read by the function [data.table::fread]. The 'root' and 'file' arguments are pasted together to form the file path. The following are converted to NA: c("NA", "", "-1", "-2", "-6", "-7", "-8", "-9", "-90", "-90.0", "-99", "N/A"). All variable names are converted to lower case. The cluster and probabilistic sampling unit have the year appended to them. Some renaming of variables is done for consistency with other years.

Examples


if (FALSE) {

data_2007 <- read_2007("X:/", "ScHARR/PR_Consumption_TA/HSE/HSE 2007/UKDA-6112-tab/tab/hse07ai.tab")

}