# Data

To start off with, there is some terminology you need to be familiar with.

**population** is the whole set of items that are of interest

**sampling unit** is an individual unit of a population

**sampling frame** is a named or numbered list of the sampling units in the population.

**quantitative variable** is one associated with numerical observations

**qualitative variable** is one that is non-numerical

**continuous variable** can take any value, e.g. decimals

**discrete variable** can only take fixed values, e.g. integers or colours

**census** measures every member of a population

**sample**is a selection of observations from a subset of the population

There are advantages and disadvantages to all forms of statistical investigations:

**census** is entirely accurate (because it measures every sampling unit), but it is time consuming, cannot work with destructive testing (when the sample is destroyed when testing), and produces a vast amount of data to be processed.

**Sample** is less time consuming because less has to be tested and less data is produced, however it may not be as accurate, and the sample may not reflect the population well.

## Sampling

Broadly speaking, there are two types of sampling - **random, and non-random**. These each have their own sub types, too.

### Random Sampling

In random sampling, each member of a population has an equal chance of being chosen. This means the sample should be both representative and unbiased.

**Simple Random Sampling**

For a simple random sample, a sampling frame is created where each member is given a number. Then, a random number generator or a lottery is used to create the sample.

Advantages are that it is free from bias, easy and cheap to use on small populations and samples, and the probability of being selected is known.

Disadvantages are that a sampling frame needs to be constructed, and it is difficult when the population/sample is large.

**Systematic Sampling**

For a systematic sample, the required elements are selected at regular, chosen intervals from an ordered list. For example, if you had a population of 50 and wanted a sample of 10, use a random number generator to pick a number between one and five to find the first person, then chose every fifth after the first.

Advantages include that it is simple and quick and works for large samples and populations