Wald's
Rules about
Coding and Data Management
- Code everything. It's easier than
adding information later.
- Computers
are smarter than you. Let them do all calculations. If you want age but
have
asked for year of birth, it's easy to have the computer subtract year
of birth
from the current year.
- Code all the information you get
from a respondent. For example, if you give respondents a laundry list
of 10‑15
items and ask them to check each that applies, convert each response to
a
separate variable and code it as checked or not. Never condense by
coding only
the first three or whatever.
- Use different numeric codes for
different forms of missing data ‑ a separate code for not applicable,
refused, don't
know, etc. If you don't, you'll regret it later on.
- Run complete frequencies when
you
first enter/access the data and check for wild codes.
- Whenever you compute a new
variable, run frequencies to make sure it makes sense.
- Be cautious about using zero as a
value code. Use it when you have a scale or index with a real zero
point or
when you have a dummy variable. Otherwise, try to avoid it as a
non‑missing
value.
- Label every printout. By the time
you're
done with a project, you won't remember what each job did.
- Whenever you change the file‑‑new
missing values, new variables, recoded variables‑‑save it under a new
name. I
find it useful to increment files by number.
- By the same logic, it's a good idea
to increment any "command files" you use.