Wald's Rules about Coding and Data Management

 

  1. Code everything. It's easier than adding information later.

  2. Computers are smarter than you. Let them do all calculations. If you want age but have asked for year of birth, it's easy to have the computer subtract year of birth    from the current year.

  3. Code all the information you get from a respondent. For example, if you give respondents a laundry list of 10‑15 items and ask them to check each that applies, convert each response to a separate variable and code it as checked or not. Never condense by coding only the first three or whatever.

  4.  Use different numeric codes for different forms of missing data ‑ a separate code for not applicable, refused, don't know, etc. If you don't, you'll regret it later on.

  5.  Run complete frequencies when you first enter/access the data and check for wild codes.

  6. Whenever you compute a new variable, run frequencies to make sure it makes sense.

  7. Be cautious about using zero as a value code. Use it when you have a scale or index with a real zero point or when you have a dummy variable. Otherwise, try to avoid it as a non‑missing value.

  8. Label every printout. By the time you're done with a project, you won't remember what each job did.

  9. Whenever you change the file‑‑new missing values, new variables, recoded variables‑‑save it under a new name. I find it useful to increment files by number.

  10. By the same logic, it's a good idea to increment any "command files" you use.