The directory contains the following files
The census-income dataset contains census information for 48,842 people. It has 14 attributes for each person (age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, and native-country) and a boolean attribute class classifying the input of the person as belonging to one of the two categories >50K, <=50K.
Out of those 48842 instances (or records), there are 45222 instances without missing values (meaning that a value has been provided for each of the attributes listed above for all such instances).
The database has been split into two parts at random: 2/3 of the records for training (contained in the file census-income.data), and 1/3 of the records for testing (contained in the file census-income.test).
You should pre-process the attributes so that you maximize the accuracy of your decision tree. Pre-processing alternatives include: disregarding an attribute that doesn't seem to have any predictive capability; and "discretizing" continuous values, that is dividing continuous attributes (e.g. age) into a few intervals (e.g. age 10-20, age 21-30, age 31-40, ...).