|
Data Analysis
Description
This project is about analyzing specific data and answer various questions about it. The data file is a
flat file database with various information about people, and can be seen here.
The program must read this file ONCE - line by line - not storing the lines for future reference. While
reading the file the data must be put into an appropiate data structure of your own devising.
The program must now answer the following questions:
- Is the age and gender distribution "normal" in the database? A yes/no answer is not good enough.
- At what age does the men become fathers first time (max age, min age, avarage age)?
- Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.
- At what age does the women become mothers first time (max age, min age, avarage age)?
- Is the distribution of first-time motherhood age normal? A yes/no answer is not good enough.
- How many men and women do not have children (in percent)?
- What is the average age difference between the parents (with a child in common obviously)?
- How many people in percent has at least one grandparent that is still alive?
A person is living if he/she is in the database.
- For those who have cousins, what is the average number of cousins?
- Is the firstborn likely to be male or female?
- How many men/women (percentage) have children with more than one woman/man?
- To what percentage does parents share family name? Does the woman take the mans family name or vice versa or both?
- Do high people marry (or at least get children together)? To answer that, calulate the percentages of
high/high, high/normal, high/low, normal/normal, normal/low, and low/low couples. Decide your own limits for high, normal and low.
- Do high parents get high children?
- Do fat people marry (or at least get children together)? To answer that, calulate the percentages of
fat/fat, fat/normal, fat/thin, normal/normal, normal/thin, and thin/thin couples. Decide your own limits for fat, normal and thin.
Calculate the BMI, and let that be the fatness indicator.
- Do fat parents get fat children?
- Using the knowledge of blood group type inheritance,
are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children
exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the "true" parents.
- Make a list of fathers who can donate blood to their sons. The list must identify must the father and the son(s) and their blood type.
- Make a list of persons who can donate blood to their grandparents. The list must identify must the person, the grandparent(s) and their blood type.
- Is the distribution of blood types in the database "normal", i.e. does it roughly correspond to the distribution of a etnic group or country?
A yes/no answer is not good enough.
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer
other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are "normal".
The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.
|