Dating Patterns Among Young Adults Who Would Get Married — Graphs

Collins Agubuike
3 min readDec 15, 2020
# making data frame
data = pandas.read_csv(‘addhealth_pds.csv’, low_memory=False)
#upper-case all DataFrame column names — place afer code for loading data above
data.columns = map(str.upper, data.columns)
# bug fix for display formats to avoid run time errors — put after code for loading data above
pandas.set_option(‘display.float_format’, lambda x:’%f’%x)
#setting variables you will be working with to numeric
data[‘H1ID1Q’] = pandas.to_numeric(data[‘H1ID1Q’])
data[‘H1ID1A’] = pandas.to_numeric(data[‘H1ID1A’])
data[‘H1RI21F1’] = pandas.to_numeric(data[‘H1RI21F1’], errors=’coerce’)
data[‘H1RI2_1’] = pandas.to_numeric(data[‘H1RI2_1’], errors=’coerce’)
data[‘AGE’] = pandas.to_numeric(data[‘AGE’], errors=’coerce’)
#subset data to young adults age 18 to 22 who would get married
subset = data[(data[‘AGE’]>=18) & (data[‘AGE’]<=22) & (data[‘H1ID1Q’]==1)]
#make a copy of my new subsetted data
would_marry = subset.copy()
#SETTING MISSING DATA
# recode missing values to python missing (NaN)
recode1 = {88: 88, 89: 89, 90: 90, 92: 92, 93: 93, 94: 94, 95: 95, 96: 99, 97: 99, 98: 99}
would_marry[‘H1RI2_1’] = would_marry[‘H1RI2_1’].map(recode1)
would_marry[‘Relationship Year’] = would_marry[‘H1RI2_1’].replace(99, numpy.nan)recode2 = {1: 1, 2: 0}
would_marry[‘Group Date’] = would_marry[‘H1ID1A’].map(recode2)
recode3 = {0: 0, 1: 1, 6: 8, 7: 8, 8: 8}
would_marry[‘H1RI21F1’] = would_marry[‘H1RI21F1’].map(recode3)
would_marry[‘Hold Hands’] = would_marry[‘H1RI21F1’].replace(8, numpy.nan)#univariate bar graph for categorical variables
# First hange format from numeric to categorical
would_marry[‘Group Date’] = would_marry[‘Group Date’].astype(‘category’)
seaborn.countplot(x=”Group Date”, data=would_marry)
plt.xlabel(‘Group Dating’)
plt.title(‘Dating Preference’)
#Univariate histogram for quantitative variable:
seaborn.displot(data = would_marry, x = “Relationship Year”)
plt.xlabel(‘Frequency’)
plt.title(‘Number of Relationships that began each year’)
#basic scatterplot: Q->Q
scat1 = seaborn.regplot(x=”Relationship Year”, y=”AGE”, fit_reg=False, data=would_marry)
plt.xlabel(‘Relationship Year’)
plt.ylabel(‘Age’)
plt.title(‘Scatterplot for the Association Between Relationship Year and Age’)
# second create a new variable (PACKCAT) that has the new variable value labels
would_marry[‘Group Date’] = would_marry[‘Group Date’].cat.rename_categories([“Single Dating”, “Group Dating”])
# bivariate bar graph C->C
seaborn.catplot(x=’Group Date’, y=’Hold Hands’, data=would_marry, kind=”bar”, ci=None)
plt.xlabel(‘Dating Pattern’)
plt.ylabel(‘Proportion Hold Hands’)

The univariate graph of dating preference:

This graph is unimodal, with its highest peak at the second category. It seems to be skewed to the left as there are higher frequencies in the group dating.

The univariate graph of year relationship started:

This graph is unimodal, with its highest peak between 1994 and 1995. It seems to be skewed to the left as there are higher frequencies in the later years.

The graph above plots the current age of the young adults who would get married to the corresponding year their relationship began. We can see that the scatter graph shows a positive relationship/trend between the two variables.

This graph is unimodal, with its highest peak at the second category. It seems to be skewed to the left as there are higher frequencies in the group dating.

--

--