失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > 「Kaggle竞赛」TITANIC预测

「Kaggle竞赛」TITANIC预测

时间:2024-06-14 17:58:45

相关推荐

「Kaggle竞赛」TITANIC预测

1、数据介绍

1. PassengerId: 乘客id2. Survived: 是否获救3. Pclass: 舱位等级4. Name: 乘客姓名5. Sex: 乘客姓名6. Age: 乘客年龄7. SibSp: 同乘兄弟姐妹个数8. Parch: 同乘父母孩子个数9. Ticket: 船票标号10. Fare: 船票价格11. Cabin: 12. Embarked: 上船地点

2、数据预处理

import pandas as pdtitanic = pd.read_csv('titanic_train.csv')titanic.head()print( titanic.describe() )1.count: 计数,总共个数2.mean: 均值3.std: 标准差

2.1 缺失值处理

titanic["Age"] = titanic["Age"].fillna( titanic["Age"].median() )# 使用中位数填充缺失值titanic["Age"] = titanic["Age"].fillna( titanic["Age"].mean() )# 使用均数填充缺失值titanic["Age"] = titanic["Age"].fillna( titanic["Age"].mode() )# 使用重数填充缺失值df2["one"].fillna("missing") # 以指定 “内容” 填充df.replace(r"\s*\.\s*", np.nan, regex=True)df.fillna(12345, inplace = True)

2.2 将 str 转化为 int/float

print( titanic["Sex"].unique())# unique() 不重复项# replace alll the iccurences of male/female with the number 0/1titanic.loc[titanic["Sex"] == "male" , "Sex"] = 0titanic.loc[titanic["Sex"] == "female" , "Sex"] = 1titanic["Embarked"].fillna( "S" ,inplace=True)titanic.loc[titanic["Embarked"] == "S" , "Embarked"] = 0titanic.loc[titanic["Embarked"] == "C" , "Embarked"] = 1titanic.loc[titanic["Embarked"] == "Q" , "Embarked"] = 2titanic.replace(r"male" , 0, regex=True)titanic.replace(r"female", 1, regex=True)

2.3 交叉验证

from sklearn.cross_validation import KFoldkf = KFlod( titanic.shape[0], n_folds=3, random_state=1 )predictions = []for train, test in kf:train_predictors = ( titanic[predictors.iloc[train, :]])train_target = titanic["Survived"].iloc[train]alg.fit( train_predictors, train_target )test_predictions = alg.predict( titanic[predictors].iloc[test, :] )predictions.append( test_predictions )

3、特征选择

import numpy as npfrom sklearn.feature_selection import SelectKBest, f_classifimport matplotlib.pyplot as pltselector = SelectKBest( f_classif, k=5 )selector.fit( titanic[predictors], titanic["Survived"] )scores = -np.log10( selector.pvalues )plt.bar(range(len(predictors)) , scores)plt.xticks(range(len(predictors)), predictors, ratation="vertical")plt.show()predictors = ["Pclass", "Sex" , "Fara" , "Title"]alg = RandomForestClassifier(random_state=1, n_estimators=50, min_samples_split=8, min_samples_leaf=4)

如果觉得《「Kaggle竞赛」TITANIC预测》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。