在如今的时代,疯狂上涨的房价让年轻人买房愈发困难,因此租房成为年轻人更优先的选择,这次数据分析整合了大多数年轻人向往的首都北京的租房价格现状,让年轻人在选择时更得心应手。

 

 

1.通过用户数目极多的租房网站(本次选择信息来源网站为:58同城租房https://bj.58.com/zufang/)获取出租房屋的各种信息

 

 

 

 

 

三.数据分析步骤

1.获取出租房屋的数据(来源https://bj.58.com/zufang/

 

  1. #导入必要的库
  2.  
  3. # Data Manipulation
  4. import numpy as np
  5. import pandas as pd
  6. #Visualization
  7. import matplotlib. pyplot as plt
  8. import missingno
  9. import seaborn as sns
  10. from pandas.tools.plotting import scatter_matrix
  11. from mpl_toolkits.mplot3d import Axes3D
  12. #Feature selection and Encoding
  13. from sklearn. feature_selection import RFE,RFECV
  14. from sklearn.svm import SVR
  15. from sklearn.decomposition import PCA
  16. from sklearn. preprocessing import OneHotEncoderLabelEncoderlabel_binarize
  17. #Machine learning
  18. import sklearn. ensemble as ske
  19. from sklearn import datasetsmodel_selectiontreepreprocessingmetrics,linear_model
  20. from sklearn.svm import LinearSVC
  21. from sklearn. ensemble import RandomForestClassifierGradientBoostingClassifier
  22. from sklearn. neighbors import KNeighborsClassifier
  23. from sklearn. naive bayes import GaussianNB
  24. import requests,json,csv
  25. for page in range(54):
  26. url=\'https://gongyu.58.com/guide/api_for_renting?displayLimitNum=15&basequery=room:j|cityId:1|areaId:1|cateId:8&cookie=e87rZl4Z2EYLG6ynBNNEAg==&pageNum={0}&_=1603797279731\'.format(page)
  27. response=requests.get(url)
  28. response.encoding=\'utf-8\'
  29. data=json.loads(response.text)[\'data\']
  30. data=data[\'position1\'][\'list\'][1:]+data[\'position2\'][\'list\'][1:]+data[\'position3\'][\'list\'][1:]+data[\'position4\'][\'list\'][1:]
  31. for i in data:
  32. name=i[\'title\'].replace(\' \',\'\')
  33. layout = i[\'layout\']
  34. area = i[\'rentRoomArea\']
  35. infor=i[\'dispLocal\']
  36. money=i[\'price\']
  37. print(name)
  38. print(layout)
  39. print(area)
  40. print(infor)
  41. print(money)
  42. result = [name, layout, area, infor, money]
  43. with open(\'租房数据.csv\', \'a+\',newline=\'\', encoding=\'gb18030\') as f:
  44. f_csv = csv.writer(f)
  45. f_csv.writerow(result)

北京各区出租房屋的信息已经有了,为了能对北京目前的租房市场有更直观的认识,对数据进行分析并可视化展示

  1. #分析各行政区房源数量及单价
  2. import pandas as pd
  3. beijing_daname=[\'朝阳区\', \'丰台区\', \'海淀区\', \'大兴区\', \'通州区\', \'昌平区\', \'东城区\', \'西城区\', \'顺义区\']
  4. data=pd.read_csv(\'北京租房数据.csv\',encoding=\'gbk\')
  5. areas=list(set(list(data[\'行政区\'])))
  6. area_sums={}
  7. for area in areas:
  8. area_sums[area]=list(data[\'行政区\']).count(area)
  9. from pyecharts import options as opts
  10. from pyecharts.charts import Bar
  11. import random
  12. hotel_num=[area_sums[i] for i in beijing_daname]
  13. bar = (
  14. Bar()
  15. .add_xaxis(beijing_daname)
  16. .add_yaxis("", hotel_num)
  17. .set_global_opts(title_opts=opts.TitleOpts(title="北京各区出租房屋数量"))
  18. .set_series_opts(
  19. label_opts=opts.LabelOpts(is_show=True),
  20. markline_opts=opts.MarkLineOpts(
  21. data=[
  22. opts.MarkLineItem(type_="min", name="最小值"),
  23. opts.MarkLineItem(type_="max", name="最大值"),
  24. opts.MarkLineItem(type_="average", name="平均值"),
  25. ]
  26. ),
  27. )
  28. )
  29. bar.render_notebook()

得到的统计图如下

由此可见,北京市内朝阳区的房源数量最多,有1877套;顺义区的房源数量最少,只有272套;9个行政区,每个区平均房源数为611套。

 

  1. #各行政区出租房屋单价
    unit_price={}
  2. for i in list(data.groupby(\'行政区\')):
  3. if i[0] in beijing_daname:
  4. unit_price[i[0]]=int(i[1][\'价格\'].sum()/i[1][\'面积\'].sum())*50
  5. unit_price
  6. bar = (
  7. Bar()
  8. .add_xaxis(list(unit_price.keys()))
  9. .add_yaxis("", [unit_price[i] for i in list(unit_price.keys())])
  10. .set_global_opts(title_opts=opts.TitleOpts(title="北京市各行政区出租房屋单价(每平米单价*50平米为例)"))
  11. .set_series_opts(
  12. label_opts=opts.LabelOpts(is_show=True),
  13. markline_opts=opts.MarkLineOpts(
  14. data=[
  15. opts.MarkLineItem(type_="min", name="最小值"),
  16. opts.MarkLineItem(type_="max", name="最大值"),
  17. opts.MarkLineItem(type_="average", name="平均值"),
  18. ]
  19. ),
  20. )
  21. )
  22. bar.render_notebook()

得出如下结果

以租房面积为五十平方米为例:

  1. 北京市西城区的住房价格最高,为4350元/间;通州区的租房价格最低,为1620元/间,由此可见不同地域位置的价格差距很大。

  1. #分析各户型占比及价格分布
    #本次分析户型所列举的为[\'1室1厅1卫\', \'2室1厅1卫\', \'1室0厅1卫\', \'3室2厅2卫\', \'2室2厅1卫\', \'3室1厅1卫\', \'2室2厅2卫\', \'2室1厅2卫\', \'3室1厅2卫\', \'1室2厅2卫\']十种户型
    layouts=list(set(data[\'户型\']))
  2. layout=data.loc[:,\'户型\'].value_counts()
  3. from pyecharts import options as opts
  4. from pyecharts.charts import Pie
  5. print(list(layout.index)[:10])
  6. values=[int(i) for i in list(layout.values)[:10]]
  7. pie = (
  8. Pie()
  9. .add(
  10. "",
  11. [(i,j)for i,j in zip(list(layout.index)[:10],values)],
  12. radius=["30%", "75%"],
  13. center=["40%", "50%"],
  14. rosetype="radius",
  15. label_opts=opts.LabelOpts(is_show=False),
  16. )
  17. .set_global_opts(
  18. title_opts=opts.TitleOpts(title="北京市各区出租房屋户型占比"),
  19. legend_opts=opts.LegendOpts(type_="scroll", pos_left="85%", orient="vertical"),)
  20. .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c},{d}%"))
  21. )
  22. pie.render_notebook()

分析结果如下



  1. 本次统计数据分析可以看出,出租房屋的主流是111卫,占比41.86%;,其次分别是是211卫、101卫,各占比30.58%和11.02%,因此可以看出,小户型在出租房屋中更占优势、更受租客的欢迎。

  1. #分析北京市各区出租房屋户型占比以及价格
    cut_n=list(range(0,12000,1000))
  2. income=pd.cut(data["价格"],cut_n)
  3. price_cut=data[\'价格\'].groupby(income).count()
  4. index=list(price_cut.index)
  5. index=[str(i) for i in list(price_cut.index)]
  6. values=[int(i) for i in list(price_cut.values)]
  7. pie = (
  8. Pie()
  9. .add(
  10. "",
  11. [(i,j)for i,j in zip(index,values)],
  12. radius=["30%", "75%"],
  13. center=["40%", "50%"],
  14. rosetype="radius",
  15. label_opts=opts.LabelOpts(is_show=False),
  16. )
  17. .set_global_opts(
  18. title_opts=opts.TitleOpts(title="北京市各区出租房屋户型占比"),
  19. legend_opts=opts.LegendOpts(type_="scroll", pos_left="85%", orient="vertical"),)
  20. .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}, {d}%"))
  21. )
  22. pie.render_notebook()

 

目前,大部分北京租客的租房价格在3000至6000元,占比50%左右,最便宜有1000元以下的,但位置相对较偏,且面积在20平以内;贵的有1万多的,这种一般面积在100平以上,位置在主城区,但选择此类房屋的租客极少,更多人还是选择了性价比更高的小户型。

  1.  
  1. #开始前导入必要的库
  2. # Data Manipulation
  3. import numpy as np
  4. import pandas as pd

    #Visualization
  5. import matplotlib. pyplot as plt
  6. import missingno
  7. import seaborn as sns
  8. from pandas.tools.plotting import scatter_matrix
  9. from mpl_toolkits.mplot3d import Axes3D



  10. #Feature selection and Encoding
  11. from sklearn. feature_selection import RFE,RFECV
  12. from sklearn.svm import SVR
  13. from sklearn.decomposition import PCA
  14. from sklearn. preprocessing import OneHotEncoderLabelEncoderlabel_binarize


  15. #Machine learning
  16.  
  17. import sklearn. ensemble as ske
  18. from sklearn import datasetsmodel_selectiontreepreprocessingmetrics,linear_model
  19. from sklearn.svm import LinearSVC
  20. from sklearn. ensemble import RandomForestClassifierGradientBoostingClassifier
  21. from sklearn. neighbors import KNeighborsClassifier
  22. from sklearn. naive bayes import GaussianNB
  23. import requests,json,csv
  24. #分析网页,获取原始数据
  25. #通过抓包,获取数据接口,通过pageNum参数控制页码,共有54页数据,返回json格式数据
  26.  
  27. for page in range(54):
  28. url=\'https://gongyu.58.com/guide/api_for_renting?displayLimitNum=15&basequery=room:j|cityId:1|areaId:1|cateId:8&cookie=e87rZl4Z2EYLG6ynBNNEAg==&pageNum={0}&_=1603797279731\'.format(page)
  29. response=requests.get(url)
  30. response.encoding=\'utf-8\'
  31. data=json.loads(response.text)[\'data\']
  32. data=data[\'position1\'][\'list\'][1:]+data[\'position2\'][\'list\'][1:]+data[\'position3\'][\'list\'][1:]+data[\'position4\'][\'list\'][1:]
  33. for i in data:
  34. name=i[\'title\'].replace(\' \',\'\')
  35. layout = i[\'layout\']
  36. area = i[\'rentRoomArea\']
  37. infor=i[\'dispLocal\']
  38. money=i[\'price\']
  39. print(name)
  40. print(layout)
  41. print(area)
  42. print(infor)
  43. print(money)
  44. result = [name, layout, area, infor, money]
  45. #存入csv表格中准备后续使用
  46. with open(\'租房数据.csv\', \'a+\',newline=\'\', encoding=\'gb18030\') as f:
  47. f_csv = csv.writer(f)
  48. f_csv.writerow(result)
  49. #分析各行政区房源数量及单价
  50.  
  51. import pandas as pd
  52. beijing_daname=[\'朝阳区\', \'丰台区\', \'海淀区\', \'大兴区\', \'通州区\', \'昌平区\', \'东城区\', \'西城区\', \'顺义区\']
  53. data=pd.read_csv(\'北京租房数据.csv\',encoding=\'gbk\')
  54. areas=list(set(list(data[\'行政区\'])))
  55. area_sums={}
    for area in areas:
  56. area_sums[area]=list(data[\'行政区\']).count(area)
    from pyecharts import options as opts
  57. from pyecharts.charts import Bar
  58. import random
  59. hotel_num=[area_sums[i] for i in beijing_daname]
  60. bar = (
  61. Bar()
  62. .add_xaxis(beijing_daname)
  63. .add_yaxis("", hotel_num)
  64. .set_global_opts(title_opts=opts.TitleOpts(title="北京各区出租房屋数量"))
  65. .set_series_opts(
  66. label_opts=opts.LabelOpts(is_show=True),
  67. markline_opts=opts.MarkLineOpts(
  68. data=[
  69. opts.MarkLineItem(type_="min", name="最小值"),
  70. opts.MarkLineItem(type_="max", name="最大值"),
  71. opts.MarkLineItem(type_="average", name="平均值"),
  72. ]
  73. ),
  74. )
  75. )
  76. bar.render_notebook()
  77. #各行政区出租房屋单价
  78. unit_price={}
    for i in list(data.groupby(\'行政区\')):
  79. if i[0] in beijing_daname:
  80. unit_price[i[0]]=int(i[1][\'价格\'].sum()/i[1][\'面积\'].sum())*50
  81. unit_price
  82. bar = (
  83. Bar()
  84. .add_xaxis(list(unit_price.keys()))
  85. .add_yaxis("", [unit_price[i] for i in list(unit_price.keys())])
  86. .set_global_opts(title_opts=opts.TitleOpts(title="北京各区出租房屋均价(每平米单价*50平米为例)"))
  87. .set_series_opts(
  88. label_opts=opts.LabelOpts(is_show=True),
  89. markline_opts=opts.MarkLineOpts(
  90. data=[
  91. opts.MarkLineItem(type_="min", name="最小值"),
  92. opts.MarkLineItem(type_="max", name="最大值"),
  93. opts.MarkLineItem(type_="average", name="平均值"),
  94. ]
  95. ),
  96. )
  97. )
  98. bar.render_notebook()
  99. #分析各户型占比及价格分布
  100.  
  101. #本次分析户型所列举的为:
  102. #[\'1室1厅1卫\', \'2室1厅1卫\', \'1室0厅1卫\', \'3室2厅2卫\', \'2室2厅1卫\', \'3室1厅1卫\', \'2室2厅2卫\', \'2室1厅2卫\', \'3室1厅2卫\', \'1室2厅2卫\']十种户型
  103. layouts=list(set(data[\'户型\']))
  104. layout=data.loc[:,\'户型\'].value_counts()
    from pyecharts import options as opts
  105. from pyecharts.charts import Pie
  106. print(list(layout.index)[:10])
  107. values=[int(i) for i in list(layout.values)[:10]]
  108. pie = (
  109. Pie()
  110. .add(
  111. "",
  112. [(i,j)for i,j in zip(list(layout.index)[:10],values)],
  113. radius=["30%", "75%"],
  114. center=["40%", "50%"],
  115. rosetype="radius",
  116. label_opts=opts.LabelOpts(is_show=False),
  117. )
  118. .set_global_opts(
  119. title_opts=opts.TitleOpts(title="北京市各区出租房屋户型占比"),
  120. legend_opts=opts.LegendOpts(type_="scroll", pos_left="85%", orient="vertical"),)
  121. .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c},{d}%"))
  122. )
  123. pie.render_notebook()
  124. #分析北京市各区出租房屋户型占比以及价格
  125. cut_n=list(range(0,12000,1000))
  126. income=pd.cut(data["价格"],cut_n)
  127. price_cut=data[\'价格\'].groupby(income).count()
  128. index=list(price_cut.index)
  129. index=[str(i) for i in list(price_cut.index)]
  130. values=[int(i) for i in list(price_cut.values)]
  131. pie = (
  132. Pie()
  133. .add(
  134. "",
  135. [(i,j)for i,j in zip(index,values)],
  136. radius=["30%", "75%"],
  137. center=["40%", "50%"],
  138. rosetype="radius",
  139. label_opts=opts.LabelOpts(is_show=False),
  140. )
  141. .set_global_opts(
  142. title_opts=opts.TitleOpts(title="北京市各区出租房屋户型占比"),
  143. legend_opts=opts.LegendOpts(type_="scroll", pos_left="85%", orient="vertical"),)
  144. .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}, {d}%"))
  145. )
  146. pie.render_notebook()

出,北京市不仅有着高昂的房价,租房价格也非常高,当代年轻人肩上所担负的压力非常大。

数据的采集与分析可以让人对繁杂的数据有非常直观的感受,但是本次采集的数据还只是一小部分,总的来说信息数据采集范围不够广泛,样本不够全面,我还会继续加深学习。




 

版权声明:本文为takanoPO原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/takanoPO/p/14909404.html