网站用户行为分析

  1. 步骤

1.1 本地数据集上传到数据仓库Hive

  1. 数据集下载与查看
  2. 数据集预处理
  3. 把数据集导入HDFS中
  4. 在Hive上创建数据库

1.2 Hive数据分析

  1. 给出数据分析需求
  2. 用select语句实现数据分析
  3. 数据分析结果查看与保存 

1.3 Hive、MySQL、HBase数据互导

 

  1. 操作过程

2.1 数据准备

第一步,通过samba服务共享文件

第二步,数据下载预处理

 

 

 第三步,将数据上传至hdfs

 

第四步,在hive上创建数据库和表

 

 

第五步,查看创建表的数据类型与信息

 

2.2 Hive数据分析

(1)用户行为分析需求:2014-12-11~12号有多少条购买商品的记录

    分析步骤

  • 语句:select count(*) from bigdata_user where visit_date >\’2014-12-10\’ and visit_date <\’2014-12-13\’ and behavior_type=\’4\’ limit 10;
  • 结果截图:运行或存为表格后的查询显示
  • (2)用户行为分析需求:分析每月1-31号购买情况
    • 语句:select day(visit_date) from bigdata_user limit 10;
    • 行为日期

 

 

 

  • 购买行为的记录数、不同用户数
  • 语句:select count(distinct uid) from bigdata_user where behavior_type=\’4’;

 

 

 

  • 按日期统计记录数、用户数
  • 语句:select count(distinct uid),day(visit_date) from bigdata_user where behavior_type=\’4\’ group by day(visit_date) limit 10;

 

 语句:select count(*),day(visit_date) from bigdata_user where behavior_type=\’4\’ group by day(visit_date) limit 10;

 

    • 保存为表格
    • 语句:create table day_count as select count(*),day(visit_date) from bigdata_user where behavior_type=\’4\’ group by day(visit_date);

 

 

 语句:create table day_uid as select count(distinct uid),day(visit_date) from bigdata_user where behavior_type=\’4\’ group by day(visit_date);

 

  • 12号+购买行为
  • 语句:select * from bigdata_user where behavior_type=\’4\’and visit_date=\’2014-12-12\’ limit 10;
  •  

     

 

  • 按用户编号分组
  • 语句:select uid from bigdata_user where behavior_type=\’4\’and visit_date=\’2014-12-12\’ group by uid limit 10;

 

 

  • 按用户分组统计
  • 语句:select uid,count(*) from bigdata_user where behavior_type=\’4\’and visit_date=\’2014-12-12\’ group by uid limit 10;
    • 12号,购买,4项以上
    • 语句:select uid,count(*) from bigdata_user where behavior_type=\’4\’and visit_date=\’2014-12-12\’ group by uid having count(*)>4 limit 10;

 

  • 语句:select uid,count(*) from bigdata_user where behavior_type=\’4\’and visit_date=\’2014-12-12\’ group by uid having count(behavior_type=\’4\’)>4 limit 10;

 

 

  • 2014-12-12号当天广东购买商品数
  • 语句:select count(*)from bigdata_user where visit_date=\’2014-12-12\’ and province=\’广东\’;

 

  • 按省份统计购买数量

  • 语句:select count(*)from bigdata_user group by province;

 

 

  • 2014-12-12号当天的商品购买与浏览比例
  • 语句:select c.*,c.c4/c.c1 c41 from (select uid,count(*)countall, sum(case when  behavior_type=\’4\’ then 1 else 0 end)c4, sum(case when behavior_type=\’1\’ then 1 else 0 end)c1 from bigdata_user where visit_date=\’2014-12-12\’ group by uid)c order by c41 desc limit 10;

 

 

  • 用户10001082在2014-12-12号当天活跃度:该用户点击行为占该天所有点击行为的比例

  • 语句:

    select \’10001082\’ uid,a.users,b.user_number,b.user_number/a.users rate from ( select count(*) users from bigdata_user where visit_date=\’2014-12-12\’ and behavior_type=\’1\’ ) a ,( select count(*) user_number  from bigdata_user where visit_date=\’2014-12-12\’ and behavior_type=\’1\’ and uid=10001082  ) b;

 

 

  • 2014-12-12号当天购买4件商品以上的用户
  • 语句:select uid,count(*)from bigdata_user where behavior_type=\’4\’ and visit_date=\’2014-12-12\’ group by uid having count(*)>4;

 

 

(3)自定义需求:

12月10号买了超过四种商品的用户id

语句:select uid from bigdata_user where behavior_type=\’4\’ and visit_date=\’2014-12-10\’ group by uid having count(behavior_type=\’4\’)>5;

 

 

 

10号,购买,3项以上

 

 语句:select uid,count(*) from bigdata_user where behavior_type=\’4\’and visit_date=\’2014-12-10\’ group by uid having count(*)>3 limit 10;

 

 

通国际当天购买商品钟类为3的天数,并保存到表中

语句:create table day_count_3 as select count(*),day(visit_date) from bigdata_user where behavior_type=\’3\’ group by day(visit_date);

 

 

 

版权声明:本文为DDCWJ原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/DDCWJ/p/14249115.html