找出共同好友 - 数据挖掘 - Scala版
找出共同好友 – 数据挖掘 – Scala版
大家好,关于“找出共同好友”的算法,网上有不少语言的实现,今天有空之余,自己研究了下Scala算法的写法
完整代码可以参考Git地址:https://github.com/benben7466/SparkDemo/blob/master/spark-test/src/main/scala/testCommendFriend.scala
录入的数据:
A:B,C,D,F,E,O
B:A,C,E,K
C:F,A,D,I
D:A,E,F,L
E:B,C,D,M,L
F:A,B,C,D,E,O,M
G:A,C,D,E,F
H:A,C,D,E,O
I:A,O
J:B,O
K:A,C,D
L:D,E,F
M:E,F,G
O:A,H,I,J
核心算法:
1 package chunbo.recommend 2 3 import org.apache.spark.SparkContext 4 5 //共同好友统计问题 6 //参考:http://www.cnblogs.com/charlesblc/p/6126346.html 7 object testCommendFriend { 8 def index(_spark_sc: SparkContext): Unit = { 9 10 //获取数据 11 val friendRDD = _spark_sc.textFile(Config.HDFS_HOSH + "test/common_friend") 12 13 //map 14 val friendKV = friendRDD.map(x => { 15 val fields = x.split(":") 16 val person = fields(0) 17 val friends = fields(1).split(",").toList 18 (person, friends) 19 }) 20 21 val mapRDD = friendKV.flatMap(x => { 22 for (i <- 0 until x._2.length) yield (x._2(i), x._1) 23 }) 24 25 //reduce 26 val reduceRDD = mapRDD.reduceByKey(_ + "::" + _) 27 28 //打印 29 reduceRDD.foreach(println) 30 31 } 32 33 }
参考:http://www.cnblogs.com/charlesblc/p/6126346.html