热门文档
- 2023-10-13 17:09:40 楚辞 中华传统诗词经典 可复制-李山-9787101097528
- 2023-04-19 15:04:26 《千家妙方 上册 1982年》(李文亮等编)
- 2022-09-18 20:22:00 中国人为什么看不起中国人 张鸣
- 2022-09-18 20:22:00 中国兵法之起源 编委会
- 2023-10-13 17:09:40 超速学习:我这样做,一个月学会素描,一年学会四种语言,完成MIT四年课程-斯科特·扬著,林慈敏译-9789861755533
- 2022-11-24 16:17:09 《性体验与性爱心理》尤里-谢尔巴特赫
- 2022-11-24 16:17:09 《性爱健康指南》石四维编著
- 2023-10-13 17:09:40 初中诗词格律读本 可复制-周建忠 徐乃为 王业强编著-9787101121704
- 2023-10-13 17:09:40 曾国藩家书 中华经典藏书(升级版)可复制-檀作文译注-9787101115482
- 2023-10-13 17:09:40 菜根谭 中华经典藏书(升级版)可复制-杨春俏译注-9787101115611
- 2022-09-18 19:48:57 心理催眠术 迈克尔.赫普,温迪.德雷顿著
- 2023-10-13 17:09:40 崇祯帝大传-晁中辰著 可复制-9787101114935
1、本文档共计 513 页,下载后文档不带www.pdfdz.com水印,支持完整阅读内容。
2、古籍基本都为PDF扫描版,所以文档不支持编辑功能,即不支持文档内文字的复制粘贴。
3、当您付费下载文档后,您只拥有了使用权限,并不意味着购买了版权,文档只能用于自身使用,不得用于其他商业用途(如 [转卖]进行直接盈利或[编辑后售卖]进行间接盈利)。
4、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。
5、如文档内容存在违规,或者侵犯商业秘密、侵犯著作权等,请点击“违规举报”。
2、古籍基本都为PDF扫描版,所以文档不支持编辑功能,即不支持文档内文字的复制粘贴。
3、当您付费下载文档后,您只拥有了使用权限,并不意味着购买了版权,文档只能用于自身使用,不得用于其他商业用途(如 [转卖]进行直接盈利或[编辑后售卖]进行间接盈利)。
4、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。
5、如文档内容存在违规,或者侵犯商业秘密、侵犯著作权等,请点击“违规举报”。
MiningofMassiveDatasetsJure LeskovecStanford Univ.Anand RajaranMilliway LabsJeffrey D.UllnStanford Univ.Copyright C 2010,2011,2012,2013,2014 Anand Rajaran,Jure Leskovec,and Jeffrey D.UllnPrefaceThis book evolved from terial developed over several years by Anand Raja-ran and Jeff Ulln for a one-quarter course at Stanford.The courseCS345A,titled "Web Mining,"was designed as an advanced graduate course,although it has become accessible and interesting to advanced undergraduates.When Jure Leskovec joined the Stanford faculty,we reorganized the terialconsiderably.He introduced a new course CS224W on network ysis andadded terial to CS345A,which was renumbered CS246.The three authorsalso introduced a large-scale data-mining project course,CS341.The book nowcontains terial taught in all three courses.What the Book Is AboutAt the highest level of description,this book is about data mining.However,it focuses on data mining of very large amounts of data,that is,data so largeit does not fit in in memory.Because of the emphasis on size,ny of ourexamples are about the Web or data derived from the Web.Further.the booktakes an algorithmic point of view:data mining is about applying algorithmsto data,rather than using data to "train"a chine-learning engine of somesort.The principal topics covered are:1.Distributed file systems and p-reduce as a tool for creating parallelalgorithms that succeed on very large amounts of data.2.Similarity search,including the key techniques of minhashing and locality-sensitive hashing.3.Data-stream processing and specialized algorithms for dealing with datathat arrives so fast it must be processed immediately or lost.4.The technology of search engines,including Google's PageRank,link-spamdetection,and the hubs-and-authorities approach.5.Frequent-itemset mining,including association rules,rket-baskets,theA-Priori Algorithm and its improvements.6.Algorithms for clustering very large,high-dimensional datasets.