ALGORITHMS FOR BIG DATA
Programma Provvisorio
PART A: Probability and Computing: Randomized Algorithms for Big Data
oBasic Notions of Probability (Chapt 1 of [2])
oVerifying Polynomial Identities (Chapt 1.1 of [2],
oVerifying Matrix Multiplications (Chapt. 1.3 of [2],
oContention Resolutions (Chapt 13 of [3])
oRandomized Quick-Sort (Chapt. 2.5 of [2])
oRandomized Algorithms for Computing the Median ( Chapt 3.4 of [2])
oRandomized Hash Functions, and Hash Tables*, (Chapt 13.6 of [3] vedi anche Ssection 2.3 in [4]*)
PART B: Data Mining
•Data Mining: A Short Overview (Chapt 1 of [1]):
oModeling from a Data Set
oStatistical Modeling
oMachine Learning
oComputational Approach
oTypical Problems in Data Mining: Examples
oBasic Data Structures: Hash Functions and Indexes (see Part A)*
•Finding Similar Items (Chapt 3 of [1] + )
oApplications of Set Similarity
oShingling of Documents
oSimilarity-preserving Summaries of Sets
oLocality Sensing Hashing for Documents
•Mining Data Streams (Chapt 4 of [1])
oThe Stream Data Model
oPattern Matching: Karp-Rabin’s Algorithm (Ssection 4.1 in [4])
oSampling Data in a Stream
oFiltering Stream
oCounting Distinct Elements in a Stream
oEstimating Moments of a Stream
•Random Walks & Page Rank (forse...)
oRandom Walks (Section 4 of [5])
oPage Rank Algorithms for the WEB (Sec 5 of [1]]
Books:
[1] Leskovec, Rajaraman, and Ullamann: Mining of Massive Data Sets (+ slides in preparazione)
[2] Mitzenmacher and Upfal: Probability and Computing (Cambridge)
[3] Kleinberg and Tardos: Algorithm Design
[4] Nicola Prezza: Algorithms for Massive Data (Notes)
[5] Foundations of Data Science, Avrim Blum, John Hopcroft, Ravindran Kannan
|