This course is an introduction to large-scale data analytics. It covers cluster computing software tools (e.g. Hadoop MapReduce, Apache Spark), programming techniques used by data scientists and mathematical and statistical models used in learning from big data.