The enormous size of today’s data sets and the specific requirements of modern applications, necessitated the growth of a new generation of data management systems, where the emphasis is put on distributed and fault-tolerant processing. New programming paradigms have evolved, novel systems and tools have been developed and an abundance of startups offering data management and analysis solutions appeared. Part of this course will cover MapReduce and NoSQL systems. Topics include: MapReduce programming, Hadoop, Pig and Hive, developing applications in Amazon’s EC2 environment, key-value stores such as Redis, document stores such as MongoDB and graph databases such as Neo4j. In addition, engineering software that can efficiently handle large data sets requires specialized skills and familiarity with sophisticated tools. Part of the course will cover an overview of general purpose tools and describe how cloud infrastructures can be configured and used for large data processing. Then a systematic method for locating and addressing performance issues will be presented. For the cases where specialized processing is required, we will examine low-level techniques, like memory mapping and copy-on-write. Finally, we will see how visualization of big data can be performed and automated.