With every smartphone and computer now boasting multiple processors, the use of functional ideas to facilitate parallel programming is becoming increasingly widespread. In this course, you’ll learn the fundamentals of parallel programming, from task parallelism to data parallelism. In particular, you’ll see how many familiar ideas from functional programming map perfectly to to the data parallel paradigm. We’ll start the nuts and bolts how to effectively parallelize familiar collections operations, and we’ll build up to parallel collections, a production-ready data parallel collections library available in the Scala standard library. Throughout, we’ll apply these concepts through several hands-on examples that analyze real-world data, such as popular algorithms like k-means clustering.
Learning Outcomes. By the end of this course you will be able to:
– reason about task and data parallel programs,
– express common algorithms in a functional style and solve them in parallel,
– competently microbenchmark parallel code,
– write programs that effectively use parallel collections to achieve performance
We motivate parallel programming and introduce the basic constructs for building parallel programs on JVM and Scala. Examples such as array norm and Monte Carlo computations illustrate these concepts. We show how to estimate work and depth of parallel programs as well as how to benchmark the implementations.
Basic Task Parallel Algorithms
We continue with examples of parallel algorithms by presenting a parallel merge sort. We then explain how operations such as map, reduce, and scan can be computed in parallel. We present associativity as the key condition enabling parallel implementation of reduce and scan.
We show how data parallel operations enable the development of elegant data-parallel code in Scala. We give an overview of the parallel collections hierarchy, including the traits of splitters and combiners that complement iterators and builders from the sequential case.
Data Structures for Parallel Computing
We give a glimpse of the internals of data structures for parallel computing, which helps us understand what is happening under the hood of parallel collections.