Key info |
Offered by | École Polytechnique Fédérale de Lausanne |
Description | In this course, you'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.
WHAT YOU WILL LEARN:
- Read data from persistent storage and load it into Apache Spark.
- Manipulate data with Spark and Scala.
- Express algorithms for data analysis in a functional style.
- Recognize how to avoid shuffles and recomputation in Spark.
SKILLS YOU WILL GAIN
- Scala Programming
- Big Data
- Apache Spark
- SQL
|
Accredited by | Coursera |
URL |
https://www.coursera.org/learn/scala-spark-big-data?specialization=scala
|