What Is Resilient Distributed Dataset. Rdd (resilient distributed dataset) is the fundamental data structure of apache spark which are an immutable collection of objects which computes. Rdd (resilient distributed dataset) is a core building block of pyspark.
Resilient distributed datasets (rdd) is a fundamental data structure of spark. Resilient distributed datasets (rdds) •restricted form of distributed shared memory •immutable, partitioned collections of records •can only be built through coarse.