pyspark rdd method

Solutions on MaxInterview for pyspark rdd method by the best coders in the world

showing results for - "pyspark rdd method"
Josue
15 Aug 2018
1>>> df.repartition(10).rdd.getNumPartitions()
210
3>>> data = df.union(df).repartition("age")
4>>> data.show()
5+---+-----+
6|age| name|
7+---+-----+
8|  5|  Bob|
9|  5|  Bob|
10|  2|Alice|
11|  2|Alice|
12+---+-----+
13>>> data = data.repartition(7, "age")
14>>> data.show()
15+---+-----+
16|age| name|
17+---+-----+
18|  2|Alice|
19|  5|  Bob|
20|  2|Alice|
21|  5|  Bob|
22+---+-----+
23>>> data.rdd.getNumPartitions()
247
25>>> data = data.repartition("name", "age")
26>>> data.show()
27+---+-----+
28|age| name|
29+---+-----+
30|  5|  Bob|
31|  5|  Bob|
32|  2|Alice|
33|  2|Alice|
34+---+-----+
35