Pyspark- Take total sum of a column and use the value to divide another column
0
I have a dataframe df >>> df = spark.createDataFrame([[1,0], [2,1], [3,1], [4,0], [5,1]], ['a', 'b']) >>> df.show() +---+---+ | a| b| +---+---+ | 1| 0| | 2| 1| | 3| 1| | 4| 0| | 5| 1| +---+---+ and >>> nrows = df.count() Using df , I created a new dataframe a that is an aggregate of df . >>> a = df.groupby('b').count() >>> a.show() +---+-----+ | b|count| +---+-----+ | 0| 2| | 1| 3| +---+-----+ I need to create a new column in a called ev . The value of ev on the i th row is given by This is the output I'm expecting +---+-----+------------------+ | b|count| ev_norm| +---+-----+------------------+ | 0| 2| 1.25| | 1| 3|0.8333333333333334| +---+---...