spark
3fcbdb8f - [MINOR][DOC] Fix python variance() documentation

Comment changes are shownComment changes are hidden
Commit
5 years ago
[MINOR][DOC] Fix python variance() documentation ## What changes were proposed in this pull request? The Python documentation incorrectly says that `variance()` acts as `var_pop` whereas it acts like `var_samp` here: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.variance It was not the case in Spark 1.6 doc but it is in Spark 2.0 doc: https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/functions.html https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/functions.html The Scala documentation is correct: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#variance-org.apache.spark.sql.Column- The alias is set on this line: https://github.com/apache/spark/blob/v2.4.3/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L786 ## How was this patch tested? Using variance() in pyspark 2.4.3 returns: ``` >>> spark.createDataFrame([(1, ), (2, ), (3, )], "a: int").select(variance("a")).show() +-----------+ |var_samp(a)| +-----------+ | 1.0| +-----------+ ``` Closes #24895 from tools4origins/patch-1. Authored-by: tools4origins <tools4origins@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 25c5d5788311ba74f4aac2f97054b4e9ef7e295c) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Author
Committer
Parents
  • python/pyspark/sql
    • File
      functions.py