1

I have parquet file compressed by zstd. It is possible to decompress it somehow? I tried to use zstd command, but without any luck:

[x@xyz tmp]# zstd -d part-00016-303a375a-e443-4f86-a59e-b5d82d15bd26.c000.zstd.parquet -o test.parquet
zstd: part-00016-303a375a-e443-4f86-a59e-b5d82d15bd26.c000.zstd.parquet: unsupported format
Jacfal
  • 21
  • 3
  • Where exactly did you get this file from? If the file format is anything like what is described at https://parquet.apache.org/documentation/latest/ the outer file format is some kind of parquet archive, and the inner fields are compressed: "Parquet allows compression schemes to be specified on a per-column level" – John Mahowald Sep 22 '20 at 14:14
  • It is parquet from Spark job, but you are right. I thought that whole parquet is compressed by zstd, but it is only its content. I finally used spark-shell for conversion from zstd to snappy. Thanks for directing me – Jacfal Sep 24 '20 at 07:05
  • Consider answering your own question. This database does not appear much on Server Fault, so documenting some practical experience would be useful. – John Mahowald Sep 25 '20 at 00:56

1 Answers1

1

It is possible via spark-shell on the machine where zstd parquet reading is supported.

spark.read.option("compression", "zstd").parquet("/tmp/parquet-folder").write.option("compression", "none").mode("overwrite").parquet("/tmp/parquet-folder-no-compression")
Jacfal
  • 21
  • 3