Zstd parquet decompression

Question

I have parquet file compressed by zstd. It is possible to decompress it somehow? I tried to use zstd command, but without any luck:

[x@xyz tmp]# zstd -d part-00016-303a375a-e443-4f86-a59e-b5d82d15bd26.c000.zstd.parquet -o test.parquet
zstd: part-00016-303a375a-e443-4f86-a59e-b5d82d15bd26.c000.zstd.parquet: unsupported format

Where exactly did you get this file from? If the file format is anything like what is described at https://parquet.apache.org/documentation/latest/ the outer file format is some kind of parquet archive, and the inner fields are compressed: "Parquet allows compression schemes to be specified on a per-column level" — John Mahowald, Sep 22 '20 at 14:14
It is parquet from Spark job, but you are right. I thought that whole parquet is compressed by zstd, but it is only its content. I finally used spark-shell for conversion from zstd to snappy. Thanks for directing me — Jacfal, Sep 24 '20 at 07:05
Consider answering your own question. This database does not appear much on Server Fault, so documenting some practical experience would be useful. — John Mahowald, Sep 25 '20 at 00:56

score 1 · Accepted Answer · answered Oct 07 '20 at 06:56

It is possible via spark-shell on the machine where zstd parquet reading is supported.

spark.read.option("compression", "zstd").parquet("/tmp/parquet-folder").write.option("compression", "none").mode("overwrite").parquet("/tmp/parquet-folder-no-compression")

Zstd parquet decompression

1 Answers1