How to select columns from a nested Dataset/Dataframe in Spark java
Let’s assume we have nested data that looks like this
Let’s say we have the data stored and we load into a dataframe frist
1 |
Dataset<Row> images = ImageSchema.readImages("hdfs://coda2:9000/datasets/original/imagenet/VOCdevkit/VOC2012/JPEGImages"); |
We can now get a dataframe, only containing one of the nested colmns with the following command
1 2 3 |
Dataset<Row> images_height = images.select(("image.height")); Dataset<Row> images_mode = images.select(("image.mode")); |
And so on. So you just have to use “.” as separate to select any nested column.