Transfer binary data (image etc) in a CSV file

2020-04-08 13:29发布

问题:

I have some binary data to transfer in addition to some other metadata information stored as string, number etc. The binary data I have is image files stored as a blob column in the database, I want to include the blob column in the csv file and store the csv file in filesystem or sftp server, I guess it does not matter where it is stored.

How can the binary data be stored as another column in the csv file? Is this a good practice to transfer the binary data this way?

回答1:

Base64

The usual (and proper) way is to encode the binary data in Base64. This will make the data larger by a factor 4:3.

While CSV files are generally considered to be text files, you could write raw binary data to the file.

That data should then be enclosed in duoble quotes, and all existing double quotes inside the data have to be escaped with another double quote. Quoting the fields will also take care of any newlines in the binary data, but the reader has to support that. There could also be null bytes inside which should not be a problem if the reader knows it's reading binary data (i.e. if you provide the reader yourself).

Problems could arise however if your data has to be in some unicode form..

So, in general it is not good practice to write raw binary data to a csv file, it's better to use base64 encoding.



回答2:

As the correct Answer by Danny_ds discusses, yes indeed you can include binary data in a CSV but you must encode that data in a text-and-CSV-friendly manner. The obvious choice for such text-and-CSV-friendly is Base64, as mentioned.

Example app

Here is the full source code for an example app.

This app downloads a couple images from over the Internet. Namely, the logos from the sister sites StackOverflow.com and StackExchange.com. After downloading, these images are encoded as Base64 strings.

Then a 2-column CSV file is written. The columns are a name and the image data. For this example, we have 2 rows, one for each site listed above. Notice that Base64 encoding does not involve commas or quotes, so there is no need to enclose a Base64 with quotes in standard CSV format.

To prove this worked, the CSV file is then read. The image data is decoded from Base64 back into binary data. The binary data is written to storage. You can open the PNG image files yourself. They should look like this & this.

The writing and reading of the CSV is accomplished using the Apache Commons CSV library.

package work.basil.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;
import java.util.Objects;

public class CsvImager
{
    // Write a CSV file, of two columns: name of the image, image in Base64 data.
    private void writeCsv ( final Map < String, BufferedImage > map , final Path path )
    {
        Objects.requireNonNull( map );

        CSVFormat format = CSVFormat.RFC4180.withHeader( "Name" , "Image" );
        try (
                BufferedWriter writer = Files.newBufferedWriter( path , StandardCharsets.UTF_8 ) ;
                CSVPrinter printer = new CSVPrinter( writer , format ) ;
        )
        {
            // Print rows.
            for ( String key : map.keySet() )
            {
                printer.print( key );
                BufferedImage image = Objects.requireNonNull( map.get( key ) );
                ByteArrayOutputStream stream = new ByteArrayOutputStream();
                ImageIO.write( image , "PNG" , stream );
                String imageData = Base64.getEncoder().encodeToString( stream.toByteArray() );
                printer.print( imageData );
                printer.println();
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
    }

    // Read images from a CSV file in storage.
    public Map < String, BufferedImage > readCsv ( final Path path )
    {
        Objects.requireNonNull( path );
        Map < String, BufferedImage > map = Map.of();
        try ( BufferedReader reader = Files.newBufferedReader( path ) )
        {
            map = new HashMap <>();
            CSVFormat format = CSVFormat.RFC4180.withHeader( "Name" , "Image" ).withFirstRecordAsHeader();
            Iterable < CSVRecord > records = format.parse( reader );
            for ( CSVRecord record : records )
            {
                String name = record.get( "Name" );
                String imageBase64 = record.get( "Image" );
//                System.out.println("imageBase64:\n" + imageBase64 + "\n");
                byte[] bytes = Base64.getDecoder().decode( imageBase64 );
                ByteArrayInputStream stream = new ByteArrayInputStream( bytes );
                BufferedImage image = ImageIO.read( stream );
                map.put( name , image );
            }

        } catch ( IOException e )
        {
            e.printStackTrace();
        }
        return map;
    }

    // Download images from the Internet.
    private Map < String, BufferedImage > fetchImages ()
    {
        Map < String, BufferedImage > map = Map.of(); // Initialize to empty map.
        try
        {
            URL urlStackOverflow = null, urlStackExchange = null;
            urlStackOverflow = new URL( "https://cdn.sstatic.net/Sites/stackoverflow/company/img/logos/so/so-logo.png" );
            urlStackExchange = new URL( "https://cdn.sstatic.net/Sites/stackoverflow/company/img/logos/se/se-logo.png" );

            BufferedImage imageStackOverflow = ImageIO.read( urlStackOverflow );
            BufferedImage imageStackExchange = ImageIO.read( urlStackExchange );

            System.out.println( "imageStackOverflow: " + imageStackOverflow );
            System.out.println( "imageStackExchange: " + imageStackExchange );

            map = Map.of( "logoStackOverflow.png" , imageStackOverflow , "logoStackExchange.png" , imageStackExchange );
        } catch ( MalformedURLException e ) // `new URL` fail.
        {
            e.printStackTrace();
        } catch ( IOException e ) // `ImageIO.read` fail.
        {
            e.printStackTrace();
        }
        ;
        return map;
    }

    // Produce individual image files on disk, to manually verify that the downloaded images were successfully Base64 endcoded, written to CSV, read from CSV, and decoded back to images.
    public void writeImages ( final Map < String, BufferedImage > map , final Path pathToFolder )
    {
        Objects.requireNonNull( map );
        Objects.requireNonNull( pathToFolder );
        if ( map.isEmpty() )
        {
            throw new IllegalArgumentException( "The Map should have elements but is empty. Message # 77063b5a-4398-49f0-b1a4-442255a13b77." );
        }
        if ( ! Files.isDirectory( pathToFolder ) )
        {
            throw new IllegalArgumentException( "The specified path must lead to an existing folder. Message # 6a19313d-b8a9-4a53-9b82-7672172923f9." );
        }
        for ( String key : map.keySet() )
        {
            Path pathToFile = pathToFolder.resolve( key );
            try (
                    OutputStream stream = Files.newOutputStream( pathToFile ) ;
            )
            {
                BufferedImage image = Objects.requireNonNull( map.get( key ) );
                ImageIO.write( image , "PNG" , stream );
            } catch ( IOException e )
            {
                e.printStackTrace();
            }
        }
    }

    // --------| Demo  |-----------------------

    public void demo ()
    {
        Map < String, BufferedImage > map = this.fetchImages(); // Pairs of name & image.
        Path path = Paths.get( "/Users/basilbourque/images.csv" );
        this.writeCsv( map , path );
        Map < String, BufferedImage > mapOut = this.readCsv( path );
        Path pathOut = path.getParent();
        this.writeImages( mapOut , pathOut );
    }

    public static void main ( String[] args )
    {
        CsvImager app = new CsvImager();
        app.demo();

        System.out.println( "Done." );
    }
}

Tip: The scattered use of magic strings Name & Image for the column names in the CSV should be replaced with the use of an enum as supported by Apache Commons CSV. I leave that as an exercise for the reader.