How to read and transfer large files with Apache Camel in any encoding?
“Encoding”, if you ask API developers these days, what encoding they’re accepting when they make and transfer/consume any data. You may get a common answer that “UTF-8” is what they accept which has become the standard in the movement of text-based data over REST calls.
But if you’re still stuck in the 20th Century, dealing with file systems API and you need to transfer a huge amount of data to another system by means of a file that is not encoded in the de facto “UTF-8”, you’ve your task cut out for you as you’ll encounter the most dreaded out of memory exceptions when you try out this task with Apache Camel.
In this post, we’ll help clear out this Jira ticket in your name and build a spring boot service that uses the Camel Integration framework to get transfer big files in GBs to a destination in any encoding you like.
We’ll be using the maven tool for building our project. Add the below plugins to your pom file. This will add the custom type converter which will convert the File payload to a Stream-based payload and register it in the Registry.
<plugin>
<groupId>org.apache.camel</groupId>
<artifactId>camel-package-maven-plugin</artifactId>
<version>3.4.0</version> <!--your version-->
<executions>
<execution>
<id>generate</id>
<goals>
<goal>generate-component</goal>
</goals>
<phase>process-classes</phase>
</execution>
</executions></plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<executions>
<execution>
<phase>initialize</phase>
<goals>
<goal>add-source</goal>
<goal>add-resource</goal>
</goals>
<configuration>
<sources>
<source>src/generated/java</source>
</sources>
<resources>
<resource>
<directory>src/generated/resources</directory>
</resource>
</resources>
</configuration>
</execution>
</executions>
</plugin>
Now we need a class to implement the type convertor. This is where the magic of conversion of the payload of exchange happens. You can use anything for encoding instead of SHIFT-JS, I was experimenting with Japanese files so I had used that. There’s even a StandardCharsets class with some predefined encodings you can use.
@Converter(generateLoader = true)
public class CustomTypeConvertor implements TypeConverters {
@Converter
public BufferedReader genericFileToInputStream(GenericFile<File> file) throws IOException {
return new BufferedReader(new InputStreamReader(new FileInputStream(file.getFile()), "SHIFT-JIS"));
}
}
That’s it, don’t use charset property now, as it’ll load the entire content of the file into memory to convert into the desired charset and transfer. A sample processor that is needed is shown here.
.process(new Processor() {
@Override
public void process(Exchange exchange) throws Exception {
File f = exchange.getIn().getBody(File.class);
exchange.getIn().setBody(f);
}
})
You can see behind the scenes action by enabling trace or debug logging.
logging.level.org.apache.camel.component.file.remote=TRACE
logging.level.org.apache.camel.component.file=TRACE
You’re looking for a log where conversion to a stream-based exchange is guaranteed.
o.a.c.c.file.remote.SftpOperations : About to store file: New Text Document.txt using stream: java.io.BufferedInputStream@7d7218fc
Cheers and hopefully if you found it useful enough, do consider leaving feedback. Thanks and welcome to this new year out of this pandemic!