After many tries I have concluded that the optimal way to transfer with SSIS data from AS400 (non-unicode) to SQL Server is:
Use native transfer utility to dump data to tsv (tab delimited)
Convert files from utf-8 to unicode
Use bulk insert to put them into SQL Server
In #2 step I have found a ready made code that does this:
string from = @"\\appsrv02\c$\bg_f0101.tsv";
string to = @"\\appsrv02\c$\bg_f0101.txt";
using (StreamReader reader = new StreamReader(from, Encoding.UTF8, false, 1000000))
using (StreamWriter writer = new StreamWriter(to, false, Encoding.Unicode, 1000000))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line.Length > 0)
writer.WriteLine(line);
}
}
I need to fully understand what is happening here with the encoding and why this is necessary.
How can I replace this script task with a more elegant solution?