ReceiveAsync interrupting/breaking message passing

2019-08-06 09:38发布

This problem raised its head while trying to implement the suggested solution to this problem.

Problem summary

Performing a ReceiveAsync() call from a TransformBlock to a WriteOnceBlock is causing the TransformBlock to essentially remove itself from the flow. It ceases to propagate any kind of message, whether it's data or a completion signal.

System design

The system is intended for parsing large CSV files through a series of steps.

The problematic part of the flow can be (inexpertly) visualized as follows:

Partial data flow

The parallelogram is a BufferBlock, diamonds are BroadcastBlocks, triangles are WriteOnceBlocks and arrows are TransformBlocks. Solid lines denote a link created with LinkTo(), and the dotted line represents a ReceiveAsync() call from the ParsedHeaderAndRecordJoiner to the ParsedHeaderContainer block. I'm aware that this flow is somewhat suboptimal but that's not the primary reason for the question.

Code

Application root

Here is a part of the class that creates the necessary blocks and links them together using PropagateCompletion

using (var cancellationSource = new CancellationTokenSource())
{
    var cancellationToken = cancellationSource.Token;
    var temporaryEntityInstance = new Card(); // Just as an example

    var producerQueue = queueFactory.CreateQueue<string>(new DataflowBlockOptions{CancellationToken = cancellationToken});
    var recordDistributor = distributorFactory.CreateDistributor<string>(s => (string)s.Clone(), 
        new DataflowBlockOptions { CancellationToken = cancellationToken });
    var headerRowContainer = containerFactory.CreateContainer<string>(s => (string)s.Clone(), 
        new DataflowBlockOptions { CancellationToken = cancellationToken });
    var headerRowParser = new HeaderRowParserFactory().CreateHeaderRowParser(temporaryEntityInstance.GetType(), ';', 
        new ExecutionDataflowBlockOptions { CancellationToken = cancellationToken });
    var parsedHeaderContainer = containerFactory.CreateContainer<HeaderParsingResult>(HeaderParsingResult.Clone, 
        new DataflowBlockOptions { CancellationToken = cancellationToken});
    var parsedHeaderAndRecordJoiner = new ParsedHeaderAndRecordJoinerFactory().CreateParsedHeaderAndRecordJoiner(parsedHeaderContainer, 
        new ExecutionDataflowBlockOptions { CancellationToken = cancellationToken });
    var entityParser = new entityParserFactory().CreateEntityParser(temporaryEntityInstance.GetType(), ';',
        dataflowBlockOptions: new ExecutionDataflowBlockOptions { CancellationToken = cancellationToken });
    var entityDistributor = distributorFactory.CreateDistributor<EntityParsingResult>(EntityParsingResult.Clone, 
        new DataflowBlockOptions{CancellationToken = cancellationToken});

    var linkOptions = new DataflowLinkOptions {PropagateCompletion = true};

    // Producer subprocess
    producerQueue.LinkTo(recordDistributor, linkOptions);

    // Header subprocess
    recordDistributor.LinkTo(headerRowContainer, linkOptions);
    headerRowContainer.LinkTo(headerRowParser, linkOptions);
    headerRowParser.LinkTo(parsedHeaderContainer, linkOptions);
    parsedHeaderContainer.LinkTo(errorQueue, new DataflowLinkOptions{MaxMessages = 1, PropagateCompletion = true}, dataflowResult => !dataflowResult.WasSuccessful);

    // Parsing subprocess
    recordDistributor.LinkTo(parsedHeaderAndRecordJoiner, linkOptions);
    parsedHeaderAndRecordJoiner.LinkTo(entityParser, linkOptions, joiningResult => joiningResult.WasSuccessful);
    entityParser.LinkTo(entityDistributor, linkOptions);
    entityDistributor.LinkTo(errorQueue, linkOptions, dataflowResult => !dataflowResult.WasSuccessful);
}

HeaderRowParser

This block parses a header row from a CSV file and does some validation.

public class HeaderRowParserFactory
{
    public TransformBlock<string, HeaderParsingResult> CreateHeaderRowParser(Type entityType,
        char delimiter,
        ExecutionDataflowBlockOptions dataflowBlockOptions = null)
    {
        return new TransformBlock<string, HeaderParsingResult>(headerRow =>
        {
            // Set up some containers
            var result = new HeaderParsingResult(identifier: "N/A", wasSuccessful: true);
            var fieldIndexesByPropertyName = new Dictionary<string, int>();

            // Get all serializable properties on the chosen entity type
            var serializableProperties = entityType.GetProperties()
                .Where(prop => prop.IsDefined(typeof(CsvFieldNameAttribute), false))
                .ToList();

            // Add their CSV fieldnames to the result
            var entityFieldNames = serializableProperties.Select(prop => prop.GetCustomAttribute<CsvFieldNameAttribute>().FieldName);
            result.SetEntityFieldNames(entityFieldNames);

            // Create the dictionary of properties by field name
            var serializablePropertiesByFieldName = serializableProperties.ToDictionary(prop => prop.GetCustomAttribute<CsvFieldNameAttribute>().FieldName, prop => prop, StringComparer.OrdinalIgnoreCase);

            var fields = headerRow.Split(delimiter);

            for (var i = 0; i < fields.Length; i++)
            {
                // If any field in the CSV is unknown as a serializable property, we return a failed result
                if (!serializablePropertiesByFieldName.TryGetValue(fields[i], out var foundProperty))
                {
                    result.Invalidate($"The header row contains a field that does not match any of the serializable properties - {fields[i]}.",
                        DataflowErrorSeverity.Critical);
                    return result;
                }

                // Perform a bunch more validation

                fieldIndexesByPropertyName.Add(foundProperty.Name, i);
            }

            result.SetFieldIndexesByName(fieldIndexesByPropertyName);
            return result;
        }, dataflowBlockOptions ?? new ExecutionDataflowBlockOptions());
    }
}

ParsedHeaderAndRecordJoiner

For each subsequent record that comes through the pipe, this block is intended to retrieve the parsed header data and add it to the record.

public class ParsedHeaderAndRecordJoinerFactory
{
    public TransformBlock<string, HeaderAndRecordJoiningResult> CreateParsedHeaderAndRecordJoiner(WriteOnceBlock<HeaderParsingResult> parsedHeaderContainer, 
        ExecutionDataflowBlockOptions dataflowBlockOptions = null)
    {
        return new TransformBlock<string, HeaderAndRecordJoiningResult>(async csvRecord =>
            {
                var headerParsingResult = await parsedHeaderContainer.ReceiveAsync();

                // If the header couldn't be parsed, a critical error is already on its way to the failure logger so we don't need to continue
                if (!headerParsingResult.WasSuccessful) return new HeaderAndRecordJoiningResult(identifier: "N.A.", wasSuccessful: false, null, null);

                // The entity parser can't do anything with the header record, so we send a message with wasSuccessful false
                var isHeaderRecord = true;
                foreach (var entityFieldName in headerParsingResult.EntityFieldNames)
                {
                    isHeaderRecord &= csvRecord.Contains(entityFieldName);
                }
                if (isHeaderRecord) return new HeaderAndRecordJoiningResult(identifier: "N.A.", wasSuccessful: false, null, null);

                return new HeaderAndRecordJoiningResult(identifier: "N.A.", wasSuccessful: true, headerParsingResult, csvRecord);
            }, dataflowBlockOptions ?? new ExecutionDataflowBlockOptions());
    }
}

Problem detail

With the current implementation, the ParsedHeaderAndRecordJoiner receives the data correctly from the ReceiveAsync() call to ParsedHeaderContainer and returns as expected, however no message arrives at the EntityParser.

Also, when a Complete signal is sent to the front of the flow (the ProducerQueue), it propagates to the RecordDistributor but then stops at the ParsedHeaderAndRecordJoiner (it does continue from the HeaderRowContainer onwards, so the RecordDistributor is passing it on).

If I remove the ReceiveAsync() call and mock the data myself, the block behaves as expected.

1条回答
▲ chillily
2楼-- · 2019-08-06 10:00

I think this part is key

however no message arrives at the EntityParser.

Based on the sample the only way EntityParser doesn't receive a message output by the ParsedHeaderAndRecordJoiner is when WasSuccessful returns false. The predicate used in your link excludes failed messages but those messages have no where to go, so they build up in the ParsedHeaderAndRecordJoiner output buffer and would also prevent Completion from propagating. You'd need to link a null target to dump the failed messages.

parsedHeaderAndRecordJoiner.LinkTo(DataflowBlock.NullTarget<HeaderParsingResult>());

Further if your mock data always is coming back with WasSuccessful true, then that might be pointing you to the await ...ReceiveAsync()

Not necessarily the smoking gun but a good place to start. Can you confirm the state of all messages in the output buffer of ParsedHeaderAndRecordJoiner when the pipeline gets stuck.

查看更多
登录 后发表回答