Is there a technical name for this concept of pars

2019-05-21 13:58发布

问题:

I came up with a technique a while back which I've been using in multiple projects. It's using a single string to store a list of values. Each value is prefixed with the size of the value, then the deliminator (after size) and then the data - and repeat. Using this technique means that you can store literally any type of character, without trying to exclude the use of a deliminator between the values.

Here's a sample of such a string:

23|This is the first value13|Another value5|third

That translates to a list of these values:

  • This is the first value
  • Another value
  • third

I've learned by testing that this method (along with my functions to convert between this string and either an array or string list) is very fast while keeping minimal memory. It's also very useful for sending data packets (which is where I first came up with this method).

Is there a technical name for this? Parsing is too broad of a word in this case, there must be a more specific term.

回答1:

Of standard/established types of serialization, the closest that I'm familiar with is type-length-value (TLV) encoding, which differs from your scheme in that it supports the use of non-fixed types, whereas yours would require the type of each field to be known in advance (and indeed, you seem to use only strings, in all fields).



回答2:

The FORTRAN language's syntax had Hollerith constants, sometimes called Hollerith strings. They are identical to your example other than using the letter H instead of |.



回答3:

This is called marshalling.



回答4:

Doing this generic type of data handling has been around as long as computer science itself. It goes by different names, but the idea of it all has to do with the ability to handle larger amounts of data at once for efficiency-sake (usually increasing speed via less disk i-o). Notable examples within Delphi include TMemo.Text and even before Delphi, the TEXT or TEXTFILE type within Turbo/Borland Pascal. Behind the scenes, this type pulls back "text data" and then parses it out in such a manner as you are describing. As a stream (the way files are processed anyway), standard Windows text files have #13#10 as delimiters, which can be parsed to determine where string breaks occur in text.