How can I parse a string into letters, digits, etc

2020-06-27 02:29发布

I have a string of characters like this '12hjb42&34ni3&(*&' in MATLAB.

I want to separate the digits and letters and everything else through regex or some other easier way. How can I do this?

2条回答
▲ chillily
2楼-- · 2020-06-27 03:12

Instead of using regular expressions, I think it would be easier to use the function ISSTRPROP:

str = '12hjb42&34ni3&(*&';                   %# Your sample string
alphaStr = str(isstrprop(str,'alpha'));      %# Get the alphabetic characters
digitStr = str(isstrprop(str,'digit'));      %# Get the numeric characters
otherStr = str(~isstrprop(str,'alphanum'));  %# Get everything that isn't an
                                             %#   alphanumeric character

Which would give you these results:

alphaStr = 'hjbni'
digitStr = '1242343'
otherStr = '&&(*&'

If you really wanted to use REGEXP, this is how you could do it:

matches = regexp(str,{'[a-zA-Z]','\d','[^a-zA-Z\d]'},'match');
alphaStr = [matches{1}{:}];
digitStr = [matches{2}{:}];
otherStr = [matches{3}{:}];
查看更多
再贱就再见
3楼-- · 2020-06-27 03:14

I don't think regex can handle this unless you know how many number/string/else chunks you have ahead of time. For example in 'st34*' there are 3 chunks, so this would work:

regexprep('st34*', '([A-Za-z]+|\d+|\W+)([A-Za-z]+|\d+|\W+)([A-Za-z]+|\d+|\W+)', ...
 '$1 $2 $3')

If you don't know the number of chunks, you can cast to int and bucket into your 3 categories, then see where the category changes to find your break point.

n = int32('st34a');
idx = zeros(size(n));
idx(ismember(n, int32('0'):int32('9'))) = 1;
idx(ismember(n, int32('a'):int32('z'))) = 2;
idx(ismember(n, int32('A'):int32('Z'))) = 2;
idx = diff(idx) ~= 0;  % these are the breakpoints where your string changes type

I haven't tested this, but something like this should work.

查看更多
登录 后发表回答