I have this long string (its a one, long, continuous string):
Home address H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR NOIDA- 121212, UTTAR PRADESH INDIA +911112121212 Last Updated: 12-JUN-12 Semester/Term-time Accommodation Type: Hall of residence (private provider) Semester/Term-time address A121A SOME APPARTMENT SOME LANE CITY COUNTY OX3 7FJ +91 1212121212 Last Updated: 12-SEP-12 Mobile Telephone Number : 01212121212
If you look at the string above, the following pattern can be produced:
<home_address_text><space><the_address><space><last_updated_text><last_updated_date><space><accomodation_type_text><accomodation_type><space><semester_time_address_text><semester_time_address><space>last_updated_text><last_updated_date><space><mobile_number_text><mobile_number>
I want to extract specific parts of this string, like:
1. H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR NOIDA- 121212, UTTAR PRADESH INDIA
2. Hall of residence (private provider)
3. A121A SOME APARTMENT SOMELANE CITY COUNTY OX3 7FJ
4. 01212121212
This information is variable, so it differs from person to person, so I can't just compute the length and use substring to extract it, because the length of the whole string & the part I want to extract is variable.
How can I extract specific parts of the string, as explained above, using Java? I've been looking for ways since a long time but couldn't find a way. Any help would be very much appreciated
This worked for me, based on your (single) example. Learn to use the reluctant modifiers for regular expressions. They'll help you a lot in situations like this.
For example, to get a string of characters to match the first part: "Home address (.+?) \+\d+ Last Updated:
this regex will not skip the "Last Updated" string or the "+dd" (digits) we don't want. The regex expression "(.+?)" is reluctant (not greedy) and won't skip over the + sign or the digits, leaving them to be matched by the rest of the expression.
You can use this to match substrings in a regular expression that is surrounded by static text. Here I'm using capturing groups to locate the text I want. (Capturing groups are the parts in parenthesis.)
class Goofy
{
public static void main( String[] args )
{
final String input
= "Home address H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR " +
"NOIDA- 121212, UTTAR PRADESH INDIA +911112121212 " +
"Last Updated: 12-JUN-12 Semester/Term-time " +
"Accommodation Type: Hall of residence (private " +
"provider) Semester/Term-time address A121A SOME " +
"APPARTMENT SOME LANE CITY COUNTY OX3 7FJ +91 " +
"1212121212 Last Updated: 12-SEP-12 Mobile Telephone " +
"Number : 01212121212";
final String regex = "Home address (.+?) \\+\\d+ Last Updated: " +
"\\S+ Semester/Term-time Accommodation Type: (.+?) " +
"Semester/Term-time address (.+?) \\+\\d\\d \\d+ " +
"Last Updated.+ Number : (\\d+)";
Pattern pattern = Pattern.compile( regex );
Matcher matcher = pattern.matcher( input );
if( matcher.find() ) {
System.out.println("Found: "+matcher.group() );
for( int i = 1; i <= matcher.groupCount(); i++ ) {
System.out.println( " Match " + i + ": " + matcher.group( i ));
}
}
}
}
Leveraging an example from http://www.tutorialspoint.com/java/java_regular_expressions.htm
I think you'll want to use a regular expression. Something like:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "Home address H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR NOIDA- 121212, UTTAR PRADESH INDIA +911112121212 Last Updated: 12-JUN-12 Semester/Term-time Accommodation Type: Hall of residence (private provider) Semester/Term-time address A121A SOME APPARTMENT SOME LANE CITY COUNTY OX3 7FJ +91 1212121212 Last Updated: 12-SEP-12 Mobile Telephone Number : 01212121212";
String pattern = "Home address (.*) Last Updated:";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
}
}
Home\s+address\s+(.*?)Last\s+Updated(.*?)Accommodation\s+Type(.*?)Semester\/Term-time(.*?)Last\s+Updated(.*)Mobile\s+Telephone\s+Number\s*:\s*(\d+)
Try this.grab the captures.See demo.
http://regex101.com/r/jI8lV7/7