Hy,
I have the following code:
import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;
/
public class RegexSimple4
{
public static void main(String[] args) {
try {
Scanner myfis = new Scanner(new File("D:\\myfis32.txt"));
ArrayList <String> foundaz = new ArrayList<String>();
ArrayList <String> noduplicates = new ArrayList<String>();
while(myfis.hasNext()) {
String line = myfis.nextLine();
String delim = " ";
String [] words = line.split(delim);
for (String s : words) {
if (!s.isEmpty() && s != null) {
Pattern pi = Pattern.compile("[aA-zZ]*");
Matcher ma = pi.matcher(s);
if (ma.find()) {
foundaz.add(s);
}
}
}
}
if(foundaz.isEmpty()) {
System.out.println("No words have been found");
}
if(!foundaz.isEmpty()) {
int n = foundaz.size();
String plus = foundaz.get(0);
noduplicates.add(plus);
for(int i=1; i<n; i++) {
if ( !noduplicates.get(i-1) .equalsIgnoreCase(foundaz.get(i))) {
noduplicates.add(foundaz.get(i));
}
}
//System.out.print("Cuvantul/cuvintele \n"+i);
}
if(!foundaz.isEmpty()) {
System.out.print("Original text \n");
for(String s: foundaz) {
System.out.println(s);
}
}
if(!noduplicates.isEmpty()) {
System.out.print("Remove duplicates\n");
for(String s: noduplicates) {
System.out.println(s);
}
}
} catch(Exception ex) {
System.out.println(ex);
}
}
}
With the purpose of removing consecutive duplicates from phrases. The code works only for a column of strings not for full length phrases.
For example my input should be:
Blah blah dog cat mice. Cat mice dog dog.
And the output
Blah dog cat mice. Cat mice dog.
Sincerly,