Converting a sentence string to a string array of

2019-01-13 05:48发布

I need my Java program to take a string like:

"This is a sample sentence."

and turn it into a string array like:

{"this","is","a","sample","sentence"}

No periods, or punctuation (preferably). By the way, the string input is always one sentence.

Is there an easy way to do this that I'm not seeing? Or do we really have to search for spaces a lot and create new strings from the areas between the spaces (which are words)?

14条回答
放我归山
2楼-- · 2019-01-13 06:04

string.replaceAll() doesn't correctly work with locale different from predefined. At least in jdk7u10.

This example creates a word dictionary from textfile with windows cyrillic charset CP1251

    public static void main (String[] args) {
    String fileName = "Tolstoy_VoinaMir.txt";
    try {
        List<String> lines = Files.readAllLines(Paths.get(fileName),
                                                Charset.forName("CP1251"));
        Set<String> words = new TreeSet<>();
        for (String s: lines ) {
            for (String w : s.split("\\s+")) {
                w = w.replaceAll("\\p{Punct}","");
                words.add(w);
            }
        }
        for (String w: words) {
            System.out.println(w);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
查看更多
贼婆χ
3楼-- · 2019-01-13 06:04

I already did post this answer somewhere, i will do it here again. This version doesn't use any major inbuilt method. You got the char array, convert it into a String. Hope it helps!

import java.util.Scanner;

public class SentenceToWord 
{
    public static int getNumberOfWords(String sentence)
    {
        int counter=0;
        for(int i=0;i<sentence.length();i++)
        {
            if(sentence.charAt(i)==' ')
            counter++;
        }
        return counter+1;
    }

    public static char[] getSubString(String sentence,int start,int end) //method to give substring, replacement of String.substring() 
    {
        int counter=0;
        char charArrayToReturn[]=new char[end-start];
        for(int i=start;i<end;i++)
        {
            charArrayToReturn[counter++]=sentence.charAt(i);
        }
        return charArrayToReturn;
    }

    public static char[][] getWordsFromString(String sentence)
    {
        int wordsCounter=0;
        int spaceIndex=0;
        int length=sentence.length();
        char wordsArray[][]=new char[getNumberOfWords(sentence)][]; 
        for(int i=0;i<length;i++)
        {
            if(sentence.charAt(i)==' ' || i+1==length)
            {
            wordsArray[wordsCounter++]=getSubString(sentence, spaceIndex,i+1); //get each word as substring
            spaceIndex=i+1; //increment space index
            }
        }
        return  wordsArray; //return the 2 dimensional char array
    }


    public static void main(String[] args) 
    {
    System.out.println("Please enter the String");
    Scanner input=new Scanner(System.in);
    String userInput=input.nextLine().trim();
    int numOfWords=getNumberOfWords(userInput);
    char words[][]=new char[numOfWords+1][];
    words=getWordsFromString(userInput);
    System.out.println("Total number of words found in the String is "+(numOfWords));
    for(int i=0;i<numOfWords;i++)
    {
        System.out.println(" ");
        for(int j=0;j<words[i].length;j++)
        {
        System.out.print(words[i][j]);//print out each char one by one
        }
    }
    }

}
查看更多
Melony?
4楼-- · 2019-01-13 06:07

You can just split your string like that using this regular expression

String l = "sofia, malgré tout aimait : la laitue et le choux !" <br/>
l.split("[[ ]*|[,]*|[\\.]*|[:]*|[/]*|[!]*|[?]*|[+]*]+");
查看更多
爷、活的狠高调
5楼-- · 2019-01-13 06:07

Use string.replace(".", "").replace(",", "").replace("?", "").replace("!","").split(' ') to split your code into an array with no periods, commas, question marks, or exclamation marks. You can add/remove as many replace calls as you want.

查看更多
forever°为你锁心
6楼-- · 2019-01-13 06:11
爷、活的狠高调
7楼-- · 2019-01-13 06:13

Try this:

String[] stringArray = Pattern.compile("ian").split(
"This is a sample sentence"
.replaceAll("[^\\p{Alnum}]+", "") //this will remove all non alpha numeric chars
);

for (int j=0; i<stringArray .length; j++) {
  System.out.println(i + " \"" + stringArray [j] + "\"");
}
查看更多
登录 后发表回答