How to set a java string variable equal to “htp://

2020-02-16 01:07发布

so I have a large list of websites and I want to put them all in a String variable. I know I can not individually go to all of the links and escape the //, but is there is over a few hundred links. Is there a way to do a "block escape", so everything in between the "block" is escaped? This is an example of what I want to save in the variable.

String links="http://website http://website http://website http://website http://website http://website"

Also can anyone think of any other problems I might run into while doing this?

I made it htp instead of http because I am not allowed to post "hyperlinks" according to stack overflow as I am not at that level :p

Thanks so much

Edit: I am making a program because I have about 50 pages of a word document that is filled with both emails and other text. I want to filter out just the emails. I wrote the program to do this which was very simple, not I just need to figure away to store the pages in a string variable in which the program will be run on.

标签: java string
4条回答
家丑人穷心不美
2楼-- · 2020-02-16 01:44

I suggest that you save your Word document as plain text. Then you can use classes from the java.io package (such as Scanner to read the text).

To solve the issue of overwriting the String variable each time you read a line, you can use an array or ArrayList. This is much more ideal than holding all the web addresses in a single String because you can easily access each address individually whenever you like.

查看更多
放荡不羁爱自由
3楼-- · 2020-02-16 01:52

Your question is not well-written. Improve it, please. In its current format it will be closed as "too vague".

Do you want to filter e-mails or websites? Your example is about websites, you text about e-mails. As I don't know and I decided to try to help you anyway, I decided to do both.

Here goes the code:

private static final Pattern EMAIL_REGEX =
        Pattern.compile("[A-Za-z0-9](:?(:?[_\\.\\-]?[a-zA-Z0-9]+)*)@(:?[A-Za-z0-9]+)(:?(:?[\\.\\-]?[a-zA-Z0-9]+)*)\\.(:?[A-Za-z]{2,})");

private static final Pattern WEBSITE_REGEX =
        Pattern.compile("http(:?s?)://[_#\\.\\-/\\?&=a-zA-Z0-9]*");

public static String readFileAsString(String fileName) throws IOException {
    File f = new File(fileName);
    byte[] b = new byte[(int) f.length()];
    InputStream is = null;
    try {
        is = new FileInputStream(f);
        is.read(b);
        return new String(b, "UTF-8");
    } finally {
        if (is != null) is.close();
    }
}

public static List<String> filterEmails(String everything) {
    List<String> list = new ArrayList<String>(8192);
    Matcher m = EMAIL_REGEX.matcher(everything);
    while (m.find()) {
        list.add(m.group());
    }
    return list;
}

public static List<String> filterWebsites(String everything) {
    List<String> list = new ArrayList<String>(8192);
    Matcher m = WEBSITE_REGEX.matcher(everything);
    while (m.find()) {
        list.add(m.group());
    }
    return list;
}

To ensure that it works, first lets test the filterEmails and filterWebsites method:

public static void main(String[] args) {
    System.out.println(filterEmails("Orange, pizza whatever else joe@somewhere.com a lot of text here. Blahblah blah with Luke Skywalker (luke@starwars.com) hfkjdsh fhdsjf jdhf Paulo <aaa.aaa@bgf-ret.com.br>"));
    System.out.println(filterWebsites("Orange, pizza whatever else joe@somewhere.com a lot of text here. Blahblah blah with Luke Skywalker (http://luke.starwars.com/force) hfkjdsh fhdsjf jdhf Paulo <https://darth.vader/blackside?sith=true&midclorians> And the http://www.somewhere.com as x."));
}

It outputs:

[joe@somewhere.com, luke@starwars.com, aaa.aaa@bgf-ret.com.br]
[http://luke.starwars.com/force, https://darth.vader/blackside?sith=true&midclorians, http://www.somewhere.com]

To test the readFileAsString method:

public static void main(String[] args) {
    System.out.println(readFileAsString("C:\\The_Path_To_Your_File\\SomeFile.txt"));
}

If that file exists, its content will be printed.

If you don't like the fact that it returns List<String> instead of a String with items divided by spaces, this is simple to solve:

public static String collapse(List<String> list) {
    StringBuilder sb = new StringBuilder(50 * list.size());
    for (String s : list) {
        sb.append(" ").append(s);
    }
    sb.delete(0, 1);
    return sb.toString();
}

Sticking all together:

String fileName = ...;
String webSites = collapse(filterWebsites(readFileAsString(fileName)));
String emails = collapse(filterEmails(readFileAsString(fileName)));
查看更多
Bombasti
4楼-- · 2020-02-16 01:52

I'm not sure what kind of 'list of websites' you're referring to, but for eg. a comma-separated file of websites you could read the entire file and use the String split function to get an array, or you could use a BufferedReader to read the file line by line and add to an ArrayList.

From there you can simply loop the array and append to a String, or if you need to:

do a "block escape", so everything in between the "block" is escaped

You can use a Regular Expression to extract parts of each String according to a pattern:

String oldString = "<someTag>I only want this part</someTag>";
String regExp = "(?i)(<someTag.*?>)(.+?)(</someTag>)";
String newString = oldString.replaceAll(regExp, "$2");

The above expression would remove the xml tags due to the "$2" which means you're interested in the second group of the expression, where groups are identified by round brackets ( ). Using "$1$3" instead should then give you only the surrounding xml tags.

Another much simpler approach to removing certain "blocks" from a String is the String replace function, where to remove the block you could simply pass in an empty string as the new value.

I hope any of this helps, otherwise you could try to provide a full example with you input "list of websites" and the output you want.

查看更多
倾城 Initia
5楼-- · 2020-02-16 02:03

For your first problem, take all the text out of word, put it in something that does regular expressions, use regular expressions to quote each line and end each line with +. Now edit the last line and change + to ;. Above the first line write String links =. Copy this new file into your java source. Here's an example using regexr.

To answer your second question (thinking of problems) there is an upper limit for a Java string literal if I recall correctly 2^16 in length.

Oh and Perl was basically written for you to do this kind of thing (take 50 pages of text and separate out what is a url and what is an email)... not to mention grep.

查看更多
登录 后发表回答