org.htmlparser.util
Class ParserUtils
public
class
ParserUtils
extends Object
Method Summary |
static Parser | createParserParsingAnInputString(String input)
Create a Parser Object having a String Object as input (instead of a url or a string representing the url location).
|
static Node[] | findTypeInNode(Node node, Class type)
Search given node and pick up any objects of given type. |
static String | removeChars(String s, char occur) |
static String | removeEscapeCharacters(String inputString) |
static String | removeTrailingBlanks(String text) |
static String[] | splitButChars(String input, String charsDoNotBeRemoved)
Split the input string considering as string separator
all the characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
|
static String[] | splitButDigits(String input, String charsDoNotBeRemoved)
Split the input string considering as string separator
all the not numerical characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
|
static String[] | splitChars(String input, String charsToBeRemoved)
Split the input string considering as string separator
the chars specified in the input variable charsToBeRemoved.
|
static String[] | splitSpaces(String input, String charsToBeRemoved)
Split the input string considering as string separator
all the spaces and tabs like chars and
the chars specified in the input variable charsToBeRemoved.
|
static String[] | splitTags(String input, String[] tags)
Split the input string in a string array,
considering the tags as delimiter for splitting. |
static String[] | splitTags(String input, String[] tags, boolean recursive, boolean insideTag)
Split the input string in a string array,
considering the tags as delimiter for splitting.
|
static String[] | splitTags(String input, Class nodeType)
Split the input string in a string array,
considering the tags as delimiter for splitting.
|
static String[] | splitTags(String input, Class nodeType, boolean recursive, boolean insideTag)
Split the input string in a string array,
considering the tags as delimiter for splitting.
|
static String[] | splitTags(String input, NodeFilter filter)
Split the input string in a string array,
considering the tags as delimiter for splitting.
|
static String[] | splitTags(String input, NodeFilter filter, boolean recursive, boolean insideTag)
Split the input string in a string array,
considering the tags as delimiter for splitting.
|
static String | trimAllTags(String input, boolean inside)
Trim the input string, removing all the tags in the input string.
|
static String | trimButChars(String input, String charsDoNotBeRemoved)
Remove from the input string all the characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
|
static String | trimButCharsBeginEnd(String input, String charsDoNotBeRemoved)
Remove from the beginning and the end of the input string all the characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
|
static String | trimButDigits(String input, String charsDoNotBeRemoved)
Remove from the input string all the not numerical characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
|
static String | trimButDigitsBeginEnd(String input, String charsDoNotBeRemoved)
Remove from the beginning and the end of the input string all the not numerical characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
|
static String | trimChars(String input, String charsToBeRemoved)
Remove from the input string all the chars specified in the input variable charsToBeRemoved.
|
static String | trimCharsBeginEnd(String input, String charsToBeRemoved)
Remove from the beginning and the end of the input string all the chars specified in the input variable charsToBeRemoved.
|
static String | trimSpaces(String input, String charsToBeRemoved)
Remove from the input string all the spaces and tabs like chars.
|
static String | trimSpacesBeginEnd(String input, String charsToBeRemoved)
Remove from the beginning and the end of the input string all the spaces and tabs like chars.
|
static String | trimTags(String input, String[] tags)
Trim all tags in the input string and
return a string like the input one
without the tags and their content. |
static String | trimTags(String input, String[] tags, boolean recursive, boolean insideTag)
Trim all tags in the input string and
return a string like the input one
without the tags and their content (optional).
|
static String | trimTags(String input, Class nodeType)
Trim all tags in the input string and
return a string like the input one
without the tags and their content.
|
static String | trimTags(String input, Class nodeType, boolean recursive, boolean insideTag)
Trim all tags in the input string and
return a string like the input one
without the tags and their content (optional).
|
static String | trimTags(String input, NodeFilter filter)
Trim all tags in the input string and
return a string like the input one
without the tags and their content.
|
static String | trimTags(String input, NodeFilter filter, boolean recursive, boolean insideTag)
Trim all tags in the input string and
return a string like the input one
without the tags and their content (optional).
|
public static
Parser createParserParsingAnInputString(String input)
Create a Parser Object having a String Object as input (instead of a url or a string representing the url location).
The string will be parsed as it would be a file.
Parameters: input The string in input.
Returns: The Parser Object with the string as input stream.
public static
Node[] findTypeInNode(
Node node, Class type)
Search given node and pick up any objects of given type.
Parameters: node The node to search. type The class to search for.
Returns: A node array with the matching nodes.
public static String removeChars(String s, char occur)
public static String removeEscapeCharacters(String inputString)
public static String removeTrailingBlanks(String text)
public static String[] splitButChars(String input, String charsDoNotBeRemoved)
Split the input string considering as string separator
all the characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call splitButChars("<DIV> +12.5, +3.4 </DIV>", "+.1234567890"),
you obtain an array of strings {"+12.5", "+3.4"} as output (+,.,1,2,3,4,5,6,7,8,9,0 are chars that do not be removed).
Parameters: input The string in input. charsDoNotBeRemoved The chars that do not be removed.
Returns: The array of strings as output.
public static String[] splitButDigits(String input, String charsDoNotBeRemoved)
Split the input string considering as string separator
all the not numerical characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call splitButDigits("<DIV> +12.5, +3.4 </DIV>", "+."),
you obtain an array of strings {"+12.5", "+3.4"} as output (1,2,3,4 and 5 are digits and +,. are chars that do not be removed).
Parameters: input The string in input. charsDoNotBeRemoved The chars that do not be removed.
Returns: The array of strings as output.
public static String[] splitChars(String input, String charsToBeRemoved)
Split the input string considering as string separator
the chars specified in the input variable charsToBeRemoved.
For example if you call splitChars("<DIV> +12.5, +3.4 </DIV>", " <>DIV/,"),
you obtain an array of strings {"+12.5", "+3.4"} as output (space chars and <,>,D,I,V,/ and the comma are chars that must be removed).
Parameters: input The string in input. charsToBeRemoved The chars to be removed.
Returns: The array of strings as output.
public static String[] splitSpaces(String input, String charsToBeRemoved)
Split the input string considering as string separator
all the spaces and tabs like chars and
the chars specified in the input variable charsToBeRemoved.
For example if you call splitSpaces("<DIV> +12.5, +3.4 </DIV>", "<>DIV/,"),
<BR>you obtain an array of strings {"+12.5", "+3.4"} as output (space chars and <,>,D,I,V,/ and the comma are chars that must be removed).
Parameters: input The string in input. charsToBeRemoved The chars to be removed.
Returns: The array of strings as output.
public static String[] splitTags(String input, String[] tags)
public static String[] splitTags(String input, String[] tags, boolean recursive, boolean insideTag)
Split the input string in a string array,
considering the tags as delimiter for splitting.
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}),
you obtain a string array {"Begin ", " ALL OK"} as output (splitted <DIV> tags and their content recursively).
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false),
you obtain a string array {"Begin ", "<DIV> +12.5 </DIV>", " ALL OK"} as output (splitted <DIV> tags and not their content and no recursively).
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false),
you obtain a string array {"Begin ", " +12.5 ", " ALL OK"} as output (splitted <DIV> tags and not their content recursively).
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true),
you obtain a string array {"Begin ", " ALL OK"} as output (splitted <DIV> tags and their content).
Parameters: input The string in input. tags The tags to be used as splitting delimiter. recursive Optional parameter (true if not present), if true delete all the tags recursively. insideTag Optional parameter (true if not present), if true delete also the content of the tags.
Returns: The string array containing the strings delimited by tags.
public static String[] splitTags(String input, Class nodeType)
public static String[] splitTags(String input, Class nodeType, boolean recursive, boolean insideTag)
public static String[] splitTags(String input,
NodeFilter filter)
public static String[] splitTags(String input,
NodeFilter filter, boolean recursive, boolean insideTag)
public static String trimAllTags(String input, boolean inside)
Trim the input string, removing all the tags in the input string.
The method trims all the substrings included in the input string of the following type:
"<XXX>", where XXX could be a string of any type.
If you set to true the inside parameter, the method deletes also the YYY string in the following input string:
"<XXX>YYY<ZZZ>", note that ZZZ is not necessary the closing tag of XXX.
Parameters: input The string in input. inside If true, it forces the method to delete also what is inside the tags.
Returns: The string without tags.
public static String trimButChars(String input, String charsDoNotBeRemoved)
Remove from the input string all the characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call trimButChars("<DIV> +12.5 </DIV>", "+.1234567890"),
you obtain a string "+12.5" as output (+,.,1,2,3,4,5,6,7,8,9,0 are chars that do not be removed).
For example if you call trimButChars("<DIV> +1 2 . 5 </DIV>", "+.1234567890"),
you obtain a string "+12.5" as output (the spaces between 1 and 2, 2 and ., . and 5 are removed).
Parameters: input The string in input. charsDoNotBeRemoved The chars that do not be removed.
Returns: The string as output.
public static String trimButCharsBeginEnd(String input, String charsDoNotBeRemoved)
Remove from the beginning and the end of the input string all the characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimButCharsBeginEnd("<DIV> +12.5 </DIV>", "+.1234567890"),
you obtain a string "+12.5" as output (+,.,1,2,3,4,5,6,7,8,9,0 are chars that do not be removed).
For example if you call trimButCharsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+.1234567890"),
you obtain a string "+1 2 . 5" as output (the spaces inside the string are not removed).
Parameters: input The string in input. charsDoNotBeRemoved The chars that do not be removed.
Returns: The string as output.
public static String trimButDigits(String input, String charsDoNotBeRemoved)
Remove from the input string all the not numerical characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call trimButDigits("<DIV> +12.5 </DIV>", "+."),
you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed).
For example if you call trimButDigits("<DIV> +1 2 . 5 </DIV>", "+."),
you obtain a string "+12.5" as output (the spaces between 1 and 2, 2 and ., . and 5 are removed).
Parameters: input The string in input. charsDoNotBeRemoved The chars that do not be removed.
Returns: The string as output.
public static String trimButDigitsBeginEnd(String input, String charsDoNotBeRemoved)
Remove from the beginning and the end of the input string all the not numerical characters
with the only exception of the characters specified in charsDoNotBeRemoved param.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimButDigitsBeginEnd("<DIV> +12.5 </DIV>", "+."),
you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed).
For example if you call trimButDigitsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+."),
you obtain a string "+1 2 . 5" as output (the spacess inside the string are not removed).
Parameters: input - The string in input. charsDoNotBeRemoved - The chars that do not be removed.
Returns: The string as output.
public static String trimChars(String input, String charsToBeRemoved)
Remove from the input string all the chars specified in the input variable charsToBeRemoved.
For example if you call trimChars("<DIV> +12.5 </DIV>", "<>DIV/ "),
you obtain a string "+12.5" as output (<,>,D,I,V,/ and space char are chars that must be removed).
For example if you call trimChars("<DIV> Trim All Chars Also The Ones Inside The String </DIV>", "<>DIV/ "),
you obtain a string "TrimAllCharsAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed).
Parameters: input The string in input. charsToBeRemoved The chars to be removed.
Returns: The string as output.
public static String trimCharsBeginEnd(String input, String charsToBeRemoved)
Remove from the beginning and the end of the input string all the chars specified in the input variable charsToBeRemoved.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimCharsBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/ "),
you obtain a string "+12.5" as output (' ' is a space char and <,>,D,I,V,/ are chars that must be removed).
For example if you call trimCharsBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/ "),
you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved).
Parameters: input The string in input. charsToBeRemoved The chars to be removed.
Returns: The string as output.
public static String trimSpaces(String input, String charsToBeRemoved)
Remove from the input string all the spaces and tabs like chars.
Remove also the chars specified in the input variable charsToBeRemoved.
For example if you call trimSpaces("<DIV> +12.5 </DIV>", "<>DIV/"),
you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed).
For example if you call trimSpaces("<DIV> Trim All Spaces Also The Ones Inside The String </DIV>", "<>DIV/"),
you obtain a string "TrimAllSpacesAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed).
Parameters: input The string in input. charsToBeRemoved The chars to be removed.
Returns: The string as output.
public static String trimSpacesBeginEnd(String input, String charsToBeRemoved)
Remove from the beginning and the end of the input string all the spaces and tabs like chars.
Remove also the chars specified in the input variable charsToBeRemoved.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimSpacesBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/"),
you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed).
For example if you call trimSpacesBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/"),
you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved).
Parameters: input The string in input. charsToBeRemoved The chars to be removed.
Returns: The string as output.
public static String trimTags(String input, String[] tags)
public static String trimTags(String input, String[] tags, boolean recursive, boolean insideTag)
Trim all tags in the input string and
return a string like the input one
without the tags and their content (optional).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}),
you obtain a string " ALL OK" as output (trimmed <DIV> tags and their content recursively).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false),
you obtain a string "<DIV> +12.5 </DIV> ALL OK" as output (trimmed <DIV> tags and not their content and no recursively).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false),
you obtain a string " +12.5 ALL OK" as output (trimmed <DIV> tags and not their content recursively).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true),
you obtain a string " ALL OK" as output (trimmed <DIV> tags and their content).
Parameters: input The string in input. tags The tags to be removed. recursive Optional parameter (true if not present), if true delete all the tags recursively. insideTag Optional parameter (true if not present), if true delete also the content of the tags.
Returns: The string without tags.
public static String trimTags(String input, Class nodeType)
public static String trimTags(String input, Class nodeType, boolean recursive, boolean insideTag)
public static String trimTags(String input,
NodeFilter filter)
public static String trimTags(String input,
NodeFilter filter, boolean recursive, boolean insideTag)
HTML Parser is an open source library released under LGPL. |  |