The java method String.split(String regex) is very useful for splitting a string into parts. Just specify the separator. Be aware however, that the parameter of split is a regular expression. So the pipe symbol needs a special treatment called escaping.
Failure: Forgetting the regular expression
Here is the straightforward way to splitting at “|”. This is the way that does not work as intended:
public static void split1(String input) { // wrong way to split at pipe character String[] split = input.split("|"); int len = split.length; System.out.println("#elements: " + len + " => " + Arrays.toString(split)); }
The output is not what you wanted to see:
#elements: 21 => [A, p, p, l, e, s, |, O, r, a, n, g, e, s, |, L, e, m, o, n, s]
The problem of the pipe character explained
It is easy to forget that the parameter of String.split is a regular expression. Regular expressions offer a lot of possibilities. It is however, important to keep in mind that the pipe character (“|”) has a special meaning within a regular expression. It stands for “or”. It can be used to specify different separators that are equally valid. At the end of this article, we will give that a try, too.
Escaping within a regular expression
In order to tell split() that you actually, literally want to use the pipe symbol, you put a marker in front of it, a backslash (“\”). Unfortunately, the backslash character itself has a special meaning. So you need to use a double backslash in order to get the desired result:
public static void split2(String input) { // correct way to split at pipe character String[] split = input.split("\\|"); int len = split.length; System.out.println("#elements: " + len + " => " + Arrays.toString(split)); }
Now we get this result:
#elements: 3 => [Apples, Oranges, Lemons]
Another way to escape the pipe symbol
There is another way to escape the pipe symbol. The Pattern class of Java can do the quoting for you. Here is the example:
public static void split3(String input) { // another correct way to split at pipe character String[] split = input.split(Pattern.quote("|")); int len = split.length; System.out.println("#elements: " + len + " => " + Arrays.toString(split)); }
Pattern.quote(“|”) returns a string that is also formatted with correct escape symbols. It uses others than \\, but they equally work:
\Q|\E
Splitting a string at several characters at the same time
Finally, let’s have a look at what the pipe symbol means, if it is not escaped. Let’s take a string with several words that are separated by comma, dot and semicolon. The goal is now to split the string at either of these.
This is how to do it:
public static void split4() { String input = "Apples,Oranges;Bananas.Cherries"; String[] split = input.split("\\.|,|;"); int len = split.length; System.out.println("#elements: " + len + " => " + Arrays.toString(split)); }
The result now is:
#elements: 4 => [Apples, Oranges, Bananas, Cherries]
All the different split characters are listed. They are separated by the pipe. The pipe signifies “or” like in “split at dot or at comma or at semicolon”. The dot itself is a special character within regex. So you need to escape the dot with \\.
BTW: here is a complete list of characters that have a special meaning within a regular expression. If you want to use them literally, you need to escape them.
\.[]{}()<>*+-=!?^$|
More java tips can be found here.