Pattern matching using java.util.regex
java.util.regex library is available since Java 1.4. It can be used for matching character sequences against pattern specified by regular expression. It has two main classes Pattern and Matcher. An instance of the Pattern
class represents a regular expression that is specified in string form in a syntax.
Instances of the Matcher
class are used to match character sequences against a given pattern. Input is provided to matchers via the CharSequence
interface in order to support matching against characters from a wide variety of input sources.
Here is the simple example, If you have to extract part of string based on the fixed pattern defined. For example the simple greeting string
“Hello Yogesh, Welcome to Hyderabad.”
In this sentence, we have to extract two info like name and location ( Yogesh & Hyderabad). First of all we need to define pattern to extract the same information. Java support regular expression for pattern matching, go through Java Pattern documentation to know more about it.
Above figure shows building pattern for given text to get specific text out of it. Here we have using grouping ‘(‘ & ’)’ to exact what he want. Sample code is below
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestPattern {
public static void main(String[] args) {
String patternString = "Hi (.*), Welcome to (.*).";
String sampleText = "Hi Yogesh, Welcome to Hyderabad(India).";
//Create Pattern from pattern string.
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(sampleText);
if(matcher.find()){
System.out.println("Name = " + matcher.group(1));
System.out.println("Location = " + matcher.group(2));
}else{
System.out.println("Didn't find anything");
}
}
}
Location = Hyderabad(India)
By default group 0 will be the complete text and other groups are numbered from left to right. As result of this it prints “Yogesh” for group 1 and “Hyderabad(India)” for group 2.
Escaping the literals : There will be some text which includes regular expression literal like ‘.’, ’(‘, ’)’, ’+’ etc. Pattern class has provided with an utility method Pattern.quote() to escape such literals.
public class TestPattern {
ReplyDeletepublic static void main(String[] args) {
//String patternString = "Hi (.*), Welcome(s) to (.*).";
//String sampleTest = "Hi Yogesh, Welcome(s) to Hyderabad(India).";
String patternString = Pattern.quote("(Hello)") + " (.*)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher("(Hello) Yogesh");
if(matcher.find()){
System.out.println("Name = " + matcher.group(1));
//System.out.println("Location = " + matcher.group(2));
}else{
System.out.println("Didn't find anything");
}
}
}