Hi, i am attempting to create an android applicati...
# android
t
Hi, i am attempting to create an android application that uses mlkit to extract text from the camera and detect article book identifiers using REGEX the identifiers i am interested in are DOI, ISBN, ArXivID, ISSN etc.. i managed to get doi detection working fine using this code...
not kotlin but kotlin colored 5
K 1
private const val _doiRegex_ = "10.\\d{4,9}/[-._;()/:A-Z0-9]+" +
"|10.1002/[^\\s]+" +
"|10.\\d{4,9}/[-._;()/:A-Z0-9]+\$" +
"|10.\\d{4}/\\d+-\\d+X?(\\d+)\\d+<[\\d\\w]+:[\\d\\w]*>\\d+.\\d+.\\w+;\\d" +
"|10.1021/\\w\\w\\d++" +
"|10.1207/[\\w\\d]+\\&\\d+_\\d+"
private val _digitalObjectIdentifierPattern_: Pattern = Pattern.compile(_doiRegex_, Pattern._CASE_INSENSITIVE_)
textRecognizer.process(inputImage) .addOnSuccessListener { visionText: Text -> val detectedText: String = visionText.text if (detectedText._isNotBlank_()) { val digitalObjectIdentifierMatcher: Matcher = digitalObjectIdentifierPattern.matcher(detectedText) when { digitalObjectIdentifierMatcher.find() -> { val doi = digitalObjectIdentifierMatcher.group() if (doi.length > 0L) onDetectedTextUpdated(doi) } } } } .addOnCompleteListener { continuation._resume_(Unit) }
however i cannot get ISBN detection to work using this REGEX and similar approach
private const val _isbnRegex_ ="^(?:ISBN(?:-1[03])?:? )?(?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]$"
private val _isbnPattern_: Pattern = Pattern.compile(_isbnRegex_, Pattern._CASE_INSENSITIVE_)
the visiionText.text can be multiline containing new line characters the regex very rarely matches anything and when it does work it misses the trailing digits what i would like to achieve is to detect the isbns with the ISBN-1X(:) prefix however only extract the actual ISBN "number" is this possible in one step? what is wrong with my approach?
c
Your question is still not Kotlin related but more of a “how do regular expression” work question. A small hint to get the “number”: look for “capturing groups in regular expressions”