best way to extract from a long `String` (~15k cha...
# announcements
e
best way to extract from a long
String
(~15k chars) the integer on the right of the following sentence?
.. ,"number":12,"..
I'd
indexOf
, then extract the next 4 chars, drop at the first comma and convert
y
Do you have any indication of where the number would be? knowing whether its at the start or end (or even better, at a specific range) would be helpful for performance. But otherwise,
indexOf
is possibly the best approach here
e
it's a json coming from github api, I have no security the number would be always at the same position, I'd say sticking to something dynamic is the safe side here
y
I meant as in like if you, for example, know that it won't (most likely) be in the latter quarter of the text then you can optimise your algorithm that way with a fail-safe of just using a linear search at the end if your optimisations fail
n
If it's JSON, then the only sane answer is to parse the JSON and then use property of the resulting data structure.
y
I have to disagree in this case since it is 15,00 characters long and so JSON parsing will take too long. If you know 100% that there'll only be one correct instance of that
number
field, then using normal string indexing should suffice.
n
that seems like one of those assumptions that is just bound to break tbh
1
p
also, what is “too long” and “best”? in some cases, parsing 15k char JSON strings is absolutely fine, in others, it’s prohibitively expensive. so the final algorithm choice probably has to involve a tradeoff between (well-defined) performance requirements and algorithm safety requirements. it could be fine with an algorithm that checks what’s at index 18 (to take a random number) and if it matches expectations, uses that, and otherwise reports a failure, halting the process until devs have had a chance to fix the algorithm. it’s all context-dependent.
e
@Nir I'm willing to take my chances 😛
m
JSON objects are unordered. It would be safer to just parse the String and live with the performance impact.
n
if all you want is maximum performance and you an live with the obvious problems like "someone added a datastructure in another place that is rendered the same way", then I would: 1. remember the index where you found this leading token the previous time and start looking there. 2. if not found there, keep track if the smallest index where you found it and start looking from there 3. otherwise, start at index 0 4. once you found the leading token, updated "latest" and "min" and then collect the number using
var i = 0; while (idx < size) { val d = s[idx++].digitToIntOrNull() ?: break; i = 10 * i + d }
Note: this assumes no leading
[-+]
e
that's a quite nice idea, @nkiesel