Dominick
02/26/2021, 7:20 AMprivate const val WORD_PATTERN = """[A-Za-z][A-Za-z']*"""
private val WORD_REGEX = Regex(WORD_PATTERN)
fun top3(s: String): List<String> {
val words = WORD_REGEX.findAll(s).map{ it.groupValues[0].toLowerCase() }
val occurrences = mutableMapOf<String, Int>()
for (word in words) {
occurrences[word] = (occurrences[word] ?: 0) + 1
}
return occurrences.toList()
.sortedByDescending{ it.second }
.map{ it.first }
.take(3)
}
Zach Klippenstein (he/him) [MOD]
02/26/2021, 7:48 AMprivate const val WORD_PATTERN = """[A-Za-z][A-Za-z']*"""
private val WORD_REGEX = Regex(WORD_PATTERN)
This uses a triple-quote string to quote a regular expression. Triple quotes are usually used for strings that include a lot of backslashes, double quotes, or newlines since they don’t need to be escaped. A more idiomatic way to write this in Kotlin would be to simply call WORD_PATTERN.toRegex()
.
val words = WORD_REGEX.findAll(s).map{ it.groupValues[0].toLowerCase() }
I believe findAll
returns a list of `Match`es that represent distinct matches of the regular expression in the string. A single regex match can contain multiple “groups”. If the regex uses parentheses, those will each be a group (e.g. (a)b
the (a)
will form a group). However, every match has at least one group: the entire match itself. So groupValues[0]
just means the whole matched substring. It’s converting it to lowercase so that the same words with different cases are counted as the same word.
val occurrences = mutableMapOf<String, Int>()
This creates a read-only variable of type MutableMap<String, Int>
. The instance referenced by the variable can’t change, but the map itself can. It’s typical to use val
with Mutable*
collections.
for (word in words) {
occurrences[word] = (occurrences[word] ?: 0) + 1
}
Iterates over the list of all the words and counts occurrences. For each word, it looks up the current count for that word in the map adds 1, and puts the new count back in the map. If the word does not exist in the map yet, 0 is used as the initial value. A slightly more concise way to write this would be to use the fold
operator over the list, although there might be an even more concise way to do it using one of the grouping operators.
return occurrences.toList()
Since occurrences
is a Map<String, Int>
, this converts it into a List<Pair<String, Int>>
. For every key/value pair in the map, the returned list contains a Pair
.
.sortedByDescending{ it.second }
This takes the list of pairs and returns a new list of pairs, but the new list will be sorted by the second value in the pair (the Int
word count). It’s sorted in descending order, so the largest value comes first.
.map { it.first }
This transforms the List<Pair<String, Int>>
to a List<String>
– the new list only contains the first value of each pair, which is the lowercase word.
.take(3)
This takes the incoming list and returns a new list that is at most 3 elements in size, containing up to the first 3 elements of the original list. Since the list is sorted, these are the words with the three biggest frequencies.nkiesel
02/26/2021, 8:36 AMfun String.topNwords(n: Int = 3) = Regex("""[a-zA-Z][a-zA-Z']*""")
.findAll(this)
.map { it.value.toLowerCase() }
.groupingBy { it }
.eachCount()
.entries
.sortedByDescending { it.value }
.take(n)
.map { it.key }
Zach Klippenstein (he/him) [MOD]
02/26/2021, 3:18 PMeachCount
but it was too late and I was too lazy to google it last night 😂Dominick
02/26/2021, 3:23 PM