Hampus Londögård
08/28/2022, 6:53 PMTokenClassificationPipeline.create("optimum/bert-base-NER")
◦ Where optimum/bert-base-NER
is a model on the HuggingFace Hub
• ✅ Load both PyTorch (TorchScript) & ONNX model through local path
• ClassificationPipeline
and TokenClassificationPipeline
exists
◦ See the following test for some examples on how to use it
A 1.2.0-BETA
release has been cut!ayodele
08/30/2022, 5:11 PMHampus Londögård
08/30/2022, 5:15 PMayodele
08/30/2022, 5:30 PM@OptIn(ExperimentalTime::class)
fun logisticTest() {
val labelsMap = mapOf(
0 to "Bank Charges",
1 to "Betting",
2 to "Card fees",
3 to "Food",
4 to "Lifestyle",
5 to "Loan",
6 to "Reversal",
7 to "Salary",
8 to "Unknown",
9 to "Utilities & Bills",
10 to "Withdrawal"
)
val data = listOf(
BankT("Vat amount charges", "Bank Charges"),
BankT("Loan payment credit", "Loan"),
BankT("Salary for Aug", "Salary"),
BankT("Payment from betking","Betting"),
BankT("Purchase from Shoprite","Food"),
)
val simpleTok = SimpleTokenizer()
val xData = data.map { it.narration }.map(simpleTok::split)
val yData = data.map { it.category!! }.mapToIndex()
val y = mk.ndarray(yData, yData.size, 1)
val tfidf = TfIdfVectorizer<Float>()
val lr = com.londogard.nlp.meachinelearning.predictors.classifiers.LogisticRegression()
val transformedData = tfidf.fitTransform(xData)
val time = measureTime {
lr.fit(transformedData, y)
}
println("Fitting: $time")
val nar = xData[2]
val list = listOf(nar)
val mx = tfidf.transform(list)
val prediction = lr.predict(mx).first()
println("Predicted label is: $prediction. This corresponds to class. ${labelsMap[prediction]}")
}
ayodele
08/30/2022, 5:32 PMHampus Londögård
08/30/2022, 5:35 PMmapToIndex
do?ayodele
08/30/2022, 5:37 PMBankT
category is Bank Charges
it replace it with 0
Hampus Londögård
08/30/2022, 5:42 PMFitting: 785.423920ms
Predicted label is: 0. This corresponds to class. Bank Charges
ayodele
08/30/2022, 5:44 PMHampus Londögård
08/30/2022, 5:51 PMHampus Londögård
08/30/2022, 5:55 PMayodele
08/30/2022, 5:56 PMayodele
08/30/2022, 5:56 PMayodele
08/30/2022, 5:57 PMHampus Londögård
08/30/2022, 5:58 PMFitting: 651.062949ms
Predicted label is: [[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]]
Would you prefer a one-hot-encoding or simple-class?
E.g. should it be one-hot-encoding as input or the true class (1,2,3)?ayodele
08/30/2022, 5:59 PMHampus Londögård
08/30/2022, 7:17 PMayodele
08/30/2022, 7:58 PMLogisticRegression
and NaiveBayes
compared to a neural network? What are your opinions??Hampus Londögård
08/31/2022, 4:07 AMHampus Londögård
08/31/2022, 4:42 AM1.2.0-BETA2
cut has been draft, it should be live within few hours.
This is a beta and API could be changed. Current implementation you can see through https://github.com/londogard/londogard-nlp-toolkit/blob/main/src/test/kotlin/com/londogard/nlp/machinelearning/ClassifierTest.kt#L48Hampus Londögård
08/31/2022, 4:42 AMval lr = LogisticRegression().asAutoOneHotClassifier()
ayodele
08/31/2022, 8:12 AMayodele
09/01/2022, 7:16 AMayodele
09/01/2022, 7:17 AMayodele
09/01/2022, 8:12 AMtfidf.transform()
then preditSimple()
the output is always zeroHampus Londögård
09/01/2022, 8:33 AMimplementation("com.londogard:nlp:1.2.0-BETA2")
?Hampus Londögård
09/01/2022, 8:33 AMAlso when you want to predict usingDid you look at the test I created? What’s the difference between your use-case and the one I did?thentfidf.transform()
the output is always zeropreditSimple()
ayodele
09/01/2022, 8:37 AMHampus Londögård
09/01/2022, 8:37 AMayodele
09/01/2022, 8:40 AMtfidf.fitTransform()
but ill be predicting using result from tfidf.transform()
ayodele
09/01/2022, 8:41 AM0
Hampus Londögård
09/01/2022, 8:41 AMayodele
09/01/2022, 9:00 AMHampus Londögård
09/02/2022, 5:49 AMHampus Londögård
09/02/2022, 7:15 PMval labelsMap = mapOf(
0 to "Bank Charges",
1 to "Betting",
2 to "Card fees",
3 to "Food",
4 to "Lifestyle",
5 to "Loan",
6 to "Reversal",
7 to "Salary",
8 to "Unknown",
9 to "Utilities & Bills",
10 to "Withdrawal"
)
val reversedLabelMap = labelsMap.asSequence().map { it.value to it.key }.toMap()
val (data, categories) = listOf(
"Vat amount charges" to "Bank Charges",
"Loan payment credit" to "Loan",
"Salary for Aug" to "Salary",
"Payment from betking" to "Betting",
"Purchase from Shoprite" to "Food",
).unzip()
val simpleTok = SimpleTokenizer()
val xData = data.map(simpleTok::split)
val yList = categories.map { category -> reversedLabelMap.getOrDefault(category, 0) }
val y = mk.ndarray(yList)
val tfidf = TfIdfVectorizer<Float>()
val lr = LogisticRegression().asAutoOneHotClassifier()
val transformedData = tfidf.fitTransform(xData)
lr.fit(transformedData, y)
lr.predictSimple(tfidf.transform(xData)) shouldBeEqualTo lr.predictSimple(transformedData)
lr.predictSimple(transformedData) shouldBeEqualTo y
works as expectedHampus Londögård
09/02/2022, 7:22 PM