https://kotlinlang.org logo
Channels
100daysofcode
100daysofkotlin
100daysofkotlin-2021
advent-of-code
aem
ai
alexa
algeria
algolialibraries
amsterdam
android
android-architecture
android-databinding
android-studio
androidgithubprojects
androidthings
androidx
androidx-xprocessing
anime
anko
announcements
apollo-kotlin
appintro
arabic
argentina
arkenv
arksemdevteam
armenia
arrow
arrow-contributors
arrow-meta
ass
atlanta
atm17
atrium
austin
australia
austria
awesome-kotlin
ballast
bangladesh
barcelona
bayarea
bazel
beepiz-libraries
belgium
benchmarks
berlin
big-data
books
boston
brazil
brikk
budapest
build
build-tools
bulgaria
bydgoszcz
cambodia
canada
carrat
carrat-dev
carrat-feed
chicago
chile
china
chucker
cincinnati-user-group
cli
clikt
cloudfoundry
cn
cobalt
code-coverage
codeforces
codemash-precompiler
codereview
codingame
codingconventions
coimbatore
collaborations
colombia
colorado
communities
competitive-programming
competitivecoding
compiler
compose
compose-android
compose-desktop
compose-hiring
compose-ios
compose-mp
compose-ui-showcase
compose-wear
compose-web
confetti
connect-audit-events
corda
cork
coroutines
couchbase
coursera
croatia
cryptography
cscenter-course-2016
cucumber-bdd
cyprus
czech
dagger
data2viz
databinding
datascience
dckotlin
debugging
decompose
decouple
denmark
deprecated
detekt
detekt-hint
dev-core
dfw
docs-revamped
dokka
domain-driven-design
doodle
dsl
dublin
dutch
eap
eclipse
ecuador
edinburgh
education
effective-kotlin
effectivekotlin
emacs
embedded-kotlin
estatik
event21-community-content
events
exposed
failgood
fb-internal-demo
feed
firebase
flow
fluid-libraries
forkhandles
forum
fosdem
fp-in-kotlin
framework-elide
freenode
french
fritz2
fuchsia
functional
funktionale
gamedev
ge-kotlin
general-advice
georgia
geospatial
german-lang
getting-started
github-workflows-kt
glance
godot-kotlin
google-io
gradle
graphic
graphkool
graphql
graphql-kotlin
graviton-browser
greece
grpc
gsoc
gui
hackathons
hacktoberfest
hamburg
hamkrest
helios
helsinki
hexagon
hibernate
hikari-cp
hire-me
hiring
hongkong
hoplite
http4k
hungary
hyderabad
image-processing
india
indonesia
inkremental
intellij
intellij-plugins
intellij-tricks
internships
introduce-yourself
io
ios
iran
israel
istanbulcoders
italian
jackson-kotlin
jadx
japanese
jasync-sql
java-to-kotlin-refactoring
javadevelopers
javafx
javalin
javascript
jdbi
jhipster-kotlin
jobsworldwide
jpa
jshdq
juul-libraries
jvm-ir-backend-feedback
jxadapter
k2-early-adopters
kaal
kafka
kakao
kalasim
kapt
karachi
karg
karlsruhe
kash_shell
kaskade
kbuild
kdbc
kgen-doc-tools
kgraphql
kinta
klaxon
klock
kloudformation
kmdc
kmm-español
kmongo
knbt
knote
koalaql
koans
kobalt
kobweb
kodein
kodex
kohesive
koin
koin-dev
komapper
kondor-json
kong
kontent
kontributors
korau
korean
korge
korim
korio
korlibs
korte
kotest
kotest-contributors
kotless
kotlick
kotlin-asia
kotlin-beam
kotlin-by-example
kotlin-csv
kotlin-data-storage
kotlin-foundation
kotlin-fuel
kotlin-in-action
kotlin-inject
kotlin-latam
kotlin-logging
kotlin-multiplatform-contest
kotlin-mumbai
kotlin-native
kotlin-pakistan
kotlin-plugin
kotlin-pune
kotlin-roadmap
kotlin-samples
kotlin-sap
kotlin-serbia
kotlin-spark
kotlin-szeged
kotlin-website
kotlinacademy
kotlinbot
kotlinconf
kotlindl
kotlinforbeginners
kotlingforbeginners
kotlinlondon
kotlinmad
kotlinprogrammers
kotlinsu
kotlintest
kotlintest-devs
kotlintlv
kotlinultimatechallenge
kotlinx-datetime
kotlinx-files
kotlinx-html
kotrix
kotson
kovenant
kprompt
kraph
krawler
kroto-plus
ksp
ktcc
ktfmt
ktlint
ktor
ktp
kubed
kug-leads
kug-torino
kvision
kweb
lambdaworld_cadiz
lanark
language-evolution
language-proposals
latvia
leakcanary
leedskotlinusergroup
lets-have-fun
libgdx
libkgd
library-development
lincheck
linkeddata
lithuania
london
losangeles
lottie
love
lychee
macedonia
machinelearningbawas
madrid
malaysia
mathematics
meetkotlin
memes
meta
metro-detroit
mexico
miami
micronaut
minnesota
minutest
mirror
mockk
moko
moldova
monsterpuzzle
montreal
moonbean
morocco
motionlayout
mpapt
mu
multiplatform
mumbai
munich
mvikotlin
mvrx
myndocs-oauth2-server
naming
navigation-architecture-component
nepal
new-mexico
new-zealand
newname
nigeria
nodejs
norway
npm-publish
nyc
oceania
ohio-kotlin-users
oldenburg
oolong
opensource
orbit-mvi
osgi
otpisani
package-search
pakistan
panamá
pattern-matching
pbandk
pdx
peru
philippines
phoenix
pinoy
pocketgitclient
polish
popkorn
portugal
practical-functional-programming
proguard
prozis-android-backup
pyhsikal
python
python-contributors
quasar
random
re
react
reaktive
realm
realworldkotlin
reductor
reduks
redux
redux-kotlin
refactoring-to-kotlin
reflect
refreshversions
reports
result
rethink
revolver
rhein-main
rocksdb
romania
room
rpi-pico
rsocket
russian
russian_feed
russian-kotlinasfirst
rx
rxjava
san-diego
science
scotland
scrcast
scrimage
script
scripting
seattle
serialization
server
sg-user-group
singapore
skia-wasm-interop-temp
skrape-it
slovak
snake
sofl-user-group
southafrica
spacemacs
spain
spanish
speaking
spek
spin
splitties
spotify-mobius
spring
spring-security
squarelibraries
stackoverflow
stacks
stayhungrystayfoolish
stdlib
stlouis
strife-discord-lib
strikt
students
stuttgart
sudan
swagger-gradle-codegen
swarm
sweden
swing
swiss-user-group
switzerland
talking-kotlin
tallinn
tampa
teamcity
tegal
tempe
tensorflow
terminal
test
testing
testtestest
texas
tgbotapi
thailand
tornadofx
touchlab-tools
training
tricity-kotlin-user-group
trójmiasto
truth
tunisia
turkey
turkiye
twitter-feed
uae
udacityindia
uk
ukrainian
uniflow
unkonf
uruguay
utah
uuid
vancouver
vankotlin
vertx
videos
vienna
vietnam
vim
vkug
vuejs
web-mpp
webassembly
webrtc
wimix_sentry
wwdc
zircon
Powered by
Title
x

xexiz

09/13/2021, 11:46 AM
Hi!! First of all, thanks @Christian Dräger for that pretty library, it really makes skraping much more fun hehe 🙂 Although, I’ve been trying with a site and I’m always getting timeouts (even if I override the timeout param to 30sec). Is there any logs we can activate to have more info on what’s going on and why it always times-out? Here’s what I’m trying for now (I did manage to have a successful call a few times to make sure my selectors were ok.
private suspend fun getTotalPages(): Int =
    withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        skrape(AsyncFetcher) {
            request {
                url = "<https://www.capfriendly.com/browse/active>"
            }
            response {
                htmlDocument {
                    div {
                        withClass = "pagination"
                        findFirst {
                            div {
                                withClass = "r"
                                val paginationText = findByIndex(1).text
                                paginationText.substringAfter(" of ").toInt()
                            }
                        }
                    }
                }
            }
        }
    }
• kotlin 1.5.30 • AGP 7.1.0-alpha11 • API 31 • skrapeit:1.1.5 Thanks
c

Christian Dräger

09/14/2021, 6:37 PM
Glad you like it. I will try out tomorrow and see if I can investigate something :)
i did a quick check by putting your code in a junit test and its working fine for me
@Test
    fun `can get total pages`() {
        runBlocking {
            withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
                val totalPages = skrape(AsyncFetcher) {
                    request {
                        url = "<https://www.capfriendly.com/browse/active>"
                    }
                    response {
                        htmlDocument {
                            div {
                                withClass = "pagination"
                                findFirst {
                                    div {
                                        withClass = "r"
                                        val paginationText = findByIndex(1).text
                                        paginationText.substringAfter(" of ").toInt()
                                    }
                                }
                            }
                        }
                    }
                }

                println(totalPages)
            }
        }
    }
to avoid the string parsing since the links text will probably change more frequent than its attributes i would maybe do something like this:
@Test
    fun `can get total pages`() = runBlocking {
        withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
            val totalPages = skrape(AsyncFetcher) {
                request {
                    url = "<https://www.capfriendly.com/browse/active>"
                }
                response {
                    htmlDocument {
                        div {
                            withClass = "pagination"
                            findFirst {
                                div {
                                    withClass = "r"
                                    a {
                                        findAll { find { it.text == "Last" }?.attribute("data-val") }
                                    }
                                }
                            }
                        }
                    }
                }
            }

            println(totalPages)
        }
    }
👍 1
just to make sure, have you checked the troubleshooting section of the android example? https://github.com/skrapeit/skrape.it/tree/master/examples/android#troubleshooting i will try to build a little android app to check. seems to be something android specific since the example is running fine on the jvm server side
x

xexiz

09/15/2021, 2:23 PM
Thanks for the suggestion. I did reused most of the android example and adapted it a bit to be able to fetch total number of pages and then fetch all 30 pages of this list. I also bump all libraries versions.
It seems it might be the website that’s causing problems. I just tried with imdb to display list of movies and it’s working on my project, but not with capfriendly. No idea why though.
private suspend fun fetchImdb(): List<User> =
    withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        skrape(AsyncFetcher) {
            request {
                url = "<https://www.imdb.com/chart/top/>"
                sslRelaxed = true
            }.also { println("call ${it.preparedRequest.url}") }
            response {
                htmlDocument {
                    table {
                        tbody {
                            withClass = "lister-list"
                            tr {
                                findAll {
                                    map {
                                        val title = <http://it.td|it.td> { findSecond { text } }
                                        User(name = title, "", "")
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
Yeah, the website really has something skrape{it} doesn’t like. I tried the whole loop I did but with another website and I can fetch all 8005 players
private suspend fun fetchDB(): List<User> {
    var players = listOf<User>()
    withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        val deferred = ('a'..'z').filterNot { it == 'x' }.map { async { getHockeyDb(it) } }
        players = deferred.awaitAll().flatten()
    }
    println("players total: ${players.size}")
    println("players 5: ${players[5]}")
    println("players 100: ${players[100]}")
    println("players 250: ${players[250]}")
    println("players 500: ${players[500]}")
    println("players 5000: ${players[5000]}")
    return players
}

private suspend fun getHockeyDb(letter: Char): List<User> {
    return withContext(<http://Dispatchers.IO|Dispatchers.IO>) {
        skrape(AsyncFetcher) {
            request {
                url = "<https://www.hockeydb.com/ihdb/players/player_ind_$letter.html>"
                sslRelaxed = true
            }.also { println("call ${it.preparedRequest.url}") }
            response {
                htmlDocument {
                    table {
                        tbody {
                            tr {
                                findAll {
                                    map {
                                        val name = <http://it.td|it.td> { it.a { findFirst { text } } }
                                        val team = <http://it.td|it.td> { findByIndex(1) { text } }
                                        val salary = <http://it.td|it.td> { findLast { text } }
                                        User(name, team, salary)
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
Have you had time to take a look yet @Christian Dräger?
c

Christian Dräger

09/17/2021, 6:36 PM
Sorry I didn't managed to had a deeper look so far. But from your examples it really looks like it has something to do with that certain url. From that point of view it would be intressting to find the reason since it could probably be a bug in skrapeit. There currently is another open issue that is related to some https url and Android. The author of the issue assumes it has something to do with self signed or invalid tls certificates https://github.com/skrapeit/skrape.it/issues/162 But I couldn't reproduce it as well. But same as your error I didn't found the time to really test on Android. Maybe it is the same error or at least related 🤷‍♂️ but can not say much so far
👍 1
x

xexiz

09/17/2021, 6:50 PM
Allright, thanks. I also noticed while playing with the previously mentioned example ( using www.hockeydb.com) that it seems to only work on API29+ I’ll try to isolate the problem and open a bug if I find more info, but the http client seems to be responsible, not working on API25,26,27,28 so far. I’ll try to bump the ktor and all other dependencies on your lib, that;’s always a good guess to use latest versions of 3rd party libraries.
Interestingly enough, I just rebuilt your library and swapped the Apache client used in the
AsyncFetcher
with the OkHttp client (from
ktor-client-okhttp
https://ktor.io/docs/http-client-engines.html#okhttp) and all my problems are fixed. I can now fetch all 1465 players from capfriendly.com and also, it works on all Android versions I previously mentioned and not only on API29+ 🙂
c

Christian Dräger

09/24/2021, 5:24 PM
woooho
thx for investigating mate. ok i think it would make sense to change the default implemantations of HttpFetcher and AsyncFetcher to use ktor-client-okhttp instead of apache 🙂
👍 1
since i am running short on time these days would you be open to send a PR (since you already have the code anyway)? 🙂 thereby we can fix it upstream for everyone. would be super awesome
x

xexiz

09/24/2021, 6:00 PM
Yes I could. I had removed everything authentication or proxy related though so I’ll need to figure out what to do with this. I would need to get more familiar with the project to make sure swapping it doesn’t break other stuff. So you are more in favor to totally replace the Apache implem. with the OkHttp one? or support both so that the user decides which one he wants? And what about the BrowserFetcher? In the Android world, OkHttp is pretty much the standard and most supported client so it would make sense to make it the default except that originally I think your lib was mostly for unittest right?
c

Christian Dräger

09/25/2021, 7:36 PM
It started to be for unit testing but why not support Android as good as possible. Since I am not into android development (building backends and web Frontends is my daily bread and butter ^^) it's sometimes hard for me to catch up with such android support topics :D Since okhttp works perfectly fine one server or unit tests I think it's a good idea to just swap the implementation. In general everything crucial should already be covered with tests to verify if the new implementation works it should be enough to execute these tests. The BrowserFetcher is somewhat special because it sticks to htmlunit (and also has to in the future) to support the js rendering thingy. Not sure if it does trouble on Android as well and furthermore what to do to support BrowserFetcher on Android. If you don't feel comfortable withe authentication (which is more like beta currently anyway) and the proxy you could just leave it open and i could add it the PR :)