So how do I use this package to get all the img src urls off kotlinlang #skrape-it

So how do I use this package to get all the img sr...

Gunslingor

04/11/2020, 8:26 AM

So how do I use this package to get all the img src urls off a page assuming I don't need the client? This is confusing:

Copy code

val reportPage = client.get<String>(report.link).toString()
                var images: List<String> = emptyList()
                htmlDocument(reportPage) {
                    images = img() { findAll { withAttribute }}
                }
                println(images)

Gunslingor

04/11/2020, 8:32 AM

Copy code

val reportPage = client.get<String>(report.link).toString()
    var images: List<String> = emptyList()
    htmlDocument(reportPage) {
        images = img() { findAll { eachAttribute("src") }}
    }
    println(images)

Gunslingor

04/11/2020, 8:37 AM

Better start... trying to figure out how to get all or, even better, just the main picture off of a news article site from the common outlets... cnn, fox, nbc, etc

Christian Dräger

04/20/2020, 7:31 AM

hey sorry for late response. to get all links you could do (using version 1.0.0-alpha6 - which i highly recommand. it’s already very close to the final version that will be published soon):

Christian Dräger

04/20/2020, 7:50 AM

sorry need to make a scrrenshot. somehow slack won’t let me paste the code propperly

Christian Dräger

04/20/2020, 7:54 AM

here are more examples: https://docs.skrape.it/docs/examples/grab-all-links-from-a-website

Christian Dräger

04/20/2020, 7:57 AM

in your case you would do the same but for

img

-tags and it’s coresponding src attribute, like this:

Christian Dräger

04/20/2020, 7:58 AM

message has been deleted

Gunslingor

04/22/2020, 1:39 PM

Thanks man... I'm starting to figure it out. The New York Times site keeps causing problems while trying to scrape imgs, I see the appropriate meta tags on news articles but scrapeit isn't finding them, might be a security thing or something... but regardless my other test sites are working well so I am starting to figure this out. Thanks man.

👍 1

7 Views

Open in Slack

Previous Next