https://kotlinlang.org logo
#skrape-it
Title
# skrape-it
m

Manuel Lorenzo

09/02/2020, 1:16 PM
how do you deal with cookies? for example if I try to parse https://www.huffpost.com/entry/patriot-prayer-portland_n_5f4ea8f0c5b69eb5c0358c26 the first thing I get is the GDPR cookie wall
c

Christian Dräger

09/06/2020, 7:40 PM
hey sorry for late response. ouh this one is hard. i don’t know how the huffpost page knows that you already agreed the cookie banner. usually this is handled by a cookie or by a localstorage entry. it is possible to set cookies with skrapeit in the request but you would need to find out which one. from what i saw they set a trillion cookies so its hard to try out. but since i guess they just place the cookie banner with javascript over the article everything should already be in the DOM and can be scraped. the overlays and stuff can most of the time just be ignored.
7 Views