That's exactly my issue. And I do have HTML input,...
# datascience
r
That's exactly my issue. And I do have HTML input, so I have some presentation information. That's the other problem with a lot of ML algorithms -- they throw away that presentational info. The best one can do with most algos seems to be to tokenize the HTML tags in the hope that those tokens become useful features.