git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

web scraper


Thank you so much for your very valuable guidance on my python experiment.
Meanwhile the problems I reported before have been solved.

This is part of a program that extracts specific information from bank
transaction records, and one functionality I still need to implement is a *web
scraper*:

The eureka / guide <https://www.edureka.co/blog/web-scraping-with-python/>
"web scraping with python" provides some insights, that are however linked
to a specific website:
by associating the "inspected" web page with the code shown in the Eureka
page, one can build the algorithm, but
can it be generalized?
Not all tags are in the form of <a class --->  <div class, so is it doable
to just replace those tags in the code, should
one process a different website?

In addition, the flipkart  website
<https://www.flipkart.com/laptops/~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&uniqBStoreParam1=val1&wid=11.productCard.PMU_V2>
contains all the needed data, while my case requires a lookup of one item
at a time, namely:
(i) loop over a bunch of ISIN codes
(ii) access a specific website (=morningstar?), that does the
ISIN-to-fund-name translation
(iii) "inspect" that page containing the result and grab the fund name.

I would appreciate any advice on how to program all this. Thanks.
-- 
Regards,
Joseph Pareti - Artificial Intelligence consultant
Joseph Pareti's AI Consulting Services
https://www.joepareti54-ai.com/
cell +49 1520 1600 209
cell +39 339 797 0644