Portal Home > Knowledgebase > Articles Database > Exporting Html to CSV help needed.


Exporting Html to CSV help needed.




Posted by KenCoble, 02-28-2010, 01:57 PM
Ok so I'm scraping content on my site.(It's from nhl 10 for xbox nothing devious) I wanted to have our stats scraped and then exported to CSV twice a day. If I can figure out how to export I'm sure a cron job could be set to run the CSV export twice a day. What I have thus far is this. The output can be seen on my personal server at. http://74.117.63.249/test.php If anyone could help me create a script that could take this output and export to a CSV I would greatly appreciate it. I am looking to keep a historical track of our stats so the CSV will be appended each time the Cron job runs.

Posted by Host Ahead, 02-28-2010, 05:35 PM
You will have to parse the html further. Run the HTML tree down to extract the single values and then concatenate the strings into comma-seperated strings. To do this you will have to examine the HTML file further and extract the patterns and exceptions to layout that exist. You could try to use some xml parser (if it is valid XHTML) to parse the HTML. If you have that you can attach these strings into a csv.

Posted by KenCoble, 03-01-2010, 01:17 PM
Don't suppose anyone could look at the source and help me along? I'm horrible at this and even with the snippet of the regex it took me a while to parse and scrape what I have.

Posted by KenCoble, 03-02-2010, 05:41 AM
Ok this is sorted using Python. You can close this out if you'd like mods.

Posted by Capricorn, 03-02-2010, 11:15 AM
I second this. It's exactly how I would approach it, though I wouldn't mess around with xhtml unless I had to. You might also want to make sure to put error checks in case the page format changes you can be alerted. I got burned when yahoo stock options changed their format last month after being the same for years and years.



Was this answer helpful?

Add to Favourites Add to Favourites    Print this Article Print this Article

Also Read
Search Suggestions (Views: 615)
Micfo & EV1servers (Views: 744)
Geostalking (Views: 658)

Language: