toadie78
(Themenstarter)
Anmeldungsdatum: 11. Dezember 2013
Beiträge: 7
|
Mit SQL datei mein ich die Export der Datenbank z.B. aus phpMyadmin. Wo ja standart mässig in eine SQL Datei geschrieben wird. Und ich möchte dort alle Bilder und Links auslesen. Und es Handelt sich um die ganze Webseite. Da auch links in Erweiterungen drin stehen die ich brauche.
Hier mal ein auszug. | (3, 89, 'Titel', 'alias', 'kategorie', '<div align="center">\r\n<table style="background-color: #ffffff; font-family: tahoma,geneva,arial,helvetica,sans-serif; font-size: 11px; vertical-align: top; font-weight: bold; padding-left: 5px; padding-top: 2px;" border="0" cellpadding="0" cellspacing="10">\r\n<tbody>\r\n<tr>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=118&Itemid=112" title="Sensorik"><img src="images/stories/auto_sensor.jpg" alt="Sensorik" title="Sensorik" border="0" height="100" hspace="6" width="110" /></a><br />Sensorik</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=120&Itemid=114" title="Bildverarbeitung"><img src="images/stories/auto_bildver.jpg" alt="Bildverarbeitung" title="Bildverarbeitung" border="0" height="100" hspace="6" width="110" /></a><br /> Bildverarbeitung</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=178&Itemid=179" title="Sicherheitstechnik"><img src="images/stories//auto_sicher.jpg" alt="Sicherheitstechnik" title="Sicherheitstechnik" border="0" height="100" hspace="6" width="110" /></a><br /> Sicherheitstechnik</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=209&Itemid=275" title="Kontrollkomponenten"><img src="images/stories/omron/auto_kontroll.jpg" alt="Kontrollkomponenten" title="Kontrollkomponenten" border="0" height="100" hspace="6" width="110" /></a><br /> Kontrollkomponenten</td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<strong>Text</strong>\r\n</strong>. </p>\r\n</div>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>', '', 1, 0, 0, 3, '2006-01-18 11:32:15', 62, '', '2012-10-29 09:49:53', 94, 94, '2012-10-29 09:49:53', '2006-01-18 01:00:00', '0000-00-00 00:00:00', '/auto_sensor.jpg||Sensorik|0||bottom||\r\/auto_bildver.jpg||Bildverarbeitung|0||bottom||\r\/auto_sicher.jpg||Sicherheitstechnik|0||bottom||\r\/auto_kontroll.jpg||Kontrollkomponenten|0||bottom||', '', '{"show_title":"","link_titles":"","show_intro":"","show_category":"","link_category":"","show_parent_category":"","link_parent_category":"","show_author":"","link_author":"","show_create_date":"","show_modify_date":"","show_publish_date":"","show_item_navigation":"","show_icons":"","show_print_icon":"","show_email_icon":"","show_vote":"","show_hits":"","show_noauth":"","urls_position":"","alternative_readmore":"","article_layout":"","show_publishing_options":"","show_article_options":"","show_urls_images_backend":"","show_urls_images_frontend":""}', 66, 0, 2, 'metakey', 'metadesc ', 1, 47581, '{"robots":"","author":"","rights":"","xreference":""}', 0, '*', ''),
|
|
track
Anmeldungsdatum: 26. Juni 2008
Beiträge: 7174
Wohnort: Wolfen (S-A)
|
Ach du Schande, solch ein Dump ist ja mit das schlimmste, was man sich aussuchen kann ! Eigentlich müsste man den sauber parsen, aber dazu müsste man
die genaue Syntax kennen und dann dafür einen Parser bauen.
Ein Heiden-Job ! .... versuchen wir's mal als absolutes Not-Mittel mit der Brechstange und zerschneiden alles jeweils an den Tags in Einzelzeilen. Vielleicht kann man dann etwas Licht in den Spaghettihaufen bringen .... - dann hat man immer die Tags am Zeilenanfang stehen: track@lucid:~$ sed 's/</\n</g' << EOL
> (3, 89, 'Titel', 'alias', 'kategorie', '<div align="center">\r\n<table style="background-color: #ffffff; font-family: tahoma,geneva,arial,helvetica,sans-serif; font-size: 11px; vertical-align: top; font-weight: bold; padding-left: 5px; padding-top: 2px;" border="0" cellpadding="0" cellspacing="10">\r\n<tbody>\r\n<tr>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=118&Itemid=112" title="Sensorik"><img src="images/stories/auto_sensor.jpg" alt="Sensorik" title="Sensorik" border="0" height="100" hspace="6" width="110" /></a><br />Sensorik</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=120&Itemid=114" title="Bildverarbeitung"><img src="images/stories/auto_bildver.jpg" alt="Bildverarbeitung" title="Bildverarbeitung" border="0" height="100" hspace="6" width="110" /></a><br /> Bildverarbeitung</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=178&Itemid=179" title="Sicherheitstechnik"><img src="images/stories//auto_sicher.jpg" alt="Sicherheitstechnik" title="Sicherheitstechnik" border="0" height="100" hspace="6" width="110" /></a><br /> Sicherheitstechnik</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=209&Itemid=275" title="Kontrollkomponenten"><img src="images/stories/omron/auto_kontroll.jpg" alt="Kontrollkomponenten" title="Kontrollkomponenten" border="0" height="100" hspace="6" width="110" /></a><br /> Kontrollkomponenten</td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<strong>Text</strong>\r\n</strong>. </p>\r\n</div>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>', '', 1, 0, 0, 3, '2006-01-18 11:32:15', 62, '', '2012-10-29 09:49:53', 94, 94, '2012-10-29 09:49:53', '2006-01-18 01:00:00', '0000-00-00 00:00:00', '/auto_sensor.jpg||Sensorik|0||bottom||\r\/auto_bildver.jpg||Bildverarbeitung|0||bottom||\r\/auto_sicher.jpg||Sicherheitstechnik|0||bottom||\r\/auto_kontroll.jpg||Kontrollkomponenten|0||bottom||', '', '{"show_title":"","link_titles":"","show_intro":"","show_category":"","link_category":"","show_parent_category":"","link_parent_category":"","show_author":"","link_author":"","show_create_date":"","show_modify_date":"","show_publish_date":"","show_item_navigation":"","show_icons":"","show_print_icon":"","show_email_icon":"","show_vote":"","show_hits":"","show_noauth":"","urls_position":"","alternative_readmore":"","article_layout":"","show_publishing_options":"","show_article_options":"","show_urls_images_backend":"","show_urls_images_frontend":""}', 66, 0, 2, 'metakey', 'metadesc ', 1, 47581, '{"robots":"","author":"","rights":"","xreference":""}', 0, '*', ''),
> EOL
(3, 89, 'Titel', 'alias', 'kategorie', '
<div align="center">\r\n
<table style="background-color: #ffffff; font-family: tahoma,geneva,arial,helvetica,sans-serif; font-size: 11px; vertical-align: top; font-weight: bold; padding-left: 5px; padding-top: 2px;" border="0" cellpadding="0" cellspacing="10">\r\n
<tbody>\r\n
<tr>\r\n
<td align="center" height="135" valign="top" width="200">
<a href="index.php?option=com_content&view=article&id=118&Itemid=112" title="Sensorik">
<img src="images/stories/auto_sensor.jpg" alt="Sensorik" title="Sensorik" border="0" height="100" hspace="6" width="110" />
</a>
<br />Sensorik
</td>\r\n
<td align="center" height="135" valign="top" width="200">
<a href="index.php?option=com_content&view=article&id=120&Itemid=114" title="Bildverarbeitung">
<img src="images/stories/auto_bildver.jpg" alt="Bildverarbeitung" title="Bildverarbeitung" border="0" height="100" hspace="6" width="110" />
</a>
<br /> Bildverarbeitung
</td>\r\n
<td align="center" height="135" valign="top" width="200">
<a href="index.php?option=com_content&view=article&id=178&Itemid=179" title="Sicherheitstechnik">
<img src="images/stories//auto_sicher.jpg" alt="Sicherheitstechnik" title="Sicherheitstechnik" border="0" height="100" hspace="6" width="110" />
</a>
<br /> Sicherheitstechnik
</td>\r\n
<td align="center" height="135" valign="top" width="200">
<a href="index.php?option=com_content&view=article&id=209&Itemid=275" title="Kontrollkomponenten">
<img src="images/stories/omron/auto_kontroll.jpg" alt="Kontrollkomponenten" title="Kontrollkomponenten" border="0" height="100" hspace="6" width="110" />
</a>
<br /> Kontrollkomponenten
</td>\r\n
</tr>\r\n
</tbody>\r\n
</table>\r\n
<strong>Text
</strong>\r\n
</strong>.
</p>\r\n
</div>\r\n
<p align="justify">
</p>\r\n
<p align="justify">
</p>\r\n
<p align="justify">
</p>\r\n
<p align="justify">
</p>\r\n
<p align="justify">
</p>\r\n
<p align="justify">
</p>', '', 1, 0, 0, 3, '2006-01-18 11:32:15', 62, '', '2012-10-29 09:49:53', 94, 94, '2012-10-29 09:49:53', '2006-01-18 01:00:00', '0000-00-00 00:00:00', '/auto_sensor.jpg||Sensorik|0||bottom||\r\/auto_bildver.jpg||Bildverarbeitung|0||bottom||\r\/auto_sicher.jpg||Sicherheitstechnik|0||bottom||\r\/auto_kontroll.jpg||Kontrollkomponenten|0||bottom||', '', '{"show_title":"","link_titles":"","show_intro":"","show_category":"","link_category":"","show_parent_category":"","link_parent_category":"","show_author":"","link_author":"","show_create_date":"","show_modify_date":"","show_publish_date":"","show_item_navigation":"","show_icons":"","show_print_icon":"","show_email_icon":"","show_vote":"","show_hits":"","show_noauth":"","urls_position":"","alternative_readmore":"","article_layout":"","show_publishing_options":"","show_article_options":"","show_urls_images_backend":"","show_urls_images_frontend":""}', 66, 0, 2, 'metakey', 'metadesc ', 1, 47581, '{"robots":"","author":"","rights":"","xreference":""}', 0, '*', ''), Es sieht so aus, als wenn es da diverse <a href=...> - und <img src=...> - Tags gibt. Die kann man natürlich herausfiltern: track@lucid:~$ sed 's/</\n</g' << EOL | sed -n '/<a href=/ {s/[^=]*="//; p}'
> (3, 89, 'Titel', 'alias', 'kategorie', '<div align="center">\r\n<table style="background-color: #ffffff; font-family: tahoma,geneva,arial,helvetica,sans-serif; font-size: 11px; vertical-align: top; font-weight: bold; padding-left: 5px; padding-top: 2px;" border="0" cellpadding="0" cellspacing="10">\r\n<tbody>\r\n<tr>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=118&Itemid=112" title="Sensorik"><img src="images/stories/auto_sensor.jpg" alt="Sensorik" title="Sensorik" border="0" height="100" hspace="6" width="110" /></a><br />Sensorik</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=120&Itemid=114" title="Bildverarbeitung"><img src="images/stories/auto_bildver.jpg" alt="Bildverarbeitung" title="Bildverarbeitung" border="0" height="100" hspace="6" width="110" /></a><br /> Bildverarbeitung</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=178&Itemid=179" title="Sicherheitstechnik"><img src="images/stories//auto_sicher.jpg" alt="Sicherheitstechnik" title="Sicherheitstechnik" border="0" height="100" hspace="6" width="110" /></a><br /> Sicherheitstechnik</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=209&Itemid=275" title="Kontrollkomponenten"><img src="images/stories/omron/auto_kontroll.jpg" alt="Kontrollkomponenten" title="Kontrollkomponenten" border="0" height="100" hspace="6" width="110" /></a><br /> Kontrollkomponenten</td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<strong>Text</strong>\r\n</strong>. </p>\r\n</div>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>', '', 1, 0, 0, 3, '2006-01-18 11:32:15', 62, '', '2012-10-29 09:49:53', 94, 94, '2012-10-29 09:49:53', '2006-01-18 01:00:00', '0000-00-00 00:00:00', '/auto_sensor.jpg||Sensorik|0||bottom||\r\/auto_bildver.jpg||Bildverarbeitung|0||bottom||\r\/auto_sicher.jpg||Sicherheitstechnik|0||bottom||\r\/auto_kontroll.jpg||Kontrollkomponenten|0||bottom||', '', '{"show_title":"","link_titles":"","show_intro":"","show_category":"","link_category":"","show_parent_category":"","link_parent_category":"","show_author":"","link_author":"","show_create_date":"","show_modify_date":"","show_publish_date":"","show_item_navigation":"","show_icons":"","show_print_icon":"","show_email_icon":"","show_vote":"","show_hits":"","show_noauth":"","urls_position":"","alternative_readmore":"","article_layout":"","show_publishing_options":"","show_article_options":"","show_urls_images_backend":"","show_urls_images_frontend":""}', 66, 0, 2, 'metakey', 'metadesc ', 1, 47581, '{"robots":"","author":"","rights":"","xreference":""}', 0, '*', ''),
> EOL
index.php?option=com_content&view=article&id=118&Itemid=112" title="Sensorik">
index.php?option=com_content&view=article&id=120&Itemid=114" title="Bildverarbeitung">
index.php?option=com_content&view=article&id=178&Itemid=179" title="Sicherheitstechnik">
index.php?option=com_content&view=article&id=209&Itemid=275" title="Kontrollkomponenten"> Wenn man dort mit einem 2. s- Befehl alles hinter den Gänsefüßchen abschneidet, hätte man erstmal die Links. Das selbe musst Du natürlich auch noch für alle anderen Tags mit Links drin machen. Hier mal als Test für die beiden Tag-Arten: track@lucid:~$ sed 's/</\n</g' << EOL | sed -n 's/" .*//; s/<a href="//p; s/<img src="//p'
> (3, 89, 'Titel', 'alias', 'kategorie', '<div align="center">\r\n<table style="background-color: #ffffff; font-family: tahoma,geneva,arial,helvetica,sans-serif; font-size: 11px; vertical-align: top; font-weight: bold; padding-left: 5px; padding-top: 2px;" border="0" cellpadding="0" cellspacing="10">\r\n<tbody>\r\n<tr>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=118&Itemid=112" title="Sensorik"><img src="images/stories/auto_sensor.jpg" alt="Sensorik" title="Sensorik" border="0" height="100" hspace="6" width="110" /></a><br />Sensorik</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=120&Itemid=114" title="Bildverarbeitung"><img src="images/stories/auto_bildver.jpg" alt="Bildverarbeitung" title="Bildverarbeitung" border="0" height="100" hspace="6" width="110" /></a><br /> Bildverarbeitung</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=178&Itemid=179" title="Sicherheitstechnik"><img src="images/stories//auto_sicher.jpg" alt="Sicherheitstechnik" title="Sicherheitstechnik" border="0" height="100" hspace="6" width="110" /></a><br /> Sicherheitstechnik</td>\r\n<td align="center" height="135" valign="top" width="200"><a href="index.php?option=com_content&view=article&id=209&Itemid=275" title="Kontrollkomponenten"><img src="images/stories/omron/auto_kontroll.jpg" alt="Kontrollkomponenten" title="Kontrollkomponenten" border="0" height="100" hspace="6" width="110" /></a><br /> Kontrollkomponenten</td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<strong>Text</strong>\r\n</strong>. </p>\r\n</div>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>\r\n<p align="justify"> </p>', '', 1, 0, 0, 3, '2006-01-18 11:32:15', 62, '', '2012-10-29 09:49:53', 94, 94, '2012-10-29 09:49:53', '2006-01-18 01:00:00', '0000-00-00 00:00:00', '/auto_sensor.jpg||Sensorik|0||bottom||\r\/auto_bildver.jpg||Bildverarbeitung|0||bottom||\r\/auto_sicher.jpg||Sicherheitstechnik|0||bottom||\r\/auto_kontroll.jpg||Kontrollkomponenten|0||bottom||', '', '{"show_title":"","link_titles":"","show_intro":"","show_category":"","link_category":"","show_parent_category":"","link_parent_category":"","show_author":"","link_author":"","show_create_date":"","show_modify_date":"","show_publish_date":"","show_item_navigation":"","show_icons":"","show_print_icon":"","show_email_icon":"","show_vote":"","show_hits":"","show_noauth":"","urls_position":"","alternative_readmore":"","article_layout":"","show_publishing_options":"","show_article_options":"","show_urls_images_backend":"","show_urls_images_frontend":""}', 66, 0, 2, 'metakey', 'metadesc ', 1, 47581, '{"robots":"","author":"","rights":"","xreference":""}', 0, '*', ''),
> EOL
index.php?option=com_content&view=article&id=118&Itemid=112
images/stories/auto_sensor.jpg
index.php?option=com_content&view=article&id=120&Itemid=114
images/stories/auto_bildver.jpg
index.php?option=com_content&view=article&id=178&Itemid=179
images/stories//auto_sicher.jpg
index.php?option=com_content&view=article&id=209&Itemid=275
images/stories/omron/auto_kontroll.jpg LG, track
|