Parser - Regex to result
complete
Cahyo
complete
Cahyo
Added a new parser "Regex match keep" to keep content form an extraction. For example, keep only the first or all emails from a text paragraph.
Cahyo
The idea of the MrScraper is that with the extractors you already retrieve only the info you wanted, and with parses you can clean or modify each property.
Can you give an example scenario where you want to select an element and discard it later in the parsing phase? I may be able to add what you need.
Thanks
H
Holger
Yes,
i need:
"ratingValue": 7.6,
"ratingCount": 3,
<!DOCTYPE html>
<html lang="......ttp://www.w3.org/1999/xhtml">
<head id="head">
<title> ....
<link rel= ...
<link rel= ...
<link rel= ...
<meta name=
<meta name=
<script type="application/ld+json">
[ {
"@context": "https://www.schema.org",
"@type": "Product",
....
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": 7.6,
"bestRating": 10,
"ratingCount": 3,
},
...}]
</script>
</head>
<body
my idea was:
- select head tag per xpath-selector,
maybe head->script tag (but with no specific markers this is not stable enought, if further script tags will be added)
- grep the full "aggregateRating": { } json part per regex parser
- returns this as result and extract the needed values in my script
- or extract the values with two regex parsers and return only the numeric values
I have no experience if I can extract the values with crazy xpath terms,
I thought the parsers are supposed to do the fine tuning on the result
thanks for help :-)
(sorry for the formatting, tried hard, but the form added unintended newlines)