Changelog

Version 1.2.0 (2021-09-07)

  • Supported platforms: macOS 11, Windows 10, Linux (.deb, tested on Ubuntu 20.x, Debian GNU Linux 10, Linux Mint 20)
  • New:
    • added the possibility to execute multi-URL scrapes using the same scraping recipe, with all the data written into one common output file
    • in Scraping recipe, added (optional) parameter Regex to input regular expressions to be used with commands returning textual output
    • added option to save output as XLSX only (please note that while scraping, CSV is still saved as a backup and only removed after XLSX is saved at the end)
    • added command ‘reload/refresh the original URL’; this may be useful, e.g., in combination with command click that may change the site’s content
    • added command ‘current URL’; this may be useful, e.g., for multi-URL scrapes where a column containing current URL may help to distinguish which scraped data are related to which URL
    • added commands ‘page source: extract all the data matching regex’ and ‘element’s HTML: extract all the data matching regex’. If no regex is specified, these commands will extract and output the whole page source or the whole element’s HTML, respectively; therefore, in most cases, it is desirable to specify a regex to only extract the relevant content
    • added ‘Stop scraper’ to the menu and the tray menu; especially useful in case of intensive scrapes that open many new tabs in rapid succession, bringing the browser in front of the OsiScraper’s main window
  • Changed:
    • enhanced the code to limit occurrence of stale main elements while scraping
  • Fix / macOS:
    • dealt with a Mac bug causing slow start on some Macs under some circumstances

Version 1.1.0 (2021-07-24)

  • Supported platforms: macOS 11, Windows 10, Linux (.deb, tested on Ubuntu 20.x, Debian GNU Linux 10, Linux Mint 20)
  • New:
    • added (optional) output in XLSX format along with the default CSV
    • added command ‘download base64-encoded image from attribute {arg}’ (for image data included directly in an attribute instead of in a file; downloads the data and saves it as PNG)
    • added command ‘download file from the link given in {arg}, filename incl. URL + .jpg’ (for image files without a suffix, or with a complex or too long URL; adds suffix .jpg to the filename)
    • added command ‘take screenshot of the element’
    • added command ‘scroll the element into view, align to the bottom’; while the already existing command ‘scroll the element into view’ aligns the upper edge of the element to the upper edge of the scrollable ancestor (which can be partially hidden behind a header), this new command aligns the lower edge of the element to the bottom; this may be useful, e.g., in combination with hover to extract more information that would otherwise be inaccessible
    • added command ‘scroll down by {arg} pixels’; scrolls down for positive values / up for negative values of the argument
    • added command ‘hover away from the element’; this is a complement to ‘hover over the element’. Example: ‘hover over the element’ can be used to show extra information in a popup window, followed by ‘hover away from the element’ to let the popup disappear to make other elements accessible.
    • command ‘does the element have class {arg}?’ now also works with ” (empty string / no argument given), meaning ‘does the element have no classes at all?’
    • for load-more-buttons (i.e., load-more button, next-page button), added another type of click to deal with JavaScript-driven buttons (automatically recognised)
  • Changed:
    • restore the scrape recipe after each scrape (reverts any automatically made changes due to scraping both text and URL for some elements to make it possible to repeat the scrape while changing your preferences to scrape text / URL / both; only relevant for elements with no specified command)
    • while scraping, the elements are styled after being scraped successfully
    • in About (Ctrl+Shift+A), added link to Acknowledgements
  • Fix:
    • caught exception when missing main element; now shows a warning

Version 1.0.1 (2021-06-26)

  • Supported platforms: macOS 11, Windows 10, Linux (.deb, tested on Ubuntu 20.x, Debian GNU Linux 10, Linux Mint 20)
  • New:
    • added menu item File -> Open Logfile… for quick access to logfiles
    • while scraping, the main elements that contain no value to scrape or their content just duplicates a previous one, are visualised differently; this helps to detect sub-optimally set main-element selectors
    • added checkbox Don’t scroll: an option to not scroll at all while scraping (except of any scrolls explicitly given in the recipes). This option is mutually exclusive with both Scroll bottom-to-top and Scroll to the end before scraping
    • (experimental) added command pair ‘open the link given in attribute {arg} in a new tab‘ / ‘close tab
    • added command ‘hover over the element‘ to be able to extract more data
    • added keyboard shortcuts Ctrl+Q to Quit, Ctrl+Shift+A for OsiScraper -> About OsiScraper
  • Changed:
    • suppressed empty rows in the output file (e.g., in case of a too broad definition of the main element)
    • for Scroll to the end before scraping, the default value = True is now used also when opening saved scrapes that do not specify this value
  • Fix:
    • corrected default command value in the before-scrape recipe (this bug also resulted in empty saved scrapes when using that default value)
    • corrected visibility of input fields for load-more-content after Open saved scrape (in case of no load-more/next-page button, the visibility of the respective input fields got mixed)
    • corrected behaviour after scraping has finished while having an extra window open (if scraping has finished while main window not active, the event got lost and OsiScraper could not finish that scrape properly)
    • corrected bug where ‘scroll_first=false‘ was ignored when reading a saved scrape
    • changed keyboard shortcut for Visit Community from Ctrl+C (reserved for Copy) to Ctrl+Y
    • suppressed duplicate rows in the output file
  • macOS:
    • menu text color for light appearance is now darker to improve readability
  • Windows:
    • added the OsiScraper icon to the taskbar
  • Windows/Linux:
    • now it is possible to switch to OsiScraper using Alt-Tab

Version 1.0.0 (2021-06-07) – First public release

  • Supported platforms: macOS 11, Windows 10, Linux (.deb, tested on Ubuntu 20.x, Debian GNU Linux 10)