Package: sparkwarc 0.1.6

Edgar Ruiz

sparkwarc: Load WARC Files into Apache Spark

Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <http://commoncrawl.org/>.

Authors:Javier Luraschi [aut], Yitao Li [aut], Edgar Ruiz [aut, cre]

sparkwarc_0.1.6.tar.gz
sparkwarc_0.1.6.zip(r-4.5)sparkwarc_0.1.6.zip(r-4.4)sparkwarc_0.1.6.zip(r-4.3)
sparkwarc_0.1.6.tgz(r-4.4-x86_64)sparkwarc_0.1.6.tgz(r-4.4-arm64)sparkwarc_0.1.6.tgz(r-4.3-x86_64)sparkwarc_0.1.6.tgz(r-4.3-arm64)
sparkwarc_0.1.6.tar.gz(r-4.5-noble)sparkwarc_0.1.6.tar.gz(r-4.4-noble)
sparkwarc_0.1.6.tgz(r-4.4-emscripten)sparkwarc_0.1.6.tgz(r-4.3-emscripten)
sparkwarc.pdf |sparkwarc.html
sparkwarc/json (API)

# Install 'sparkwarc' in R:
install.packages('sparkwarc', repos = c('https://r-spark.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/r-spark/sparkwarc/issues

Uses libs:
  • zlib– Compression library
  • c++– GNU Standard C++ Library v3

On CRAN:

3.89 score 13 stars 12 scripts 162 downloads 6 exports 40 dependencies

Last updated 3 years agofrom:c3e8975ad7. Checks:OK: 1 NOTE: 8. Indexed: yes.

TargetResultDate
Doc / VignettesOKNov 12 2024
R-4.5-win-x86_64NOTENov 12 2024
R-4.5-linux-x86_64NOTENov 12 2024
R-4.4-win-x86_64NOTENov 12 2024
R-4.4-mac-x86_64NOTENov 12 2024
R-4.4-mac-aarch64NOTENov 12 2024
R-4.3-win-x86_64NOTENov 12 2024
R-4.3-mac-x86_64NOTENov 12 2024
R-4.3-mac-aarch64NOTENov 12 2024

Exports:cc_warcrcpp_read_warc_samplespark_rcpp_read_warcspark_read_warcspark_read_warc_samplespark_warc_sample_path

Dependencies:askpassblobclicodetoolsconfigcpp11curlDBIdbplyrdplyrfansigenericsglobalsgluehttrjsonlitelifecyclemagrittrmimeopensslpillarpkgconfigpurrrR6Rcpprlangrstudioapisparklyrstringistringrsystibbletidyrtidyselectutf8uuidvctrswithrxml2yaml