Commit 5ba9d889 authored by Nicolas Peifer's avatar Nicolas Peifer

readme updated

parent 49dfbd33
GutenbergUtils is a command-line tool which extracts book information from [Project Gutenberg](http://www.gutenberg.org/) and generates static HTML pages which list the available books in a clear way. The HTML pages link to a mirror where you can download the books.
- Example: [Bücherbucht.de](http://www.bücherbucht.de)
- Tested on Java 8 and 11 JVM
GutenbergUtils is a command-line tool which extracts book information from [Project Gutenberg](http://www.gutenberg.org/) and generates static HTML pages which list the available books in a clear way. The HTML pages link to a mirror where you can download the books. Example: [Bücherbucht.de](http://www.bücherbucht.de/A.html)
## Command-line interface
**--parse-rdf RDF_DIR DATABASE_FILE**
......@@ -10,11 +7,11 @@ Parses all RDF files in the given directory and its subdirectories and stores th
**--create-html TEMPLATE_DIR DATABASE_FILE OUTPUT_DIR [LANGUAGE]**
Creates HTML pages which list the books alphabetically by author (including an index navigation bar). TEMPLATE_DIR contains HTML skeletons. The DATABASE_FILE contains all book information and serves as input source (**Note**: Do NOT specify the file extension .mv.db). The resulting HTML pages are stored in the folder OUTPUT_DIR. If a LANGUAGE (e. g. 'en', 'de') is specified, the set of books is reduced to the given language. **Note**: Please use absolute paths.
Furthermore, in order to elude copyright issues, the books are filtered automatically according to EU copyright laws so that the resulting HTML pages do NOT contain any copyrighted books.
Furthermore, in order to elude copyright issues, the books are filtered automatically according to EU copyright rules (70 years p. m. a.) so that the resulting HTML pages should NOT contain any copyrighted books.
## How to use GutenbergUtils
### Prerequisites
1. Install Java (e. g. Java 8 or 11 will work).
1. Install Java (e. g. Java 8 or higher should work).
2. Download the archive which contains one RDF/XML file for each book that is listed on Project Gutenberg. Go to [The Complete Project Gutenberg Catalog](https://www.gutenberg.org/wiki/Gutenberg:Feeds#The_Complete_Project_Gutenberg_Catalog) and choose one file (BZip2 or Zip). The BZip2 file is approx. 45 MB huge. It's probably a good idea to download the file from one of the mirrors when your IP is blocked, e. g. [http://gutenberg.readingroo.ms/cache/generated/feeds/rdf-files.tar.bz2](http://gutenberg.readingroo.ms/cache/generated/feeds/rdf-files.tar.bz2).
3. Extract the downloaded archive, e. g. with **tar xf rdf-files.tar.bz2**. **Note**: If you try to open the folder with the extracted RDF/XML files your file manager will probably hang or crash because the directory contains a lot of files.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment