Adding the text extraction of the PDF files found in data
This commit is contained in:
parent
641f85f17c
commit
583c4aac0b
4
Makefile
4
Makefile
@ -1,2 +1,6 @@
|
||||
help:
|
||||
@cat doc/help.txt
|
||||
|
||||
pdftotext:
|
||||
@find ./data -iname '*.pdf' -execdir pdftotext {} \;
|
||||
@find ./data -not \( -path ./data/text -prune \) -iname '*.txt' -exec mv {} './data/text/' ';'
|
||||
|
||||
@ -1,2 +1,6 @@
|
||||
With this command, you'll be able to manage easily the extraction of
|
||||
URLs from books scanned by KBR
|
||||
|
||||
Here is the list of commands and what there are doing:
|
||||
* make pdftotext: this command extract a text version of the PDF files
|
||||
and copy these files to the data/text/ directory
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user