Página principal > Admin Area > Admin HOWTOs > HOWTO Manage Fulltext Files |
Usage: /opt/invenio/bin/bibdocfile [options] Options: --version show program's version number and exit -h, --help show this help message and exit -D, --debug -H, --human-readable print sizes in human readable format (e.g., 1KB 234MB 2GB) Query options: -r RECIDS, --recids=RECIDS matches records by recids, e.g.: --recids=1-3,5-7 -d DOCIDS, --docids=DOCIDS matches documents by docids, e.g.: --docids=1-3,5-7 -a, --all Select all the records --with-deleted-recs=yes/no/only 'Yes' to also match deleted records, 'no' to exclude them, 'only' to match only deleted ones --with-deleted-docs=yes/no/only 'Yes' to also match deleted documents, 'no' to exclude them, 'only' to match only deleted ones (e.g. for undeletion) --with-empty-recs=yes/no/only 'Yes' to also match records without attached documents, 'no' to exclude them, 'only' to consider only such records (e.g. for statistics) --with-empty-docs=yes/no/only 'Yes' to also match documents without attached files, 'no' to exclude them, 'only' to consider only such documents (e.g. for sanity checking) --with-record-modification-date=date1,date2 matches records modified date1 and date2; dates can be expressed relatively, e.g.:"-5m,2030-2-23 04:40" # matches records modified since 5 minutes ago until the 2030... --with-record-creation-date=date1,date2 matches records created between date1 and date2; dates can be expressed relatively --with-document-modification-date=date1,date2 matches documents modified between date1 and date2; dates can be expressed relatively --with-document-creation-date=date1,date2 matches documents created between date1 and date2; dates can be expressed relatively --url=URL matches the document referred by the URL, e.g. "http:/ /pcsk.cern.ch/record/1/files/foobar.pdf?version=2" --path=PATH matches the document referred by the internal filesystem path, e.g. /opt/cds- dev/var/data/files/g0/1/foobar.pdf\;1 --with-docname=DOCNAME matches documents with the given docname (accept wildcards) --with-doctype=DOCTYPE matches documents with the given doctype -p PATTERN, --pattern=PATTERN matches records by pattern -c COLLECTION, --collection=COLLECTION matches records by collection --force force an action even when it's not necessary e.g. textify on an already textified bibdoc. Actions for getting information: --get-info print all the informations about the matched record/documents --get-disk-usage print disk usage statistics of the matched documents --get-history print the matched documents history Actions for setting information: --set-doctype=doctype specify the new doctype --set-description=description specify a description --set-comment=comment specify a comment --set-restriction=restriction specify a restriction tag --set-docname=docname specifies a new docname for renaming --unset-comment remove any comment --unset-descriptions remove any description --unset-restrictions remove any restriction --hide hides matched documents and revisions --unhide hides matched documents and revisions Action for revising content: --append=PATH/URL specify the URL/path of the file that will appended to the bibdoc --revise=PATH/URL specify the URL/path of the file that will revise the bibdoc --revert reverts a document to the specified version --delete soft-delete the matched documents (applies to all revisions and formats) --hard-delete hard-delete the matched documents (applies to matched revisions and formats) --undelete undelete previosuly soft-deleted documents (applies to all revisions and formats) --purge purge (i.e. hard-delete previous versions) the matched documents --expunge expunge (i.e. hard-delete any version and formats) the matched documents --with-versions=VERSION specifies the version(s) to be used with hard-delete, hide, revert, e.g.: 1-2,3 or all --with-format=FORMAT to specify a format when appending/revising/deleting/reverting a document, e.g. "pdf" --with-hide-previous when revising, hides previous versions Actions for housekeeping: --check-md5 check md5 checksum validity of files --check-format check if any format-related inconsistences exists --check-duplicate-docnames check for duplicate docnames associated with the same record --update-md5 update md5 checksum of files --fix-all fix inconsistences in filesystem vs database vs MARC --fix-marc synchronize MARC after filesystem/database --fix-format fix format related inconsistences --fix-duplicate-docnames fix duplicate docnames associated with the same record Experimental options (do not expect to find them in the next release): --set-icon=URL/PATH attache the specified icon to the matched documents --unset-icon remove any icon on the matched documents --textify extract text from matched documents and store it for later indexing --with-ocr=yes/no/always when used with --textify, wether to perform OCR (yes will perform it only if necessary, based on an heuristic) Examples: $ bibdocfile --append foo.tar.gz --recid=1 $ bibdocfile --revise http://foo.com?search=123 --with-docname='sam' --format=pdf --recid=3 --set-docname='pippo' # revise for record 3 $ bibdocfile --delete *sam --all # delete all documents starting ending $ bibdocfile --undelete -c "Test Collection" # undelete documents for $ bibdocfile --get-info --recids=1-4,6-8 # obtain informations $ bibdocfile -r 1 --with-docname=foo --set-docname=bar # Rename a document
Usage: python /opt/invenio/lib/python/invenio/websubmit_file_converter.py [options] Options: -h, --help show this help message and exit -c FILE, --convert=FILE convert the specified FILE -d, --debug Enable debug information --special-pdf2hocr2pdf=FILE convert the given scanned PDF into a PDF with OCRed text -f FORMAT, --format=FORMAT the desired output format -o OUTPUT_NAME, --output=OUTPUT_NAME the desired output FILE (if not specified a new file will be generated with the desired output format) --without-pdfa don't force creation of PDF/A PDFs --without-pdfopt don't force optimization of PDFs files --without-ocr don't force OCR --can-convert=FORMAT display all the possible format that is possible to generate from the given format --is-ocr-needed=FILE check if OCR is needed for the FILE specified -t TITLE, --title=TITLE specify the title (used when creating PDFs) -l LN, --language=LN specify the language (used when performing OCR, e.g. en, it, fr...) Examples: python /opt/invenio/lib/python/invenio/websubmit_file_converter.py \ --convert=foo.docx -f pdf python /opt/invenio/lib/python/invenio/websubmit_file_converter.py \ --special-pdf2hocr2pdf=scanned-foo.pdf --output=ocred-foo.pdf
This module is used internally by Invenio to convert from one format to another
It can also be invoked manually by a system administrator to obtain the same level of conversion quality offered by Invenio.
Usage: python /opt/invenio/lib/python/invenio/websubmit_icon_creator.py \ [options] input-file.jpg websubmit_icon_creator.py is used to create an icon for an image. Options: -h, --help Print this help. -V, --version Print version information. -v, --verbose=LEVEL Verbose level (0=min, 1=default, 9=max). [NOT IMPLEMENTED] -s, --icon-scale Scaling information for the icon that is to be created. Must be an integer. Defaults to 180. -m, --multipage-icon A flag to indicate that the icon should consist of multiple pages. Will only be respected if the requested icon type is GIF and the input file is a PS or PDF consisting of several pages. -d, --multipage-icon-delay=VAL If the icon consists of several pages and is an animated GIF, a delay between frames can be specified. Must be an integer. Defaults to 100. -f, --icon-file-format=FORMAT The file format of the icon to be created. Must be one of: [pdf, gif, jpg, jpeg, ps, png, bmp] Defaults to gif. -o, --icon-name=XYZ The optional name to be given to the created icon file. If this is omitted, the icon file will be given the same name as the input file, but will be prefixed by "icon-"; Examples: python /opt/invenio/lib/python/invenio/websubmit_icon_creator.py \ --icon-scale=200 \ --icon-name=test-icon \ --icon-file-format=jpg \ test-image.jpg python /opt/invenio/lib/python/invenio/websubmit_icon_creator.py \ --icon-scale=200 \ --icon-name=test-icon2 \ --icon-file-format=gif \ --multipage-icon \ --multipage-icon-delay=50 \ test-image2.pdf
Usage: python /opt/invenio/lib/python/invenio/websubmit_file_stamper.py \ [options] input-file.pdf websubmit_file_stamper.py is used to add a "stamp" to a PDF file. A LaTeX template is used to create the stamp and this stamp is then concatenated with the original PDF file. The stamp can take the form of either a separate "cover page" that is appended to the document; or a "mark" that is applied somewhere either on the document's first page or on all of its pages. Options: -h, --help Print this help. -V, --version Print version information. -v, --verbose=LEVEL Verbose level (0=min, 1=default, 9=max). [NOT IMPLEMENTED] -t, --latex-template=PATH Path to the LaTeX template file that should be used for the creation of the PDF stamp. (Note, if it's just a basename, it will be sought first in the current working directory, and then in the invenio file-stamper templates directory; If there is a qualifying path to the template name, it will be sought only in that location); -c, --latex-template-var='VARNAME=VALUE' A variable that should be replaced in the LaTeX template file with its corresponding value. Of the following format: VARNAME=VALUE This option is repeatable - one for each template variable; -s, --stamp=STAMP-TYPE The type of stamp to be applied to the subject file. Must be one of 3 values: + "first" - stamp only the first page; + "all" - stamp all pages; + "coverpage" - add a cover page to the document; The default value is "first"; -o, --output-file=XYZ The optional name to be given to the finished (stamped) file. If this is omitted, the stamped file will be given the same name as the input file, but will be prefixed by "stamped-"; Example: python /opt/invenio/lib/python/invenio/websubmit_file_stamper.py \ --latex-template=demo-stamp-left.tex \ --latex-template-var='REPORTNUMBER=TEST-THESIS-2008-019' \ --latex-template-var='DATE=27/02/2008' \ --stamp='first' \ --output-file=testfile_stamped.pdf \ testfile.pdf