CERN Accelerating science

Websubmit_file_metadata APIs and Plugin Development

Contents

The websubmit_file_metadata library enables extraction and update of file metadata.
It can be called from Python sources or run from the command line.
The library can be extended to support various formats thanks to plugins (which must be dropped in /opt/invenio/lib/python/invenio/websubmit_file_metadata_plugins/ directory).

1. APIs

Two main functions can be imported from websubmit_file_metadata:

def read_metadata(inputfile, force=None, remote=False, loginpw=None, verbose=0):
Returns metadata extracted from given file as dictionary.

Availability depends on input file format and installed plugins
(return TypeError if unsupported file format).

Parameters:

        * inputfile (string) - path to a file
        * force (string) - name of plugin to use, to skip plugin auto-discovery
        * remote (boolean) - if the file is accessed remotely or not
        * loginpw (string) - credentials to access secure servers (username:password)
        * verbose (int) - verbosity

Returns: dict
    dictionary of metadata tags as keys, and (interpreted) value as value

Raises:
        * TypeError - if file format is not supported.
        * RuntimeError - if required library to process file is missing.
        * InvenioWebSubmitFileMetadataRuntimeError - when metadata cannot be read.
def write_metadata(inputfile, outputfile, metadata_dictionary, force=None, verbose=0):
Writes metadata to given file.

Availability depends on input file format and installed plugins
(return TypeError if unsupported file format).

Parameters:

        * inputfile (string) - path to a file
        * outputfile (string) - path to the resulting file.
        * metadata_dictionary (dict) - keys and values of metadata to update.
        * force (string) - name of plugin to use, to skip plugin auto-discovery
        * verbose (int) - verbosity

Returns: string
    output of the plugin

Raises:
        * TypeError - if file format is not supported.
        * RuntimeError - if required library to process file is missing.
        * InvenioWebSubmitFileMetadataRuntimeError - when metadata cannot be updated.

2. Plugin development

You can develop new plugins to extend the compatibility of the library with additional file formats.

2.1 Specifications

Your plugin name must start with "wsm_" and end with ".py". For eg. wsm_myplugin.py.
Once ready, it must be dropped into /opt/invenio/lib/python/invenio/websubmit_file_metadata_plugins/ directory.
Your plugin can define the following interface:

The functions can_read_local(..), can_read_remote(..), and can_write_local(..) are called at runtime by the library on all installed plugin to check which one can process the given file for the given action. If one of these functions return true, your plugin will be selected to process the file. You can omit one or several of these functions (for eg. if you don't support reading from remote server, simply omit can_read_remote(..)).

If your plugin returned True for a given action, the corresponding function read_metadata_local(..), read_metadata_remote(..) or write_metadata_local(..) is then called. You must therefore implement the corresponding function (for eg. if you return True for some file with can_write_local(..), then you must implement write_metadata_local(..)).

Your plugin code should also define the __required_plugin_API_version__variable, to define the interface version your plugin is compatible with. For eg. set __required_plugin_API_version__ = "WebSubmit File Metadata Plugin API 1.0"

def can_read_local(inputfile):
Returns True if file can be processed by this plugin.

Parameters:

        * inputfile (string) -  path to a file to read metadata from

Returns: boolean
    True if file can be processed
def can_read_remote(inputfile):
Returns True if file at remote location can be processed by this plugin.

Parameters:

        * inputfile (string) -  URL to a file to read metadata from

Returns: boolean
    True if file can be processed
def can_write_local(inputfile):
Returns True if file can be processed by this plugin for writing.

Parameters:

        * inputfile (string) -  path to a file to update metadata

Returns: boolean
    True if file can be processed
def read_metadata_local(inputfile, verbose):
Returns a dictionary of metadata read from inputfile.

Parameters:

        * inputfile (string) - path to file to read from
        * verbose (int) - verbosity

Returns: dict
    dictionary with metadata
def read_metadata_remote(inputfile, verbose):
Returns a dictionary of metadata read from remote inputfile.

Parameters:

        * inputfile (string) - URL to file to read from
        * verbose (int) - verbosity

Returns: dict
    dictionary with metadata
write_metadata_local(inputfile, verbose):
Update metadata of given inputfile.

Parameters:

        * inputfile (string) - path to file to update
        * verbose (int) - verbosity

Returns: dict
    dictionary with metadata

2.2 Dependencies on External Libraries

If your plugin depends on some other external library, you should check that this library is installed at load time (that is in the main scope of the plugin). If the library is missing, it should raise an ImportError exception. For example:

"""
WebSubmit Metadata Plugin - My custom plugin

Dependencies: extractor
"""

__plugin_version__ = "WebSubmit File Metadata Plugin API 1.0"

import extractor

def can_read_local(inputfile):
[...]
The import extractor will generate such ImportError exception if extractor is missing.

2.3 Conflict With Other Plugins

If your plugin can read the same file type as other installed plugins, the system will combine the information returned by all compatible plugins in a single dictionary, so that there is no conflict.
The behaviour is different when writing to a file: in that case the first library found is used to update the metadata of a file. There is no way for the developer to prioritize libraries. Only the user can specify the --force option to select a given library.