CERN Accelerating science

Directory Organization

Please find some notes below on how the source (as well as the target)
directory structure is organized, where the sources get installed to,
and how the visible URLs are organized.

1. Invenio will generally install into the directory taken from
   --with-prefix configuration variables.  These are discussed in points
   2 and 3 below, respectively.

2. The first directory (--with-prefix) specifies general Invenio
   install directory, where we'll put CLI binaries, Python and PHP
   libraries, manpages, log and cache directories for the running
   installation, and any other dirs as needed.  They will all live
   under one common hood.

   For example, configure --with-prefix=/opt/invenio,
   and you'll obtain the following principal directories:

      /opt/invenio/
      /opt/invenio/bin
      /opt/invenio/lib
      /opt/invenio/lib/python
      /opt/invenio/lib/wml
      /opt/invenio/var
      /opt/invenio/var/cache
      /opt/invenio/var/log

   with the obvious meaning:

      - bin : for command-line executable binaries and scripts

      - lib/python : for our own Python libraries, see below

      - lib/wml : for our own WML libraries, see below

      - var : for installation-specific runtime stuff

      - var/log : for all sorts of runtime logging, e.g. search.log

      - var/cache : for all sorts of runtime caching, e.g. OAI
                    retention harvesting, collection cache, etc

   This scheme copies to some extent the usual Unix filesystem
   convention, so it may be easily expanded later according to our
   future needs.

3. The second directory (prefix/var/www) contains Web scripts (PHP,
   mod_python), HTML documents and images, and so on.  This is where
   webuser-seen files are located.  Basically, the files there contain
   only the interface to the functionality that is provided by the
   libraries stored under the library directory.

   The prefix/var/www directory is further structured according to
   whom it provides services.  We distinguish user-level, admin-level
   and hacker-level access to the site, as reflected by the visible
   URL structure.

     a) The user-level access point is provided by the main CFG_SITE_URL
        address and its subdirs.  All the user-level documentation is
        available under CFG_SITE_URL/help/.  The module-specific user-level
        documentation is available under CFG_SITE_URL/help/<module>/.

     b) The admin-level access is provided by CFG_SITE_URL/admin/ entry
        point.  The admin-level documentation is accessible from the
        same place.  The admin-level module-specific functionality and
        help is available under CFG_SITE_URL/admin/<module>/.  (If
        it's written in mod_python, it usually points to
        CFG_SITE_URL/<module>admin.py/ since we configure the server
        to have all mod_python scripts under the prefix/var/www root
        directory.)

     c) The hacker-level documentation is provided by CFG_SITE_URL/hacking/
        entry point.  There is no hacker-level functionality possible
        via Web, of course, so that unlike admin-level entry point,
        the hacker-level entry point provides only a common access to
        available hacking documention, etc.  The module-specific
        information is available under CFG_SITE_URL/hacking/<module>/.

4. Let's now return a bit more closely to the role Python
   library directories outside of the Apache tree:

      /opt/invenio/lib/python

   Here we put not only (a) libraries that may be reused across CDS
   Invenio modules, but also (b) all the "core" functionality of CDS
   Invenio that is not directly callable by the end users.  The
   "callable" functionality is put under "prefix/var/www" in case of
   web scripts and documents, and under "bindir" in case of CLI
   executables.

   As for (a), for example in the PHP Invenio library you'll find
   currently the common PHP error handling code that is shared between
   BibFormat and WebSubmit; in the Python Invenio library (in fact,
   Invenio Pythonic 'module', but we are reserving the word 'module'
   to denote 'Invenio module' in this text) you'll find config.py
   containing WML-supplied site parameters, dbquery.py containing DB
   persistent query module, or webpage.py with templates and functions
   to produce mod_python web pages with common look and feel.  These
   could and should be reused across all our modules.  Note that I
   created only a small number of "broad" libraries at the moment.  In
   case we want to reuse more code parts, we'd refactor the code more,
   as needed.

   As for (b), for example the existing search engine was split into
   search.py that only contains three "callable" functions, which goes
   into prefix/var/www, while the search engine itself is composed of
   search_engine.py and search_engine_config.py living under LIBDIR.
   In this way we can easily create "real" CLI search, that will
   depend only on the search libraries in LIBDIR, and that will get
   installed into BINDIR.

   To recap:

      - For each Invenio module, I'm differentiating between
        "callable" and "core" parts.  The former go into
        prefix/var/www or BINDIR, the latter into LIBDIR.

      - Our PHP/Pythonic libraries contain several sorts of thing:

          - the implementation of the "callable" functions

          - non-callable internal "core" or "library" code parts, as
            stated above.  Not shared across Invenio modules.

          - utility code meant for reuse across Invenio modules, such
            as dbquery.py

          - Pythonic config files out of user-supplied WML (non-MySQL)
            configuration parameters (see
            e.g. search_engine_config.py)

5. The same strategy is reflected in the organization of source
   directories inside Invenio CVS.  Each Invenio module lives in a
   separate directory located under "modules" directory of the
   sources.  Further on, each module contains usually several
   subdirectories that reflect the above-mentioned packaging choice.
   For example, in case of WebSearch you'll find:

      ./modules/websearch
      ./modules/websearch/bin
      ./modules/websearch/doc
      ./modules/websearch/doc/hacking
      ./modules/websearch/doc/admin
      ./modules/websearch/lib
      ./modules/websearch/web
      ./modules/websearch/web/admin

   with the following straightforward meaning:

      - bin : for callable CLI binaries and scripts

      - doc : for documentation.  The user-level documentation is
              located in this directory.  The admin-level
              documentation is located in the "admin" subdir.  The
              programmer-level documentation is located in the
              "hacking" subdir.

      - lib : for uncallable "core" functionality, see the comments
              above

      - web : for callable web scripts and pages.  The user- and
              admin- level is separated similarly as in the "doc"
              directory (see above).

   The structure is respected throughout all the Invenio modules, a
   notable exception being the MiscUtil module that contains subdirs
   like "sql" (for the table creating/dropping SQL commands, etc) or
   "demo" (for creation of Atlantis Institute of Science, our demo
   site.)

- end of file -