CERN Accelerating science

Create knowledgebase files (bibrankgkb)


   1. Read default configuration file or the one specified by the user

   2. Read each create_ line from the cfg file, for each line, read the
      source(s) from either database, file or www by calling get_from_source().
      Convert between naming conventions if source for conversion data is given.

   3. Merge into one file, repeat 2. until last source is read.

   4. Save file if requested with --output

   Configuration:

   How to spesify a source:
       -create_x = filter, source
   Where x is a number from 0 and up. The source and the filter is read,
   and each line in the source is checked against the filter to be converted
   into the correct naming standard. If no filter is given, the source is
   directly translated into a .kb file.

   Read filter from:

   File:
     [bibrankgkb]
     #give path to file containing lines like: COLLOID SURFACE A---Colloids Surf., A
     kb_1_filter = /bibrank/bibrankgkb_jif_conv.kb
     #replace filter with the line below (switch kb_1_filter with the variable names you used)
     create_0 = file,,%(kb_1_filter)s

   Read source from:

   Database:
     [bibrankgkb]
     #Specify sql statements
     kb_2 = SELECT id_bibrec,value FROM bib93x,bibrec_bib93x WHERE tag='938__f' AND id_bibxxx=id
     kb_3 = SELECT id_bibrec,value FROM bib21x,bibrec_bib21x WHERE tag='210__a' AND id_bibxxx=id
     #replace source with the line below (switch kb_2 and kb_3 with the variable names you used)
     db,,%(kb_2)s,,%(kb_3)s

   File:
     [bibrankgkb]
     #give path to file containing lines like: COLLOID SURFACE A---1.98
     kb_1 = /bibrank/bibrankgkb_jif_example.kb
     #replace source with the line below (switch kb_1 with the variable names you used)
     create_0 = file,,%(kb_1)s

   Internet:
     [bibrankgkb]
     #specify the urls to the file containing JIF data
     url_0 = http://www.sciencegateway.org/impact/if03a.htm
     url_1 = http://www.sciencegateway.org/impact/if03bc.htm
     url_2 = http://www.sciencegateway.org/impact/if03df.htm
     url_3 = http://www.sciencegateway.org/impact/if03gi.htm
     url_4 = http://www.sciencegateway.org/impact/if03j.htm
     url_5 = http://www.sciencegateway.org/impact/if03ko.htm
     url_6 = http://www.sciencegateway.org/impact/if03pr.htm
     url_7 = http://www.sciencegateway.org/impact/if03sz.htm
     #give the regular expression necessary to extract the key and value from the file

     url_regexp = (TR bgColor=\#ffffff>\s*?\n\s*?(?P.*?)\s*?\n\s*?.*?\s*?\n\s*?.*?\s*?\n\s*?\s*?\n\s*?(?P[\w|,]+))

     #replace source with the line below (switch kb_4 and url_regexp with the variable names you used)
     create_0 = www,,%(kb_4)s,,%(url_regexp)s