Browse Source

Remove COMTRADE references

stephenjboyle 1 year ago
parent
commit
a2d6ab5323

+ 6 - 3
DEVELOPING.md

@@ -1,8 +1,11 @@
-# Converting the data
+# Converting new data
+
+To add a new file {filename}.csv for conversion, copy the file to the `data` folder, create a file {filename}\_defs.dlog containing appropriate meta-data (specifying the prefixes to use in the converted RDF data) and modify `dodo.py` as appropriate.
+ 
+If the data format is different to that of the examples, a new file load\_data\_{type}.rdfox will need to be created to specify the mapping between the columns of the input csv file and the columns of the RDFOX tuple table used to store input data. A file map_{type}.dlog will also need to be created to specify conversion rules. These files should be copied to the ```scripts``` folder.
 
 To convert the data run `doit run convert_data`.
 
 The results will be in the `outputs/` folder.
 
-To test the expected values are present, run `pytest`.
-
+To test the expected values are present, run `pytest`.

+ 22 - 15
README.md

@@ -6,22 +6,20 @@ See [DEVELOPING.md](DEVELOPING.md) for more information about using this reposit
 
 ## Dataset structure
 
+- Repository is a datalad dataset
 - Input data files needing preprocessing are located in `raw_data/`.
 - Preprocessed data files ready for conversion are located in `data/`. 
 - All custom code is located in `scripts/`.
 - Converted data is saved to `outputs/`.
 
-## Converted data
-
-TODO: add link to converted data
 ## Installation
 
 ### Getting the code
 
-To clone the git repository, in a shell/command window (e.g. git-bash) type:
+To clone the datalad dataset, in a shell/command window (e.g. git-bash) type:
 
 ```shell
-git clone https://github.com/probs-lab/prodcom-data.git
+datalad clone https://github.com/probs-lab/prodcom-data.git
 ```
 ### Setting up the virtual environment and installing dependencies:
 
@@ -34,16 +32,28 @@ conda env create
 
 ## Running the code
 
+After installation:
+
+- Open a terminal / git-bash window
+- Navigate to ```prodcom-data``` folder, e.g. ```cd prodcom-data```
+- Activate environment using ```conda activate prodcom-data```
+
+To download the example output data files from the server use:
+
+```shell
+datalad get outputs
+```
+
 To preprocess input data files run the script:
 
 ```shell
-doit preprocess
+doit run preprocess
 ```
 
 To convert the preprocessed data in the `data` folder run:
 
 ```shell
-doit convert_data
+doit run convert_data
 ```
 
 To run all necessary tasks (i.e. preprocessing and conversion) simply run:
@@ -55,18 +65,14 @@ doit
 Individual files can be converted by running the `convert_data.py` script with appropriate parameters specifying the file type and the input and output filenames:
 
 ```shell
-scripts/convert_data.py prodcom data/PRODCOM2017DATA.csv outputs/PRODCOM2017DATA.nt.gz
+scripts/convert_data.py prodcom data/PRODCOM2016DATA.csv outputs/PRODCOM2016DATA.nt.gz
 ```
 
-Valid file types are:
-- prodcom
-- comtrade
-- prodcom_list
-- prodcom_correspondence
-- comtrade_class
+For conversion of the example PRODCOM data files the type `prodcom` should be specified. Types prodcom_list and prodcom_correspondence are also defined.
 
+# Converting new data
 
-To add a new file {filename}.csv for conversion, copy the file to the `data` folder, create a file {filename}_defs.dlog containing appropriate meta-data (specifying the prefixes to use in the converted RDF data) and modify `dodo.py` as appropriate.
+For conversion of new data files (possibly in a different format from the examples) see the [DEVELOPING.md](DEVELOPING.md) file.
 
 ## Testing the code
 
@@ -80,3 +86,4 @@ pytest
 
 
 
+

+ 2 - 2
scripts/convert_data.py

@@ -19,10 +19,10 @@ CODE_DIR = Path(__file__).parent
 
 def parse_arguments():
     parser = argparse.ArgumentParser(description=__doc__)
-    parser.add_argument("type", help="data type of convert")
+    parser.add_argument("type", help="data type of file to convert")
     parser.add_argument('input_file',
                         type=Path,
-                        help='PRODCOM or COMTRADE csv file to convert')
+                        help='PRODCOM csv file to convert')
     parser.add_argument('output_file',
                         type=Path,
                         help='path to save output RDF')

+ 0 - 25
scripts/load_data_comtrade.rdfox

@@ -1,25 +0,0 @@
-######################################################
-###                  COMTRADE DATA                 ###
-######################################################
-
-dsource register "COMTRADE_DATA"                                \
-    type    delimitedFile                                       \
-    file    "$(dir.datasource)/data.csv"              \
-    header  true                                                \
-    quote   '"'
-
-tupletable create ufrd:COMTRADE_DATA                            \
-    dataSourceName  "COMTRADE_DATA"                             \
-    "columns"       3                                           \
-    "1"             "{ID}"                                      \
-    "1.datatype"    "string"                                    \
-    "2"             "{Commodity Code}"                          \
-    "2.datatype"    "string"                                    \
-    "3"             "{Netweight (kg)}"                          \
-    "3.datatype"    "xsd:decimal"                               \
-    "3.if-empty"    "absent"
-
-    # 1 ID generated for unique observations
-    # 2 This is the ClassificationCode and related to 1. by `objectDefinedBy`
-    # 3 MeasurementValue (of measurement of observation) - the measurement is implicit here. If blank then the value was witheld (but is not 0 necessarily)
-

+ 0 - 27
scripts/load_data_comtrade_class.rdfox

@@ -1,27 +0,0 @@
-######################################################
-###             COMTRADE CLASSIFICATION            ###
-######################################################
-
-dsource register "COMTRADE_CLASSIFICATION"                      \
-    type    delimitedFile                                       \
-    file    "$(dir.datasource)/data.csv"                        \
-    header  true                                                \
-    quote   '"'
-
-tupletable create ufrd:COMTRADE_CLASSIFICATION                  \
-    dataSourceName  "COMTRADE_CLASSIFICATION"                   \
-    "columns"       4                                           \
-    "1"             "{Code}"                                    \
-    "1.datatype"    "string"                                    \
-    "2"             "{Description}"                             \
-    "2.datatype"    "string"                                    \
-    "3"             "{Parent Code}"                             \
-    "3.datatype"    "string"                                    \
-    "3.if-empty"    "absent"                                    \
-    "4"             "{Level}"                                   \
-    "4.datatype"    "string"
-
-# 1 This is the classification code, with dots removed in advance
-# 2 This is the classification description
-# 3 This is the parent code for the classification code, with dots removed in advance
-# 4 This is the level of the classification code in the hierarchy

+ 0 - 26
scripts/map_comtrade.dlog

@@ -1,26 +0,0 @@
-:DirectObservation[?ID] ,
-[?ID, :objectDirectlyDefinedBy, ?CNCode] ,
-[?ID, :hasRegion, gnd:2635167] ,
-[?ID, :hasTimePeriod, ?TimePeriod] ,
-[?ID, :hasRole, ?Role] ,
-[?ID, :partOfDataset, ?Dataset] ,
-[?ID, :bound, :ExactBound] ,
-[?ID, :metric, quantitykind:Mass] ,
-ufu:NG(?ID, ufu:unit, ?Unit)
-        :- ufrd:COMTRADE_DATA(?IDstring, ?CNCodestring, ?ImQNT),
-
-        [ufu:CurrentImport, :hasTimePeriod, ?TimePeriod],
-        [ufu:CurrentImport, :partOfDataset, ?Dataset],
-        [ufu:CurrentImport, :hasRole, ?Role],
-        [ufu:CurrentImport, :useDataPrefix, ?DataPrefix],
-        [ufu:CurrentImport, :useObjectPrefix, ?ObjectPrefix],
-
-        BIND(IRI(CONCAT(STR(?DataPrefix), "Observation-", SHA256(?IDstring))) AS ?ID) ,
-        BIND(IRI(CONCAT(STR(?ObjectPrefix), "Object-", SHA256(?CNCodestring))) AS ?CNCode) ,
-        BIND(IRI(CONCAT(STR(:), "Unit-", SHA256("Weight in kilograms"))) as ?Unit) .
-# if ?ImQNT is not "absent"
-ufu:NG(?ID, ufu:measurementUnit, ?ImQNT)
-        :- ufrd:COMTRADE_DATA(?IDstring, ?CNCodestring, ?ImQNT), FILTER(BOUND(?ImQNT)),
-
-        [ufu:CurrentImport, :useDataPrefix, ?DataPrefix],
-        BIND(IRI(CONCAT(STR(?DataPrefix), "Observation-", SHA256(?IDstring))) AS ?ID) .

+ 0 - 22
scripts/map_comtrade_class.dlog

@@ -1,22 +0,0 @@
-ufct:ClassificationCode[?CCID] ,
-[?ObjectID, :hasClassificationCode, ?CCID] ,
-[?ObjectID, :objectName, ?objectName] ,
-[?CCID, :codeName, ?Code] ,
-[?CCID, :codeDescription, ?Description] ,
-[?CCID, :belongsToList, ?List]
-        :- ufrd:COMTRADE_CLASSIFICATION(?Code, ?Description, ?Parent, ?Level) ,
-
-        [ufu:CurrentImport, :belongsToList, ?List],
-        [ufu:CurrentImport, :useDataPrefix, ?DataPrefix],
-
-        BIND(IRI(CONCAT(STR(?DataPrefix), "Object-", SHA256(?Code))) AS ?ObjectID) ,
-        BIND(IRI(CONCAT(STR(?DataPrefix), "ClassificationCode-", SHA256(?Code))) AS ?CCID) ,
-        BIND(CONCAT("COMTRADE Object from Code ", ?Code) AS ?objectName) .
-# if ?Parent is not "absent"
-[?ParentID, :objectComposedOf, ?ObjectID]
-        :- ufrd:COMTRADE_CLASSIFICATION(?Code, ?Description, ?Parent, ?Level) ,
-
-        [ufu:CurrentImport, :useDataPrefix, ?DataPrefix],
-
-        BIND(IRI(CONCAT(STR(?DataPrefix), "Object-", SHA256(?Code))) AS ?ObjectID) ,
-        BIND(IRI(CONCAT(STR(?DataPrefix), "Object-", SHA256(?Parent))) AS ?ParentID) .