Commit 5589df6b authored by Bronger, Torsten's avatar Bronger, Torsten
Browse files

Get rid of “items” collection, simplifying the database model

parent 8e8ad663
......@@ -24,15 +24,6 @@ findings
i.e. Zenodo, Pangaea, datapub, and all text journals. It is ever growing,
but existing documents may be updated.
items
This contains metadata of research data publications. The only mandatory
field is the PID. It contains publications from our institutional Dataverse.
It is ever growing, but existing documents may be updated.
aliases
This maps PID aliases to their prime PID. In “findings” and “items” are only
prime PIDs in the MongoDB documents.
crawlings
This contains the timestamp of last crawling for each crawler.
......@@ -84,23 +75,16 @@ them to the “findings” collection. They store their latest run in the
not every time the whole repository is harvested.
Dataverse crawler
.................
This is a cron job.
This crawler is special because it adds records to the “items” collection.
Steward
.......
This is a cron job.
The steward takes all PIDs from “findings” that are not in “items” and does
something about it, e.g.:
The steward takes all PIDs from “findings” that are ``dirty`` and not
``false_positive`` and does something about it, e.g.:
- adds it to Dataverse
- adds it to Dataverse if not existing there yet (and set ``dirty`` to false);
if existing, update the record there
- sends an email to the authoring scientist
- sends an email to the FDM team
......@@ -108,15 +92,28 @@ something about it, e.g.:
Documents
---------
The documents in the collections “findings” and “items” must have a “``pid``”
field. This is the prime PID. The rest is optional:
The documents in the collection “findings” must have the following fields:
``uris``
This is a list of URIs/PIDs the data publication was found under on the
Internet.
``dirty``
A boolean which is true at the start. If true, this record needs to be added
to or updated in the data repository.
``false_positive``
If true, the steward ignores that entry.
Other fields are optional:
- ``institute``
- ``pof4_topic``
- ``aliases`` (alias PIDs)
- ``email`` (of contact person)
- ``false_positive`` (only in “findings”; if true, the steward ignores that
entry)
- …
However, some of those fields might be reuqired when adding a record to the
data repository.
.. LocalWords: FindRD
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment