Publishing/sharing models

Writing your own macsy-model package

The whole package structure and the corresponding files are described in the section Structure of a macsy-model package. It requires five different types of files to be complete:

a metadata.yml file (mandatory)
a README.md file (mandatory)
a LICENSE file (optional but HIGHLY recommended)
a model_conf.xml file (optional)
macsy-models definition(s) within a definitions folder (mandatory)
HMM profiles within a profiles folder (mandatory)

You can create a template for your package by using msf_data init. It will create for you:

the git repository with the data package with the right structure.
a template of metadata.yaml .
a template of README.md file.
a generic model_conf.xml file.
a LICENSE file if –license option is set.
a COPYRIGHT file if –holders option is set.
a directory definitions with an example of model definition (model_example.xml to remove before publishing).
a directory profiles where to put the hmm profiles corresponding to the models genes.

Sharing your models

If you want to share your models you can create a macsy-model package in your github repository. Several steps are needed to publish your model:

Check the validity of your package with the msf_data check command. You have to run it from within the folder containing your package files. It will report:
- everything is clear: msf_data displays the next step to take to publish the package
- warning: it means that the package could be improved.
It is better to fix it if you can, but you can also proceed to Step 2
- error: the package is not ready to be published as is. You have to fix the errors before you go to Step 2.

Create a tag, and submit a pull request to the https://github.com/macsy-models organization. This step is very important: without a tag, there is no package. msf_data check only tagged packages. It is Mandatory to follow a versioning scheme described here:

https://www.python.org/dev/peps/pep-0440/#public-version-identifiers

https://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/specification.html#standard-versioning-schemes

Important

If your package is in version 2.0.1 the tag must be 2.0.1. The version or tag must NOT start with letter as v2.0.1 or my_package-2.0.1.

Warning

To avoid making an inconsistent model visible by msf_data install/search (by pushing a tag), a pre-push hook has been setup in the git repository by msf_data init command. If you do not used msf_data init to create the model, It is a good idea to set up the hook by yourself.

Check that the hook is well named pre-push and it is executable (chmod 755 .git/hooks/pre-push) This script run msf_data check if you push a tag and it prevent the push if some error are found.

#!/bin/sh

# An example hook script to verify what is about to be pushed.  Called by "git
# push" after it has checked the remote status, but before anything has been
# pushed.  If this script exits with a non-zero status nothing will be pushed.
#
# This hook is called with the following parameters:
#
# $1 -- Name of the remote to which the push is being done
# $2 -- URL to which the push is being done
#
# If pushing without using a named remote those arguments will be equal.
#
# Information about the commits which are being pushed is supplied as lines to
# the standard input in the form:
#
#   <local ref> <local oid> <remote ref> <remote oid>
#
# This script check if you push a tag
# if yes check if the tag match to the version decalred in metadata.yml
# if yes it prevents the push until the tag and the version match
#
# This script is widely inspired from https://gist.github.com/farseerfc/0729c08cd7c82b07000f20105f733b17

remote="$1"
url="$2"

tagref=$(grep -Po 'refs/tags/([^ ]*) ' </dev/stdin | head -n1 | cut -c11- | tr -d '[:space:]')

if [[ "$tagref" == ""  ]]; then
    ## NOT pushing tag , exit normally
    exit 0
fi

macsydata check
returncode=$?

if [ $returncode -ne 0 ]; then
    Red=$'\e[1;31m'
    Green=$'\e[1;32m'
    Yello=$'\e[1;33m'
    Clear=$'\e[0m'
    echo "${Green}To fix errors:${Clear}"
    echo "${Red}  1. remove tag:${Clear} git tag -d ${tagref}"
    echo "${Yello}  2. fix errors above ${Clear}"
    echo "${Yello}  3. run 'macsydata check' until everything is fixed ${Clear}"
    echo "${Green}  4. commit your fix:${clear} git add / git commit "
    echo "${Green}  5. tag again:${Clear} git tag -a ${tagref}"
    echo "${Green}  6. and push:${Clear} git push ${remote} ${tagref}"
fi

exit $returncode

pre-push .

When your pull request (PR) is accepted, the model package becomes automatically available to the community through the msf_data tool.

If you don’t want to submit a PR you can provide the tag release tarball (tar.gz) as is to your collaborators. This archive will also be usable with the msf_data tool.

Note

msf_data check checks the syntax of the package, but it does not publish anything. It just warns you if something is wrong with the package. Every model provider should check its own package before publishing it. The package publication is done by the git push and the pull request.

Examples of msf_data check outputs:

Your package is syntactically correct:

msf_data check tests/data/models/test_model_package/
Checking 'test_model_package' package structure
Checking 'test_model_package' metadata_path
Checking 'test_model_package' Model definitions
Models Parsing
Definitions are consistent
Checking 'test_model_package' model configuration
There is no model configuration for package test_model_package.
If everyone were like you, I'd be out of business
To push the models in organization:
        cd tests/data/models/test_model_package
Transform the models into a git repository
        git init .
        git add .
        git commit -m 'initial commit'
add a remote repository to host the models
for instance if you want to add the models to 'macsy-models'
        git remote add origin https://github.com/macsy-models/
        git tag 1.0b2
        git push --tags

You received some warnings:

msf_data check tests/data/models/Model_w_conf/
Checking 'Model_w_conf' package structure
Checking 'Model_w_conf' metadata_path
Checking 'Model_w_conf' Model definitions
Models Parsing
Definitions are consistent
Checking 'Model_w_conf' model configuration
The package 'Model_w_conf' have not any LICENSE file. May be you have not right to use it.
The package 'Model_w_conf' have not any README file.
msf_data says: You're only giving me a partial QA payment?
I'll take it this time, but I'm not happy.
I'll be really happy, if you fix warnings above, before to publish these models.

You received some errors:

msf_data check tests/data/models/TFF-SF/
Checking 'TFF-SF' package structure
The package 'TFF-SF' have no 'metadata.yml'.
Please fix issues above, before publishing these models.
ValueError