by Marcos Vanetta / @malev / Continuum Analytics
PyData NYC 2015
Marcos Vanetta (@malev)
Powered by tacos
Sponsored by Continuum Analytics
Reproducibility is the ability of an entire experiment or study to be duplicated, either by the same researcher or by someone else working independently.
A development environment considers infrastructure in terms of both hardware and software.
Automate
| pip | conda |
|---|---|
| Lot of packages | Data packages mostrly |
| ~ Multi platform | Multi platform |
| Not so fast | Fast |
| Included in Anaconda | Included in Anaconda |
$ pip freeze > requirements.txt
$ cat requirements.txt
requests==2.8.1
virtualenv==13.0.1
wheel==0.26.0
$ virtualenv .my-env
$ source .my-env/bin/activate
(my-env)$ pip install -r requirements.txt
Keep it simple
$ conda env export -n please-work -f environment.yml
$ cat environment.yml
name: my-project
dependencies:
- bokeh=0.8.0=np19py27_0
- colorama=0.3.3=py27_0
- pip:
- flask
$ conda env create
...
$ source activate my-project
discarding /Users/mvanetta/miniconda/bin from PATH
prepending /Users/mvanetta/miniconda/envs/my-project/bin to PATH
(my-project)$
Keep it simple
$ conda install pill deps -c malev
$ pill init
$ source pill in
$ deps install
$ source pill out
$ rm -rf .pill
$ conda create -n project
$ conda install -y bokeh pandas jupyter
$ ipython notebook iris.ipynb
$ conda env attach -n iris iris.ipynb
$ anaconda notebook upload iris.ipynb
$ anaconda notebook download malev/iris
$ conda env create iris.ipynb
$ source activate iris
$ ipython notebook iris.ipynb
Attach to the repo, document, use tools like autoenv and automate.