As a part of data analytics course project for IIITB, a team as formed and a group project was our target. The team we had was spread across (3 in Bangalore and 1 US). So, I thought best option for doing a collaborative project was using Notebooks. Though worked in a limited way using Jupyter Notebooks, never tried with a public facing end point where any user with right credentials can login and work.
Below are steps to enable Jupyter Notebooks for shared , collaborative analytics experience.
Note: Did not use Jupyter Hub for multi User Server (though ideal would want to setup simple and fast for our projects
Create Virtual Machine using Azure Portal (Or any public cloud or a machine with public end point)
4 Core, 16 GB RAM, Linux OR
4 Core, 16 GB RAM, Windows 2012
Though basic R and R Studio are not required for this, we have installed nevertheless for any local server testing and validations.
Installed R Studio.
Conda is package and environment management system.
Using Conda multiple version of R / Python can be run simultaneously without impacting each other environment.
Conda can installed using either Anaconda / Miniconda.
Installed conda with Miniconda.
As default conda package management does not include R environment, we created a new R environment using R essentials
After installation is complete, open command prompt and run below command to create R Environment
conda install -c r r-essentials=1.5.2
This command takes time to install R Packages.
Test it by running command below to see if Jupyter Notebooks are properly installed
It should open a browser and open http://localhost:8888/tree where we will be able to create and R notebook.
Next steps enable netbook server side for public access.
- Open command prompt and type below command to create new configuration
- jupyter notebook –generate-config
- It gives path where configuration file is stored.
- Run command to create password
- jupyter notebook password
- As is with previous command, returns path where password hash is stored and it will be in same directory as configuration file.
- In configuration file do below,
- Search c.NotebookApp.password and replace hash that is present in password hash file created.
- Search c.NotebookApp.ip, remove # (uncomment) from front and put ‘*’ if users can connect from anywhere. As we are opening server for public access with only password, recommendation is to put a very strong password.
- Search c.NotebookApp.allow_origin and remove # and set it to ‘*’.
- Search c.NotebookApp.port and set it to port 8888
- On Local Server, enable firewall exceptions to allow notebook web layers and kernel servers to communicate. We did open all ports for 127.0.0.1 for both incoming and outgoing. Also if public cloud, Network Security Groups setting have to be configured to allow data for port 8888.
- Shutdown and restart jupyter netbooks and we are set to collaborate data science project.
While configuring we realized that there is lot more Conda (Anaconda / Miniconda) and would have to dwell much much deeper into how internally these things work and best practices one has to adhere to for a large scale deployment. But for now quite happy to get this started in a matter 30 minutes.