Skip to content
Snippets Groups Projects
Commit f8e9be04 authored by Philip Monaco's avatar Philip Monaco
Browse files

add synthetic data generator, refactor project modules

parent 8607c26f
Branches
No related tags found
No related merge requests found
Pipeline #1619 passed
Showing
with 425 additions and 4 deletions
Metadata-Version: 2.1
Name: WHY
Version: 0.0.1.dev0+5059977
Summary: Explainable AI system
Home-page: https://gitlab.cci.drexel.edu/pjm363/why-senior-project
Author: Philip Monaco, Abdullah Shah, Ibrahim Elsaid, Jashanpreet Singh, William Lu, Songheng Li
License: LICENSE.md
Description: # WHY Senior Project
## Installing WHY
There are two way in which the `WHY` package can be installed.
First, follow the [prerequisite](#prerequisites) instructions.
IF you do not need to rebuild documentation or make modifications to the library, follow the instructions under [User Installation](#user-installation).
Otherwise, follow the instructions under [Developer Installation](#developer-installation).
### Prerequisites
The `WHY` requires at least `python3.8` and makes use of a number of third-party libraries. The bare minimum packages are automatically installed when you install `WHY` using `pip`. Additional dependencies for developers are contained in `requirements.txt` file. See the
It's recommended that you use `WHY` within a python virtual environment.
Virtual environments are now included natively with Python 3 using `venv`. Instructions to create a virtual environment can be found [here](https://docs.python.org/3/library/venv.html).
API documentation requires the use of `sphinx` which will require installing `Cmake`. Installation instructions can be found [here](https://cmake.org/install/).
### User Installation
First you must clone the repository using the following bash command.
```bash
pip clone git@gitlab.cci.drexel.edu:pjm363/why-senior-project.git
```
From the root path of the repository (the folder where `setup.py` is located) `WHY` can be installed using `pip` using the following command.
```bash
pip install .
```
### Developer Installation
Developers need an additional tool, `clang-format` in order to run the precommit script.
Install via Ubuntu.
```bash
apt-get install clang-format
```
Install via MacOSX with homebrew.
```bash
brew install clang-format
```
Install via [installer](https://llvm.org/builds/) or using chocolatey via.
```bash
choco install llvm
```
## Building API Documentation
To build the API Documentation in HTML format for local browsing, execute the following from the root of the repository.
```
cd docs/
make html
```
This will also cause any examples contained in `examples` to be generated in the example gallery.
All documentation is also built automatically when the `./precommit.sh` script is run.
Platform: UNKNOWN
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: lint
Provides-Extra: docs
Provides-Extra: all
README.md
setup.py
src/WHY.egg-info/PKG-INFO
src/WHY.egg-info/SOURCES.txt
src/WHY.egg-info/dependency_links.txt
src/WHY.egg-info/not-zip-safe
src/WHY.egg-info/requires.txt
src/WHY.egg-info/top_level.txt
\ No newline at end of file
numpy>=1.21
pandas>=1.3.5
bokeh>=2.4.2
matplotlib>=3.5.0
scikit-learn>=1.0.2
[all]
numpy>=1.21
pandas>=1.3.5
bokeh>=2.4.2
matplotlib>=3.5.0
scikit-learn>=1.0.2
black==21.12b0
isort==5.10.1
flake8==4.0.1
mypy
Sphinx
sphinx-gallery
sphinx-rtd-theme
m2r2
[docs]
Sphinx
sphinx-gallery
sphinx-rtd-theme
m2r2
[lint]
black==21.12b0
isort==5.10.1
flake8==4.0.1
mypy
File added
File added
File added
File added
File added
"""
This is a docstring for config.
"""
from bokeh.models import Select, Slider, Row, Column
from bokeh.layouts import column, row
x = 0
y = 0
spectral = 0
source = 0
datasets_names = [
"Make Classification",
"Multilabel Classification",
"Blobs",
"Noisy Moons",
"Noisy Circles"
]
dataset_select = Select(value='Make Classification',
title='Select Dataset',
width=200,
options=datasets_names)
samples_slider = Slider(title="Number of samples",
value=1500.0,
start=200.0,
end=3000.0,
step=100,
width=400)
classes_slider = Slider(title="Number of Classes",
value = 3,
start=2,
end=20,
step=1,
width=400)
features_slider = Slider(title="Number of Features",
value = 3,
start=2,
end=1000,
step=1,
width=400)
inf_slider = Slider(title='Informative Classes',
value=3,
start=2,
end=100,
step=1,
width=400)
selects = Row(dataset_select, width=420)
inputs = Column(selects, samples_slider, classes_slider, inf_slider, features_slider)
"""
This is a docstrings for datavis
"""
from bokeh.models import ColumnDataSource, Select, Slider, Plot, Scatter, Row, Column
from bokeh.plotting import figure
import config as config
def vis_synthetic():
"""This is a docstring for datavis
Returns:
_type_: _description_
"""
b = figure(
title="Dataset", width=400, height=400, min_border=0)
glyph = Scatter(x="x", y="y", size=5, fill_color="colors")
b.add_glyph(config.source, glyph)
return b
\ No newline at end of file
"""
Synthetic data generation class, with callbacks and visualizations
"""
import numpy as np
from sklearn import datasets
from bokeh.models import Select, Slider, Row, Column
from bokeh.io import curdoc
from bokeh.layouts import column, row
import math
from bokeh.palettes import Spectral6
import config as config
from data_vis import vis_synthetic
class SyntheticData:
"""Class for creating a synthetic data object with default parameters.
"""
def __init__(self,
dataset='Make Classification',
n_samples=1500,
n_features=4,
n_classes=3,
n_inf=2):
self.dataset = dataset
self.n_samples = n_samples
self.n_features = n_features
self.n_classes = n_classes
self.n_inf = n_inf
def generator(self):
"""Engine that creates synthetic data.
Takes advantage the synthetic data generator provided by sklearn.
The generator makes 5 data shapes available.
Returns:
X: ndarray of shape(n_samples, 2) The generated samples.
y: ndarray of shape(n_samples,) The integer labels for class membership of each sample.
"""
if self.dataset == 'Blobs':
#sliders: samples, classes, features
return datasets.make_blobs(n_samples=self.n_samples,
centers=self.n_classes,
n_features=self.n_features,
random_state=8
)
elif self.dataset == 'Make Classification':
#sliders: samples, features, informative features, classes
return datasets.make_classification(n_samples=self.n_samples,
n_features=self.n_features,
n_informative=self.n_inf,
n_redundant=0,
n_clusters_per_class=1,
n_classes=self.n_classes,
random_state=8
)
elif self.dataset == 'Noisy Circles':
#sliders: samples
return datasets.make_circles(n_samples=self.n_samples,
factor=0.5,
noise=0.05
)
elif self.dataset == 'Noisy Moons':
#sliders: samples
return datasets.make_moons(n_samples=self.n_samples,
noise=0.05
)
elif self.dataset == 'Multilabel Classification':
#sliders: samples, features, classes
return datasets.make_multilabel_classification(n_samples=self.n_samples,
n_features=self.n_features,
n_classes=self.n_classes,
random_state=8
)
elif self.dataset == "No Structure":
return np.random.rand(self.n_samples, 2), None
def update_samples_or_dataset(attrname, old, new):
"""Callback function that updates samples as values are scrubbed with sliders.
Args:
attrname (_type_): _description_
old (_type_): _description_
new (_type_): _description_
"""
if config.dataset_select.value == 'Blobs':
dataset = config.dataset_select.value
n_samples = int(config.samples_slider.value)
n_classes = int(config.classes_slider.value)
n_features = int(config.features_slider.value)
data = SyntheticData(dataset, n_samples, n_features, n_classes)
config.x, config.y = data.generator()
colors = [config.spectral[i] for i in config.y]
config.source.data = dict(colors=colors, x=config.x[:, 0], y=config.x[:, 1])
elif config.dataset_select.value == 'Make Classification':
dataset = config.dataset_select.value
n_samples = int(config.samples_slider.value)
n_classes = int(config.classes_slider.value)
n_features = int(config.features_slider.value)
n_inf = int(config.inf_slider.value)
if n_inf > n_features:
n_features = n_inf
config.features_slider.update(value=n_inf)
if n_classes > 2**n_inf:
n_inf = (math.ceil(math.log2(n_classes)))
n_features = n_inf
config.inf_slider.update(value=n_inf)
config.features_slider.update(value=n_features)
data = SyntheticData(dataset, n_samples, n_features, n_classes, n_inf)
config.x, config.y = data.generator()
colors = [config.spectral[i] for i in config.y]
config.source.data = dict(colors=colors, x=config.x[:, 0], y=config.x[:, 1])
elif config.dataset_select.value == 'Noisy Circles':
dataset = config.dataset_select.value
n_samples = int(config.samples_slider.value)
data = SyntheticData(dataset, n_samples)
config.x, config.y = data.generator()
colors = [config.spectral[i] for i in config.y]
config.source.data = dict(colors=colors, x=config.x[:, 0], y=config.x[:, 1])
elif config.dataset_select.value == 'Noisy Moons':
dataset = config.dataset_select.value
n_samples = int(config.samples_slider.value)
data = SyntheticData(dataset, n_samples)
config.x, config.y = data.generator()
colors = [config.spectral[i] for i in config.y]
config.source.data = dict(colors=colors, x=config.x[:, 0], y=config.x[:, 1])
elif config.dataset_select.value == 'Multilabel Classification':
dataset = config.dataset_select.value
n_samples = int(config.samples_slider.value)
n_features = int(config.features_slider.value)
n_classes = int(config.classes_slider.value)
data = SyntheticData(dataset, n_samples, n_features, n_classes)
config.x, config.y = data.generator()
colors = [config.spectral[i] for i in config.y]
config.source.data = dict(colors=colors, x=config.x[:, 0], y=config.x[:, 1])
def update_layout(attrname, old, new):
"""Callback function that updates the sliders layout as datasets change.
Args:
attrname (_type_): _description_
old (_type_): _description_
new (_type_): _description_
"""
if config.dataset_select.value == 'Blobs' or config.dataset_select.value == 'Multilabel Classification':
inputs = Column(config.selects, config.samples_slider, config.classes_slider, config.features_slider)
b = vis_synthetic()
curdoc().clear()
curdoc().add_root(Row(inputs, b))
elif config.dataset_select.value == 'Make Classification':
inputs = Column(config.selects, config.samples_slider, config.classes_slider, config.features_slider, config.inf_slider)
b = vis_synthetic()
curdoc().clear()
curdoc().add_root(Row(inputs, b))
elif config.dataset_select.value == 'Noisy Circles' or config.dataset_select.value == 'Noisy Moons':
inputs = Column(config.selects, config.samples_slider)
b = vis_synthetic()
curdoc().clear()
curdoc().add_root(Row(inputs,b))
config.dataset_select.on_change('value', update_samples_or_dataset)
config.samples_slider.on_change('value_throttled', update_samples_or_dataset)
config.classes_slider.on_change('value_throttled', update_samples_or_dataset)
config.features_slider.on_change('value', update_samples_or_dataset)
config.inf_slider.on_change('value', update_samples_or_dataset)
config.dataset_select.on_change('value', update_layout)
\ No newline at end of file
...@@ -3,10 +3,10 @@ ...@@ -3,10 +3,10 @@
# You can set these variables from the command line, and also # You can set these variables from the command line, and also
# from the environment for the first two. # from the environment for the first two.
SPHINXOPTS = SPHINXOPTS ?=
SPHINXBUILD = sphinx-build SPHINXBUILD ?= sphinx-build
SOURCEDIR = source SOURCEDIR = .
BUILDDIR = build BUILDDIR = _build
# Put it first so that "make" without argument is like "make help". # Put it first so that "make" without argument is like "make help".
help: help:
... ...
......
WHY package
===========
Submodules
----------
WHY.config module
-----------------
.. automodule:: WHY.config
:members:
:undoc-members:
:show-inheritance:
WHY.data\_vis module
--------------------
.. automodule:: WHY.data_vis
:members:
:undoc-members:
:show-inheritance:
WHY.synthetic module
--------------------
.. automodule:: WHY.synthetic
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: WHY
:members:
:undoc-members:
:show-inheritance:
File added
File added
File added
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment