fastgplearn package

Subpackages

Submodules

fastgplearn.gp module

fastgplearn.gp.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

Note

New code should use the choice method of a default_rng() instance instead; please see the random-quick-start.

Parameters
  • a (1-D array-like or int) – If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • replace (boolean, optional) – Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times.

  • p (1-D array-like, optional) – The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.

Returns

samples – The generated random samples

Return type

single item or ndarray

Raises

ValueError – If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

See also

randint, shuffle, permutation

Generator.choice

which should be used in new code

Notes

Setting user-specified probabilities through p uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element of p is 1 / len(a).

Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its axis keyword.

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')
fastgplearn.gp.crossover(pop_np, p_crossover=0.5)

Corssover.

Parameters
  • pop_np (np.ndarray) – population

  • p_crossover (float) – probability for crossover.

Returns

population with shape (n_pop,2**depth_max).

Return type

pop (np.ndarray)

fastgplearn.gp.csub_science(pop, sci_template)

This would change the init pop!!!

pyx version for sci substitute.

fastgplearn.gp.generate_random(func_num, xs_num, pop_size=10, depth_min=1, depth_max=5, p=None, func_p=None, xs_p=None)

Generate the first population. Each individual with ordered: [mark,1,2,3,4,5,6,7, 103,102,100,102,102,103,102,100] 1.First part: mark index of root. 2.second part: index of x gene and f gen. 3.Third part: protect index of x gene.

Parameters
  • func_num (int) – func number.

  • xs_num (int) – x number (n_fea).

  • pop_size (int) – population size.

  • depth_min (int) – min depth of expression.

  • depth_max (max) – max depth of expression.

  • p (None) – (just for test).

  • func_p (np.ndarray) – with shape of (n_func), probability,.

  • xs_p (np.ndarray) – with shape of (n_fea), probability.

Returns

with shape (n_pop,2**depth_max), population .

Return type

pop (np.ndarray)

fastgplearn.gp.mutate(mutate_pop, func_num, xs_num, depth_min=1, depth_max=5, p_mutate=0.8, p=None, func_p=None, xs_p=None)

Mutate. Each individual with ordered: [mark,1,2,3,4,5,6,7, 103,102,100,102,102,103,102,100] 1.First part: mark index of root. 2.second part: index of x gene and f gen. 3.Third part: protect index of x gene.

Parameters
  • func_num (int) – func number.

  • mutate_pop (np.ndarray) – with shape (n_pop,2**depth_max),population.

  • xs_num (int) – x number (n_fea).

  • depth_min (int) – min depth of expression.

  • depth_max (max) – max depth of expression.

  • p (None) – (just for test).

  • func_p (np.ndarray) – with shape of (n_func), probability,.

  • xs_p (np.ndarray) – with shape of (n_fea), probability.

  • p_mutate (flaot) – probability for mutate.

Returns

population with shape (n_pop,2**depth_max).

Return type

pop (np.ndarray)

fastgplearn.gp.mutate_random(pop_np, func_num, xs_num, pop_size=10, depth_min=1, depth_max=5, p_mutate=0.8, p=None, func_p=None, xs_p=None)

Mutate. Each individual with ordered: [mark,1,2,3,4,5,6,7, 103,102,100,102,102,103,102,100] 1.First part: mark index of root. 2.second part: index of x gene and f gen. 3.Third part: protect index of x gene.

Parameters
  • func_num (int) – func number.

  • pop_size (int) – population size.

  • pop_np (np.ndarray) – with shape (n_pop,2**depth_max),population.

  • xs_num (int) – x number (n_fea).

  • depth_min (int) – min depth of expression.

  • depth_max (max) – max depth of expression.

  • p (None) – (just for test).

  • func_p (np.ndarray) – with shape of (n_func), probability,.

  • xs_p (np.ndarray) – with shape of (n_fea), probability.

  • p_mutate (flaot) – probability for mutate.

Returns

population with shape (n_pop,2**depth_max).

Return type

pop (np.ndarray)

fastgplearn.gp.mutate_sci(func_num, xs_num, pop_size=10, depth_min=1, depth_max=5, p=None, func_p=None, xs_p=None, sci_template=None)

Mutate. Each individual with ordered: [mark,1,2,3,4,5,6,7, 103,102,100,102,102,103,102,100] 1.First part: mark index of root. 2.second part: index of x gene and f gen. 3.Third part: protect index of x gene.

Parameters
  • func_num (int) – func number.

  • pop_size (int) – population size.

  • xs_num (int) – x number (n_fea).

  • depth_min (int) – min depth of expression.

  • depth_max (max) – max depth of expression.

  • p (None) – (just for test).

  • func_p (np.ndarray) – with shape of (n_func), probability,.

  • xs_p (np.ndarray) – with shape of (n_fea), probability.

  • sci_template (list of list) – the science expression templates.

Returns

population with shape (n_pop,2**depth_max).

Return type

pop (np.ndarray)

fastgplearn.gp.select_index(score, num_percent=0.3, method='tournament', tour_num=3)

Selection.

Parameters
  • score (np.ndarray) – score with shape (n_res,)/

  • num_percent (int,float) – number or percent of population.

  • method (str) – “tournament” or “k_best”.

  • tour_num (int) – tournament size .

Returns

index of selection to population.

Return type

index (np.ndarray)

fastgplearn.gp.set_seed(seed)

Set random seed.

fastgplearn.gp.sub_re_hall99(inds, func_num, xs_num)

sub the 99 in halls.

fastgplearn.gp.sub_science(pop, sci_template)

This would change the init pop!!! sci substitute.

fastgplearn.sci_formula module

fastgplearn.skflow module

class fastgplearn.skflow.SymbolicClassifier(population_size=10000, generations=20, stopping_criteria=0.95, store_of_fame=50, hall_of_fame=3, store=False, p_mutate=0.2, p_crossover=0.5, select_method='tournament', tournament_size=5, device='cpu', sci_template=None, constant_range=None, constants=None, depth=(2, 4), function_set=('add', 'sub', 'mul', 'div', 'pow2', 'pow3'), n_jobs=1, verbose=0, random_state=None, method_backend='p_numpy', func_p=None)

Bases: fastgplearn.skflow.SymbolicEstimator

A Genetic Programming symbolic classifier.

A symbolic classifier is an estimator that begins by building a population of naive random formulas to represent a relationship. The formulas are represented as tree-like structures with mathematical functions being recursively applied to variables and constants. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations such as crossover, mutation or reproduction.

The default score for find expression is accuracy.

Examples:

>>> from fastgplearn.skflow import SymbolicRegressor
>>> est_gp = SymbolicRegressor(population_size=5000,
...                     generations=20, stopping_criteria=0.01,
...                     p_crossover=0.7, p_mutate_=0.1,
...                     max_samples=0.9, verbose=1,
...                     random_state=0)
>>> est_gp.fit(X_train, y_train)
>>> est_gp.top_n()
>>> test_score = est_gp.score(X_test,y_test)
Parameters
  • population_size (int) – number of population, default 10000.

  • generations (int) – number of generations, default 20.

  • tournament_size (int) – tournament size for selection.

  • stopping_criteria (float) – criteria of correlation score, max 1.0.

  • constant_range (tuple) – floats. constant_range=(0,1.0)

  • constants (tuple) – floats. constants=(-1,1,2,10), if given, The parameter constant_range would be ignored.

  • depth (tuple) – default (2, 5), The max of depth is not more than 8.

  • function_set (tuple) – tuple of str. optional: (‘add’, ‘sub’, ‘mul’, ‘div’,”max”, “min”, “ln”, “exp”, “pow2”, “pow3”, “rec”, “sin”, “cos”).

  • n_jobs (int) – n jobs to parallel.

  • verbose (bool) – print message.

  • p_mutate – mutate probability.

  • p_crossover (float) – crossover probability.

  • random_state (int) – random state

  • hall_of_fame (int) – hall of frame number to add to next generation.

  • store_of_fame (int) – hall of frame number to return result.

  • method_backend (str) – optional: (“p_numpy”,”c_numpy”,”p_torch”,”c_torch”)

  • device (str) – default “cpu”, “cuda:0”, only accessible of torch.

  • func_p (np.ndarray,tuple) – with shape (n_function,), probability values of each function.

  • sci_template (str,list) – None, “default” or user self-defined list template, default None.

best_expression(scoring='accuracy')

Print the best expression.

static cla(pre_y)

classification tool.

fit(X: numpy.ndarray, y: numpy.ndarray, xs_p: Optional[numpy.ndarray] = None, x_label=None)

Fitting.

Parameters
  • X (np.ndarray) – with shape (n_sample,n_fea).

  • y (np.ndarray) – with shape (n_sample,).

  • xs_p (np.ndarray) – with shape (n_fea,), probability values of each xi.

  • x_label (np.ndarray) – with shape (n_fea), names of xi.

predict(X, y=None, n=0)

Return the real predicted y.

Parameters
  • X (np.ndarray) – array-like of shape (n_samples, n_features).

  • vectors (Input) –

  • features (where n_samples is the number of samples and n_features is the number of) –

  • y (np.ndarray) – array-like of shape (n_samples,).

  • n

Returns

array-like of shape (n_samples,).

Return type

y (np.ndarray)

score(X, y, scoring='accuracy', n=0)

Return the mean accuracy on the given test data and labels.

Parameters
  • X (np.ndarray) – array-like of shape (n_samples, n_features).

  • y (np.ndarray) – array-like of shape (n_samples,).

  • scoring (str) – see also sklearn.metrics.

  • n (int) – calculate by the n_ed expression.

Returns

Mean accuracy of self.predict(X) wrt. y.

Return type

score (float)

single_coef_logistic(X, y)

Fitting by sklearn.linear_model.LogisticRegression.

top_n(n=0, scoring='accuracy')

Print the top n result. The best one is index 0.

Parameters
  • scoring (str) – see also sklearn.metrics.

  • n (int) – calculate by the n_ed expression.

class fastgplearn.skflow.SymbolicEstimator(population_size=10000, generations=20, stopping_criteria=0.95, store_of_fame=50, hall_of_fame=3, store=False, p_mutate=0.2, p_crossover=0.5, select_method='tournament', tournament_size=5, device='cpu', sci_template=None, constant_range=None, constants=None, depth=(2, 5), function_set=('add', 'sub', 'mul', 'div', 'pow2', 'pow3'), n_jobs=1, verbose=0, random_state=None, method_backend='p_numpy', func_p=None)

Bases: sklearn.base.BaseEstimator, abc.ABC

Parameters
  • population_size (int) – number of population, default 10000.

  • generations (int) – number of generations, default 20.

  • tournament_size (int) – tournament size for selection.

  • stopping_criteria (float) – criteria of correlation score, max 1.0.

  • constant_range (tuple) – floats. constant_range=(0,1.0)

  • constants (tuple) – floats. constants=(-1,1,2,10), if given, The parameter constant_range would be ignored.

  • depth (tuple) – default (2, 5), The max of depth is not more than 8.

  • function_set (tuple) – tuple of str. optional: (‘add’, ‘sub’, ‘mul’, ‘div’,”max”, “min”, “ln”, “exp”, “pow2”, “pow3”, “rec”, “sin”, “cos”).

  • n_jobs (int) – n jobs to parallel.

  • verbose (bool) – print message.

  • p_mutate – mutate probability.

  • p_crossover (float) – crossover probability.

  • random_state (int) – random state

  • hall_of_fame (int) – hall of frame number to add to next generation.

  • store_of_fame (int) – hall of frame number to return result.

  • method_backend (str) – optional: (“p_numpy”,”c_numpy”,”p_torch”,”c_torch”)

  • device (str) – default “cpu”, “cuda:0”, only accessible of torch.

  • func_p (np.ndarray,tuple) – with shape (n_function,), probability values of each function.

  • sci_template (str,list) – None, “default” or user self-defined list template, default None.

filter_sci_perset(sci_template)

Get the available sci available

fit(X: numpy.ndarray, y: numpy.ndarray, xs_p: numpy.ndarray = None, x_label=None)

Fitting.

Parameters
  • X (np.ndarray) – with shape (n_sample,n_fea).

  • y (np.ndarray) – with shape (n_sample,).

  • xs_p (np.ndarray) – with shape (n_fea,), probability values of each xi.

  • x_label (np.ndarray) – with shape (n_fea), names of xi.

abstract predict(X, y=None, n=0)

Return the real predicted y.

refresh_xcs()

Refresh X and constant for each generation.

refresh_xcs_more()

Refresh X and constant for each generation for torch.

run_gp()

Run the GP processing.

score(X, y, scoring, n=0)

Score.

single_cal(n, new_x=None, with_coef=True)

Get the temp predict y of n_ed expression name (without coef and intercept),This is not the final result!

single_name(n)

Get the name of n_ed expression name.

class fastgplearn.skflow.SymbolicRegressor(population_size=10000, generations=20, stopping_criteria=0.95, store_of_fame=50, hall_of_fame=3, store=False, p_mutate=0.2, p_crossover=0.5, select_method='tournament', tournament_size=5, constant_range=None, constants=None, depth=(2, 4), function_set=('add', 'sub', 'mul', 'div', 'pow2', 'pow3'), sci_template=None, device='cpu', n_jobs=1, verbose=0, random_state=None, method_backend='p_numpy', func_p=None)

Bases: fastgplearn.skflow.SymbolicEstimator

A Genetic Programming symbolic regressor.

A symbolic regressor is an estimator that begins by building a population of naive random formulas to represent a relationship. The formulas are represented as tree-like structures with mathematical functions being recursively applied to variables and constants. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations such as crossover, mutation or reproduction.

The default score for find expression is R (correlation coefficient), Thus this score needs to be further calculated.

Examples:

>>> from fastgplearn.skflow import SymbolicRegressor
>>> est_gp = SymbolicRegressor(population_size=5000,
...                     generations=20, stopping_criteria=0.01,
...                     p_crossover=0.7, p_mutate_=0.1,
...                     max_samples=0.9, verbose=1,
...                     random_state=0)
>>> est_gp.fit(X_train, y_train)
>>> est_gp.top_n()
>>> test_score = est_gp.score(X_test,y_test)
Parameters
  • population_size (int) – number of population, default 10000.

  • generations (int) – number of generations, default 20.

  • tournament_size (int) – tournament size for selection.

  • stopping_criteria (float) – criteria of correlation score, max 1.0.

  • constant_range (tuple) – floats. constant_range=(0,1.0)

  • constants (tuple) – floats. constants=(-1,1,2,10), if given, The parameter constant_range would be ignored.

  • depth (tuple) – default (2, 4), The max of depth is not more than 8.

  • function_set (tuple) – tuple of str. optional: (‘add’, ‘sub’, ‘mul’, ‘div’, “max”, “min”, “ln”, “exp”, “pow2”, “pow3”, “rec”, “sin”, “cos”).

  • n_jobs (int) – n jobs to parallel.

  • verbose (bool) – print message.

  • p_mutate – mutate probability.

  • p_crossover (float) – crossover probability.

  • random_state (int) – random state

  • hall_of_fame (int) – hall of frame number to add to next generation.

  • store_of_fame (int) – hall of frame number to return result.

  • method_backend (str) – optional: (“p_numpy”,”c_numpy”,”p_torch”,”c_torch”)

  • device (str) – default “cpu”, “cuda:0”, only accessible of torch.

  • func_p (np.ndarray) – with shape (n_function,), probability values of each function.

  • sci_template (str,list) – None, “default” or user self-defined list template, default None.

best_expression(scoring='r2')

Print the best expression.

predict(X, y=None, n=0)

Return the real predicted y.

Parameters
  • X (np.ndarray) – array-like of shape (n_samples, n_features).

  • vectors (Input) –

  • features (where n_samples is the number of samples and n_features is the number of) –

  • y (np.ndarray) – array-like of shape (n_samples,).

  • n (int) – calculate by the n_ed expression.

Returns

array-like of shape (n_samples,).

Return type

y (np.ndarray)

score(X, y, scoring='r2', n=0)

Return the r2 score (default) on the given test data and labels.

Parameters
  • X (np.ndarray) – array-like of shape (n_samples, n_features).

  • y (np.ndarray) – array-like of shape (n_samples,).

  • scoring (str) – see also sklearn.metrics.

  • n (int) – calculate by the n_ed expression.

Returns

Mean r2 of self.predict(X) wrt. y.

Return type

score (float)

static single_coef_linear(X, y)

Fitting by sklearn.linear_model.LinearRegression.

top_n(n=0, scoring='r2')

Print the top n result. The best one is index 0.

Parameters
  • scoring (str) – see also sklearn.metrics.

  • n (int) – calculate by the n_ed expression.

fastgplearn.skflow.randint(low, high=None, size=None, dtype=int)

Return random integers from low (inclusive) to high (exclusive).

Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).

Note

New code should use the integers method of a default_rng() instance instead; please see the random-quick-start.

Parameters
  • low (int or array-like of ints) – Lowest (signed) integers to be drawn from the distribution (unless high=None, in which case this parameter is one above the highest such integer).

  • high (int or array-like of ints, optional) – If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None). If array-like, must contain integer values

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • dtype (dtype, optional) –

    Desired dtype of the result. Byteorder must be native. The default value is int.

    New in version 1.11.0.

Returns

outsize-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.

Return type

int or ndarray of ints

See also

random_integers

similar to randint, only for the closed interval [low, high], and 1 is the lowest value if high is omitted.

Generator.integers

which should be used for new code.

Examples

>>> np.random.randint(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0]) # random
>>> np.random.randint(1, size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Generate a 2 x 4 array of ints between 0 and 4, inclusive:

>>> np.random.randint(5, size=(2, 4))
array([[4, 0, 2, 1], # random
       [3, 2, 2, 0]])

Generate a 1 x 3 array with 3 different upper bounds

>>> np.random.randint(1, [3, 5, 10])
array([2, 2, 9]) # random

Generate a 1 by 3 array with 3 different lower bounds

>>> np.random.randint([1, 5, 7], 10)
array([9, 8, 7]) # random

Generate a 2 by 4 array using broadcasting with dtype of uint8

>>> np.random.randint([1, 3, 5, 7], [[10], [20]], dtype=np.uint8)
array([[ 8,  6,  9,  7], # random
       [ 1, 16,  9, 12]], dtype=uint8)

fastgplearn.tools module

class fastgplearn.tools.Hall(size=10)

Bases: object

Hall of Frame.

Examples:

>>> hall = Hall(size=50)
>>> hall.update(inds, gen_i, score, consts)
>>> hall[i]
best_constant()

Return the best individual’s constant for next generation.

change0()

Change the unused constants to 0.

get_share_parameter(x_num, single_start)
sort_and_hash()

Remove the repeat result, (Imperfect guarantee,due to the different individuals could be with same expression).

top_n(n)

Return the top n result.

update(inds, gen_i, score, consts)

Add individual.

class fastgplearn.tools.Logs(head_msg='')

Bases: object

Log the message.

Examples:

>>> log = Logs()
>>> log.record("score:0.9")
>>> log.print(log)
>>> "score:0.9"
print(head=False, row=True)
prints(row=True)
record(msg)
record_and_print(msg, row=False)
records(msg)
fastgplearn.tools.find_add_mask_all_merge(pop, single_start=6)