Data Science

Numpy’s Random Module

Understanding how to create Random Data

Himani Gulati
5 min readJun 8, 2022
Random Functions
Table of Content:
The Random Module
Simple Random Functions
Random Distributions
Random Generator
Random Permutations

Random is a module in the Numpy library for providing random numerical data in any required data structure. It contains simple functions/methods to generate random numbers, permutations, and probability distributions.

In this tutorial, we will understand how to use these functions and create random data as per our needs.

These features are based on PRNG(Pseudo Random Number Generation) Algorithms. To suffice, the PRN generator uses mathematical formulas to produce a sequence of random numbers using an arbitrary seed state. This helps to reproduce the same numbers at one’s convenience.

Importing Numpy

import numpy as np

After importing NumPy, we can directly call the random module and then the function we want to use. Let’s get into these functions:

1) numpy.random.rand():

  • This function allows random numbers generation in the desired shape (Provided in the argument).
  • Syntax: numpy.random.rand(d1,d2,..dn).
  • This will typically return values from a uniform distribution over 0, 1. If no argument is provided, the function will return a single float value.

Uniform Distribution: When the probability of each value is equally likely, for example, rolling a fair die.

Code example for rand function
Rand function with dimension

2) numpy.random.randn():

  • randn() is similar to rand() and provides an array of the mentioned dimensions except that, this function returns random floats sampled from a Standard Normal Distribution.
  • The Standard Normal Distribution is the Gaussian(Normal) Distribution with a mean of 0 and a variance of 1.
  • In any case, the argument provided is float, they are first converted to int.

3) numpy.random.randint():

  • This function takes in a low and a high argument & returns random integers from the half-open interval. i.e [low, high)low(inclusive), high (exclusive)
  • You can mention the size of the required array in the size argument.
  • If high is none, then the result will range within [0, low).

4) numpy.random.random() & random_sample():

  • Above mentioned functions have the same functionality and can be used interchangeably.
  • These functions return random floats in the half-open interval’s continuous uniform distribution of [0.0, 1.0).
  • You can specify the size argument(optional) to return an array of a specific size.

Note: Along with these two, numpy.random.sample() and numpy.random.ranf() return the same results.

5) numpy.random.choice():

  • The choice function generates a random sample from a given 1-D array in a argument.
  • You can use the size argument to mention the output shape.
  • If for the input, an array is provided the values in the output come from this input array. But if an integer is provided for the input, random values are generated from numpy.arange(input_int).

You can check out the functionality of np.arange() here: https://numpy.org/doc/stable/reference/generated/numpy.arange.html?highlight=arange#numpy-arange

  • This function also provides the replace:boolean argument, which is True by default. This means that a value of the input array acan be selected multiple times.
  • The p argument ≡ probabilities associated with each entry in a. i.e, You can provide another argument stating the probability of selecting each item in a.

Note: Make sure that the probabilities you provide for each entry in a have the same size as a and sum up to 1.

6) numpy.random.bytes():

  • This function simply returns random bytes of the size mentioned in the argument.
  • random.bytes(length)
  • This can especially be used to generate API keys and passwords.

Random Distributions

Random provides a myriad of methods to access probability distributions. You can access the whole list of methods provided by random here.

Let’s look at a few examples:

Normal Distribution

  • numpy.random.normal(loc, scale, size)
  • As the name suggests, this function will return random samples from a normal (Gaussian) Distribution.
  • loc: Mean, scale: Std Deviation, size: Output shape
  • Read more about Normal Distributions

Binomial Distribution

  • numpy.random.binomial(n,p,size)
  • n: Number of Trials at a time
  • p: Probability of Success
  • Read more about Binomial Distribution

Let’s create a distribution of a number of heads in a coin tossed 100 times with probability = 0.5.

Uniform Distribution

  • numpy.random.uniform(low, high, size)
  • This function draws samples from a half-open interval of the uniform distribution between the low and high argument. i.e [low, high).
  • Hence, any value btw low and high is equally likely to be chosen or
  • By default low is 0 and high is 1.
  • Read more about Uniform Distribution.

Random Generator

RandomState()

  • numpy.random.RandomState(seed)
  • This specific function is a container that provides access to a wide variety of probability distributions using the Mersenne Twister PRNG.
  • In simple words, RandomState uses the Mersenne Twister Pseudo Random Number Generator just like Random but the only difference is, that RandomState provides a much larger number of probability distributions to choose from.

You can access the entire list of Methods provided by RandomState here: https://numpy.org/doc/1.16/reference/generated/numpy.random.RandomState.html#numpy-random-randomstate

Seed

  • np.random.seed(seed)
  • The random number generator needs a number to start with, which we fix using seed.
  • This method is used when we want to reproduce the same set of numbers.
  • We can initialize a seed to generate a random set of numbers, and with the same seed value, a similar set of numbers will be generated every time.

Let’s look at an example:

Typically a seed is used when we want to produce the same random numbers throughout the notebook. So after declaring the seed once in the beginning, you can expect different random number sequences across the notebook, yet the same sequences every time you re-run your notebook.

Permutation

The Random module provides the following functions for permutations:

Shuffle

  • numpy.random.shuffle(x), where x: Array
  • This function is used to shuffle the array provided in the argument.

Permutation

  • numpy.random.permutation(x), where x: int or array.
  • This function can be used as the shuffe for rearranging or if x is an integer, then the function will first call np.arange(x), and then shuffle the array.

Note: In the case of a multidimensional array, the arrays are shuffled along the first axis. (For both permutation functions)

Conclusion

Numpy’s Random module is suitable when there is a need for a huge number of random numbers especially when the same sequence is needed repeatedly. In this tutorial, we learned examples of numpy.random’s:

  • rand()
  • randn()
  • randint()
  • random(), sample()
  • choice()
  • bytes()

You can find all the necessary code in this notebook: https://jovian.ai/himani007/numpy-random-module

References

--

--

Himani Gulati
Himani Gulati

Written by Himani Gulati

Here to share some views and gather some insights. Find me here: https://www.linkedin.com/in/himani-gulati-958b3119a/

No responses yet