Tuesday, February 7, 2023
HomeArtificial IntelligenceA Mild Introduction to the tensorflow.knowledge API

A Mild Introduction to the tensorflow.knowledge API

Final Up to date on August 6, 2022

Whenever you construct and prepare a Keras deep studying mannequin, you possibly can present the coaching knowledge in a number of alternative ways. Presenting the information as a NumPy array or a TensorFlow tensor is widespread. One other approach is to make a Python generator perform and let the coaching loop learn knowledge from it. One more approach of offering knowledge is to make use of tf.knowledge dataset.

On this tutorial, you will note how you should use the tf.knowledge dataset for a Keras mannequin. After ending this tutorial, you’ll study:

  • How one can create and use the tf.knowledge dataset
  • The good thing about doing so in comparison with a generator perform

Let’s get began.

A delicate introduction to the tensorflow.knowledge API
Picture by Monika MG. Some rights reserved.


This text is split into 4 sections; they’re:

  • Coaching a Keras Mannequin with NumPy Array and Generator Operate
  • Making a Dataset utilizing tf.knowledge
  • Making a Dataset from Generator Operate
  • Information with Prefetch

Coaching a Keras Mannequin with NumPy Array and Generator Operate

Earlier than you see how the tf.knowledge API works, let’s evaluate the way you would possibly normally prepare a Keras mannequin.

First, you want a dataset. An instance is the style MNIST dataset that comes with the Keras API. This dataset has 60,000 coaching samples and 10,000 check samples of 28×28 pixels in grayscale, and the corresponding classification label is encoded with integers 0 to 9.

The dataset is a NumPy array. Then you possibly can construct a Keras mannequin for classification, and with the mannequin’s match() perform, you present the NumPy array as knowledge.

The whole code is as follows:

Working this code will print out the next:

And likewise, create the next plot of validation accuracy over the 50 epochs you educated your mannequin:

The opposite approach of coaching the identical community is to supply the information from a Python generator perform as a substitute of a NumPy array. A generator perform is the one with a yield assertion to emit knowledge whereas the perform runs parallel to the information client. A generator of the style MNIST dataset will be created as follows:

This perform is meant to be referred to as with the syntax batch_generator(train_image, train_label, 32). It’s going to scan the enter arrays in batches indefinitely. As soon as it reaches the tip of the array, it should restart from the start.

Coaching a Keras mannequin with a generator is much like utilizing the match() perform:

As an alternative of offering the information and label, you simply want to supply the generator as it should give out each. When knowledge are introduced as a NumPy array, you possibly can inform what number of samples there are by trying on the size of the array. Keras can full one epoch when all the dataset is used as soon as. Nonetheless, your generator perform will emit batches indefinitely, so you might want to inform it when an epoch is ended, utilizing the steps_per_epoch argument to the match() perform.

Within the above code, the validation knowledge was supplied as a NumPy array, however you should use a generator as a substitute and specify the validation_steps argument.

The next is the entire code utilizing a generator perform, through which the output is identical because the earlier instance:

Making a Dataset Utilizing tf.knowledge

Given that you’ve the style MNIST knowledge loaded, you possibly can convert it right into a tf.knowledge dataset, like the next:

This prints the dataset’s spec as follows:

You’ll be able to see the information is a tuple (as a tuple was handed as an argument to the from_tensor_slices() perform), whereas the primary factor is within the form (28,28) whereas the second factor is a scalar. Each parts are saved as 8-bit unsigned integers.

If you don’t current the information as a tuple of two NumPy arrays if you create the dataset, you may also do it later. The next creates the identical dataset however first creates the dataset for the picture knowledge and the label individually earlier than combining them:

This can print the identical spec:

The zip() perform within the dataset is just like the zip() perform in Python as a result of it matches knowledge one after the other from a number of datasets right into a tuple.

One advantage of utilizing the tf.knowledge dataset is the flexibleness in dealing with the information. Beneath is the entire code on how one can prepare a Keras mannequin utilizing a dataset through which the batch measurement is about to the dataset:

That is the best use case of utilizing a dataset. If you happen to dive deeper, you possibly can see {that a} dataset is simply an iterator. Subsequently, you possibly can print out every pattern in a dataset utilizing the next:

The dataset has many capabilities in-built. The batch() used earlier than is certainly one of them. If you happen to create batches from a dataset and print them, you’ve gotten the next:

Right here, every merchandise from a batch just isn’t a pattern however a batch of samples. You even have capabilities akin to map(), filter(), and scale back() for sequence transformation, or concatendate() and interleave() for combining with one other dataset. There are additionally repeat(), take(), take_while(), and skip() like our acquainted counterpart from Python’s itertools module. A full record of the capabilities will be discovered within the API documentation.

Making a Dataset from Generator Operate

To this point, you noticed how a dataset may very well be used instead of a NumPy array in coaching a Keras mannequin. Certainly, a dataset may also be created out of a generator perform. However as a substitute of a generator perform that generates a batch, as you noticed in one of many examples above, you now make a generator perform that generates one pattern at a time. The next is the perform:

This perform randomizes the enter array by shuffling the index vector. Then it generates one pattern at a time. Not like the earlier instance, this generator will finish when the samples from the array are exhausted.

You’ll be able to create a dataset from the perform utilizing from_generator(). It is advisable present the identify of the generator perform (as a substitute of an instantiated generator) and in addition the output signature of the dataset. That is required as a result of the tf.knowledge.Dataset API can not infer the dataset spec earlier than the generator is consumed.

Working the above code will print the identical spec as earlier than:

Such a dataset is functionally equal to the dataset that you just created beforehand. Therefore you should use it for coaching as earlier than. The next is the entire code:

Dataset with Prefetch

The actual advantage of utilizing a dataset is to make use of prefetch().

Utilizing a NumPy array for coaching might be the perfect in efficiency. Nonetheless, this implies you might want to load all knowledge into reminiscence. Utilizing a generator perform for coaching means that you can put together one batch at a time, through which the information will be loaded from disk on demand, for instance. Nonetheless, utilizing a generator perform to coach a Keras mannequin means both the coaching loop or the generator perform is operating at any time. It’s not simple to make the generator perform and Keras’s coaching loop run in parallel.

Dataset is the API that permits the generator and the coaching loop to run in parallel. When you’ve got a generator that’s computationally costly (e.g., doing picture augmentation in realtime), you possibly can create a dataset from such a generator perform after which use it with prefetch(), as follows:

The quantity argument to prefetch() is the dimensions of the buffer. Right here, the dataset is requested to maintain three batches in reminiscence prepared for the coaching loop to eat. At any time when a batch is consumed, the dataset API will resume the generator perform to refill the buffer asynchronously within the background. Subsequently, you possibly can permit the coaching loop and the information preparation algorithm contained in the generator perform to run in parallel.

It’s price mentioning that, within the earlier part, you created a shuffling generator for the dataset API. Certainly the dataset API additionally has a shuffle() perform to do the identical, however chances are you’ll not wish to use it until the dataset is sufficiently small to slot in reminiscence.

The shuffle() perform, identical as prefetch(), takes a buffer-size argument. The shuffle algorithm will fill the buffer with the dataset and draw one factor randomly from it. The consumed factor can be changed with the following factor from the dataset. Therefore you want the buffer as massive because the dataset itself to make a very random shuffle. This limitation is demonstrated with the next snippet:

The output from the above seems like the next:

Right here you possibly can see the numbers are shuffled round its neighborhood, and also you by no means see massive numbers from its output.

Additional Studying

Extra concerning the tf.knowledge dataset will be discovered from its API documentation:


On this publish, you’ve gotten seen how you should use the tf.knowledge dataset and the way it may be utilized in coaching a Keras mannequin.

Particularly, you discovered:

  • How one can prepare a mannequin utilizing knowledge from a NumPy array, a generator, and a dataset
  • How one can create a dataset utilizing a NumPy array or a generator perform
  • How one can use prefetch with a dataset to make the generator and coaching loop run in parallel

Develop Deep Studying Tasks with Python!

Deep Learning with Python

 What If You Might Develop A Community in Minutes

…with only a few strains of Python

Uncover how in my new Book:

Deep Studying With Python

It covers end-to-end initiatives on matters like:

Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and extra…

Lastly Convey Deep Studying To

Your Personal Tasks

Skip the Teachers. Simply Outcomes.

See What’s Inside



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments