First impressions on the CNTK and a comparability with Google’s TensorFlow. – Microsoft College Connection

My first impressions on the CNTK and a comparability with Google’s TensorFlow. by Microsoft Pupil Associate at College School London.


About me

My identify is Warren Park who’s a first-year pc science pupil at UCL. My specific curiosity is in Synthetic Intelligence (AI), in varied purposes with varied approaches. One of many elements in AI that I give attention to these days is machine studying, and I often use TensorFlow (in case I want to make use of GPGPU) or plain NumPy to assemble an Synthetic Neural Community (ANN).

Since I like interdisciplinary analysis, I’ve achieved a private analysis in astronomy mixed with machine studying in January and written a draft paper. Throughout this summer time, I’m planning to both enhance the standard of the analysis that I’ve beforehand achieved or planning to begin a brand new analysis in machine studying. My earlier analysis used Python with TensorFlow to assemble a totally related neural community.

This weblog submit…

On this weblog submit, I wish to introduce how simply a pupil with a restricted expertise might begin machine studying experiments by utilizing Microsoft Cognitive Toolkit (CNTK). I may also describe some points that CNTK and TensorFlow differs and can present some causes to choose CNTK.

Microsoft Cognitive Toolkit (CNTK)

CNTK is likely one of the most generally identified machine studying frameworks available in the market, which is developed by Microsoft that options nice compatibility and efficient use of computational sources.

Machine studying framework is a library for the programming languages that gives a capability to assemble a machine studying mannequin, practice, take a look at, and consider the outlined mannequin. Though with out frameworks machine studying may be achieved, typically, frameworks are used since they present better-optimised execution of the machine studying duties. Moreover, utilizing machine studying frameworks save time since they’re examined, and designed rigorously to allow builders to make use of machine studying simply.

To allow newbie builders to make use of well-known machine studying fashions, Microsoft gives CNTK mannequin gallery [1], that builders can obtain predefined fashions for sure duties in case they’re unfamiliar with the thought of machine studying. There are additionally some predefined fashions within the CNTK bundle that are prepared to make use of.

How can a pupil use CNTK?

To begin utilizing CNTK, it’s good to comply with the CNTK lab [2]. The CNTK lab instructs a pupil to coach the predefined CNTK mannequin with MNIST dataset and take a look at the educated mannequin with the given take a look at dataset in addition to a dataset that may be created by the scholar.

MNIST dataset [3] is a set of 60,000 coaching set photographs and 10,000 take a look at set photographs of handwritten digits from zero to 9. Since all the photographs have 28X28 dimensions that are greyscaled, and because the photographs are easy, the MNIST dataset is often used as many machine studying frameworks’ “Hey World!” mission.

Utilizing the predefined mannequin, a pupil would simply be capable to have CNTK expertise with the real-life datasets and would be capable to utilise the mannequin for different duties sooner or later.

To point out how straightforward it’s to begin utilizing CNTK, I’ll now clarify how I used to be in a position to full the lab workouts.

Prerequisite for the lab workouts

All of the issues that pupil must know to finish the lab workouts could be some fundamental Linux or Home windows command line immediate instructions. Practically neither data about machine studying nor coding is required to finish the lab workouts, though some data about machine studying could be required to understand the which means of the duties. A pc with both 64-bit Home windows or 64-bit Linux can also be required.

Step 1: Set up of the CNTK

I’ve a Home windows PC, so adopted the directions given for the Home windows. Putting in CNTK for Home windows PC is simply barely more durable than putting in different software program available on the market. As an alternative of double-clicking .exe file, what I wanted to do was double-click the set up.bat.

The set up batch file needed to be downloaded from


By clicking I settle for, I used to be in a position to obtain a compressed .zip file containing all of the sources. Then, I’ve executed the set up.bat batch file, which was present in a cntk->Scripts->install->home windows folder. Once I double-clicked the file, the window beneath has proven.


I simply needed to press 1 and enter to proceed the set up. It put in varied software program together with Anaconda that may make CNTK run on a pc.

In the course of the set up, the command immediate sometimes confirmed “Do you wish to proceed? (y/n)”. On this case, I’ve typed “y” adopted by an enter key.

After the set up, the set up web page confirmed a command wanted to activate the CNTK relying on the put in Python model, for instance:

To activate the CNTK Python surroundings and set the PATH to incorporate CNTK, begin a command shell and run


This command wanted to be copied, and every time I needed to make use of the CNTK, the copied command needed to be copied to the command immediate and executed.

Step 2: Obtain, flatten and label the MNIST knowledge

This step additionally makes use of a program within the cntk folder that has been downloaded. Within the command immediate, I needed to go to the listing the place cntk folder exists. Then,

cd ExamplesImageDataSetsMNIST


wanted to be executed.


After doing this, there have been two recordsdata generated within the MNIST folder. One file was a dataset for the coaching and the opposite was the take a look at dataset. Each of them have been the set of texts representing photographs flattened and labelled. For every picture, a line of knowledge was added to coach or take a look at dataset textual content file relying on which file (uncompressed folder) every picture was coming from, and the road of knowledge was:

| labels <One scorching encoded row vector> | options <Flattened picture knowledge>

One scorching encoded means changing a quantity in a type of knowledge (on this case, type of a row vector) that has all parts zero besides the place corresponds to the quantity, which might be 1. For instance, 2 will probably be transformed to 0000000100 and zero will probably be transformed to 0000000001 in case there are solely 10 doable digits.

Picture flattening occurs by appending the subsequent peak degree pixels’ knowledge row vector to the earlier peak degree pixel’s knowledge. Due to this fact, per picture, a really lengthy row vector (1X784) will probably be produced after the flattening. Flattening is beneficial because it allows batch coaching since many row vectors can kind a matrix that the batch coaching steps may be achieved with a easy matrix multiplication.

Step Three: Acquire some take a look at knowledge (Used my very own hand writing)

The MNIST dataset has a take a look at dataset given individually, however it will even be fascinating to make use of my handwriting. On this step, I’ll clarify how I collected, labelled, and flattened the take a look at knowledge.

First, a python program with a pattern take a look at dataset needed to be downloaded from: https://a4r.blob.core.home windows.web/public/

This needed to be uncompressed.

Second, a handwriting knowledge wanted to be collected. The info needed to have a dimension of 28X28 px and was in a position to be ready in varied methods. In my case, I’ve used a “Paint” utility on Home windows.


Every picture must be saved within the uncompressed downloaded file’s input-images folder. Two photographs per digit was really useful to be added and every picture needed to have file identify <digit>-01.png or <digit>-02.png or <digit>-03.png e.g. Three-02.

From the uncompressed downloaded file, I executed I’ve achieved this by typing python on the command immediate from the folder the place the Python file was saved in.

Consequently, labelled, flattened knowledge was saved as Customized-Check-28x28_cntk_text.txt file.

Step four: Totally related neural community with one hidden layer

Now, I’ll clarify how I educated and examined the machine studying mannequin that has one hidden layer. Earlier than I clarify how I’ve achieved the duty, I wish to point out how Microsoft has predefined the mannequin in 01_OneHidden.cntk:

01_OneHidden.cntk had a typical related neural community configured. As an optimiser, it used a Stochastic Gradient Descent (SGD) with studying charge zero.01 for first 5 epochs and zero.zero05 for the opposite 5 epochs. The hidden layer had 200 nodes and ReLU had been used as an activation perform, with a Softmax on the output layer.

This implies the community is configured like this:


In actuality, photographs are flattened so the details about above and beneath pixels could be ignored. Every output may also be a row vector as a substitute of a column vector.

This info might be slightly bit tough for the scholars who’ve restricted experiences. Due to this fact, CNTK gives a predefined mannequin, which implies college students don’t must even take into consideration the underlying construction of the machine studying mannequin.

To ensure that me to make use of a machine studying mannequin, I solely needed to 1) activate the CNTK on the command immediate by typing the command given on the set up time e.g. C:UsersusernameDesktopcntkscriptscntkpy35.bat 2) transfer to the cntkExamplesImageGettingStarted folder on the command immediate and three) kind cntk configFile=01_OneHidden.cntk.

Then the machine studying had began.

If I needed to make use of the information collected in Step Three, what I wanted to do was to vary the file variable worth beneath the #TEST CONFIG within the 01_OneHidden.cntk file discovered within the cntkExamplesImageGettingStarted folder.


On account of the duties, I’ve obtained:

For the MNIST dataset take a look at knowledge:

Minibatch[1-10]: errs = 1.760% * 10000; ce = zero.05873108 * 10000

Closing Outcomes: Minibatch[1-10]: errs = 1.760% * 10000; ce = zero.05873108 * 10000; perplexity = 1.06049001

Which suggests the share of an incorrect consequence was 1.76% and for the take a look at knowledge that I’ve made:

Minibatch[1-1]: errs = 30.000% * 30; ce = 1.68806547 * 30

Closing Outcomes: Minibatch[1-1]: errs = 30.000% * 30; ce = 1.68806547 * 30; perplexity = 5.40900667

Which means that 30.00% of the information was misclassified.

Though the share error was fairly excessive, it’s cheap for the reason that machine studying mannequin was only a totally related neural community with just one hidden layer.

Step 5: One convolutional layered neural community

A Convolutional Neural Community (CNN) typically reveals higher efficiency in comparison with the traditional related neural community because it preserves the scale e.g. colors (channel), and above and beneath pixel relationships. Due to this cause, CNN is likely one of the hottest neural community used for a lot of purposes together with picture recognition and speech recognition.

CNN is a neural community which is fashioned by having convolutional layers[1]. Pooling layers may also be utilized in a sequence of convolutional layer-activation function-pooling layer. On the finish of all of the layers, typical affine layers (dense layers) could be added.

As talked about, every convolutional layer has pooling layer subsequent to it (after the activation perform utilized) which is generally a max pool. Every convolutional layer applies a filter to the enter knowledge and pooling layer reduces the scale of the enter. Pooling layer reduces the scale by utilizing a requested function, for instance, if max pool is used, in a sure half (e.g. 2X2 matrix within the higher left nook) of the enter matrix, the utmost aspect amongst them would be the consultant for the half and can solely be preserved whereas all others which are smaller than the chosen aspect and throughout the dimension that’s being investigated will probably be misplaced. The requested dimension of the unit half for the analysis will transfer with a sure distance to cowl all of the elements of the enter, and that distance is known as a stride. Strides should be outlined earlier than the coaching begins.

CNN with one convolutional layer predefined in 02_OneConv is like this:


02_OneConv.cntk used configuration of layers illustrated above. Like 01_OneHidden, it used the SGD as an optimiser however with zero.zero01 for first 5 epochs and zero.0005 for the remaining (10 epochs) as a studying charge. Much like 01_OneHidden, #TEST CONFIG take a look at file path was in a position to be modified to make the CNTK take a look at the educated mannequin with the information that I’ve produced.

From the coaching, I’ve obtained:

Utilizing MNIST take a look at units:

Minibatch[1-10]: errs = 1.zero10% * 10000; ce = zero.03236526 * 10000

Closing Outcomes: Minibatch[1-10]: errs = 1.zero10% * 10000; ce = zero.03236526 * 10000; perplexity = 1.03289471

Which suggests there was just one.01% error, and utilizing my knowledge:

Minibatch[1-1]: errs = 23.333% * 30; ce = zero.57137871 * 30

Closing Outcomes: Minibatch[1-1]: errs = 23.333% * 30; ce = zero.57137871 * 30; perplexity = 1.77070666

This implies there was 23.33% of misclassified photographs.

In comparison with the consequence from 01_OneHidden (See the desk beneath), it was clear that CNN can carry out higher on the picture classification.




MNIST take a look at knowledge




My take a look at knowledge




Lab abstract

The CNTK lab was very intuitively designed that I feel this lab sources can be utilized for any particular person finding out any topic space. One of many spectacular factor for me was that the CNTK lab doesn’t require any prerequisite data to finish. In different machine studying frameworks case, predefined fashions may be discovered, however with a purpose to practice them, they require some additional coding, until the writer of the mannequin additionally distributes a software program that makes use of the mannequin. Nonetheless, I’ve seen that in CNTK’s case, if there’s a predefined mannequin given as a .cntk file, coaching and testing can happen with none coding.

For the reason that mannequin shouldn’t be totally restrictive on the information, I feel a pupil in one other disciplinary space also can do a machine studying classification mission very simply offered that the enter knowledge has a dimension of 28X28 or 1X784 with 10 classification courses. If a pupil can do some linear algebra, which is the topic that folks generally research, altering the dimension of the mannequin may also be doable that I feel the CNTK’s predefined fashions might function a machine studying mannequin for quite a few initiatives in varied fields.

Comparisons with TensorFlow

I’ve coded the mannequin with the identical layer configurations on Python utilizing TensorFlow to do some comparisons and I’ve found a number of issues. Beneath are some screenshots of this system that I’ve developed:


One of many fascinating factor that I’ve obtained from this experiment was that if I exploit even the smallest studying charge used through the coaching of the corresponding CNTK’s mannequin (i.e. zero.05 within the above instance), on the mannequin that I’ve developed by utilizing TensorFlow, extreme overfitting downside occurred (All of the classifications resulted in 1). Resulting from that downside, I’ve diminished the educational charge to zero.0001 and was in a position to get eight.77% proportion error for one hidden layered totally related neural community with the MNIST take a look at dataset.

Though the share is excessive, I don’t suppose this implies TensorFlow is unhealthy when it comes to accuracy. If the higher configuration of layers or one other implementation approach was getting used, the consequence might have been completely different. Particularly, I’ve used a unique mini-batch dimension (I’ve used 100 however the predefined mannequin used 64), which might have been resulted in a distinction.

Anyway, through the coding experiences, I’ve found some points that present how properly CNTK itself is designed. I’ve some causes to recommend that.

Firstly, mini-batch coaching may be outlined from the optimiser definitions in CNTK, by simply offering parameters to the capabilities, whereas in TensorFlow, it’s achieved by for loops or by utilizing a separate perform.

For instance, to code the above program, I used for loops to set the mini-batch coaching:


(I’ve eliminated stochastically selecting a mini batch half to make the implementation easy)

Or I might have used a separate enter perform:


Whereas in CNTK:


Which suggests the mini batches are outlined inside an optimiser definition.

In lots of circumstances, above instance reveals pupil has to be taught extra with a purpose to get began with TensorFlow, which might delay the event course of. CNTK programming, then again, may be began sooner because of the much less required studying.

Secondly, I used to be impressed by the strategy that the CNTK defines a layer. So as to outline a layer in CNTK, dimensions of the layer i.e. quite a lot of nodes, an activation perform and a earlier layer should be given. In TensorFlow, all three issues talked about should be outlined individually, and all of the calculations between the layers should be outlined to attach one another.

For instance, for this system described above, I had to make use of:


Since that is generally tough, there’s an extra library known as Keras [1] which allows TensorFlow for use simply. Keras handles the addition of layers by mannequin.add() perform. Though that is straightforward, if the neural community will get sophisticated, it turns into more durable to handle the layers. CNTK, then again, makes the layer to be added simply whereas ensuring the earlier layer that every layer is related to.

On 01_OneHidden.cntk, building of layers has been achieved by:


which is far shorter however nonetheless may be managed simply since all of the earlier layers are described e.g. (h1) after the .

Due to this fact, it makes programming to be achieved simply and likewise makes the layer administration to occur effectively.

In CNN’s case, the distinction turns into slightly bit much less apparent as CNN has completely different sorts of layers, however nonetheless obvious:

In TensorFlow:


So, each convolutional layer and pooling layers should be outlined individually, whereas in CNTK:


Which additionally must outline every layer, however they are often outlined beneath the identical mannequin, which implies the code turns into extra manageable.

Thirdly, CNTK gives quicker computational velocity. In response to the analysis performed in 2015 [2], it has been confirmed that CNTK has the quickest computational velocity amongst 5 completely different machine studying frameworks in case four GPUs have been used. Though I couldn’t perform a take a look at presently since I should not have entry to a machine that has four GPUs, offered that the analysis is appropriate and the effectivity of CNTK is enhancing, it will be cheap to suggest CNTK for use for the large-scale deep studying initiatives. Beneath is the graph that reveals the efficiency comparisons between 5 machine studying frameworks [2]:


Due to this fact, I’m able to say that if I’ve used CNTK to finish my private mission, I might have completed it earlier, whereas saving occasions on each studying and machine studying mannequin coaching. This implies that CNTK is extra preferable for the pc scientists.


My expertise with the CNTK lab has proven how straightforward a machine studying duties may be achieved by college students with a restricted expertise. By offering straightforward interfaces for machine studying, I’m fairly certain that college students from many disciplines together with me can utilise CNTK to finish their initiatives utilizing machine studying, with out a lot further data.

Moreover, on this weblog submit, I’ve talked about about the benefits of utilizing CNTK as a substitute of TensorFlow. As CNTK offered an intuitive technique to construct the mannequin whereas offering a facility to handle the construction of the mannequin, I feel CNTK might be an ideal various for TensorFlow.

Due to this fact, I might suggest any pc scientists to make use of CNTK.

Thanks for studying this weblog submit.



Keras, “Keras: The Python Deep Studying library,” [Online]. Accessible: [Accessed 4 June 2018].


X. Huang, “Microsoft Computational Community Toolkit gives best distributed deep studying computational efficiency,” 7 December 2015. [Online]. Accessible: [Accessed 4 June 2018].


Microsoft, “Azure Labs,” [Online]. Accessible: [Accessed 4 June 2018].


Microsoft, “Cognitive Toolkit Mannequin Gallery,” [Online]. Accessible: [Accessed 4 June 2018].


Yann LeCun et al., “THE MNIST DATABASE,” [Online]. Accessible: [Accessed 4 June 2018].

[1] Convolutional layer processes a convolutional arithmetic, which corresponds to the filter arithmetic in picture processing.

Supply hyperlink

Add a Comment

Your email address will not be published. Required fields are marked *