Introduction to Artificial Intelligence Using Microsoft Cognitive Services

Artificial intelligence (AI) is an area of computer science that emphasizes the creation of intelligent machines that work and react like humans. And AI (and machine learning) is all the rage these days. However, unless you have a PhD in Mathematics and Computer Science, AI programming remains an elusive field to many developers. But this shouldn't be the case. What most developers want is the ability to integrate AI capabilities into their applications without knowing the nuts and bolts of how AI works.

Find additional: Artifical Intelligence Articles

The Microsoft Cognitive Services is a suite of AI services and APIs to help developers integrate AI capabilities into their applications without building the AI services from scratch. Developers simply need to choose the relevant APIs from the Cognitive Services for their needs and requirements.

The various services in the Microsoft Cognitive Services allows you to:

Analyze images and provide a description of their content
Perform language translations
Use machine vision to recognize users and emotions
Perform image recognition
And much more

In this article, I'll walk you through some of the capabilities of the Microsoft Cognitive Services. You'll learn how to use the various APIs and services and integrate them into your Python, iOS, and Android applications. You'll also learn how to build a custom learning model to recognize the thorny (and to some people, smelly) fruit known as durians!

Getting Started

To get started with the Microsoft Cognitive Services, use a Web browser and go to https://azure.microsoft.com/en-us/services/cognitive-services/. Click the Start Free button. You'll now be able to select the Cognitive Services that you want to use (see Figure 1).

Figure 1: Getting the API key for the services you want

Click on the first GET API Key button. This allows you to get the Computer Vision API key. You should see the options shown in Figure 2.

Figure 2: Selecting the types of account you want to use

For readers without an Azure account, click the Get started button under the Guest section. In the next screen, accept the terms and conditions and click Next. You need to select one of the ways to log in (see Figure 3).

Figure 3: You can log in using a variety of different accounts

Once you've successfully logged in, you'll be able to get your trial API keys for the Computer Vision API (see Figure 4).

Figure 4: The API keys and endpoint for the Computer Vision API

It's important that you take note of the following for each API:

The endpoints. This is the URL for you to access the APIs. For the Computer Vision API, the 2 endpoints are: https://westcentralus.api.cognitive.microsoft.com/vision/v1.0 and https://westcentralus.api.cognitive.microsoft.com/vision/v2.0. The westcentralus domain is usually used for trial accounts.
The API keys. Also, known as the subscription key, this is the key that allows your app to access the services. Make sure you save them in a secure location. Remember that these trial keys only last for seven days, so you need to reapply for them when they expire.

The endpoint of each service uniquely identifies the types of service you are using, such as Face API, Vision API, etc.

You can repeat the process to apply for the trial API keys for other services, such as Face API, etc.

Testing the APIs Using CURL

The best way to test out the various services provided by the Microsoft Cognitive Services is to use the cURL utility.

For macOS users, cURL is already installed right out of the box. For Windows users, you can obtain and install cURL from https://curl.haxx.se/download.html.

Using the Vision API

Let's get started with the Vision API, which is a lot of fun to test out. The Vision API includes the following services:

Analyze: The Analyze service helps to detect and provide insight about the visual features and characteristics of your images.
Describe: The Describe service analyzes the content in an image and the description summarizes what is found in the image.
Generate Thumbnail: The Generate Thumbnail service generates a high-quality thumbnail and then analyzes the objects within the image to identify the region of interest (ROI).
Tag: The Tag service returns tags based on more than 2000 recognizable objects, living beings, scenery, and actions.

Let's see how each of these services work with examples.

Analyze Service

The Analyze service helps to detect and provide insight about the visual features and characteristics of your images. For this feature, I'm going to use the image (see Figure 5) located at https://bit.ly/2CI9I53.

Figure 5 : A picture of Steve Jobs speaking

In Terminal, type the following command (all in a single-line; formatted here to aid readability):

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://bit.ly/2CI9I53"}' 
'<a href="https://westcentralus.api.cognitive.microsoft.com/";>https://westcentralus.api.cognitive.microsoft.com/<;/a>
vision/v1.0/analyze?visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&
details=Celebrities,Landmarks&language=en'

In the preceding command:

Replace the <Subscription_Key> with your own subscription key.
Specify the image you want to analyze using a JSON string containing the URL of the image and submit it through the -d option.
The Base URL of the service is that of your endpoint, plus the service you are assessing: https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/analyze. The options for the service are then passed through key-value pairs after the base URL. Here, vision indicates the Computer Vision API, and analyze is the service within the Vision API.
You want the result to contain the following information: Categories, Tags, Description, Faces, ImageType, Color, and Adult, and you pass in all these options through key-value pairs in the URL. The details of the image contain Celebrities and Landmarks.
You want the result to be returned in in English (language=en).

The result returned from the service looks like Listing 1 (formatted for readability).

Listing 1: The result returned from the Analyze service

{
  "categories": [{
    "name": "people_",
    "score": 0.5234375,
    "detail": {
      "celebrities": [{
        "name": "Steve Jobs",
        "confidence": 0.99999821186065674,
        "faceRectangle": {
          "left": 397,
          "top": 100,
          "width": 170,
          "height": 170
        }
      }]
    }
  }, {
    "name": "people_portrait",
    "score": 0.3984375,
    "detail": {
      "celebrities": [{
        "name": "Steve Jobs",
        "confidence": 0.99999821186065674,
        "faceRectangle": {
          "left": 397,
          "top": 100,
          "width": 170,
          "height": 170
        }
      }]
    }
  }],
  "adult": {
    "isAdultContent": false,
    "isRacyContent": false,
    "adultScore": 0.1824040561914444,
    "racyScore": 0.18311712145805359
  },
  "color": {
    "dominantColorForeground": "Black",
    "dominantColorBackground": "Black",
    "dominantColors": ["Black"],
    "accentColor": "2D949E",
    "isBwImg": false,
    "isBWImg": false
  },
  "imageType": {
    "clipArtType": 0,
    "lineDrawingType": 0
  },
  "tags": [{
    "name": "person",
    "confidence": 0.99992716312408447
  }, {
    "name": "man",
    "confidence": 0.99678182601928711
  }, {
    "name": "necktie",
    "confidence": 0.28407412767410278
  }, {
    "name": "concert",
    "confidence": 0.030748603664156535
  }, {
    "name": "music",
    "confidence": 0.023170323008648616
  }],
  "description": {
    "tags": ["person", "man", "holding", "phone", "outdoor", "cellphone", "wearing", "looking", "suit", "standing", "camera", "hand", "photo", "front", "posing", "glasses", "talking", "black", "face", "mirror", "sunglasses", "street", "hat", "court", "blue"],
    "captions": [{
      "text": "Steve Jobs wearing a suit and tie talking on a cell phone",
      "confidence": 0.76812406805206646
    }]
  },
  "faces": [{
    "age": 48,
    "gender": "Male",
    "faceRectangle": {
      "left": 397,
      "top": 100,
      "width": 170,
      "height": 170
    }
  }],
  "requestId": 
    "604422d8-2ed8-414f-8691-d9dafd7d78b0",
  "metadata": {
    "width": 854,
    "height": 570,
    "format": "Jpeg"
  }
}

Observe that:

The categories key contains the main result of the image
It can identify the person in the image (Steve Jobs) with a confidence level of 99.99%.
It can detect the subject's age (48; via the Age attribute).
It can locate the position of the face (faceRectangle attribute).
It also identified a number of other features in the image – man, person, necktie, concert, etc.
It returned a caption that says, “Steve Jobs was wearing a suit and tie talking on the cell phone”.

Not all of the results were accurate, but it's quite impressive that the service got a number of things correct.

If you want the result to be returned in other languages, such as Chinese, replace the value of the language key with the language code, such as “zh”:

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://bit.ly/2CI9I53"}' 
'https://westcentralus.api.
<a href="http://cognitive.microsoft.com/vision/v1.0/";>cognitive.microsoft.com/vision/v1.0/<;/a>
analyze?visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&
details=Celebrities,Landmarks&amp;language=zh'

I'll let you try this out and see the result for yourself.

Sometimes, the image you want to analyze is not on the Web, but local, on your computer. In this case, how do you pass the image to the Vision API? You can use the following command:

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
-H 'Content-Type: application/octet-stream' 
--data-binary @stevejobs.jpg 
'https://westcentralus.api.
<a href="http://cognitive.microsoft.com/vision/v1.0/";>cognitive.microsoft.com/vision/v1.0/<;/a>
analyze?visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&
details=Celebrities,Landmarks&language=en'

In this case, you're uploading the binary image (named stevejobs.jpg and saved in the current directory) instead of specifying the URL for the image. Specifically, note the –data-binary option and the value of the Content-Type in the header.

The Analyze service of the Vision API not only recognizes people; it also recognizes landmarks. Let's try it using the image (see Figure 6) located at https://bit.ly/2Q8qy4n.

Figure 6 : A picture of Notre Dame de Paris

Instead of viewing the entire JSON result, you can extract only the result of interest to you using the jq processor.

The following command sends the image to the Vision API's Analyze service, obtains the result, and extracts the value of the categories key:

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://bit.ly/2Q8qy4n"}' 
'https://westcentralus.api.
<a href="http://cognitive.microsoft.com/vision/v1.0/";>cognitive.microsoft.com/vision/v1.0/<;/a>
analyze?visualFeatures=Categories,Tags,Description,Faces,ImageType,Color,Adult&
details=Celebrities,Landmarks&amp;language=en' | 
jq '.categories'

The result is as follows:

[
  {
    "name": "building_",
    "score": 0.6171875,
    "detail": {
      "landmarks": [
        {
          "name": "Notre Dame de Paris",
          "confidence": 0.966315507888794
        }
      ]
    }
  },
  {
    "name": "outdoor_",
    "score": 0.0078125,
    "detail": {
      "landmarks": [
        {
          "name": "Notre Dame de Paris",
          "confidence": 0.966315507888794
        }
      ]
    }
  }
]

The result shows that the service has recognized the image as that of “Notre Dame de Paris”.

Describe Service

The next service in the Vision API is the Describe service. The Describe service analyzes the content in an image and the returning description summarizes what is found in the image. Each description is evaluated and a confidence score is generated. A list is then returned, ordered from highest confidence score to lowest.

For this example, you will make use of the image (see Figure 7) located at https://bit.ly/2ELp9MD.

Figure 7 : An image of Bill Gates and his daughter

To get a description for this image, type the following command in Terminal:

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://read.bi/2ELp9MD"}' 
'https://westcentralus.api.
<a href="http://cognitive.microsoft.com/";>cognitive.microsoft.com/<;/a>
vision/v1.0/describe'

You should get the following result:

{
  "description": {
    "tags": ["person", "indoor", "man", "photo", "posing", "standing", "couple", 
             "people", "sitting", "table", "front", "holding", "group", "woman", "food", "cake", "large", "suit", "white", "shirt", "kitchen"],
    "captions": [{
      "text": "Bill Gates et al. posing for the camera",
      "confidence": 0.98106487648114582
    }]
  },
  "requestId": 
    "1a8d818b-41fd-4d5f-8ddc-e3bbd883c5c6",
  "metadata": {
    "width": 480,
    "height": 360,
    "format": "Jpeg"
  }
}

Interestingly, the service could recognize Bill Gates but not his daughter. Also, the tags key contains irrelevant items like “cake” and “kitchen,” which aren't found in the image.

Generate Thumbnail Service

The next service of the Vision API is the Generate Thumbnail service. The Generate Thumbnail service generates a high-quality thumbnail and then analyzes the objects within the image to identify the region of interest (ROI). It can optionally crop the image to fit the requirements of the ROI.

For this section, you'll make use of the following image (see Figure 8) located at https://bit.ly/2AxcAjR.

Figure 8 : A image of Bill Gates (off-center)

To use the Generate Thumbnail service, you pass in the URL of the image and use the –output option to specify a filename to save the thumbnail returned by the API:

$ curl -H 'Ocp-Apim-Subscription-Key: 
<Subscription_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://bit.ly/2AxcAjR"}' 
'https://westcentralus.api.
<a href="http://cognitive.microsoft.com/vision/v1.0/";>cognitive.microsoft.com/vision/v1.0/<;/a>
generateThumbnail?width=303&height=303' 
--output billgates.jpg

Figure 9 shows the result of the output, saved in the file named billgates.jpg, and located in the current directory.

Figure 9 : The extracted thumbnail of the image

Here are some sample images for you to try:

Tag Service

The Tag service returns tags based on more than 2000 recognizable objects, living beings, scenery, and actions. For this feature, you're going to use the image (Figure 10) located at https://bit.ly/2Obj9vW.

Type the following command in Terminal:

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://bit.ly/2Obj9vW"}' 
'https://westcentralus.api.
<a href="http://cognitive.microsoft.com/vision/v1.0/tag'";>cognitive.microsoft.com/vision/v1.0/tag'<;/a>

The result is as shown in Listing 2.

Listing 2: The result returned from the Tag service

{
  "tags": [{
    "name": "train",
    "confidence": 0.99972409009933472
  }, {
    "name": "track",
    "confidence": 0.99791616201400757,
    "hint": "sport"
  }, {
    "name": "outdoor",
    "confidence": 0.99774926900863647
  }, {
    "name": "transport",
    "confidence": 0.98162382841110229
  }, {
    "name": "yellow",
    "confidence": 0.81481581926345825
  }, {
    "name": "platform",
    "confidence": 0.66632789373397827
  }, {
    "name": "traveling",
    "confidence": 0.48101967573165894
  }, {
    "name": "pulling",
    "confidence": 0.388392835855484
  }, {
    "name": "railroad",
    "confidence": 0.33330151438713074
  }, {
    "name": "day",
    "confidence": 0.18677715957164764
  }, {
    "name": "rail",
    "confidence": 0.09175156561009834
  }, {
    "name": "snow",
    "confidence": 0.030824975966616926
  }, {
    "name": "station",
    "confidence": 0.026948144704649526
  }],
  "requestId": "ef3bc18d-dc4b-4e40-8768-979808d7cbba",
  "metadata": {
    "width": 630,
    "height": 461,
    "format": "Jpeg"
  }
}

The result contains a series of tags describing the image.

OCR

One very interesting service in the Vision API is the OCR service. The OCR service in the Vision API allows you to detect printed text in images, as well as handwritten text.

Let's first try to detect printed text in the image shown in Figure 11.

Figure 11 : An image with some printed text

Type the following command in Terminal:

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://bit.ly/2SD9u4u"}' 
'https://westcentralus.api.
<a href="http://cognitive.microsoft.com/vision/v1.0/ocr";>cognitive.microsoft.com/vision/v1.0/ocr<;/a>?language=en&detectOrientation=true'

You should get the result shown in Listing 3.

Listing 3: The result returned from the OCR service

{
  "language": "en",
  "textAngle": 0.0,
  "orientation": "Up",
  "regions": [{
    "boundingBox": "48,109,321,220",
    "lines": [{
      "boundingBox": "74,109,275,64",
      "words": [{
        "boundingBox": "74,109,275,64",
        "text": "CAUTION"
      }]
    }, {
      "boundingBox": "88,195,247,39",
      "words": [{
        "boundingBox": "88,195,247,39",
        "text": "HAZARDOUS"
      }]
    }, {
      "boundingBox": "82,247,260,40",
      "words": [{
        "boundingBox": "82,247,138,40",
        "text": "WASTE"
      }, {
        "boundingBox": "231,248,111,38",
        "text": "AREA"
      }]
    }, {
      "boundingBox": "48,299,321,30",
      "words": [{
        "boundingBox": "48,299,132,30",
        "text": "LABELED"
      }, {
        "boundingBox": "190,299,97,28",
        "text": "DRUMS"
      }, {
        "boundingBox": "297,299,72,28",
        "text": "ONLY"
      }]
    }]
  }]
}

You can also use the OCR service to recognize handwritten text. However, unlike the ocr method in the URL (https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/ocr?...), you now need to use the recognizeText method. Also, this method runs asynchronously, and it doesn't return the result directly. Instead, it returns an URI in the response header field: Operation-Content. Using this value, you need to make a query to the API again to extract the result.

Let's see how this is done using the image (see Figure 12) located at https://bit.ly/2EGZ40Y.

Figure 12: An image containing some handwritten text

As usual, use cURL to query the recognizeText method. This time around, you need to use the -v option (for verbose) so that you can get the returning response header:

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://bit.ly/2EGZ40Y"}' 
'https://westcentralus.api.
<a href="http://cognitive.microsoft.com/vision/v1.0/";>cognitive.microsoft.com/vision/v1.0/<;/a>
recognizeText?mode=handwritten' -v

You'll see a whole bunch of text upon success. In particular, look for the value of the Operation-Content field (see Figure 13). Copy the URL.

Figure 13 : Observe the output of the cURL command

Then issue another cURL command, this time around to extract the result (using the URL that you have just copied):

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
https://westcentralus.api.
<a href="http://cognitive.microsoft.com/vision/v1.0/";>cognitive.microsoft.com/vision/v1.0/<;/a>
textOperations/054a5cd6-964d-45f7-bc2a-622ded019d94

You'll now get the result shown in Listing 4.

Listing 4: The result returned from the OCR Handwriting service

{
  "status": "Succeeded",
  "recognitionResult": {
    "lines": [{
      "boundingBox": [54, 12, 637, 18, 636, 
                      59, 53, 54],
      "text": "quite often , I suspect that the",
      "words": [{
        "boundingBox": [33, 12, 132, 13, 133, 
                        55, 34, 54],
        "text": "quite"
      }, {
        "boundingBox": [132, 13, 244, 14, 245, 
                        56, 133, 55],
        "text": "often"
      }, {
        "boundingBox": [238, 14, 270, 14, 271, 
                        56, 239, 56],
        "text": ","
      }, {
        "boundingBox": [267, 14, 299, 14, 300, 
                        56, 268, 56],
        "text": "I"
      }, {
        "boundingBox": [299, 14, 459, 15, 460, 
                        57, 300, 56],
        "text": "suspect"
      }, {
        "boundingBox": [455, 15, 558, 16, 559, 
                        58, 457, 57],
        "text": "that"
      }, {
        "boundingBox": [558, 16, 635, 16, 636, 
                        58, 559, 58],
        "text": "the"
      }]
    }, {
      "boundingBox": [28, 60, 650, 63, 649, 109,
                      27, 106],
      "text": "quality of my handwriting s in",
      "words": [{
        "boundingBox": [15, 59, 165, 59, 165, 
                        107, 15, 107],
        "text": "quality"
      }, {
        "boundingBox": [162, 59, 216, 60, 216, 
                        108, 162, 107],
        "text": "of"
      }, {
        "boundingBox": [216, 60, 300, 60, 299, 
                        108, 216, 108],
        "text": "my"
      }, {
        "boundingBox": [293, 60, 549, 62, 549, 
                        110, 293, 108],
        "text": "handwriting"
      }, {
        "boundingBox": [552, 62, 584, 62, 584, 
                        110, 552, 110],
        "text": "s"
      }, {
        "boundingBox": [597, 62, 642, 62, 642, 
                        110, 597, 110],
        "text": "in"
      }]
    }, {
      "boundingBox": [26, 106, 627, 112, 626, 
                      155, 25, 150],
      "text": "decline , but is this perception",
      "words": [{
        "boundingBox": [22, 106, 164, 107, 166,
                        151, 23, 150],
        "text": "decline"
      }, {
        "boundingBox": [154, 107, 188, 107, 189, 
                        151, 155, 151],
        "text": ","
      }, {
        "boundingBox": [181, 107, 266, 108, 268, 
                        152, 183, 151],
        "text": "but"
      }, {
        "boundingBox": [266, 108, 317, 108, 319, 
                        152, 268, 152],
        "text": "is"
      }, {
        "boundingBox": [317, 108, 409, 109, 410, 
                        153, 319, 152],
        "text": "this"
      }, {
        "boundingBox": [402, 109, 617, 111, 618, 
                        155, 404, 153],
        "text": "perception"
      }]
    }, {
      "boundingBox": [31, 163, 173, 162, 174, 
                      183, 32, 185],
      "text": "correct",
      "words": [{
        "boundingBox": [22, 166, 171, 161, 173, 
                        184, 23, 189],
        "text": "correct"
      }]
    }]
  }
}

Besides returning the words, you also get the coordinates of each recognized word in the image. Something fun to do: write a Python application to display the image and overlay it with rectangles showing each of the words that have been found by the OCR service.

Using the Face API

So far, you've been looking at the Vision API. Let's also take a look at the Face API. The Face API allows you to:

Detect and compare human faces
Organize images into groups
Identify previously tagged people in images

For this example, let's use the image (see Figure 14) located at https://bit.ly/2VyAnIv.

You can use the following command (remember that for this you have to use the key for the Face API, and not the Vision API):

$ curl -H 'Ocp-Apim-Subscription-Key: <Subscription_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://bit.ly/2VyAnIv" }' 
'https://westcentralus.api.
<a href="http://cognitive.microsoft.com/face/v1.0/detect";>cognitive.microsoft.com/face/v1.0/detect<;/a>?
returnFaceLandmarks=true&returnFaceAttributes'

It returns the result shown in Listing 5.

Listing 5: The result returned from the Face API

[{
  "faceId": "0289d1e4-a211-4dc1-9027-ecadf4f70afd",
  "faceRectangle": { "top": 246, "left": 364,
                     "width": 250, "height": 250 },
  "faceLandmarks": {
    "pupilLeft": { "x": 433.1, "y": 316.6 },
    "pupilRight": { "x": 546.5, "y": 314.5 },
    "noseTip": { "x": 486.3, "y": 392.6 },
    "mouthLeft": { "x": 429.0, "y": 422.2 },
    "mouthRight": { "x": 548.1, "y": 423.1 },
    "eyebrowLeftOuter": { "x": 396.5, "y": 300.7 },
    "eyebrowLeftInner": { "x": 472.8, "y": 302.8 },
    "eyeLeftOuter": { "x": 417.5, "y": 318.0 },
    "eyeLeftTop": { "x": 433.9, "y": 310.2 },
    "eyeLeftBottom": { "x": 433.6, "y": 322.5 },
    "eyeLeftInner": { "x": 450.5, "y": 318.2 },
    "eyebrowRightInner": { "x": 505.9, "y": 299.8 },
    "eyebrowRightOuter": { "x": 595.9, "y": 303.1 },
    "eyeRightInner": { "x": 527.4, "y": 315.7 },
    "eyeRightTop": { "x": 543.4, "y": 309.8 },
    "eyeRightBottom": { "x": 542.7, "y": 319.5 },
    "eyeRightOuter": { "x": 560.9, "y": 314.6 },
    "noseRootLeft": { "x": 475.5, "y": 318.7 },
    "noseRootRight": { "x": 504.8, "y": 319.6 },
    "noseLeftAlarTop": { "x": 462.4, "y": 363.6 },
    "noseRightAlarTop": { "x": 512.3, "y": 363.8 },
    "noseLeftAlarOutTip": { "x": 452.3, "y": 385.2 },
    "noseRightAlarOutTip": { "x": 529.2, "y": 382.5 },
    "upperLipTop": { "x": 494.2, "y": 423.9 },
    "upperLipBottom": { "x": 495.1, "y": 430.6 },
    "underLipTop": { "x": 489.8, "y": 448.5 },
    "underLipBottom": { "x": 488.9, "y": 458.9 }
  }
}]

Using this result, you can know the location of each of the features of the face, such as the eyes, nose, lips, and eyebrows.

Using the Vision API in Python

Up until this point, you've been playing with the Vision and Face APIs through the command line using cURL. Of course, developer, cURL is simply not for the end user – it's a development tool. So in the following sections, I'm going to show you how to use the various APIs in the various platforms: Python, iOS, and Android.

Let's get started with Python first. Create a file named visionapi.py and populate it with the code shown in Listing 6.

Listing 6: Python code to access the Vision API's Analyze service

import requests
import matplotlib.pyplot as plt
import json

from PIL import Image
from io import BytesIO

subscription_key = "<Subscription_Key>"
vision_api_url = "<a href="https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/";>https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/<;/a>"
analyze_url = vision_api_url + "analyze"

image_url = "https://bit.ly/2apR8in"

headers = { 'Ocp-Apim-Subscription-Key': subscription_key }
params  = { 'visualFeatures': 'Categories,Description,Color' }
data    = { 'url': image_url }

response = requests.post(analyze_url, headers=headers, params=params, json=data)
           response.raise_for_status()  # raise exception for error codes
analysis = response.json()
           print(json.dumps(response.json()))

image_caption = analysis["description"]["captions"][0]["text"].capitalize()
image_subcaption = "Keywords: " + ",".join(analysis["description"]["tags"])

#---download the image---
image = Image.open(BytesIO(requests.get(image_url).content))

#---display the image---
plt.imshow(image)

#---don't show the x and y-axis---
plt.axis("off")

#---show the description---
plt.suptitle(image_caption, fontsize = 18)

#---show the tags---
plt.title(image_subcaption,  fontsize = 8)
plt.show()

Listing 6 shows how you can use the Vision API's Analyze service to get a description of an online image (located at https://bit.ly/2apR8in) containing a hot air balloon. Once the description is obtained, you'll use the matplotlib library to display the image, as well as the description and tags associated with the image.

To run the Python code, type the following command in Terminal:

$ python visionapi.py

Figure 15 shows the output of the Python code.

Figure 15 : Output of the Python code showing the image with the description and associated tag

What if the image is located locally and not online? Listing 7 shows the modified Python code with a bit more addition).

Listing 7: Modified Python code for an image that is located locally

import requests
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import json

subscription_key = "<Subscription_Key>"
vision_api_url = "<a href="https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/";>https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/<;/a>"
analyze_url = vision_api_url + "analyze"

#---name of image---
filename = "balloon.jpg"

#---read the image---
image = open(filename,'rb').read()

#---specify the type of data you are sending---
headers = { 'Content-Type': 'application/octet-stream', 'Ocp-Apim-Subscription-Key': subscription_key }
params  = { 'visualFeatures': 'Categories,Description,Color' }

response = requests.post(analyze_url, headers=headers, params=params, data=image)
           response.raise_for_status()  # raise exception for error codes
analysis = response.json()
print(json.dumps(response.json()))

image_caption = analysis["description"]["captions"][0]["text"].capitalize()
image_subcaption = "Keywords: " + ",".join(analysis["description"]["tags"])

#---read the image locally---
plt.imshow(mpimg.imread(filename))
plt.axis("off")  # don't show the x and y-axis
plt.suptitle(image_caption, fontsize = 18)
plt.title(image_subcaption,  fontsize = 8)
plt.show()

You'll now be able to load the image locally, send it to the Vision API and get the response.

Using the Vision API in iOS

In the previous section, you saw how to use the Vision API in Python. Let's now use the Vision API in a mobile app. Let's start with iOS first.

Using Xcode, create a new Single View Application project and name it VisionAPI. In Main.Storyboard, add the following views to the View window (see Figure 16):

Label
ImageView
TextView

Figure 16 : Populating the View window with the various views

Create outlets for the ImageView and TextView views in ViewController.swift (be sure to visually connect them to the views on the View window):

import UIKit
class ViewController: UIViewController {    
    @IBOutlet weak var lblResult: UITextView!
    @IBOutlet weak var img1: UIImageView!
    @IBOutlet weak var img2: UIImageView!

Add the statements in Listing 8 to the ViewController.swift file.

Listing 8: Adding the code to access the Vision API in iOS

import UIKit

class ViewController: UIViewController, URLSessionDelegate {
    
    @IBOutlet weak var lblResult: UITextView!
    @IBOutlet weak var img1: UIImageView!
    @IBOutlet weak var img2: UIImageView!
    
    var subscriptionKey = "<Subscription_Key>"
    
    var baseURL = "<a href="https://westcentralus.api.cognitive.microsoft.com/";>https://westcentralus.api.cognitive.microsoft.com/<;/a>" 
        + "vision/v1.0/describe?maxCandidates=1"
    
    //---billgates---
    var image1URL = "https://bit.ly/2EHM3o8"
    
    //---steve jobs and mark zuckerberg---
    var image2URL = "https://bit.ly/2JeKfBB"
    
    let activityView = UIActivityIndicatorView(activityIndicatorStyle: .whiteLarge)

    func connectREST(method:String, apiUrl:String, 
                     imageURL:String){
        let config = URLSessionConfiguration.default
        let url = URL(string:apiUrl)
        let request = NSMutableURLRequest(url: url!)

        //---method---
        request.httpMethod = method
        
        //---headers---
        request.addValue("application/json", forHTTPHeaderField: "Content-Type")
        request.addValue("\(subscriptionKey)", forHTTPHeaderField: "Ocp-Apim-Subscription-Key")
        
        //---data---
        let bodyData = "{'url':'\(imageURL)'}"
        request.httpBody = bodyData.data(using: String.Encoding.utf8);
        
        let session = Foundation.URLSession(
            configuration: config,
            delegate: self,
            delegateQueue: nil)
        session.dataTask(with: request as URLRequest,
          completionHandler: {
            (data, response, error) -> Void in
              if error == nil {
                let result = NSString(data: data!, encoding: String.Encoding.utf8.rawValue)
                let statusCode = 
                  (response as! HTTPURLResponse).statusCode
                print("Status Code: \(statusCode) - " + (result! as String))
                DispatchQueue.main.async {
                  self.lblResult.text = (result! as String)                    
                  self.activityView.stopAnimating()
                }
              } else {                                   
                    print(error.debugDescription)
              }
        }).resume()
    }

    //---fired when user taps on an image---
    @objc func imageTapped(tapGestureRecognizer: UITapGestureRecognizer)
    {
        let tappedImage = tapGestureRecognizer.view as! UIImageView
        activityView.center = tappedImage.center
        self.view.addSubview(activityView)
        activityView.startAnimating()
        if (tappedImage.tag == 0 ) {
          connectREST(method: "POST", apiUrl: baseURL, imageURL: image1URL)
        } else {
          connectREST(method: "POST", apiUrl: baseURL, imageURL: image2URL)
        }
    }

In the Main.Storyboard file, set the Tag attribute of the second ImageView to 1 (see Figure 17). This is to allow the code to be able to differentiate between the two ImageView views.

Figure 17 : Setting the Tag for the second ImageView

Add the following statements to the viewDidLoad() function in ViewController.swift:

    override func viewDidLoad() {
        super.viewDidLoad()
        
        var url = URL(string: image1URL)
        var data = try? Data(contentsOf: url!)
        img1.image = UIImage(data: data!)
        
        url = URL(string: image2URL)
        data = try? Data(contentsOf: url!)
        img2.image = UIImage(data: data!)
        
        //---make the images tappable---
        let tapGestureRecognizer1 =
            UITapGestureRecognizer(
              target: self,
              action: #selector( 
          imageTapped(tapGestureRecognizer:)))
        img1.isUserInteractionEnabled = true 
        img1.addGestureRecognizer(
            tapGestureRecognizer1)
        
        let tapGestureRecognizer2 =
            UITapGestureRecognizer(
              target: self,
              action: #selector( 
          imageTapped(tapGestureRecognizer:)))
        img2.isUserInteractionEnabled = true 
        img2.addGestureRecognizer(
            tapGestureRecognizer2)
    }

Press Cmd-R to deploy the application onto the iPhone Simulator. Tap one of the images and observe the message displayed on the screen (see Figure 18).

Figure 18 : Testing the application on the iOS Simulator

Using the Vision API in Android

Let's now work on a similar app, but this time for Android.

Using Android Studio, create a new project and name it UseMSCognitive. Add the following statement in bold to the AndroidManifest.xml file:

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android=<a href="http://schemas.android.com/apk/res/android";>http://schemas.android.com/apk/res/android<;/a>
    package= "com.example.weimenglee.usemscognitive">
    <uses-permission android:name="android.permission.INTERNET"/>
    <application android:allowBackup="true"
        android:icon="@mipmap/ic_launcher
        android:label="@string/app_name" 
        android:roundIcon="@mipmap/ic_launcher_round"
        android:supportsRtl="true"
        android:theme="@style/AppTheme">
        <activity android:name=".MainActivity">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />
                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
    </application>
</manifest>

This addition is needed because you need Internet access to access the Vision API service. Add the statements in bold to the activity_main.xml file as shown in Listing 9. The additions are to build the UI for the Android app.

Listing 9: Creating the UI for the Android app

<?xml version="1.0" encoding="utf-8"?>
<android.support.constraint.ConstraintLayout 
  xmlns:android="<a href="http://schemas.android.com/apk/res/android";>http://schemas.android.com/apk/res/android<;/a>"
  xmlns:app="<a href="http://schemas.android.com/apk/res-auto";>http://schemas.android.com/apk/res-auto<;/a>"
  xmlns:tools="<a href="http://schemas.android.com/tools";>http://schemas.android.com/tools<;/a>"
  android:layout_width="match_parent"
  android:layout_height="match_parent"
  tools:context="com.example.wml.usemscognitive.MainActivity">

    <LinearLayout
        android:layout_width="fill_parent"
        android:layout_height="fill_parent"
        android:layout_marginEnd="8dp"
        android:layout_marginStart="8dp"
        android:layout_marginTop="16dp"
        android:orientation="vertical"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent">

        <ImageView
            android:id="@+id/image1"
            android:layout_width="fill_parent"
            android:layout_height="200dp"
            android:layout_marginBottom="8dp"
            android:layout_marginEnd="8dp"
            android:layout_marginStart="8dp"
            android:layout_marginTop="8dp"
            android:scaleType="centerCrop" />

        <ImageView
            android:id="@+id/image2"
            android:layout_width="fill_parent"
            android:layout_height="200dp"
            android:layout_marginBottom="8dp"
            android:layout_marginEnd="8dp"
            android:layout_marginStart="8dp"
            android:layout_marginTop="8dp"
            android:scaleType="centerCrop" />

        <ProgressBar
            style="?android:attr/progressBarStyle"
            android:id="@+id/progressBar"
            android:layout_width="match_parent"
            android:layout_height="wrap_content" />

        <TextView
            android:id="@+id/result"
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:textSize="20sp"
            android:text="" />

    </LinearLayout>

</android.support.constraint.ConstraintLayout>

Finally, add the statements to the MainActivity.java file, as shown in Listing 10.

Listing 10: Additions to the MainActivity.java file

package com.example.weimenglee.usemscognitive;

import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.os.AsyncTask;
import android.os.Handler;
import android.os.Looper;
import android.support.v7.app.AppCompatActivity;
import android.os.Bundle;
import android.util.Log;
import android.view.View;
import android.widget.ImageView;
import android.widget.ProgressBar;
import android.widget.TextView;
import android.widget.Toast;

import org.json.JSONObject;

import java.io.BufferedWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

import javax.net.ssl.HttpsURLConnection;

public class MainActivity extends AppCompatActivity {
    VisionAPI api;

    String baseURL =  
        "<a href="https://westcentralus.api.cognitive.microsoft.com/";>https://westcentralus.api.cognitive.microsoft.com/<;/a>" + "vision/v1.0/describe?" + "maxCandidates=1";

    String subscriptionKey = "<Subscription_Key>";
    String imageURL;

    String image1URL = "https://bit.ly/2JgzZbQ";
    String image2URL = "https://cnb.cx/2RbEOWY";

    ImageView imageView1;
    ImageView imageView2;
    TextView txtResult;
    ProgressBar progressBar;

    //---down the photos from web and display on screen---
    private class DownloadImageTask extends 
    AsyncTask<String, Void, Bitmap> {
        ImageView bmImage;

        public DownloadImageTask(ImageView bmImage) {
            this.bmImage = bmImage;
        }

        protected Bitmap doInBackground(String... urls) {
            String urldisplay = urls[0];
            Bitmap mIcon11 = null;
            try {
                InputStream in = new 
                    java.net.URL(urldisplay).openStream();
                mIcon11 = BitmapFactory.decodeStream(in);
            } catch (Exception e) {
                Log.e("Error", e.getMessage());
                e.printStackTrace();
            }
            return mIcon11;
        }

        protected void onPostExecute(Bitmap result) {
            bmImage.setImageBitmap(result);
        }
    }

    //---access the Vision API---
    private class VisionAPI extends AsyncTask
    <Void, String, String> {
        public String url;
        public String method;
        public int response = -1;

        @Override
        protected String doInBackground(Void... params) {
            String urlString = this.url;
            int BUFFER_SIZE = 2000;
            InputStream in;
            try {
                URL url = new URL(urlString);
                URLConnection conn = url.openConnection();
                try {
                    HttpsURLConnection httpConn = 
                        (HttpsURLConnection) conn;
                    httpConn.setAllowUserInteraction(false);
                    httpConn.setInstanceFollowRedirects(true);
                    httpConn.setRequestMethod(this.method);
                    httpConn.setRequestProperty("Content-Type", 
                        "application/json");
                    httpConn.setRequestProperty(
                        "Ocp-Apim-Subscription-Key", 
                        subscriptionKey);
                    if (this.method == "POST") {
                        httpConn.setDoOutput(true);
                        JSONObject jsonObj = new JSONObject(
                            "{'url':'" + imageURL + "'}");
                        BufferedWriter wr = 
                            new BufferedWriter(new              
                              OutputStreamWriter(
                                  httpConn.getOutputStream()));
                        wr.write(jsonObj.toString());
                        wr.flush();
                    }
                    httpConn.connect();
                    this.response = httpConn.getResponseCode();
                    in = httpConn.getInputStream();
                } catch (Exception ex) {
                    throw new IOException("Error connecting");
                }
            } catch (IOException e) {
                Log.d("REST Service", e.getLocalizedMessage());
                return "";
            }

            InputStreamReader isr = new InputStreamReader(in);
            int charRead;
            String str = "";
            char[] inputBuffer = new char[BUFFER_SIZE];
            try {
                while ((charRead = isr.read(inputBuffer)) > 0) {
                    //---convert the chars to a String---
                    String readString =
                        String.copyValueOf(inputBuffer, 0, 
                        charRead);
                    str += readString;
                    inputBuffer = new char[BUFFER_SIZE];
                }
                in.close();
            } catch (IOException e) {
                Log.d("REST Service", e.getLocalizedMessage());
                return "";
            }
            return str;
        }

        protected void onPostExecute(String result) {
            if (result != "") {
                txtResult.setText(result);
                progressBar.setVisibility(View.GONE);
            }
        }
    }

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        imageView1  = findViewById(R.id.image1);
        imageView2  = findViewById(R.id.image2);
        txtResult   = findViewById(R.id.result);
        progressBar = findViewById(R.id.progressBar);

        new DownloadImageTask(imageView1).execute(image1URL);
        new DownloadImageTask(imageView2).execute(image2URL);

        progressBar.setVisibility(View.GONE);

        imageView1.setOnClickListener(new View.OnClickListener() 
        {
            public void onClick(View v) {
                api = new VisionAPI();
                api.url = baseURL;
                api.method = "POST";
                imageURL = image1URL;
                progressBar.setVisibility(View.VISIBLE);
                api.execute();
            }
        });

        imageView2.setOnClickListener(new View.OnClickListener() 
        {
            public void onClick(View v) {
                api = new VisionAPI();
                api.url = baseURL;
                api.method = "POST";
                imageURL = image2URL;
                progressBar.setVisibility(View.VISIBLE);
                api.execute();
            }
        });
    }
}

Run the application on the Android emulator. Clicking on either image will display the result shown in Figure 19.

Figure 19 : Testing the application on the Android Emulator

Custom Vision

Phew! You've seen quite a lot in action. But I saved the coolest for last. One of my favorite features in the Microsoft Cognitive Services is the Custom Vision Service. The Custom Vision Service allows you to train learning models to recognize your own custom images. For example, you can use the Custom Vision Service to build a learning model to recognize images of durians. Best of all, the learning model that you've trained can be exported as a CoreML or TensorFlow model for use in mobile applications. The best way to learn more about this is to try it out. So let the fun begin!

Using your Web browser, go to https://www.customvision.ai/ and sign in. Click NEW PROJECT (see Figure 20).

Figure 20 : Creating a new project in Custom Vision

Enter the following details and then click Create project (see Figure 21).

Figure 21 : Entering the details for the project to recognize durians

Click Add images (see Figure 22). Select the series of durian images that you have prepared beforehand (you can always use Google to find a series of durian images).

Figure 2 2: Click the Add Images button to add images to your project for training.

Assign the tag of “durian” and click the Upload 19 images button (see Figure 23):

Figure 23 : Tagging and uploading the series of durian images

Once the images are uploaded, click the Train button (see Figure 24):

Figure 24: Click the Train button to start the training

Once the training is done, click the Quick Test button (see Figure 25):

Figure 25 : Click the Quick Test button to test the model.

Click the Browse local files (see Figure 26) button and select a test image.

Figure 26 : Uploading a test durian image

Figure 27 shows the result of the test. I've uploaded an image containing a soursop and the model predicted that it is only 3.3% confident that it's a durian.

Figure 27 : The mode is only 3.3% confident that the image is a durian.

Figure 28 shows another test image. This time the model predicted that it is 95% confident that it is a durian!

Figure 28 : The mode is 95% confident that the image is a durian.

Interestingly, when I used a test image of a jackfruit (see Figure 29), the model predicted that it is 97.6% confident that it is a durian. This is because I've only used 19 images for training. If I used more images containing different parts of the durian, I'd get a more accurate prediction.

Figure 29 : Wrongly predicting that a jackfruit is a durian

Obtaining the Prediction API

Once the model is trained, you want it to be published it so that it's available programmatically. To do so, click the Performance tab (see Figure 30) and then click the Prediction URL link.

Figure 30 : Locating the Prediction URL link

You will see the information as shown in Figure 31.

Figure 31 : Save the details of the Prediction API

Take note of the two URLs and the Prediction-Key.

Using cURL to Access the Prediction API

Now that the learning model is published publicly, you can test it using cURL. For this, you'll make use of the image (see Figure 32) located at: https://bit.ly/2yKVhdE.

In Terminal, type the following command (the <Prediction_Image_URL> refers to the first URL that you saw in Figure 31):

$ curl -H 'Prediction-Key: 
<Prediction_Key>' 
-H 'Content-Type: application/json' 
-d '{"url":"https://bit.ly/2yKVhdE"}'
'<Prediction_Image_URL>'

You should see the following output:

{
  "id": "380fa20b-e483-442f-ae16-f5b12e484858",
  "project": "0104d173-00b2-453d-926c-0d78b74edf42",
  "iteration": "048cde7d-ceee-4dc8-9687-0aefb5fce986",
  "created": "2018-10-31T04:17:06.8759444Z",
  "predictions": [{
    "probability": 0.9991205,
    "tagId": "78439faa-4e37-461c-9ed7-e9a0f7a60501",
    "tagName": "durian"
  }]
}

The result indicates that the model is 99.91% confident that the image is a durian.

Exporting the Model to CoreML (iOS)

With the model tested to work correctly, let's now export the model so that you can use it in your iOS and Android apps.

Using your Web browser, go to: https://www.customvision.ai/projects. Select the Durians project (see Figure 33).

Figure 33 : Locating the Durians project you created earlier

Click on the gear icon, as shown in Figure 34.

Scroll down the page and select General (compact) and then click Save Changes (see Figure 35).

Figure 35 : Converting the project to General (compact)

Click the Train button located at the top of the page. Then, click the Performance tab and then click Export (see Figure 36).

Select the iOS platform (see Figure 37). You will download the Tensorflow (TF) model later to use on the Android platform.

Figure 37 : Exporting the model for iOS (CoreML) and Android (TensorFlow)

Click Export and then Download. A file named something like 0bbfed18d17a43258828d3a710b8b9e8.mlmodel will now be downloaded onto your computer. Rename it Durians.mlmodel.

To make it really easy to test your iOS app, Microsoft has a sample app already written for you. Download the sample code for iOS at: https://github.com/Azure-Samples/cognitive-services-ios-customvision-sample.

Launch the sample code using Xcode. Drag and drop the Durians.mlmodel file that you have downloaded earlier into the project (see Figure 38).

Figure 38 : Drag and drop the Durians.mlmodel file into the Xcode project.

In the ViewController.swift file, locate the following block of code and make the following changes in bold:

  let targetImageSize = CGSize(width: 227, height: 227) 
   // must match model data input
  
  lazy var classificationRequest: [VNRequest] =
  {
    do {
      // Load the Custom Vision model.
      // To add a new model, drag it to the 
      // Xcode project browser making sure 
      // that the "Target Membership" is 
      // checked.
      // Then update the following line with 
      // the name of your new model.
      let model = try VNCoreMLModel(for: 
          Durians().model)
      let classificationRequest = 
          VNCoreMLRequest(model: model, 
          completionHandler: self.handleClassification)
      return [ classificationRequest ]
    } catch {
      fatalError(
       "Can't load Vision ML model: \(error)")
    }
  }()

Deploy the application onto an iOS device. Point the camera at the picture of a durian and observe that the app will display a pop up displaying the word “durian” (see Figure 39).

Figure 39 : The iOS app displays the caption “durian” when you point the camera at images of durians.

Exporting the Model to TensorFlow (Android)

Finally, let's see how to use the trained model on Android. Back in the Export page, click TF to download the model for Tensorflow. Click Export and then Download.

Once the zip file is downloaded, unzip it and it should contain two files:

labels.txt
model.pb

Like its iOS counterpart, Microsoft has also written a sample Android app for you to quickly test your model. Download the sample Android app from: https://github.com/Azure-Samples/cognitive-services-android-customvision-sample. Once the project is downloaded, open it in Android studio.

Replace the two files in the assets folder with the ones that you have just downloaded (see Figure 40).

Figure 40 : Replacing the two files in the assets folder with the ones you've just downloaded

Deploy the application on an Android device. Point the camera at the picture of a durian and observe that the app displays a pop up displaying the word “durian” (see Figure 41).

Figure 41 : The Android app displays the caption “durian” when you point the camera at images of durians.

Summary

In this article, I've walked you through some of the services in the Face and Vision APIs available in the Microsoft Cognitive Services. At the same time, I've showed you how to integrate them into your Python, iOS, and Android apps. The final section of this article talked about custom vision, and this is something that is very interesting and hopefully that has given you some good ideas about the types of applications you can build using the Microsoft Cognitive Services.