hitesh-baldaniya.png

Hitesh Baldaniya

Hitesh Baldaniya is a Technical Architect. He has six years of experience in backend development, designing scalable system architectures, debugging, and problem solving. His interests include learning new technologies and listening to music.

Posts by Hitesh Baldaniya

Jul 02, 2021

Principles of Effective RESTful API Design

Highlights You'll learn: Use JSON for communication: APIs should accept and respond with JSON for efficient data exchange. Utilize nouns in endpoint paths: Avoid verbs to maintain clarity and simplicity. Name collections with plurals: Enhances readability and understanding. Adhere to widely accepted standards: Ensures compatibility and interoperability. Secure your APIs: Protecting sensitive data and maintaining user trust is crucial. Keep reading to take your API design to the next level! The RESTful style of API has been around for more than 20 years. It is one of the most common services that allow clients — including browser applications and mobile and IoT devices — to talk to the servers. If you are developing an app and have reached a stage where you are ready to create public APIs, it is worth pausing to ensure that you are on the right track. It’s difficult to make drastic changes to your APIs once they are out. So, getting as much right as possible from the beginning makes sense. And since these APIs would form the core of your application, they should be: Secure, fast, and flexible for easy scaling Built using common and widely accepted standards Easily understandable and consumable, enabling quick integration Supported with good documentation that explains semantics and syntax Contentstack serves billions of API requests every month without glitches, and it’s flexible enough to scale 10x the current volume in a flash. This blog post shares some of our best practices while developing our API. These tips should be helpful to anyone starting to build REST APIs. Make Your APIs Secure When developing APIs, security should come first. This is a must. Since you can execute an API call from the Internet, requests can come from anywhere. Regarding API security, there is no reason not to use encryption. Use SSL/TLS, always. SSL security is a very straightforward and least expensive method to encrypt the request and response. TLS is a cryptographic protocol that provides secure, encrypted communication over a computer network. It encrypts data between an API client and an API server, preventing data from being read if intercepted between point A and point B. TLS ensures the encryption of data in transit. Another potential security threat could be long-lived authentication or authorization tokens for APIs. A best practice is to make them short-lived. You can do this with custom API key management using low-overhead implementation protocols like OAuth or JWT. Short-lived API tokens are much easier to use and significantly more secure. {{nativeAd:3}} Define Requests Clearly It all starts with defining your requests in a way that makes your API easy to use, reduces ambiguity, and brings some consistency. Let’s look at some ways to help you achieve that. Follow request path conventions Make use of resource namesYour request path should have the resource name with which the API will interact. For example, if your app provides “products,” use “/products” as a noun in the API request. Avoid using verbs with resource names in the request (e.g., /api/create-products) since the request’s type should define the verb, as explained below. Use HTTP methodsMost developers are familiar with HTTP methods such as GET, POST, PUT, and DELETE. These are the type of common API requests developers make on the resources. So, something like “GET /products” indicates that the request is to fetch products. Use plural formsYou can use plural forms for all resource names. For example, “/products” can be used to fetch all products, whereas “/products/20” is used to fetch a single product with an ID of “20.” Plural forms make your requests more consistent and intuitive. Use nested hierarchyYou can provide a structured hierarchy for nested or related resources, so it’s easier for a larger group to work on a specific item or sub-item. But keep the depth level to a minimum. For example, for products that have reviews and ratings associated with them, you can define the relationship as follows: GET /products/:product_id/reviews POST /products/:product_id/reviews PUT /products/:product_id/reviews/:review_id Most of the cases satisfy the resource mapping with a path. Still, certain exceptions are relevant to functionality and do not have any resources associated with it directly, such as /search or /bulk-actions. We can define such a request with its associated actions. Use Standard Exchange Format REST can use many exchange formats such as plain text, XML, CSV, etc. But go with JSON. JSON is a lightweight data format, allowing for faster encoding and decoding on the server-side. It can be easily consumed by different channels such as browsers, mobile devices, IoT devices, etc., is available in many technologies and is now a standard for most developers. To ensure that your API uses JSON, use “Content-type”: “application/json” in request and response. Use consistent casing, such as lowercase (recommended) for the request body fields. For other non-textual formats, use “multipart/form-data,” which you can use for sending files over HTTP. While it also allows sending textual or numerical data, restrict its usage to sending files; use JSON for textual data. For this format, you need to use “Content-type”: “multipart/form-data” in the request header. Response header may vary based on the type of file it receives (e.g., images, application, PDFs, and documents). Provide Standard Headers Requests headers are used to transfer additional information from clients. Follow the standard headers to share information like content-type, basic authentication, content-length, accept, accept-encoding, and user-agent. We recommend transferring the security and authentication parameters, such as a user token or secret tokens, into the headers. Version Your APIs As you start introducing more features and update existing ones, you may have to make minor or major changes to your APIs. Changes are inevitable as you grow. The best way to maintain these changes is through versioning. Versioning your APIs ensures that you continue to support older APIs and release newer APIs systematically. It also prevents users from hitting invalid URLs. Each change should be versioned, and your APIs should be backward compatible. While doing this, you must ensure users have a way to migrate to the latest version. For significant changes, the version should be in the request URL. Examples: GET /v1/products - Version 1 of products GET /v2/products - Version 2 of products For minor changes or fixes, pass the versions in the header, where you can use “date” to track the changes, as shown in the following example: GET /v1/products Header: Accept-Version: “2019-11-01” Offer Ways to Filter, Paginate, Sort, and Search Filtering: Design your APIs to filter or query the stored data via specific parameters. For example, users may want to filter products by tags, categories, or price range. You can provide filters like this: /v1/products?query={tags: ["tag1", "tag2"]}. Pagination: Provide the ability to paginate responses so that users can request just what is required. For example, a mobile user may want the first five entries to show on the homepage, whereas web users may want 25 entries in a single request. There are various ways to provide pagination. The more flexibility you offer, the better it will be for users. The “skip,” “limit,” and “per_page” are some of the most common parameters that provide enough flexibility for paginations. It is, however, also important to define certain restrictions—for example, set a max limit on items that can be fetched “per page”—so the users do not exploit your services. /v1/products?skip=100&limit=10&per_page=10 Sorting: It provides the ability to get the list of items in the desired order. You can provide options to sort in ascending and descending order based on values of certain fields, such as updated_at, price, etc. The users can then combine these options for more flexibility. For example /v1/products?sort=-price,updated_at. Searching: Searching is different from filtering or passing basic queries. It should give the ability to perform a full-text search, helping users find relevant results faster. So, for example, when the user runs this /v1/products?text={{search}}, your API should check for relevant terms in the values of all the fields. Field Selections Your APIs should provide your users the flexibility to fetch only the required details instead of everything. No over-fetching or under-fetching. Field selections are helpful for low-computing devices (such as mobile or IoT) as it allows them to save on bandwidth and improve the overall network transmission speed. You can do this by offering ways within the request to include or exclude data of specific fields. For example: /v1/products?skip_fields[]=created_at,updated_at and /v1/products?select_fields[]=title,price,color Also, you can extend this further by offering ways to include details of other related items in the single request to save on the network traffic. For example: /v1/products?include=categories,colors,brands&select_fields[categories]=title,price,color Structure Responses the Right Way Your API requests need to have an appropriate response. You can achieve this by following certain standard practices. Here are some of the important ones. {{nativeAd:9}} Response Status Codes Use the standard HTTP status codes in the API responses to inform the users about success or failure. While there are broad categories for these codes, each code has a meaning and should indicate the exact definition. Here are the categories of codes that you can use: Informational responses (100–199) Successful responses (200–299) Redirects (300–399) Client errors (400–499) Server errors (500–599) You can find more details on status codes here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status. Response Body Request errors should deliver proper error messages so that the user is informed about the exact issue. Attach each resource with its system-provided metadata, such as created_at, updated_at, created_by, and updated_by. Each successful response should be wrapped via its proper wrapper according to the path, to include the additional meta details about the response. Example: GET /v1/users?limit=10{ “users”: [{ first_name: “Amar”, last_name: “Akbar”, … }, { first_name: “Alpha”, last_name: “Beta”, … }], “skip”: 0, “limit”: 10,}Headers Response headers play an essential role in providing information about the server’s behavior to respond to the request. Make sure your headers support the following before you make your APIs public: RequestID This response header helps debug the specific request instead of asking the customer different questions such as the request type, time, etc. It can trace down the entire request lifecycle on the server. You can use the “X-Request-Id: {{TRACE_ID}}” header in the response. CORS Your APIs should have support for CORS if they are available for public consumption. CORS (cross-origin resource sharing) is an HTTP-header-based mechanism that allows a server to indicate any other origins than its own from which a browser should permit the loading of resources. If your APIs are not consumable publicly or are restricted, it is best to keep them closed by providing restricted CORS values. You can find more information about CORS here: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS Rate Limit To ensure your system is secure from unwanted attacks and abuse, add rate limiting to your APIs. Rate limiting restricts users from making more than the defined requests in a given time period. While implementing this, a good practice is to provide a way for users to know about the time limits through the following headers. X-RateLimit: Max requests allowed in a given time period X-RateLimit-Remaining: Requests remaining in a given time period X-RateLimit-Reset: Time remaining in the current period Compression Your API should support all types of compression on the response. This compression saves a lot of time for the network transmission of the data. Some of the popular compression types are gzip and brotli. Cache Headers Cache headers are mostly provided on GET calls to give information about the behavior of the request, more specifically to determine if the request was served from the cache or the origin server. You can use the following header for this: X-Cache: “HIT/MISS.” Let’s look at some of the other headers you can use: Cache-controlThis header allows the client to keep the resource stored on their end and not make requests until the cache is cleared or expired. EtagsWhen generating a response, include an HTTP header Etag containing a hash or checksum of the representation. If an inbound HTTP request contains an If-None-Match header with a matching Etag value, the API should return a 304 Not Modified status code instead of the output representation of the resource. Have Adequate Documentation and Tools in Place Most developers would prefer reading the documentation before trying out your APIs or attempting any integration. It is, therefore, crucial to make your docs publicly available and easy to understand. There are tools like Swagger that help you generate the documentation API during the development phase. Ensure that your docs record change logs, version details, and deprecation notices whenever required. You can provide examples of requests and responses so users know what to expect. You can also provide Postman collections that allow users to configure the environment and try out the API requests through their accounts to make things easier. Since APIs are used by developers, it is important to follow good design practices to deliver a well-designed and effective developer experience.

Oct 19, 2020

The Benefits of Contentstack’s CDN

For most web properties, delivering content fast is of the utmost importance. That’s why Content Delivery Networks (CDNs) exist. A CDN is a network of caching servers scattered across the globe, caching content regularly and delivering cached content to nearby requests. This improves the content delivery and page load time dramatically since the request does not need to travel to the origin server. Read more about CDN. Most website owners consider setting up a CDN. But with Contentstack as your CMS, you don’t need to worry about setting up any caching mechanism since Contenstack comes with a CDN. Lightning-fast content delivery comes by default. Let’s look at some of the other CDN benefits that you get with Contentstack. Leverage our CDN to serve high traffic We have partnered with a modern CDN provider with more than 60 cache servers worldwide, capable of delivering cached content in a fraction of a second. Additionally, there is no maximum limit to the number of GET requests per second you can make to the CDN. This means you do not need any caching mechanism or CDN, even if your website gets incredibly high traffic per day. In cases when Contentstack is down, which doesn’t happen, the CDN servers will continue to deliver cached content to the visitors. Keep your content fresh with Cache Purging Purging refers to the removal of cached content from the CDN servers. Your cached content remains in the Contentstack CDN servers for up to one year, after which it is cleared automatically. However, whenever any content is published, unpublished, or deleted, it purges the changed content (and some related content such as referred entries and assets) from the CDN servers instantly. Read about how cache purging works with Contentstack. After the cache is purged, when making subsequent page requests, the CDN fetches new content from the origin server, delivers to the requester, and saves the updated cache. This process ensures that your website visitors always get updated, fresh content. If an asset or image is updated, the cache is not purged. The updated asset or image gets a new URL, and the old image or asset would be available at the previous URL for one year. Near Realtime Cache Purging Cache purging happens in realtime. When any new content is published, unpublished, or deleted, the cache is purged instantly. So, subsequent requests are updated from the origin server. Cache Optimized for Fewer Requests to the Origin Server Contentstack’s intelligent caching mechanism purges the cache of only the content that has been changed (published, unpublished, or deleted) and other related content (e.g., referenced entries and assets) from the CDN servers. Also, the cache of only that specific locale and environments are purged. Purging happens for only the cached items that are changed. The cache of other unchanged content remains intact. This translates into a lower number of requests to the origin server. You can refer to our Cache Purging Scenarios doc for more details. Maintenance-free and Economical Having a CDN within your CMS means you don’t have to worry about choosing a CDN from the hundreds of available options, setting it up, and maintaining it forever. Nor do you have to worry about the CDN being compatible with the CMS or defining cache purging rules, etc. It’s all set up and ready to use. Additionally, this is much more cost-effective than another third-party CDN, even if you use and pay for over-usage with Contentstack. From the above points, it is undoubtedly clear that you do not need a separate CDN for your web properties if you use Contentstack as your CMS. However, if you still want to set up a separate CDN (or if you already have one), there are certain things to keep in mind when using Contentstack. Using other CDNs is a topic that we will cover in our follow-up CDN blog post.

Oct 06, 2020

Elasticsearch: Working With Dynamic Schemas the Right Way

Elasticsearch is an incredibly powerful search engine. However, to fully utilize its strength, it’s important to get the mapping of documents right. In Elasticsearch, mapping refers to the process of defining how the documents, along with their fields, are stored and indexed. This article dives into the two types of schemas (strict and dynamic) that you usually encounter when dealing with different types of documents. Additionally, we look at some common but useful best practices for working with the dynamic schema so that you get accurate results for even the most complex queries. If you are new to Elasticsearch, we recommend reading and understanding the related terms and concepts before starting. Schema Types, Their Mapping, and Best Practices Depending on the type of application that you are using Elasticsearch for, the documents could have a strict schema or a dynamic schema. Let’s look at the definition and examples of each, and learn more about their mapping. Strict Schema - The Simple Way A strict schema is where the schema follows a rigid format, with a predefined set of fields and their respective data types. For example, systems like logs, analytics, application performance systems (APMs), etc. have strict schema formats. With such schemas, you know that all the index documents have a known data structure, which makes it easier to load the data in Elasticsearch and get accurate results for queries. Let’s look at an example to understand it better. The following snippet shows the data of a log entry within Nginx. { "date": "2019-01-01T12:10:30Z", "method": "POST", "user_agent": "Postman", "status": 201, "client_ip": "0.0.0.0", "url": "/api/users" } All the log entries within Nginx use the same data structure. The fields and data types are known so it becomes easy to add these specific fields to Elasticsearch, as shown below. { "mappings": { "properties": { "date": { "type": "date" }, "method": { "type": "keyword" }, "user_agent": { "type": "text" }, "status": { "type": "long" }, "client_ip": { "type": "IP" }, "url": { "type": "text" } } } } Defining the fields, as shown above, makes it easy for Elasticsearch to get the relevant results for any query. Non-Strict Schema Challenges and How to Overcome Them There are several applications where the schema of the documents is not fixed and varies a lot. An apt example would be the various structures that you define in a content management system (CMS). Different types of pages (for example navigation, home page, products) may have different fields and data types. In such cases, if you don’t provide any mapping specifications, Elasticsearch has the ability to identify new fields and generate mapping dynamically. While this, in general, is a great ability, it may often lead to unexpected results. Here’s why: When documents have a nested JSON schema, Elasticsearch’s dynamic mapping does not identify inner objects. It flattens the hierarchical objects into a single list of field and value pairs. So, for example, if the document has the following data: { "group" : "participants", "user" : [ { "first" : "John", "last" : "Doe" }, { "first" : "Rosy", "last" : "Woods" } ] } In such a case, the relation between “Rosy” and “Woods” is lost. And for a query that requests for “Rosy AND Woods,” it will actually throw a result, which, in reality, does not exist. So, What’s the Solution to This? The best way to avoid such flat storage and inaccurate query results is to use nested data type for fields. The nested type is a specialised version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other. This makes sure that the relation between the objects, if any, is maintained, and the query would return accurate results. The following example shows how you can add a generic schema for all pages of a CMS application. { "mappings": { "properties": { "doc_type": { "type": "keyword" }, "doc_id": { "type": "long" }, "fields": { "type": "nested", // important data type "properties": { "field_uid": { "type": "keyword" }, "value": { "type": "text", “fields”: { “raw”: { “type”: “keyword” } } } } } } } } Now let’s look at a couple of examples where different types of input objects can be ingested into a single type of index. Example data 1: { "first_name": "ABC", "last_name": "BCD", "city": "XYZ", "address": "Flat no 1, Dummy Apartment, Nearest landmark", "country": "India" } You can convert this data into Elasticsearch mapping, as shown below: { "doc_type": "user", "doc_id": 500001, "fields": [{ "field_uid": "first_name", "value": "ABC" },{ "field_uid": "last_name", "value": "BCD" },{ "field_uid": "city", "value": "XYZ" },{ "field_uid": "address", "value": "Flat no 1, Dummy Apartment, Nearest landmark" },{ "field_uid": "country", "value": "India" }] } Example data 2: { "title": "ABC Product", "product_code": "PRODUC_001", "description": "Above product description colors, sizes and prices", "SKU": "123123123123", "colors": ["a", "b", "c"], "category": "travel" } { "doc_type": "product", "doc_id": 100001, "fields": [{ "field_uid": "title", "value": "ABC Product" },{ "field_uid": "product_code", "value": "PRODUC_001" },{ "field_uid": "description", "value": "Above product description colors, sizes and prices" },{ "field_uid": "SKU", "value": "123123123123" },{ "field_uid": "colors", "value": ["a", "b", "c"] },{ "field_uid": "category", "value": "travel" }] } This type of mapping makes it easier to perform a search on multiple types of documents within an index. For example, let’s try to search for users where "country" is set to "India" AND for products where "category" is set to "travel." GET /{{INDEX_NAME}}/search { "query": { "nested": { "path": "fields", "query": { "bool": { "should": [ { "bool": { "must": [ { "match": { "fields.field_uid": "country" } }, { "match": { "fields.value": "India" } } ] } }, { "bool": { "must": [ { "match": { "fields.field_uid": "category" } }, { "match": { "fields.value": "travel" } } ] } } ] } } } } } In Conclusion If you are certain that your documents follow a strict schema, you don’t need to structure your data in a nested data type format. Follow the pattern shown in the “Strict Schema” section to input your data in Elasticsearch. However, suppose your documents are not likely to follow a strict schema. In that case, we highly recommended that you store the data in a nested format, which helps you consolidate all types of documents under a single index roof with uniform mapping.

May 28, 2020

Why and When to Use GraphQL

Highlights You'll learn the following: Efficient Data Retrieval: GraphQL allows for fetching specific data, eliminating unnecessary network requests and reducing over-fetching. No Versioning Required: Unlike REST, GraphQL eliminates the need to maintain multiple versions due to its constant resource URL. Schema Stitching: It combines multiple schemas into one, ideal for a microservices architecture. Field Resolution: GraphQL provides the flexibility to define aliases for fields, resolving them into different values. Adaptable: It shines in scenarios where nested data retrieval is needed, bandwidth is limited, or in composite and proxy pattern applications. Keep reading to learn more! REST is an API design architecture that has become a norm for implementing web services in the last few years. It uses HTTP to get data and perform various operations (POST, GET, PUT, and DELETE) in JSON format, allowing better and faster data parsing. However, like all technologies, REST API comes with some limitations. Here are some of the most common ones: It fetches all data, whether required or not (a.k .a. “over-fetching”). It makes multiple network requests to get multiple resources. Sometimes, resources are dependent, which causes waterfall network requests. To overcome these, Facebook developed GraphQL, an open-source data query and manipulation language for APIs. Since then, GraphQL has gradually entered the mainstream and become a new standard for API development. GraphQL is a syntax for requesting data. It’s a query language for APIs. The beauty, however, lies in its simplicity. It lets you specify precisely what is needed, and then it fetches just that — nothing more, nothing less. And it provides numerous other advantages. The following covers some of the most compelling reasons to use GraphQL and looks at some common scenarios where GraphQL is useful. {{nativeAd:3}} Why Use GraphQL? Strongly-Typed Schema All the data types (such as Boolean, String, Int, Float, ID, Scalar) supported by the API are specified in the schema in the GraphQL Schema Definition Language (SDL), which helps determine the data that is available and the form in which it exists. This strongly typed schema makes GraphQL less error-prone and provides additional validation. GraphQL also provides auto-completion for supported IDEs and code editors. Fetch Only Requested Data (No Over- or Under-Fetching) With GraphQL, developers can fetch exactly what is required. Nothing less, nothing more. The ability to deliver only requested data solves the issues arising from over-fetching and under-fetching. Over-fetching happens when the response fetches more than is required. Consider the example of a blog home page. It displays the list of all blog posts (just the title and URLs). However, to present this list, you must fetch all the blog posts (along with body data, images, etc.) through the API and then show what is required, usually through UI code. Over-fetching impacts your app’s performance and consumes more data, which is expensive for the user. With GraphQL, you define the fields you want to fetch (i.e., Title and URL, in this case), and it fetches the data of only these fields. On the other hand, under-fetching does not fetch adequate data in a single API request. In this case, you must make additional API requests to get related or referenced data. For instance, while displaying an individual blog post, you must also fetch the referenced author’s profile entry to display the author’s name and bio. GraphQL handles this well. It lets you fetch all relevant data in a single query. Saves Time and Bandwidth GraphQL allows multiple resource requests in a single query call, which saves time and bandwidth by reducing the number of network round trips to the server. It also helps prevent waterfall network requests, where you must resolve dependent resources on previous requests. For example, consider a blog’s homepage where you must display multiple widgets, such as recent posts, the most popular posts, categories, and featured posts. With REST architecture, displaying these would take at least five requests, while a similar scenario using GraphQL requires just a single GraphQL request. Schema Stitching for Combining Schemas Schema stitching allows the combination of multiple schemas into a single schema. This is very useful in a microservices architecture where each microservice handles the business logic and data for a specific domain. Each microservice can define its GraphQL schema, after which you use schema stitching to weave them into one schema accessible by the client. Versioning is Not Required In REST architecture, developers create new versions (e.g., api.domain.com/v1/, api.domain.com/v2/) due to resource changes or the resource request/response structure over time. Hence, maintaining versions is a common practice. With GraphQL, there is no need to maintain versions. The resource URL or address remains the same. You can add new fields and deprecate older fields. This approach is intuitive as the client receives a deprecation warning when querying a deprecated field. Transform Fields and Resolve with Required Shape A user can define an alias for fields, and each can be resolved into different values. Consider an image transformation API where a user wants to transform multiple types of images using GraphQL. The query looks like this:  query { images { title thumbnail: url(transformation: {width: 50, height: 50}) original: URL, low_quality: url(transformation: {quality: 50}) file_size content_type }}  {{nativeAd:11}} Apart from the advantages listed above, there are a few other reasons why GraphQL works well for developers. GraphQl Editor or GraphQL Playground where entire API designs for GraphQL will be represented from the provided schema The GraphQL ecosystem is growing fast. For example, Gatsby, the popular static site generator, uses GraphQL along with React. It’s easy to learn and implement GraphQL. GraphQL is not limited to the server-side; it can be used for the frontend as well. When to Use GraphQL? GraphQL works best for the following scenarios: Apps for devices such as mobile phones, smartwatches, and IoT devices, where bandwidth usage matters. Applications where nested data need to be fetched in a single call. For example, a blog or social networking platform where posts need to be fetched along with nested comments and details about the person commenting. A composite pattern, where an application retrieves data from multiple, different storage APIs. For example, a dashboard that fetches data from multiple sources, such as logging services, backends for consumption stats, and third-party analytics tools to capture end-user interactions. Proxy patterns on the client side; GraphQL can be added as an abstraction on an existing API so that each end-user can specify response structure based on their needs. For example, clients can create a GraphQL specification according to their needs on a common API provided by FireBase as a backend service. In this article, we examined how GraphQL is transforming how apps are managed and why it’s the future technology. Here are some helpful resources to help you get started with GraphQL: https://graphql.org/ https://github.com/graphql/graphql-spec