Friday, October 13, 2017

Asp.Net Core MVC Custom OutputFormatter to send Word Document as a response

I tried to find a way to serve word document from a Memory Stream using byte array as we do in our Web API to send documents in the response. But Asp.Net Core WepAPI Response object has no supporting methods to support this feature. We need to save the file from memory to Disk and use that path to server the file.

Our requirement is to Open Word document template into memory and do some changes to the document on the fly based on the user request. We were able to achieve this using ASP.NET web API using the below code

  public HttpResponseMessage Get()  
       GenerateMyDocument gd = new GenerateMyDocument();  
       byte[] documentContent = gd.GetDocumentParameters(3, "ISQ839343", "Venkat", "sept 19 2019");  
       HttpResponseMessage response = new HttpResponseMessage(HttpStatusCode.OK);  
       response.Content = new ByteArrayContent(documentContent);  
       response.Content.Headers.ContentType = new MediaTypeHeaderValue("application/msword");  
       response.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("inline")  
         FileName = "some.docx"  
       return response;  


The default format used by ASP.NET Core MVC is JSON.To send other content types as a response, we need to implement our own OutputFormatter. To deliver word document as a response we followed below steps.

Create a new class WordOutputFormatter inheriting from OutputFormatter. The below two methods we need to override to return our word document as a content type.

public class WordOutputFormatter : OutputFormatter
        public string ContentType { get; }

        public WordOutputFormatter()
            ContentType = "application/ms-word";

        //we need to check whether the context Object is what we need to format
        // We must override this method to keep the normal execution flow for other action methods
        public override bool CanWriteResult(OutputFormatterCanWriteContext context)
            return context.Object is Jurer;

        public async override Task WriteResponseBodyAsync(OutputFormatterWriteContext context)
            IServiceProvider serviceProvider = context.HttpContext.RequestServices;
            var response = context.HttpContext.Response;

            var buffer = new StringBuilder();
            Jurer jurer = null;
            if (context.Object is Jurer)
                jurer = context.Object as Jurer;

            response.Headers.Add("Content-Disposition", "inline;filename=jurer.docx");
            response.ContentType = "application/ms-word";

            string documentFilePath = DocumentHelper.GetDocumentParameters(jurer.JurisdictionID,jurer.JurorID,jurer.JurorName,jurer.Eventdate);

            await response.SendFileAsync(documentFilePath);


Now In your Controller add the below Action method

     public IActionResult Get(int juridictionid, string jurerid,string jurername, string date)  
       Jurer jurer = new Jurer()  
         JurisdictionID = juridictionid,  
         JurorID = jurerid,  
         JurorName = jurername,  
         Eventdate = date,  
       return Ok(jurer);  


We are using [DeleteFileAttribute] inherited from  ActionFilterAttribute to Delete the word document we recently created from Word template and saved to Disk

    public class DeleteFileAttribute : ActionFilterAttribute
        public override void OnResultExecuted(ResultExecutedContext filterContext)
          //write your cleanup logic to delete temp files created in the previous step.

To invoke our custom WordOutputFormatter we need to alter ConfigureServices method from Startup.cs as shown below

        public void ConfigureServices(IServiceCollection services)
            // Add framework services.
            services.AddMvc(options =>
                options.RespectBrowserAcceptHeader = true; // false by default
                options.OutputFormatters.Insert(0,new WordOutputFormatter());
                  "docx", MediaTypeHeaderValue.Parse("application/ms-word"));


Thursday, September 14, 2017

D3 charts using React JS and Node JS error --element type is invalid: expected a string (for built-in components) or a class/function

From the last two months I am working on Big Data technologies like Hadoop, Python, Spark, Kafka
and web front end frameworks like Angular JS , React JS and D3 charts.

Our requirement is to Display Data from Hadoop as charts. Pyspark will load the data into DataFrame , then convert the DataFrame into JSON data, React with D3 charts will use this JSON data and display as charts on the portal.

used the below two commands to create a facebook provided sample React JS app.

npm install -g create-react-app
create-react-app AppName

I found one solution in git and modified the solution to make it a Node.JS solution, I have added import and require statements.

used npm install statements to install packages.

npm run build , to run the build and serve -s build to start the web application on localhost.

as I am learning react js, struggled for two days to make things happen. A silly mistake that I did is

to import components, we need to use {} brackets, if we omit these brackets, then it will not give any compilation error, but while running, you will get javascript error saying

"element type is invalid: expected a string (for built-in components) or a class/function (for composite components) but got: undefined. check the render method of  'ComponentName'"

below is the working version of  the import statements in App.jsx

import React, {Component} from 'react'
import ReactDOM from 'react-dom';
import ReactD3 from 'react-d3-components';
import {Waveform} from 'react-d3-components'
import {BarChart} from 'react-d3-components'
import {PieChart} from 'react-d3-components'
import {AreaChart} from 'react-d3-components'
import {ScatterPlot} from 'react-d3-components'
import {LineChart} from 'react-d3-components'
import {Brush} from 'react-d3-components'

const d3 = require('d3');

index.json file contents

import ReactDOM from 'react-dom';
import React from 'react';
import App from './App';

const  sw = require('./registerServiceWorker');

//ReactDOM.render(<App />, document.getElementById('root'));

Below is the code to render D3 chart

render: function() {
  return (
    <LineChart data={}
                           margin={{top: 10, bottom: 50, left: 50, right: 20}}
                           xAxis={{tickValues: this.state.xScale.ticks(, 2), tickFormat: d3.time.format("%m/%d")}}

    <div className="brush" style={{float: 'none' }}>
    <Brush width={400}
                       margin={{top: 0, bottom: 30, left: 50, right: 20}}
                       extent={[new Date(2015, 2, 10), new Date(2015, 2, 12)]}
                       xAxis={{tickValues: this.state.xScaleBrush.ticks(, 2), tickFormat: d3.time.format("%m/%d")}}
 _onChange: function(extent) {
  this.setState({xScale: d3.time.scale().domain([extent[0], extent[1]]).range([0, 400 - 70])});
 <SomeComponent />,

Wednesday, August 16, 2017

Spark Programming

What is RDD:
The main abstraction Spark provides is a Resilient Distributed Dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. RDDs are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system), or an existing collection in the driver program, and transforming it. Users may also ask Spark to persist an RDD in memory, allowing it to be reused efficiently across parallel operations. Finally, RDDs automatically recover from node failures. Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel.

RDD Types:

Parallelized collections, which take an existing Scala collection and run functions on it in parallel.
Parallelized collections are created by calling SparkContext’s parallelize method on an existing Scala collection (a Seq object). The elements of the collection are copied to form a distributed dataset that can be operated on in parallel
One important parameter for parallel collections is the number of slices to cut the dataset into. Spark will run one task for each slice of the cluster. Typically, you want 2-4 slices for each CPU in your cluster. Normally, Spark tries to set the number of slices automatically based on your cluster. However, you can also set it manually by passing it as a second parameter to parallelize (e.g. sc. parallelize(data, 10)).

Hadoop datasets, which run functions on each record of a file in Hadoop distributed file system or any other storage system supported by Hadoop.
Spark can create distributed datasets from any file stored in the Hadoop distributed file system (HDFS) or other storage systems supported by Hadoop (including your local file system, Amazon S3, Hypertable, HBase). Spark supports text files, SequenceFiles, and any other Hadoop InputFormat.
For other Hadoop InputFormats, you can use the SparkContext.hadoopRDD method, which takes an arbitrary JobConf and input format class, key class and value class. Set these the same way you would for a Hadoop job with your input source.

RDD Operations
Transformations which create a new dataset from an existing one, and Actions, which return a value to the driver program after running a computation on the dataset.

For example, map is a transformation that passes each dataset element through a function and returns a new distributed dataset representing the results. On the other hand, reduce is an action that aggregates all the elements of the dataset using some function and returns the final result to the driver program.

All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset. The transformations are only computed when an action requires a result to be returned to the driver program. This design enables Spark to run more efficiently – for example, we can realize that a dataset created through map will be used in a reduce and return only the result of the reduce to the driver, rather than the larger mapped dataset.

By default, each transformed RDD is recomputed each time you run an action on it. However, you may also persist an RDD in memory using the persist (or cache) method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it. There is also support for persisting datasets on disk.

How to share state between nodes:
A second abstraction in Spark is shared variables that can be used in parallel operations. By default, when Spark runs a function in parallel as a set of tasks on different nodes, it ships a copy of each variable used in the function to each task. Sometimes, a variable need to be shared across tasks, or between tasks and the driver program.

broadcast variables: which can be used to cache a value in memory on all nodes,
accumulators: which are variables that are only “added” to, such as counters and sums.

Dataset (New Abstraction of Spark)
For long, RDD was the standard abstraction of Spark. But from Spark 2.0, Dataset will become the new abstraction layer for spark. Though RDD API will be available, it will become low-level API, used mostly for runtime and library development. All userland code will be written against the Dataset abstraction and it’s subset Dataframe API.

Dataset is a superset of Dataframe API which is released in Spark 1.3. Dataset together with Dataframe API brings better performance and flexibility to the platform compared to RDD API. Dataset will be also replacing RDD as an abstraction for streaming in future releases.

SparkSession (New entry point of Spark)
In earlier versions of spark, spark context was an entry point for Spark. As RDD was the main API, it was created and manipulated using context API’s. For every other API, we needed to use different contexts. For streaming, we needed StreamingContext, for SQL sqlContext and for hive HiveContext. But as DataSet and Dataframe API’s are becoming new standard API’s we need an entry point build for them. So in Spark 2.0, we have a new entry point for DataSet and Dataframe API’s called as Spark Session.

SparkSession is essentially a combination of SQLContext, HiveContext and StreamingContext. All the API’s available on those contexts are available on spark session also. Spark session internally has a spark context for actual computation.
Creating SparkSession
val sparkSession = SparkSession.builder.
      .appName("spark session example")
The above is similar to creating an SparkContext with local and creating an SQLContext wrapping it.
The Spark Session encapsulates the existing Spark Context, therefore, existing functionality should not be affected and developers may continue using the Spark Context as desired. However, the new Spark Session abstraction is preferred by the Spark community in Spark 2.0.0 on beyond.

Read data using Spark Session

The below code is reading data from csv using spark session.

val df ="header","true").

Wednesday, June 21, 2017

Spiral Tree traversal using Stack and Queue

I was asked to traverse a binary tree by level , but in Spiral order

                                   /       \
                                  B       C
                                /    \      /  \
                               D    E   F   G

should be traversed as


And the interviewer asked me to do it using a stack and a queue. here is the solution.

class Program
        static char[] inputArray = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', };
        static void Main(string[] args)
            TreeTraversalBFS treeTraversal = new TreeTraversalBFS();
            Console.WriteLine("*****************With Queue and Stack *******************");
            Node root = treeTraversal.BuildBinaryTree(inputArray);
    public class Node
        public Node left { get; set; }
        public Node right { get; set; }
        public char data;
        public Node(char data)
   = data;

    class TreeTraversalBFS
        /// <summary>
        /// Build the binary tree
        /// </summary>
        /// <param name="inputArray"></param>
        /// <returns></returns>
        public Node BuildBinaryTree(char[] inputArray)
            //to hold the nodes
            Queue<Node> queue = new Queue<Node>();
            Node root = new Node(inputArray[0]);
            for (int i = 1; i < inputArray.Length;)
                Node node = queue.Dequeue();
                Node left = new Node(inputArray[i++]);
                node.left = left;
                if (i < inputArray.Length)
                    Node right = new Node(inputArray[i++]);
                    node.right = right;
            return root;
        /// <summary>
        /// breadth-first using a queue and stack
        /// </summary>
        /// <param name="root"></param>
        public void DisplayNodesByCrisCross(Node root)
            if (root == null)
            Queue<Node> queue = new Queue<Node>();
            Stack<Node> stack = new Stack<Node>();
            int level = 0;
            while (true)
                if (level % 2 == 1)
                    int queuNodeCount = queue.Count;
                    if (queuNodeCount == 0)
                    while (queuNodeCount > 0)
                        Node queueNode = queue.Dequeue();
                        if (queueNode.left != null)
                            //insert into queue as well to display next level left to right
                        if (queueNode.right != null)
                            //insert into queue as well to display next level left to right
                    int stackNodeCount = stack.Count;
                    while (stackNodeCount > 0)
                        Node stackNode = stack.Pop();
                        Node queueNode = queue.Dequeue();
                        //display data from stack
                        //add nodes from Queue and not from Stack to display next level nodes left to right
                        if (queueNode.left != null)
                        if (queueNode.right != null)

Saturday, April 08, 2017

Flow vs Logic Apps vs Functions vs Webjobs

Azure Functions or WebJobs
All these services are useful when "gluing" together disparate systems. They can all define
  • input
  • actions
  • conditions
  • output
You can run each of them on a schedule or trigger. However, each service has unique advantages, and comparing them is not a question of "Which service is the best?" but one of "Which service is best suited for this situation?" Often, a combination of these services is the best way to rapidly build a scalable, full-featured integration solution.

Flow vs. Logic Apps
Microsoft Flow and Azure Logic Apps are both configuration-first integration services. They make it easy to build processes and workflows and integrate with various SaaS and enterprise applications.
  • Flow is built on top of Logic Apps
  • They have the same workflow designer
  • Connectors that work in one can also work in the other
  • Flow empowers any office worker to perform simple integrations (for example, get SMS for important emails) without going through developers or IT. Flow is for Office workers and business users  as a self service tool.
On the other hand, Logic Apps can enable advanced or mission-critical integrations (for example, B2B processes) where enterprise-level DevOps and security practices are required. It is typical for a business workflow to grow in complexity over time. Accordingly, you can start with a flow at first, then convert it to a logic app as needed. Logic apps are for IT pros and developers used for mission critical operations.

Functions vs. WebJobs
Azure Functions and Azure App Service WebJobs are both code-first integration services and designed for developers. They enable you to run a script or a piece of code in response to various events, such as new Storage Blobs or a WebHook request. Here are their similarities:

  • Both are built on Azure App Service and enjoy features such as source control, authentication, and monitoring.
  • Both are developer-focused services.
  • Both support standard scripting and programming languages.
  • Both have NuGet and NPM support.

Functions is the natural evolution of WebJobs in that it takes the best things about WebJobs and improves upon them.

The improvements include:
  • Streamlined dev, test, and run of code, directly in the browser.
  • Built-in integration with more Azure services and 3rd-party services like GitHub WebHooks.
  • Pay-per-use, no need to pay for an App Service plan.
  • Automatic, dynamic scaling.
  • For existing customers of App Service, running on App Service plan still possible (to take advantage of under-utilized resources).
  • Integration with Logic Apps.
Functions support in-browser editing where as web jobs doesn't support in browser editing.

Both can be invoked by Event Triggers and Timer based Scheduling. We can make web jobs as continuous.
Functions support more event triggers than Webjobs , Functions support
  • Timer
  • Azure Cosmos DB
  • Azure Event Hubs
  • HTTP/WebHook (GitHub, Slack)
  • Azure App Service Mobile Apps
  • Azure Notification Hubs
  • Azure Service Bus
  • Azure Storage
Webjobs support
  • Azure Storage
  • Azure Service Bus
Flow vs. Logic Apps vs. Functions
Which service is best suited to you depends on your situation. For simple business optimization, use Flow.If your integration scenario is too advanced for Flow, or you need DevOps capabilities and security compliances, then use Logic Apps.
If a step in your integration scenario requires highly custom transformation or specialized code, then write a function and trigger the function as an action in your logic app.

You can call a logic app in a flow. You can also call a function in a logic app, and a logic app in a function. The integration between Flow, Logic Apps, and Functions continues to improve over time. You can build something in one service and use it in the other services. Therefore, any investment you make in these three technologies is worthwhile.

Saturday, April 01, 2017

DirSync vs Azure AD Sync vs Azure AD Connect

DirSync, Azure AD Sync and Azure AD Connect are used to sync on-premises Active Directory to cloud based directory service like Azure AD instance, Office 365, Dynamics Online and other Microsoft Cloud Services

All are used for Single Sign-On (SSO) and user can use the a single user account and password to access there cloud based application on Office 365, Dynamics Online and Azure AD, we can synchronize user account and there passwords

Single sign-on (SSO)
Single sign-on (SSO) is a session and user authentication service that permits a user to use one set of login credentials (e.g., name and password) to access multiple applications. The service authenticates the end user for all the applications the user has been given rights to and eliminates further prompts when the user switches applications during the same session. On the back end, SSO is helpful for logging user activities as well as monitoring user accounts.
Example : Microsoft, Google (if you sign into email, you can also access other applications with out entering username and password.) . To make it more secure , service providers are using Multifactor authentication where required.

DirSync to sync your local on-premises Active Directory with cloud based services. dirsync doesn’t support multi-forest environments.

Azure AD Sync
Azure AD Sync is advance version of DirSync, it support most of the functions of traditional DirSync, and adds extra functionality such as mutli-forest support and password write back. It’s more flexible then Dir Sync

Azure AD Connect
Azure AD Connect will integrate your on-premises directories with Azure Active Directory. This allows you to provide a common identity for your users for Office 365, Azure, and SaaS applications integrated with Azure AD.
Azure AD Connect has many of the same features as DirSync and Azure AD Sync, its going to replace DirSync and Azure AD Sync and it has plans for many other features such as non-AD LDAP support.
Azure Connect is recommended for larger organizations that have greater flexibility requirements, it provides consistent experience in hybrid environments that may or may not entirely utilize Microsoft on premise solutions

Why use Azure AD Connect?
Azure AD Connect is the best way to connect your on-premises directory with Azure AD and Office 365. This is a great time to upgrade to Azure AD Connect from Windows Azure Active Directory Sync (DirSync) or Azure AD Sync as these tools are now deprecated and will reach end of support on April 13, 2017.
Integrating on-premises directory service with Azure AD makes your users more productive by providing a common identity for accessing both cloud based and on-premises service and application.

  • Companies can provide users with a common hybrid identity across on-premises or cloud-based services with Windows Active Directory and then connecting to Azure Active Directory.
  • Administrators can use multi-factor authentication to provide conditional access based on application, device and user identity, network location and many more.
  • Users can use their user accounts in Azure AD to access Office 365, Microsoft Intune, SaaS apps and any other third-party applications.
  • Application can be developed with common identity model, integrating applications into Active Directory on-premises or Azure for cloud-based applications

Azure Active Directory Connect Sync
The Azure Active Directory Connect synchronization services (Azure AD Connect sync) is a main component of Azure AD Connect. It takes care of all the operations that are related to synchronize identity data between your on-premises environment and Azure AD. Azure AD Connect sync is the successor of DirSync, Azure AD Sync, and Forefront Identity Manager with the Azure Active Directory Connector configured.

Azure AD Connect and federation
Azure Active Directory (Azure AD) Connect lets you configure federation with on-premises Active Directory Federation Services (AD FS) and Azure AD. With federation sign-in, you can enable users to sign in to Azure AD-based services with their on-premises passwords--and, while on the corporate network, without having to enter their passwords again. By using the federation option with AD FS, you can deploy a new installation of AD FS, or you can specify an existing installation in a Windows Server 2012 R2 farm.