I. T. Engg Portal

Friday, April 19, 2013

A General Introduction to Big Data

3:09 PM No comments

Today, before we get deeper into Data Mining, Analytics and other industry related concepts, let us understand a few common terms which we have already heard about, but never emphasized on what it actually meant? Sounds interesting, yes I believe it should. A few words like Big Data, Data Mining and Analytics are not new to most of us. Almost every one who keeps interest in tech news should have come across the term 'Big Data' , it was a very talked about term recently over Facebook,Twitter & Techcrunch.

Lets begin with 'Big Data'

So what exactly do we understand when we say Big Data?

A very simple insight which any one could conclude is that, it deals with data which really BIG in size. Yes, in a way it does.

In a very simple layman definition, I would say that data is managed by a special software called DBMS which runs on a machine(desktop or a server). This software has certain limitations on the size of the data that it can handle. When the size of data goes beyond the limit any traditional database system can handle, we call this kind of data as Big Data.

In a more precise way, Big data is a collection of data sets so large and complex that it becomes really difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.

Now, where did this Big Data come from?

Nowadays, almost all organizations are interested to capture data from every transaction in the business they do and store it, so that they can understand better how well is the business doing. This data could be anything like logs or bills, and is very diverse when compared to another organization. Lets take an example, a grocery shop may have database that captures all the bills from the daily business it does and then on may be on a monthly basis he could understand simple statistics of his business like total revenue, total profit, no of customers etc. If you move on to a larger organizations, people capture more data from the business they do. They are normally more interested in looking how well is their business doing compared to Year on Year Level (YoY) or Quarter on Quarter level (QoQ). And then may be they would go even further, by forecasting how much business they can do in future and where they need to improve.Analysis doesn't end here, facts like what happened to the business, how did this happen and why did it happen can all be figured out by digging deeper into the data. Now that would catch your attention! Never realized that you could learn so many things from your own data, right?

Yes, many organizations gradually understood that they could understand more about their business and do even better, if they really knew what happened, how it happened,what will happen in their business. And hence they started capturing more and more data from every possible dimension. This process continued and over a period of time the size of the data revolved from a few Gigabytes to a few Terabytes. When, the size of data reached a little higher level than expected, the industry figured out the limitations faced by the traditional database systems and therefore concluded the need of more powerful and more sophisticated tools needs to be developed to handle the growing data. And that was where Big Data was first talked about.

Later the size of data started increasing exponentially. All of sudden, many organizations understood the importance of their own data and hence started capturing all possible information they can from the business they do. And this growth now seems to get even faster. May be we would also need to find a better replacement for the word 'Big' in Big Data, something like 'Gigantic Data'. Sounds funny?, trust me this is very much possible!

So, what did people do handle Big Data?

Since it was obvious that a normal machine could not handle this huge data sets, the industry then proposed the need for a tool which is powerful enough to handle large data sets and also allow all necessary operations on them so that data could be processed , cleaned, filtered, merged and analyzed. These tools also needs to process large data sets parallelly to speed up processing time required. A few big players came ahead and released really sophisticated tools, that could handle really large data. So Oracle launched Exadata, Teradata launched a tool with its own name - 'Teradata', Google came up with their own model - MapReduce and so on. Teradata can handle databases to the size of 10 to the power 12 Gigabytes!.

Lately, the biggest innovation in softwares to handle Big Data was brought by Apache and Google. Apache developed a tool named 'Hadoop' which can handle exponential growth of data size and allows faster processing with more sophisticated functionality to handle data. Google also came up with MapReduce - A programming model typically used to process large data sets over a distributed environment. Apache also launched Hive, which is another layer over Hadoop and provides functionality for data warehousing. There are many more tools and softwares which were launched by different players to handle huge data sets and are not discussed here.

But one thing to notice is that, when you have Big data, your demands are also bigger. You will no longer expect simple results like aggregation and roll ups on a large data set. There is always a need to present this complex data in a simple and more structured format. Hence, we need better and more sophisticated tools to visualize data. A few tools which help in better reporting and and data handling are :- Pentaho, Jasper Reports, DAS - Datameer Analytics Solution, Tableau, Platfora and many more. Apart from these, there are many more tools and packages which help us to visualize data more effectively.

-x-x-x-x-x-x-x-x-x-x-x

So that's all I could do, to introduce in a simple way topics like - Big Data and Hadoop. I know, we didn't talk much about Hadoop, but eventually we will. There would be one more article on big data, in which would cover a few other important topics. But for now, this brief introduction would help me to introduce to you basics of Data Analytics. In the upcoming articles we would talk about a brief introduction to Data Analysis and Business Intelligence.

Stay tuned :)

A new Beginning!

11:18 PM General, new beginning 7 comments

Hi all,
We really apologize to all our readers, as our team was inactive for the past 2 months. We are happy to inform you that we are back again and this time with something better! We were initially publishing articles only related to Computer and IT Engineering syllabus, but with the growing number of diverse readers and contributors we are moving one step ahead. We will now also post articles on all new concepts that will help shaping our fellow engineers better for the Industrial Race.

We have identified Data Mining/Analytics as one of the most important and hottest jobs in the near future for all engineers and we would hence concentrate more on Data Analytics. With a group of Data Scientists, Business Analysts and Software Engineers working with the most prestigious organizations in the world as our contributors, we would now publish articles related to a few upcoming and important technologies which help all engineers blend their technological skills and logical in a better way.

Initially, we have decided to focus on a few new technologies like

SQL
Teradata
Hadoop/Hive
Statistics
SAS
R Programming
Excel
Decision Science
HTML5 and so on

Gradually we would also be adding a few more new sections like Business Intelligence, JSP, PHP, UNIX and a few other new domains based on user demand and new contributors. Simultaneously, we would also be posting about IT and Computer Engineering academic related topics as usual. We look forward to have your continuous support. Feel free to revert with any feedback or opinion for, help us help you better :)

Regards,

Team - IT Engg Portal

Neural Network and Expert System : [BE - IT]

9:32 AM BE, BE Pune University, elective, elective 3, Neural Network, Neural Network and Expert Systems No comments

The subject "Neural Network and Expert System" is introduced as an elective subject in the final semester for BE -IT. The number of students who opt for this subject are very few compared to those opting GIS. The major reason why this subject is not chosen as an elective subject is because it requires Artificial Intelligence as a pre-requisite. Moreover, the subject is a bit confusing and time consuming unlike 'Artificial Intelligence'. Students have managed to score averagely in this subject. And there have been rumors about a local author copy from Nirali Publications for this subject, but no confirm news - as I have not seen the copy myself!

Lets have a look at the syllabus for the subject :

Unit I :

Introduction to Artificial Neural Networks

Biological Neural Networks, Pattern analysis tasks: Classification and Clustering, Computational models of neurons, Basic structures and properties of Artificial Neural Networks, Structures of Neural Networks Learning principles

Unit II

Feedforward Neural Networks

Perceptron, its learning law , Pattern classification using perceptron, Single layer and Multilayer feed forward Neural Networks (MLFFNNs), Pattern classification and regression using MLFFNNs, ADALINE : The Adaptive Linear Element, its Structure and Learning laws, Error back propagation learning, Fast learning methods: Conjugate gradient method, Auto associative Neural Networks, Bayesian Neural Networks

Unit III

Radial Basis Function Networks and Pattern Analysis

Regularization theory, RBF networks for function approximation , RBF networks for pattern classification

Kernel methods for pattern analysis: Statistical learning theory, Support vector machines for pattern classification, Relevance vector machines for classification.

Unit IV

Self organizing maps and feedback networks

Pattern clustering,, Topological mapping, Kohonen’s self, organizing map Feedback Neural Networks : Pattern storage and retrieval ,Hopfield model, Boltzmann machine, Recurrent Neural Networks

Unit V

Expert Systems Architectures:

Introduction, Rule Based System Architecture, Non-Production System Architecture, Dealing with uncertainty, Knowledge Acquisition and Validation

Unit VI

Shells and Case Studies

Expert System Shells , Knowledge System Building Tools for Expert System, Expert System tools case study – MYCIN – EMYCIN -ELIZA Knowledge Management (Wiki Web case study)

Download E-Books for

Neural Networks & Expert System

An Introduction to Neural Networks

James Anderson

File Type : DJVU

File Size : 5MB

Download Now

---------------------------------------------------------------------------

Artificial Intelligence & Expert Systems for Engineers

Krishnamoorty, Rajeev

File Type : PDF

File Size : 3.5 MB

Download Now

---------------------------------------------------------------------------

Pattern Recognition & Machine Learning

C. S. Bishop

File Type : PDF

File Size : 4.2 MB

Download Now

---------------------------------------------------------------------------

Artificial Neural Network

Colin Fyfe

File Type : DJVU

File Size : 1.17MB

Download Now

---------------------------------------------------------------------------

Artificial Neural Network

An Introduction to ANN Theory & Practice

P. J. Braspenning

File Type : DJVU

File Size : 2.1 MB

Download Now

---------------------------------------------------------------------------

Cell Splitting - Mobile Computing : [BE - IT/Comp]

2:18 AM BE Pune University, elective, mobile, Mobile Computing 1 comment

Article contributed by :

Manoj Pisharody

manoj@itportal.in

----------------------------------------------------------

Click Here
To DOWNLOAD this Article in PDF

We have already studies in detail the concept of a cell, cluster, MS and the BS. So, we will not discuss those all over again. Let us move straight into the topic.

The concept of Cell Splitting is quite self explanatory by its name itself. Cell splitting means to split up cells into smaller cells. The process of cell splitting is used to expand the capacity (number of channels) of a mobile communication system. As a network grows, a quite large number of mobile users in an area come into picture. Consider the following scenario.

There are 100 people in a specific area. All of them owns a mobile phone (MS) and are quite comfortable to communicate with each other. So, a provision for all of them to mutually communicate must be made. As there are only 100 users, a single base station (BS) is built in the middle of the area and all these users’ MS are connected to it. All these 100 users now come under the coverage area of a single base station. This coverage area is called a cell. This is shown in Fig 2-1.

Fig 2-1. A single BS for 100 MS users.

But now, as time passed by, the number of mobile users in the same area increased from 100 to 700. Now if the same BS has to connect to these 700 users’ MS, obviously the BS will be overloaded. A single BS, which served for 100 users is forced to serve for 700 users, which is impractical. To reduce the load of this BS, we can use cell splitting. That is, we will divide the above single cell into 7 separate adjacent cells, each having its own BS. This is shown in Fig 2-2.

-->

Fig 2-2. Single cell split up into 7 cells

Now, let us look into the big picture. Until now, we have discussed about cell splitting in a small area. Now, we use this same concept to deal with large networks. In a large network, it is not necessary to split up all the cells in all the clusters. Certain BSes can handle the traffic well if their cells (coverage areas) are split up. Only those cells must be ideal for cell splitting. Fig 2-3 shows network architecture with a few number of cells split up into smaller cells, without affecting the other cells in the network.

Fig 2-3. Cell Splitting.

The concept of cell splitting can further be applied to the split cells as well. That is, the split up cells can further be split into a number of smaller cells to improve the efficiency of the BS even more. Fig 2-4 shows a hierarchy of cell splitting.

Here, the master cells which have been split up into smaller cells are known as macro cells. The split up cells are known as micro cells. The innermost cells, split up by splitting the micro cells are known as pico cells.

Click Here

To DOWNLOAD this Article in PDF

Article contributed by :

Manoj Pisharody

manoj@itportal.in

----------------------------------------------------------

Frequency Reuse - Mobile Computing : [BE : IT/Comp]

1:53 AM BE Pune University, elective, Mobile Computing No comments

Article contributed by :

Manoj Pisharody

manoj@itportal.in

----------------------------------------------------------

Click Here
To DOWNLOAD this Article in PDF

Most of you might be familiar with the concept of frequency reuse. We often come across this term in Mobile Computing. Quite a straightforward and simple concept, but still it requires a detailed explanation. This is one of the most common terms used in the world of Cellular Telephony (Wireless Communication). Most cellular systems use frequency reuse scheme to improve capacity and coverage. Let us understand what exactly a cell mean and how they are related to frequencies.

In a cellular system, each mobile station (MS) is connected with its base station (BS) via a radio link. The BS is responsible for sending the calls to and from the MS, which lie in the coverage area of that BS. The coverage area of a base station or a sector of a base station is known as a cell. Each BS consists of a number of frequency channels, which serve as a link between the MS and the BS. Every time, a call propagates through a channel which is currently idle and receiving the best signal. As the coverage area of a BS can be termed as a cell, we can also say that a cell uses the frequency channels for call forwarding. These cells are usually of hexagonal shape (this explanation is certainly not in the scope of our discussion here). The Fig 1-1 shows a typical structure of a cell.

Fig 1-1. A cell.

A PCS (Personal Communication System) is a combination of many such cells. So, a cell may be surrounded by a large number of adjacent cells. This is shown in Fig 1-2.

Fig 1-2. Cells adjacent to each other (Cluster).

Now, let us look at a more general term used for the above structure- a cluster. A number of cells are grouped to form a cluster. So, a cluster is a collection of various cells. Now, after understanding the concepts of cells and cluster, let us move into the actual concept of frequency reuse.

-->

As we have seen, cells use frequencies. But imagine two or more cells in a single cluster using the same frequency. Obviously, there is a wide scope of interference. So, it is always a better option to avoid two cells in a cluster using the same frequencies. That is, inside a cluster, all the cells must use different frequencies. A 3-cell cluster with all the adjacent cells using different frequencies (F1, F2 and F3) is shown in Fig 1-3.

Fig 1-3. Cells in a cluster using different frequencies.

But this will definitely lead to a new problem. As the network grows, if every cell in a system uses different frequencies, the frequency spectrum will be heavily utilized. A large amount of frequencies will be utilized by these cells. A solution to this problem is the Frequency Reuse.

All the cells in a cluster must still have different frequencies, but these frequencies can be reused by the cells in other clusters. This is the concept of frequency reuse. That is, if frequencies A, B, C, D, E, F and G are used by the cells in a 7-cell cluster, these same frequencies A, B, C, D, E, F and G can be used by the cells in other clusters. See Fig 1-4.

Fig 1-4. Frequency Reuse.

In the above figure, three different clusters are shown with three different colors. Each of the 7 cells in each clusters use different frequencies (A through G). But, the same frequencies (A through G) are reused by the seven cells of each of the other clusters. Thus, the problems of interfering frequencies as well as over-utilization of frequencies are overcome using the concept of frequency reuse.

Click Here

To DOWNLOAD this Article in PDF

Article contributed by :

Manoj Pisharody

manoj@itportal.in

----------------------------------------------------------

Deploying the best E-Resources for Software Engineering Students

Power Point Presentations and Video Lectures for Download

Bundling Codes for your Lab Practicals

The Complete Placement Guide

Pune University's most viewed website for Computer and IT Engineering

Friday, April 19, 2013

A General Introduction to Big Data

Monday, April 15, 2013

A new Beginning!

Tuesday, December 11, 2012

Neural Network and Expert System : [BE - IT]

Monday, November 5, 2012

Cell Splitting - Mobile Computing : [BE - IT/Comp]

Frequency Reuse - Mobile Computing : [BE : IT/Comp]

Search - IT Engg Portal

Labels

Translate ITPORTAL.IN

What is Open Elective?

Codearea.in - A Website for all Lab Codes

The Complete Placement Guide - 2nd Edition

Latest Technology

Previous Question Papers

Site Map

Followers

Contact Us

IT Engg Portal - Lets Share Knowledge

Friday, April 19, 2013

Monday, April 15, 2013

Tuesday, December 11, 2012

Monday, November 5, 2012

Social Profiles

Search - IT Engg Portal

Labels

Translate ITPORTAL.IN

Site Map

Followers

Contact Us

IT Engg Portal - Lets Share Knowledge