Minor in Scientific Computation & Informatics | Centre for Informatics

Minor in Scientific Computation & Informatics

Vision, Scope & Outlook:

Computer literacy is no longer a luxury in today's world. Computations and data mining are supplementing experiment and theory in many fields of science. A degree of familiarity with coding/programming and the ability to handle large amounts of data, are a valuable asset to students in a range of scientific disciplines. The Minor in Scientific Computation & Informatics will give students a basic familiarity with computation and data mining, essential programming/coding skills, exposure to multiple applications, and an integrated vision of the applications of informatics and computation across scientific disciplines.

Key Information

Center for Informatics
School of Natural Sciences (SNS)
N. Sukumar

Target audience:

Undergraduate students of Physics, Chemistry and Life Sciences with an interest in computation, and students of Mathematics and Engineering interested in scientific applications of computation and machine learning.

Requirements for Minor in Scientific Computation & Informatics: 24 credits:

  • 16 Core credits not part of the student's Major Core requirements;
  • 8 Elective credits whether or not part of the student's Major Core requirements.
Any 16 Core credits not part of the student's Major Core requirements.
Course code
Fundamentals of Computers

Course Summary

IT has changed the Biologists thought process, revolutionary processing speed and advancement in data storage and mining methods has completely changed the Biotechnology  and new branches emerged like “-Omics” technologies. The class is hands-on, project-oriented to give better understanding of IT applications in Biology. The course is designed to introduce the most important and basic computer concepts. It also involves case studies and applications in which Bioinformatics tools and algorithms can be used. The course will introduce students to altogether a new world of Biology which includes many new terminologies and concepts in Bioinformatics. This course will enable student aware of programming methods in Perl and linux which they can use in Bioinformatics analysis.

Detailed Contents

Introduction to computers:Overview and functions of a computer system, Computer generations with characteristic features, computer organization, CPU, ALU, memory hierarchy, registers, I/O devices, storage devices. Types of Processing: Batch, Real-Time, Online, Offline. Introduction to operating systems:  Operating System concept, Variants of Unix, Linux operating system and command line applications. Computer Networking: Introduction to networking: Associated hardware devices, gadgets (Router, Switch etc.), Network Topologies and Protocols LAN, WAN and MAN, World Wide Web (WWW) Network security: fire walls. Concepts in text-based searching Medline, bibliographic databases.

Algorithms, Flowcharts & Programming concepts:  Algorithms: Concepts & definitions, Converting algorithms to flowcharts, Comparing algorithms, flowcharts & programs, Algorithms solving Biological problems, Basic PERL Programming. Computers in Biology: Nature of Biological data, Biological Databases, pubmed, Overview of Bioinformatics, sequence alignment, Major Bioinformatics Resources: NCBI, EBI & ExPASY.


  1. Introduction to Bioinformatics- Attwood
  2. Bioinformatics -David Mount
  3. Developing Bioinformatics Computer Skills- Cynthia Gibas
  4. Introduction to Bioinformatics- Arthur M Lesk
  5. Fundamentals of Computers, -V Rajaraman, PHI.
  6. Introduction to computers - Peter Norton
  7. Computer Fundamentals – P.K. Sinha



Core course for B.Sc. (Research) Mathematics. Optional for B.Sc. (Research) Economics. Not available for B.Tech. students. Others may credit it as UWE with permission from Department of Mathematics. Does not count towards Minor in Mathematics.

Credits (Lec:Tut:Lab)= 3:0:1 (Three lecture hours and two lab hours weekly)

Prerequisites: Class XII Mathematics

Overview: This course aims to empower the students in data abstraction, algorithm design and performance estimation. In the process they shall learn the art of programming – a pretty useful skill to have! Programming in C and Matlab will be taught.

Detailed Syllabus:

  1. Basic programming constructs: conditional statements, functions, loops, arrays, structures, pointers.
  2. Linear data structures: Linked list, queue, stack
  3. Trees and Graphs: basic operations
  4. Searching and Hashing: Linear search, Binary search, tree search, hash tables
  5. Sorting: Insertion sort, bubble sort, merge sort, heap sort
  6. Introduction to MATLAB programming.


  1. B. Kolman, R. Busby, and S. Ross, Discrete Mathematical Structures, PHI, 2012
  2. Jeri R.Hanly, Eklliot B.Koffmain, Problem Solving and Program Design in C,Pearson,2009
  3. A. Aho, J. Hopcroft, D. Ullman, Data structures and Algorithms, Addison-Wesley, 1983
  4. A. Aho, and D. Ullman, Foundations of Computer Science, Comp. Sci. Press, 1992
  5. T. Cormen and C. Leiserson, Introduction to Algorithms, MIT Press, 2009
  6. N. Kalicharan, Data Structures in C, CreateSpace Independent Publishing, 2008
  7. A. Tenenbaum, Data Structures using C, PHI, 2003

Past Instructors: Charu Sharma, Niteesh Sahni

Introduction to Computing and Programming

This course briefs about Computer Structure, the Algorithmic approach to solve a problem, basic introduction to computers and its corresponding concepts for the benefit of students. Apart from this, programming concepts are also discussed in this course using C programming language.


Introduction to Bioinformatics, Review on Biological Databases concept: Primary, secondary and composite databases, Nucleotide Sequence databases (EMBL, GenBank, DDBJ) Protein Databases –(UNIPROT, PIR, TREMBL), Protein family/domain databases (PROSITE, PRINTS, Pfam,), Metabolic & Pathway databases (KEGG), Structural databases (PDB).

Structural Bioinformatics: Classification of protein structures, Primary, Secondary and Tertiary structures, Quaternary structure, Protein folding concept, Potential energy map and Ramachandran plot. Secondary structure prediction methods, Classification of Three Dimensional Structures of Proteins, Motifs, Folds and Domains, Classification of Three Dimensional Structures in PDB (HSSP, SCOP, FSSP, CATH).  Structural Alignment Methods, Homology Modeling, fold recognition and ab initio methods. Computer aided drug design (CADD), Molecular Docking.

Genomics: The Human Genome, Comparative Genomics (Comparative genomics of Model organisms), gene identification methods, primary gene expression analysis. Primary Sequence Analysis: Sequence alignment, Homology concept, pairwise sequence alignment, multiple sequence alignment, Phylogenetic Analysis, concept of SNP and snip analysis.


  • Bioinformatics–Sequence, Structure and Databanks, Higgins, D., Taylor, W., Pub: Oxford University Press, Incorporated.
  • Bioinformatics: A practical guide to the analysis of genes and proteins, Baxevanis, A. D., Ouellette, B.F.F., Pub: John Wiley and Sons Inc.
  • Bioinformatics: Sequence and Genome Analysis, Mount, D.W., Pub: Cold Spring Harbor Laboratory Press.
  • Structural Bioinformatics, Ed: Bourne, P. E., Weissig, H., Pub: Wiley-Blackwell.
Introduction to Computational Physics I

Introduction to Python: General information, Operators, Functions, Modules, Arrays, Formatting, Printing output, Writing a program
Approximation of a function: Interpolation, Least-squares Approximation              
Roots of Equations: Method of Bisection, Method based on Linear Interpolation,              
Newton-Raphson Method              
Numerical Differentiation: Finite Difference Approximation              
Numerical Integration: Trapezoidal Rule, Simpson's Rule              
Ordinary Differential Equations: Taylor Series Method, Runge-Kutta Methods, Shooting Method

Informatics & Molecular Modelling

This course and the associated computer lab deal with Molecular Modelling and Cheminformatics, applied to the search for new drugs or materials with specific properties or specific physiological effects (in silico Drug Discovery). Students will learn the general principles of structure-activity relationship modelling, docking & scoring, homology modelling, statistical learning methods and advanced data analysis. They will gain familiarity with software for structure-based and ligand-based drug discovery. Some coding and scripting will be required.


  1. Introduction:
    • Drug Discovery in the Information-rich age
    • Introduction to Pattern recognition and Machine Learning
    • Supervised and unsupervised learning paradigms and examples
    • Applications potential of Machine learning in Cheminformatics & Bioinformatics
    • Introduction to Classification and Regression methods
  2. Representation of Chemical Structure and Similarity:
    • Sequence Descriptors
    • Text mining
    • Representations of 2D Molecular Structures: SMILES
    • Chemical File Formats, 3D Structure
    • Descriptors and Molecular Fingerprints
    • Graph Theory and Topological Indices
    • Progressive incorporation of chemically relevant information into molecular graphs
    • Substructural Descriptors
    • Physicochemical Descriptors
    • Descriptors from Biological Assays
    • Representation and characterization of 3D Molecular Structures
    • Pharmacophores
    • Molecular Interaction Field Based Models
    • Local Molecular Surface Property Descriptors
    • Quantum Chemical Descriptors
    • Shape Descriptors
    • Protein Shape Comparisons, Motif Models
    • Molecular Similarity Measures
    • Chemical Space and Network graphs
    • Semantic technologies and Linked Data
  3. Mapping Structure to Response: Predictive Modelling:
    • Linear Free Energy Relationships
    • Quantitative Structure-Activity/Property Relationships (QSAR/QSPR) Modeling
    • Ligand-Based and Structure-Based Virtual High Throughput Screening
    • 3D Methods - Pharmacophore Modeling and alignment
    • ADMET Models
    • Activity Cliffs
    • Structure Based Methods, docking and scoring
    • Model Domain of Applicability
  4. Data Mining and Statistical Methods:
    • Linear and Non-Linear Models
    • Data preprocessing and performance measures in Classification & Regression
    • Feature selection
    • Principal Component analysis
    • Partial Least-Squares Regression
    • kNN, Classification trees and Random forests
    • Cluster and Diversity analysis
    • Introduction to kernel methods
    • Support vector machines classification and regression
    • Introduction to Neural Nets
    • Self-Organized Maps
    • Deep Neural Networks
    • Introduction to evolutionary computing
    • Genetic Algorithms
    • Data Fusion
    • Model Validation
    • Best Practices in Predictive Cheminformatics


  1. Johann Gasteiger, Thomas Engel,Chemoinformatics: A Textbook (Wiley-VCH, 2003)
  2. Jürgen Bajorath (Editor), Chemoinformatics and Computational Chemical Biology (Methods in Molecular Biology) (Humana Press, 2004)
  3. Leach & Gillet, An Introduction to Chemoinformatics

Prerequisites: Basic Organic chemistry/Biochemistry, Basic Statistics, Computer Programming.

Basic Probability and Statistics

Core course for B.Sc. (Research) Biotechnology. Only available as UWE with prior permission of Department of Mathematics. Does not count towards Minor in Mathematics.

Credits (Lec:Tut:Lab)= 3:1:0 (3 lectures +1 tutorial weekly)

Prerequisites: Class XII Mathematics or MAT 020 (Elementary Calculus) or MAT 101 (Calculus I)

Overview: Probability is the means by which we model the inherent randomness of natural phenomena. This course provides an introduction to a range of techniques for understanding randomness and variability, and for understanding relationships between quantities. The concluding portions on Statistics take up the problem of testing our theoretical models against actual data, as well as applying the models to data in order to make decisions. This course will act as an introduction to probability and statistics for students from natural sciences, social sciences and humanities.

Detailed Syllabus:

  1. Describing data: scales of measurement, frequency tables and graphs, grouped data, stem and leaf plots, histograms, frequency polygons and ogives, percentiles and box plots, graphs for two characteristics
  2. Summarizing data: Measures of the middle: mean, median, mode; Measures of spread: variance, standard deviation, coefficient of variation, percentiles, interquartile range; Chebyshev’s inequality, normal data sets, Measures for relationship between two characteristics; Relative risk and Odds ratio
  3. Elements of Probability: Sample space and events, basic definitions and rules of probability, conditional probability, Bayes’ theorem, independent events
  4. Sampling: Population and samples, reasons for sampling, methods of sampling, standard error, Population parameter and sample statistic
  5. Special random variables and their distributions: Bernoulli, Binomial, Poisson, Uniform, Normal, Exponential, Gamma, distributions arising from the Normal: Chi-­‐square, t, F
  6. Distributions of Sampling statistics: Sampling distribution of the mean, The central limit theorem, Determination of sample size, standard deviation versus standard error, the sample variance, sampling distributions from a normal population, sampling from a finite population
  7. Estimation: Maximum likelihood estimator; Interval estimates; Estimating the confidence interval for population mean, variance and proportions; Confidence intervals for the difference between independent means
  8. Hypothesis testing: Null and alternate hypothesis; Significance levels; Type  I  and  Type  II  errors;  Tests  based  on  Normal,  t,  F  and  Chi-­‐Square distributions for testing of mean, variance and proportions, Tests for independence  of  attributes,  Goodness  of  fit;  Non-­‐parametric  tests:  the sign test, the Signed Rank test, Wilcoxon Rank-­‐Sum Test.
  9. Analysis of variance: Comparing three or more means: One-­‐way analysis of variance, Two-­‐factor analysis of variance, Two-­‐way analysis of variance with interaction
  10. Correlation and Regression: Correlation, calculating correlation coefficient, coefficient of determination, Spearman’s rank correlation; Linear regression, Least square estimation of regression parameters, distribution of the estimators, assumptions and inferences in regression; analysis of residuals: assessing the model; transforming to linearity; weighted least squares; polynomial regression

Main References:

  • Introduction to Probability and Statistics for Engineers and Scientists by Sheldon Ross, 2nd edition, Harcourt Academic Press.

Other References:

  • Basic and Clinical Biostatistics by Beth Dawson-­‐Saunders and Robert G. Trapp, 2nd edition, Appleton and Lange.
  • John E. Freund’s Mathematical Statistics with Applications by I. Miller & M. Miller, 7th edition, Pearson, 2011.

Past Instructors: Sneh Lata, Suma Ghosh

Probability and Statistics

Core course for BSc (Research) Economics. Students of BSc (Research) Mathematics or any B.Tech. program are not allowed to credit this course.

Prerequisites: Calculus I (MAT 101)

Overview: Probability is the means by which we model the inherent randomness of natural phenomena. This course introduces you to a range of techniques for understanding randomness and variability, and for understanding relationships between quantities. The concluding portions on Statistics take up the problem of testing our theoretical models against actual data, as well as applying the models to data in order to make decisions. This course is a prerequisite for later courses in Advanced Statistics, Stochastic Processes and Mathematical Finance.

Detailed Syllabus:

1.    Probability: Classical probability, axiomatic approach, conditional probability, independent events, addition and multiplication theorems with applications, Bayes’ theorem.
2.    Random Variables: Probability mass function, probability density function, cumulative density function, expectation, variance, standard deviation, mode, median, moment generating function.
3.    Some Distributions and their Applications: Uniform (discrete and continuous), Bernoulli, Binomial, Poisson, Exponential, Normal.
4.    Sequences of Random Variables: Chebyshev’s Inequality, Law of Large Numbers, Central Limit Theorem, random walks.
5.    Joint Distributions: Joint and marginal distributions, covariance, correlation, independent random variables, least squares method, linear regression.
6.    Sampling: Sample mean and variance, standard error, sample correlation, chi square distribution, t distribution, F distribution, point estimation, confidence intervals.
7.    Hypothesis Testing: Null and alternate hypothesis, Type I and Type II errors, large sample tests, small sample tests, power of a test, goodness of fit, chi square test.

Main References:
•    A First Course in Probability by Sheldon Ross, 6th edition, Pearson.
•    John E. Freund’s Mathematical Statistics with Applications by I. Miller & M. Miller, 7th edition, Pearson, 2011.

Other References:
•    Elementary Probability Theory: With Stochastic Processes and an Introduction to Mathematical Finance by Kai Lai Chung and Farid Aitsahlia, 4th edition, Springer International Edition, 2004.
•    Introduction to the Theory of Statistics by Alexander M. Mood, Franklin A. Graybill and Duane C. Boes, 3rd edition, Tata McGraw-Hill, 2001.

Machine Learning through R.

Course description not available.

Any 8 Elective credits whether or not part of the student's Major Core requirements. Other courses may be considered for inclusion in the Electives from time to time.
Graph Theory

A Major Elective for B.Sc. (Research) Mathematics.

Credits (Lec:Tut:Lab)= 3:1:0 (3 lectures and 1 tutorial weekly)

Prerequisites: MAT 160 Linear Algebra I

Overview: Graphs are fundamental objects in combinatorics. The results in graph theory, in addition to their theoretical value, are increasingly being applied to understand and analyze systems across a broad domain of enquiry, including natural sciences, social sciences and engineering. The course does not require any background of the learner in graph theory. The emphasis will be on the axiomatic foundations and formal definitions, together with the proofs of some of the central theorems. Few applications of these results to other disciplines would be discussed.

Detailed Syllabus:

Unit 1

  • Definitions of Graph, Digraph, Finite and Infinite Graph, Degree of a Vertex, Degree Sequence, Walk, Path, Cycle, Clique.
  • Operations on graphs, Complement of a graph, Subgraph, Connectedness, Components, Isomorphism.
  • Regular graph, Complete graph, Bipartite graph, Cyclic graph, Euler graph, Hamiltonian path and circuit, Tree, Cut set, Spanning tree.

Unit 2

  • Planar graph, Colouring, Covering, Matching, Factorization, Independent sets.

Unit 3

  • Graphs and relations, Adjacency matrix, Incidence matrix, Laplacian matrix, Spectral properties of graphs, Matrix tree theorem, Automorphism group of a graph.

Unit 4

  • DFS, BFS for minimal spanning tree, Kruskal, Prim and Dijkstra algorithms.


  • D. West, Introduction to Graph Theory, 2nd ed., PHI Learning, New Delhi, 2009.
  • N. Deo, Graph Theory: With Application to Engineering and Computer Science, PHI Learning, New Delhi, 2012.
  • C. D. Godsil and G. Royle, Algebraic Graph Theory, Springer, New Delhi, 2013.
  • B. Kolman, R.C. Busby, S.C. Ross, Discrete Mathematical Structures, 6th ed., PHI Learning, New Delhi, 2012.
  • F. Harary, Graph Theory, Narosa, New Delhi, 2012.
  • J.A. Bondy and U.S.R. Murty, Graph Theory, Springer, New Delhi, 2013.
  • R.J. Wilson, Introduction to Graph Theory, 4th ed., Pearson Education, New Delhi, 2003.

Past Instructors: Sudeepto Bhattacharya

Linear Algebra

Core course for B.Sc. (Research) Mathematics.

Credits (Lec:Tut:Lab)= 3:1:0 (3 lectures and 1 tutorial weekly)

Prerequisites: Class XII Mathematics

Overview: Linear Algebra provides the means for studying several quantities simultaneously. A good understanding of Linear Algebra is essential in almost every area of higher mathematics, and especially in applied mathematics. A CAS such as Maxima/Matlab will be used throughout the course for computational purposes.

Detailed Syllabus:

  1. Matrices and Linear Systems
  2. Vector Spaces and Linear Transformations
  3. Inner Product Spaces
  4. Determinant
  5. Eigenvalues and Eigenvectors, Diagonalization
  6. Quadratic Forms and Positive Definite Matrices
  7. Applications chosen from: Numerical aspects, Difference equations, Markov matrices, Least squares.


  1. Linear Algebra by Jim Hefferon
  2. Linear Algebra and its Applications by Gilbert Strang, 4th edition, Cengage.
  3. Linear Algebra and its Applications by David C. Lay, 3rd edition, Pearson.
  4. Linear Algebra: A Geometric Approach by S. Kumaresan, PHI, 2011.
  5. Elementary Linear Algebra by Howard Anton and Chris Rorres, 9th edition, Wiley.
  6. Linear Algebra: An Introductory Approach by Charles Curtis, Springer.
  7. Matrix Analysis and Applied Linear Algebra by Carl D Meyer, SIAM.
  8. Videos of lectures by Prof Gilbert Strang: 18.06 Linear Algebra, Spring 2010. (Massachusetts Institute of Technology: MIT OpenCourseWare), http://ocw.mit.edu


Introduction to Computational Physics II

1.Systems of Linear Algebraic Equations: Gauss Elimination Method, LU              
decomposition, Choleski’s Decomposition Method, Symmetric and Banded Coefficient Matrices, Pivoting, Matrix Inversion, Iterative Methods 
2.Symmetric Matrix Eigenvalue Problems: Jacobi Method, Power and Inverse Power Methods, Eigenvalues of Symmetric Tridiagonal Matrices, Computation of Eigenvectors    
3. Two-Point Boundary Value Problems: Shooting Method 
4. Solution of Partial Differential Equations: Separation of variables, Finite            
Difference Method, The Relaxation Method, The matrix method for difference Equations.

Chemical Binding

Quantum mechanics provides the microscopic basis for a fundamental understanding of chemistry, molecular structure, bonding, and reactivity. This course and the associated computer lab provide a comprehensive treatment of valence bond and molecular orbital theories, post Hartree-Fock wave function and density functional methods. Students will learn to compute molecular structures, spectra, and thermochemical parameters for molecules in the gas-phase and for condensed-phase systems.


  • Postulates of Quantum Mechanics
  • Atomic Orbitals and Basis Sets
  • The Born-Oppenheimer approximation and the molecular Hamiltonian
  • The Concept of the Potential Energy Surface
  • Geometry Optimization and Frequency Analysis
  • Semi-empirical and ab initio Quantum Mechanics
  • Variation and Perturbation Theory
  • Valence Bond and Molecular Orbital theories
  • Independent-Particle Models: the Hartree method
  • Spin, statistics and the Pauli principle
  • The Hartree-Fock Self-Consistent Field equations
  • Electron Correlation, Density Matrices and Natural Orbitals
  • Density Functional Theory
  • Periodic systems
  • Implicit and explicit solvent methods
  • QM/MM and ONIOM


  1. Frank Jensen: Introduction to Computational Chemistry (Wiley)
  2. Henry Eyring, John Walter and George E. Kimball: Elementary Quantum Chemistry (John Wiley)
  3. J. N. Murrell, S. F. A. Kettle, J. M. Tedder: Valence Theory  [ELBS & John Wiley]
  4. Richard P. Feynman, Robert B. Leighton & Matthew Sands: The Feynman Lectures on Physics, Vol.III (Addison Wesley Longman)
  5. James B. Foresman, AEleen Frisch, Exploring Chemistry With Electronic Structure Methods: A Guide to Using Gaussian (Gaussian, Inc.)
  6. Errol G. Lewars, Computational Chemistry: Introduction to the Theory and Applications of Molecular and Quantum Mechanics (Kluwer Academic Publishers, 2003)
  7. N. Sukumar, ed. A Matter of Density: Exploring the Electron Density Concept in the Chemical, Biological, and Materials Sciences (John Wiley, Hoboken, NJ, 2013)

Prerequisites: Chemical Principles, Calculus, Linear Algebra, physics, CS.

Co-requisite: Molecular Spectroscopy.

Topics In Nanotechnology

The next few years will see dramatic advances in atomic-scale technology. Molecular machines, nanocircuits, and the like will transform all aspects of modern life - medicine, energy, computing, electronics and defense are all areas that will be radically reshaped by nanotechnology. These technologies all involve the manipulation of structures at the atomic level - what used to be the stuff of fantasy is now reality. The economics impact of these developments has been estimated to be in the trillions of dollars. But, as with all new technologies, ethical and legal challenges will arise in their implementation and further development. This course will examine the science of nanotechnology and place it in the larger social context of how this technology may be, and already is, applied. Underlying physical science principles will be covered in lecture sessions and students will read articles from current news sources and the scientific literature. There will be presentations on scientific literature on topics of student interests, to examine the science and applications of a well-defined aspect of nanotechnology of their choosing. Lecture material will focus on the principles behind modern materials such as semi-conductors (organic, inorganic) and novel nanostructures.


  • Introduction
  • Bulk Vs. Nano
  • Quantum confinement effect
  • Surface area to volume ratio
  • Effect on Properties: Material (electrical, magnetic, mechanical etc.) and structural properties
  • Carbon nano-architectures: Fullerene, SWNT, MWNT, Graphite etc., Classification of structure
  • Q-Dots • Bonding parameters
  • Methods of preparation
  • Nanomaterial’s synthesis: Top down and Bottom up approach, Physical and chemical methods Applications (Nano-machines, solar cells, coatings, MEMS, nano-medicine, sensors, miscellaneous)
  • Characterization Techniques and Instruments: Microscopy SEM, TEM, AFM, X-Ray diffraction, UV-vis, Photoluminescence, Raman, FTIR, ESR, XPS, BET, DLS, Zeta potential
Applied Machine Learning
Applied Machine Learning
Data Mining & Data Warehousing
In this course, we would explore the fundamental data mining methodology, OLTP and OLAP, data pre-processing, association rules mining, clustering, classification, and other advanced topics in the field such as Social impact of Data mining, Recent trends in Data mining research, Challenges and Future Scope, need for Security and Privacy preserving in Data Mining.
Data Mining
Data Mining
Big Data and Cloud Computing
Big Data Management and Analytics has been becoming increasingly important for deriving valuable and actionable insights in in several important and diverse domains such as smart cities, transportation, healthcare and financial services. On the other hand, Cloud computing platforms, such as Hadoop, incorporate the capabilities of processing, managing and analyzing such Big Data in a highly scalable manner. This course is designed to equip students with the fundamentals of Big Data management & analytics (including data mining, machine learning techniques etc.) as well as facilitate them in understanding how Big Data can be efficiently processed in Cloud computing platforms. The course also has a significant “hands-on” lab component, where students will gain exposure to processing and analyzing Big Data on Hadoop. Unit 1: Introduction to Big Data and its applications This unit introduces the concept of Big Data and explains its four dimensions (i.e., volume, velocity, variety & veracity). Then it details several applications of Big Data analytics to motivate the ever-increasing importance of Big Data in today’s world. Applications cover a wide gamut of domains ranging from transportation services to finance to social media. Moreover, it describes how Big Data can represent a high value proposition to businesses as a source of competitive advantage in improving some of their key performance metrics such as market share, profit margins etc. Unit 2: Issues associated with Big Data Management This unit discusses various key issue which arise in the processing of Big Data. Notably, many of these issues also arise while processing data that do not fall under the Big Data category. However, such issues are significantly exacerbated due to the tremendously large volumes and typically high complexity of Big Data. Issues include (but are not limited to) data cleaning, data heterogeneity, data integration, replication, caching, maintenance of data consistency, scalability and so on. The unit also covers the inherent trade-offs associated with each of these issues. Unit 3: Concepts of Cloud computing This unit discusses the key concepts and principles of Cloud Computing. It also incorporates detailed information about Cloud-related terminology. The topics covered in this unit include (but are not limited to) pros and cons of Cloud computing, Cloud architecture, Cloud service models (IaaS, PaaS, SaaS), Cloud applications (Azure, AWS etc.), effective resource allocation and cost efficiencies in Cloud computing, multitenancy and so on. Unit 4: Hadoop and MapReduce This unit covers the key concepts of Hadoop and MapReduce for solving real-world analytics problems associated with Big Data. The topics covered in this unit include (but are not limited to) Hadoop Distributed File system and several key Hadoop-related modules or software packages such as Hive, Pig, HBase, Spark, Flume, Sqoop, Oozie etc. Students will not only understand the concepts of these Hadoop packages, but also engage in some hands-on development work on these modules to gain a deeper level of expertise. Unit 5: Data Models & NoSQL This unit discusses the four key data models that are important for handling Big Data. The models are key-value DB, column-family DB, document DB and graph DB. For each of these data models, the unit will cover some of the important real-world technologies from both a theoretical perspective as well as from a practical hands-on point of view. Examples include HBase, Cassandra, Hypertable, BigTable, Dynamo DB, Mongo DB, Neo4J, Redis etc. This unit will also present the various trade-offs associated with selecting an appropriate data model based on issues such as the requirements of the respective applications, the specific properties of the underlying data, complexity of performing analytics and scalability. Unit 6: Big Data Strategy and Implementation This unit examines the business and strategic perspective of Big Data. Topics covered in this unit include (but are not limited to) a brief overview of some of the fundamental concepts of business strategy & business intelligence, understanding the key requirements of the relevant stakeholder(s), defining a Big Data strategy & creating plans for implementing the strategy, selecting appropriate Big Data tools and technologies based on the requirements of stakeholder(s) and cost-benefit trade-offs, maximizing the benefits obtaining by analyzing Big Data and maintaining a sustainable competitive advantage in the market.
Intro. to Machine Learning
Course Summary The course introduces the basic concepts, techniques and tools for designing programs that learn from data. Course Aims a) Understand different types of data. b) Learn how to construct models that can predict from data (supervised learning) and organize data into coherent categories (unsurpervised learning). c) Understand where and how machine learning can go wrong. Learning Outcomes On successful completion of the course, students will be able to: Build models for prediction and data organization from data. Learn to use basic ML libraries. Understand the basic theories and concepts that underly machine learning. Curriculum Content Topics: The learning problem. Types of learning. Training, validation, testing, generalization, overfitting. Features and feature engineering, dimensionality reduction. Bayesian decision theory. Parametric methods. Tree models. Linear models. SVMs and kernel based models. Nearest neighbour models. Markov models. Neural network models. Ensemble methods - boosting, bagging, voting schemes. Distance metrics and cluster based models. The topics in the course will not be covered in linear order. They will be inter-twined to make machine learning easy to understand and hopefully the progression will be fairly logical. Teaching and Learning Strategy Lectures, demonstrations, targeted assignments on conceptual material, term project for integrating the various parts of the course. ASSSESSMENT. Assessment Strategy Midsem (20%), Endsem (35%), programming assignments (15%), term project (30%). Mapping of Learning Outcomes to Assessment Strategy (For each learning outcome listed in Item 12, describe the formative and summative assessment strategy) The midsem and endsem exams will test grasp of theoretical concepts. The assignments will test use of libraries and tools to build and test models. The term project will test the ability to build an end to end system starting from possibly noisy data to construct a high performance model. References a) Ethem Alpaydin, Introduction to Machine Learning, 3rd Ed., MIT Press, 2014. b) Peter Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data, CUP, 2012. c) Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012. d) S Kulkarni, G Harman, An Elementary Introduction to Statistical Learning Theory, Wiley, 2011.
Mathematical Modelling

Mathematical modeling is the science and art of addressing real-world problems many other scientific disciplines. The practice of mathematical modeling inherently captures the interdisciplinary nature of the real-world phenomena, and thus appropriate for students from all disciplines. This course is designed to introduce students to fundamental concepts and methods of mathematical modeling, through a hands on, project-oriented approach. The applications studied will motivate the mathematics covered, contrary to traditional math courses.

Core course for B.Sc. (Research) programs in Mathematics.

Credits (Lec:Tut:Lab)= 3:1:0 (3 lectures and 1 tutorial weekly)

Prerequisites: Class XII mathematics