About AI/ML Tools and Pipelines


About AI/ML Tools and Pipelines

What Do I Need and How Will It Fit?

What tools for AI/ML should be adopted in my organization and how to integrate advanced analytics in data architecture? Implementations are driven by business cases with various technological requirements. There are plenty of options in the market and different SaaS products have characteristic strengths and weaknesses. Existing architecture is a significant factor in decision making as well.

Limited Use of AI/ML within Reporting Tools

Although programming language support and AI/ML capabilities exist for reporting tools, there are certain limitations and hindrances to them. For example, writing R scripts in Tableau requires one to adopt a product-specific workflow and programming logic.

Reporting software can still be utilized to produce small-scale solutions. One of the common use cases for advanced analytics in reporting is key figure forecasting. URL-based integration also allows to embed AI/ML applications in reporting. For example, interactive Shiny apps ® dashboards can be included in Tableau reports. However, these are minimal implementations.

Shortcomings of Graphical AI/ML Tools

Graphical AI/ML utilities, such Azure ML Studio and RapidMiner, are a step up from reporting tools, but they still lack flexibility that is necessary to fulfil large-scale production requirements. Despite having support for R and Python, this is not the standard way to use graphical tools for AI/ML, which reflects to associated usability.

When it comes to training workloads, adding a powerful computation engine on top of other features has not been sufficient for RapidMiner to remain relevant. This is partially because the industry is taken over by end-to-end design and seamlessly conacatenated cloud products from source to consumption.

Finally, mere REST API model deployment without scalability is often not good enough for real-time implementations. On the contrary, IaaS-based solutions for scaling are too tricky to maintain for many organizations. Such solutions also require extra DevOps programming work compared to standardized cloud products for the purpose.

Microsoft Azure Cloud Platform for Scalability, Power and End-To-End Features

Cloud-based programming environments have invaded the AI/ML scene. These products provide calculation power, scaling features and end-to-end readiness. Model training may necessitate a true computation cannon to be swift enough. Furthermore, it is sometimes required for models to be consumed by X thousands of users with a minimal response time. Reasons to prefer such SaaS or MLaaS (machine learning as a service) solutions over custom applications include cloud platform compatibility, ease of maintenance and standardization.

AI tools

Note: Model training in Spark is available for Azure ML Service. However, Databricks a more comprehensive tool for AI/ML development work.

Azure Databricks – Where Data Science Meets Data Engineering

Demand for large scale training computation loads can be met by employing a Spark-driven tool called Azure Databricks. It supports role-based access control and allows data scientists, data engineers and other people involved to collaborate in advanced analytics projects. Developers can write R, Scala, Python and SQL in Databricks notebooks. The resulting AI/ML modelling pipelines can be scheduled by using Azure Data Factory. Version control is typically managed through Azure DevOps.

Note: Databricks is available in AWS too. 

Real-Time Scenario

Scalability requirements for a large user base and stable real-time performance can be addressed by adding Azure ML Service to the chain of sequentially employed cloud products. A typical way to do this would be deploying the solution to Azure Container Service. Kubernetes clusters are often employed in this scenario, but other deployment targets are supported too. Additional custom features can be built inside the web service.

Batch Scenario

If real-time responses are not required by the AI/ML business case, Azure building blocks can be used to generate a batch forecasting pipeline. This is a common scneario where Azure Databricks trains the AI/ML model and writes a batch of forecasts to a pre-defined table. Once again, Databricks workloads can be scheduled with Data Factory. The forecast table is consumed by a reporting tool, such as Microsoft Power BI.

Concluding Remarks

Although AI/ML development is business case driven, cloud environment for POCs and production solutions is also a strategic asset. Modern cloud-hosted solutions provide means to build production ready advanced analytics pipelines. They can also be extended with additional features stemming from business needs and technological landscape. Some of the early AI/ML adopters may have to shift from custom IaaS solutions towards end-to-end cloud platforms to ensure architecture viability in the long-term.

Contact Person

Blog writer

Pekka Tiusanen

Bilot Alumnus

Vincit Bilot

Bilot & Vincit have joined forces!

See where the story continues 

You have Successfully Subscribed!

Vincit Bilot

Bilot & Vincit have joined forces!

See where the story continues 

You have Successfully Subscribed!