Consulting services in algorithms, computer vision and data driven systems
I am an expert and specialist in Machine Learning, Deep Learning and numerically optimized solutions for video analytics and computer vision. With experiences as a senior computer vision engineer, Technical Lead and entrepreneur I have the knowledge and ability to help your team build robust, reusable and effective algorithms with focus on the business value.
I provide services at all levels from hands-on PoC implementation and system design to strategical planning. If you are in the fields of machine learning, edge analytics, AI for embedded, video analytics or computer vision then my consulting services are for you. Regardless if the outcome is a toolset, a software deployment unit or a process it will be tailored and applicable to solve your specific needs. As a specialist focusing on the machine learning area, I have an impressive amount of field experience and know-how to navigate the minefields of machine learning and MLOps. By leveraging my experience and knowledge I deliver fast results with major impact on your business problems.
Developer education and system design
Are you getting into a new domain within machine learning, computer vision, MLOps or camera analytics? I will help you design the system, find the best tools and API's to use and make sure that your developers get a flying start.
Are you already in the game and experiencing growing pains or just looking for ways to make your machine learning scale better? I help organizations and departments to steer the efforts in order to increase capabilities, agility and robustness. This is done e.g. by introducing MLOps best practices and DevOps.
Are you a start-up with a critical PoC demonstrator or committed to deliver with a tight dead-line? As a hired gun on small but critical projects I can do algorithm implementation, benchmarking and optimization or prototyping.
Where data science meets software engineering
A machine learning solution requires an extensive data collection effort, data and algorithm exploration to avoid bias or cross domain issues, significant annotation work and organization of datasets, algorithm versions and validation results. Knowing what tools and libraries exists, how custom tools should be implemented and the best design practices and patterns will:
Create shorter development cycles and reduce time to market
Improve robustness and reliability
Streamline the operations and governance process
Reduce developer and QA efforts by reusing flexible modules and components
Where business meets tomorrow's technology
Transforming a problem into an opportunity demands detailed knowledge about technology strengths and weaknesses, what infrastructure and data is needed and what resources needs to be involved. Business value comes from a solution that can scale sufficiently, perform good enough where it is important and be predictable or explainable in critical decisions. A solution that does not integrate in the correct ecosystem can loose its value, something that is seen by the many deep learning models that never reach production. This understanding affects the system that should be built and among many others the following processes:
Targeting the data that should be collected
Choosing a domain for data annotation
Defining test KPIs and validation parameters
Prioritizing requirements, system designs and deployment domains
Data for a better product
Much of the recent research show that in deep learning the data has a higher impact on model performance than architectural code changes. I have designed systems and tools to visualize and slice datasets, mine hard examples or infrequent outliers and automate data collection loops where edge devices can be instructed to sample data according to centrally decided strategies set by a cloud hosted rule engine. Data collection strategies can be implemented as simple histogram equalizations to sample balanced datasets from unbalanced processes, they can use edge predictions combined with heuristics, e.g., flickering predictions, to sample data with a high probability of prediction error, or they can be based on the model validation error rate to focus the improvement to scenarios where the model is weak. The collection strategy can be deployed and exposed to the devices as an AWSLambda Function with an API Gateway proxy. The data samples can be uploaded to AWS S3 storage service using presigned URLs.
Rapid prototyping on streaming data
Algorithm development and experimentation is done in batch mode on image datasets stored in a data lake, network storage or cloud based object storage. Working with camera analytics, the deployment environment is very different; images are streamed sequentially in real time, some target platforms have multiple camera streams connected to the same inference environment while some targets does the inference on the camera device itself. These considerations can cause the inference environment to be implemented in C or C++ while the algorithm development is performed in Python. In a proof of concept where it is extra important that the iteration cadence is high and changes can be made in any environment and integrated back to the development code, running python as an integrated component in a C or C++ system can be a viable option. Python can either be integrated with C/C++ libraries for e.g. video capture and decoding by using extensions, alternatively a python interpreter can be embedded into a C/C++ application. If needed, python can be cross-compiled for foreign target architectures.
Reusable and reproducible ML
A machine learning model is developed for a specific use-case, e.g. detecting the full body pose of a human in order for a system to react to certain actions. In another project a person should be tracked using the coordinates of the head (since this reduces the problem with occlusions) but the position of the persons feet should be used as the anchor coordinate. The problem as seen from a deep learning perspective is very similar and the same architecture can be modified with the correct number of output layers for the joints to be predicted. The preprocessing of the images is the same. The post processing of the joints is however different and the dataset is different. A monolithic script can make reusability so hard that it is faster to start from scratch to develop the new experiment than to reuse the prior. By using best practices from software engineering and frameworks for pipeline design the experiments can however be build in a modular and flexible approach. Datasets can be abstracted using defined interfaces making them easy to swap. Model architectures can be designed so that they are easy to inherit and extend. Projects can be built using standardized scaffolding and file structures making it easy to move between projects. A tool that can be used to encourage designs like this is Kedro.
Datasets and data versioning
You can't navigate if you don't know where you are. Measuring metrics from your data and models is thus fundamental to creating good products. If metrics should be comparable, each parameter needs to be controlled independently. If the model is changed, the metrics needs to be compared using the same data. If the data is changed, the metrics needs to be compared using the same models. Doing this requires you to have well-specified datasets where each sample is unmodified and each dataset has the same exact samples included. Dataset versioning is an absolute necessity to achieve this. Individual samples can be modified, e.g. image crops can be updated or bounding boxes can be corrected, or datasets can be changed in definition either by adding new interesting data or by removing bad data samples. Tracking and versioning the data allows fair comparisons despite the volatility of the datasets. By also versioning and tracking the output data of each step in a pipeline the refactoring and optimization of algorithms can simply be verified by rerunning the pipeline and assuring that the output data is bit-exact to the result prior to the code change. Data versioning can be done with anything from simple tools like DVC that tracks hashes to tools that visualizes differences in the image domain.
Focus on empowering clients
My work often starts off by spearheading the team creating more performent solutions, more predictable solutions or better scalable solutions. As the solution proves itself I shift my focus to capacity-building and empowering the rest of the team in the processes, the theoretical background and the resources needed so that they can apply the same principles to maintain, improve and further develop their solutions. Every solution, every product and every feature needs a driving force, therefore my end goal is always to get someone to claim the full ownership of the results.
I work for long-term relationships and make sure that each transaction is a two-way street. Sounds interesting? Let's talk.