<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Aleksandar Ivanovski's Blog]]></title><description><![CDATA[Engineer with a diverse skill set. Focused on delivering significant results through innovative and comprehensive approach.]]></description><link>https://blog.ivanovski.me</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1644956388353/YX04F0YISp.png</url><title>Aleksandar Ivanovski&apos;s Blog</title><link>https://blog.ivanovski.me</link></image><generator>RSS for Node</generator><lastBuildDate>Sat, 06 Jun 2026 16:41:56 GMT</lastBuildDate><atom:link href="https://blog.ivanovski.me/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[But it works on my machine?! ONNX 101]]></title><description><![CDATA[PyData Skopje Chapter
After a significant pause of this chapter, it's finally revived. So if you are located in close geographic proximity or are interested to remotely follow the events, join the official Meetup Group.
Reflecting on the exhilarating...]]></description><link>https://blog.ivanovski.me/pydata-skp-noe-23</link><guid isPermaLink="true">https://blog.ivanovski.me/pydata-skp-noe-23</guid><category><![CDATA[Python]]></category><category><![CDATA[PyData]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[conference]]></category><dc:creator><![CDATA[Aleksandar Ivanovski]]></dc:creator><pubDate>Fri, 01 Dec 2023 06:32:22 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1701340770894/ee092aeb-30ed-447c-922a-f8e01fdf35d5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-pydata-skopje-chapter">PyData Skopje Chapter</h2>
<p>After a significant pause of this chapter, it's finally revived. So if you are located in close geographic proximity or are interested to remotely follow the events, join the official <a target="_blank" href="https://www.meetup.com/pydata-skopje/">Meetup Group</a>.</p>
<p>Reflecting on the exhilarating experience of spearheading the community's revival during the first event post-hiatus! It was an honor to contribute to reigniting the community spirit. Our exploration into the realms of interoperability and operationalizing ML models with <a target="_blank" href="https://onnx.ai/">ONNX</a> was truly rewarding. The added excitement of delving into the seamless deployment capabilities offered by <a target="_blank" href="https://onnxruntime.ai/">ONNX Runtime</a> made this journey unforgettable. A heartfelt thanks to everyone who joined the discussion, and let's look forward to many more insightful conversations on the horizon!</p>
<h2 id="heading-synopsis-of-my-talk">Synopsis of my talk</h2>
<h3 id="heading-challenges-taming-the-complexity"><strong>Challenges: Taming the Complexity</strong></h3>
<p>First off, let's acknowledge the challenges we face in the ever-expanding universe of deep learning. Multiple frameworks like TensorFlow and PyTorch, coupled with varied training accelerators (think T4 GPUs and VPUs), make the landscape intricate. This is where ONNX steps in, initiated by tech giants AWS, Microsoft, and Meta. It acts as a unifying force, providing an abstraction layer over frameworks and hardware, reducing the complexity of integrating these different components.</p>
<h3 id="heading-design-principles-flexibility-and-standardization"><strong>Design Principles: Flexibility and Standardization</strong></h3>
<p>ONNX isn't just another acronym; it's a set of design principles that support both deep learning and traditional machine learning. These principles aim to be flexible enough to keep up with the rapid advances in AI while providing a standardized, cross-platform representation for serialization. Imagine it as the Common Language Runtime for programming languages, minimizing the number of moving parts in the deep learning landscape.</p>
<h3 id="heading-understanding-the-onnx-specification"><strong>Understanding the ONNX Specification</strong></h3>
<p>So, what's under the hood of ONNX? The ONNX specification reveals a structured format where each computational graph is a directed acyclic graph. Nodes represent inputs, outputs, and operators, with metadata documenting crucial details about the model and its production environment. Data types, including tensor types and non-tensor types in ONNX-ML, showcase the framework's versatility.</p>
<h3 id="heading-operators-the-essence-of-onnx"><strong>Operators: The Essence of ONNX</strong></h3>
<p>Now, let's talk about operators – the heart and soul of ONNX. These are defined by name, domain, and version, and they encapsulate the essence of various operations. Take, for instance, the Relu and Abs operators, which I'll be diving into during the presentation. These examples demonstrate input-output relationships and type constraints, showcasing the elegance and power of ONNX.</p>
<h3 id="heading-practical-demos-bridging-theory-and-application"><strong>Practical Demos: Bridging Theory and Application</strong></h3>
<p>Enough theory; let's get hands-on! In <a target="_blank" href="https://github.com/Aleksandar1932/onnx-101/blob/master/onnx_workshop/mnist.py">Demo 1</a>, we'll walk through defining a PyTorch model, training it, exporting it to ONNX, and visualizing the resulting graph. The practical application comes to life as we tackle a real-world problem – training a classifier for the MNIST dataset using a convolutional neural network in PyTorch.</p>
<h3 id="heading-onnx-runtime-bridging-the-gap"><strong>ONNX Runtime: Bridging the Gap</strong></h3>
<p>What about deployment? Enter ONNX Runtime, the bridge between the ONNX file format and deploying the graph on different hardware. Microsoft takes the lead here, maintaining ONNX Runtime as a distinct project from the ONNX specification governed by the Linux Foundation AI.</p>
<p><img src="https://lh7-us.googleusercontent.com/itmyqmU7sL1uzxaTcVQ1faj1gET1ZvsRPkflAtsTwia1c5XRcgtz7wZ2P-JCqmA1LftUw8hsS5hXbluyy6Z40Ams3bJfFaISWwS9g6DgS_jZlX9JR6zh9DUO8NKjGUmF9s0yA0jpyE7fEgrg_OrofpKHKw=s2048" alt class="image--center mx-auto" /></p>
<h3 id="heading-interoperability-onnx-in-action"><strong>Interoperability: ONNX in Action</strong></h3>
<p>But ONNX isn't limited to theoretical discussions. We'll explore interoperability, touching on the ONNX Model Zoo, Azure Cognitive Services, and methods to convert existing models or train from scratch. Practical tools like Netron and VisualDL will be your companions in understanding and visualizing ONNX models.</p>
<h3 id="heading-demo-2-onnx-in-the-real-world"><strong>Demo 2: ONNX in the Real World</strong></h3>
<p>The finale? <a target="_blank" href="https://github.com/Aleksandar1932/onnx-101/tree/master/ort-web-api">Demo 2</a>, where we take the model from Demo 1, integrate ONNX Runtime in JavaScript, and build an inference REST API with Express.js. This real-world application underscores ONNX's cross-language applicability, bringing everything full circle from theory to practice.</p>
<h3 id="heading-join-the-onnx-adventure"><strong>Join the ONNX Adventure</strong></h3>
<p>I'm beyond excited to share this ONNX adventure with you. Whether you're a seasoned deep learning enthusiast or just dipping your toes into the data science world, this journey promises to offer insights, practical knowledge, and a newfound appreciation for the unifying force that ONNX brings to the table.</p>
]]></content:encoded></item><item><title><![CDATA[Unlocking Bonsai Brilliance]]></title><description><![CDATA[Background
In the realm of software development, the pursuit of observability and data-driven decision-making has become a fundamental aspect of building and maintaining robust microservices architectures. Drawing inspiration from this ethos, I sough...]]></description><link>https://blog.ivanovski.me/unlocking-bonsai-brilliance</link><guid isPermaLink="true">https://blog.ivanovski.me/unlocking-bonsai-brilliance</guid><category><![CDATA[arduino]]></category><category><![CDATA[smart home]]></category><category><![CDATA[iot]]></category><dc:creator><![CDATA[Aleksandar Ivanovski]]></dc:creator><pubDate>Mon, 03 Jul 2023 18:39:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1688300105659/b539bcf0-0ddf-434a-8d8c-a26deaa51641.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-background">Background</h2>
<p>In the realm of software development, the pursuit of observability and data-driven decision-making has become a fundamental aspect of building and maintaining robust microservices architectures. Drawing inspiration from this ethos, I sought to apply these principles to a rather unexpected domain - the care and cultivation of my cherished Bonsai tree.</p>
<p>Similar to the complex systems we engineer, the art of growing a Bonsai tree involves a multitude of parameters that must be carefully balanced for optimal results. However, fine-tuning these variables can be a time-consuming process. Fortunately, certain rules of thumb exist within this horticultural art form that provide valuable guidance in our quest for nurturing these miniature living masterpieces.</p>
<blockquote>
<p>Be sure not to <strong><em>water</em></strong> your <strong><em>tree</em></strong> if the soil is still wet, but don't let the <strong><em>tree</em></strong> dry out either.</p>
<p>As a beginner, use your fingers at about one centimeter deep, (0.4") to check the soil moisture. If it's slightly dry, go ahead and water your tree.</p>
</blockquote>
<p>followed by:</p>
<blockquote>
<p>Avoid watering all of your trees on a daily routine, until you know exactly what you are doing.</p>
</blockquote>
<p>This sounds not too exact (for me at least) given my objective is to have a good-looking tree on my desk with healthy and green leaves</p>
<h2 id="heading-requirements">Requirements</h2>
<p>To get started you'll need:</p>
<ul>
<li><p>Arduino-like Board</p>
</li>
<li><p>Soil-Moisture Sensor</p>
</li>
<li><p>Wi-Fi module (optional)</p>
</li>
</ul>
<p>In this project, <strong>NodeMCU V2 ESP8266</strong> paired with <strong>Soil Moisture Hygrometer Detection Humidity Sensor</strong> was used.</p>
<h2 id="heading-diagram">Diagram</h2>
<p>The schematics are pretty straight-forward and as follows:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688292144995/483ecf27-37af-4026-aa4c-761981ee0f80.webp" alt class="image--center mx-auto" /></p>
<p>I've settled on analogous mode for the sensor.</p>
<h2 id="heading-code">Code</h2>
<p>The prerequisites of this step are to connect your development board to a serial monitor and decide on a Baud Rate (in this example 9600). The source code is available on <a target="_blank" href="https://github.com/Aleksandar1932/overkill-bonsai">Aleksandar1932/overkill-bonsai</a>.</p>
<p>The next step is to write a lib that provides an API to the soil moisture sensor.</p>
<p>Stating by taking defining a few constants in <code>lib/Constants /Constants.cpp</code>:</p>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> sensorPower 0</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> sensorPin A0</span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> WET_THRESHOLD 500     <span class="hljs-comment">// Define max value we consider soil 'wet'</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> DRY_TREHSHOLD 750     <span class="hljs-comment">// Define min value we consider soil 'dry'</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> MEASURE_INTERVAL 1000 <span class="hljs-comment">// Define how often we check soil moisture (milliseconds)</span></span>
</code></pre>
<p>Next is the <code>lib/Moisture /Moisture.cpp</code>:</p>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;Arduino.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;Constants.h&gt;</span></span>

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">readSensor</span><span class="hljs-params">()</span>
</span>{
    digitalWrite(sensorPower, HIGH);
    delay(<span class="hljs-number">10</span>);
    <span class="hljs-keyword">int</span> val = analogRead(sensorPin);
    digitalWrite(sensorPower, LOW);
    <span class="hljs-keyword">return</span> val;
}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">setupSoilMoistureSensor</span><span class="hljs-params">()</span>
</span>{
  pinMode(sensorPower, OUTPUT);
  digitalWrite(sensorPower, LOW);
}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">logMoisture</span><span class="hljs-params">(<span class="hljs-keyword">int</span> moisture)</span>
</span>{
  Serial.print(<span class="hljs-string">"Analog Output: "</span>);
  Serial.println(moisture);

  <span class="hljs-comment">// Determine status of our soil</span>
  <span class="hljs-keyword">if</span> (moisture &lt; WET_THRESHOLD)
  {
    Serial.println(<span class="hljs-string">"Status: Soil is too wet"</span>);
  }
  <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (moisture &gt;= WET_THRESHOLD &amp;&amp; moisture &lt; DRY_TREHSHOLD)
  {
    Serial.println(<span class="hljs-string">"Status: Soil moisture is perfect"</span>);
  }
  <span class="hljs-keyword">else</span>
  {
    Serial.println(<span class="hljs-string">"Status: Soil is too dry - time to water!"</span>);
  }
}
</code></pre>
<p>At this point, the core API is defined, and the reset is implementing the presentation layer that will allow us to interact with the sensor. For this I've used <a target="_blank" href="https://github.com/esp8266/Arduino/blob/master/libraries/ESP8266WebServer/src/ESP8266WebServer.h">ESP8266WebServer</a> for the web server and <a target="_blank" href="https://github.com/esp8266/Arduino/blob/master/libraries/ESP8266WiFi/src/ESP8266WiFiMulti.h">ESP8266WiFiMulti</a> for Wi-Fi connectivity.</p>
<p>The web server implements two handlers</p>
<ul>
<li><p>on <code>/</code> will return json response containing the soil-moisture reading alongside the status (<code>wet</code>, <code>perfect</code> and <code>dry</code>) determined by the thresholds.</p>
</li>
<li><p>on <code>/display</code> will return an HTML response containing the Moisture and some "prettier" UI.</p>
</li>
</ul>
<p>The presentation layer alongside rest of the logic is as follows:</p>
<pre><code class="lang-cpp"><span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;Arduino.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;ESP8266WiFi.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;WiFiClient.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;ESP8266WebServer.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;ESP8266WiFiMulti.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;ESP8266mDNS.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;Moisture.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;Constants.h&gt;</span></span>

<span class="hljs-function">ESP8266WebServer <span class="hljs-title">server</span><span class="hljs-params">(<span class="hljs-number">80</span>)</span></span>;
ESP8266WiFiMulti wifiMulti;

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">handleMeasurement</span><span class="hljs-params">()</span></span>;
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">handleDisplayHTML</span><span class="hljs-params">()</span></span>;
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">connectToWifi</span><span class="hljs-params">()</span></span>;
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">handleDisplayPrettyHTML</span><span class="hljs-params">()</span></span>;

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">setup</span><span class="hljs-params">()</span>
</span>{
  Serial.begin(<span class="hljs-number">9600</span>);
  connectToWifi();
  setupSoilMoistureSensor();
  server.on(<span class="hljs-string">"/"</span>, handleMeasurement);
  server.on(<span class="hljs-string">"/display"</span>, handleDisplayHTML);
  server.begin();
}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">loop</span><span class="hljs-params">()</span>
</span>{
  <span class="hljs-keyword">int</span> moisture = readSensor();
  logMoisture(moisture);
  delay(MEASURE_INTERVAL);
  server.handleClient();
}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">handleMeasurement</span><span class="hljs-params">()</span>
</span>{
  <span class="hljs-keyword">int</span> moisture = readSensor();
  server.send(<span class="hljs-number">200</span>, <span class="hljs-string">"application/json"</span>, <span class="hljs-string">"{\"moisture\": "</span> + String(moisture) + <span class="hljs-string">", \"status\": \""</span> + (moisture &lt; WET_THRESHOLD ? <span class="hljs-string">"wet"</span> : (moisture &gt;= WET_THRESHOLD &amp;&amp; moisture &lt; DRY_TREHSHOLD ? <span class="hljs-string">"perfect"</span> : <span class="hljs-string">"dry"</span>)) + <span class="hljs-string">"\"}"</span>);
}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">handleDisplayHTML</span><span class="hljs-params">()</span>
</span>{
  server.send(<span class="hljs-number">200</span>, <span class="hljs-string">"text/html"</span>, <span class="hljs-string">"&lt;html&gt;&lt;head&gt;&lt;title&gt;ESP8266 Soil Moisture Sensor&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;h1&gt;Aleksandar's Bonsai&lt;/h1&gt;&lt;p&gt;Moisture: "</span> + String(readSensor()) + <span class="hljs-string">"&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;"</span>);
}

<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">connectToWifi</span><span class="hljs-params">()</span>
</span>{
  wifiMulti.addAP(getenv(<span class="hljs-string">"WIFI_SSID"</span>), getenv(<span class="hljs-string">"WIFI_PASSWORD"</span>));
  Serial.println(<span class="hljs-string">"Connecting ..."</span>);

  <span class="hljs-keyword">while</span> (wifiMulti.run() != WL_CONNECTED)
  {
    delay(<span class="hljs-number">250</span>);
    Serial.print(<span class="hljs-string">'.'</span>);
  }
  Serial.println(<span class="hljs-string">'\n'</span>);
  Serial.print(<span class="hljs-string">"Connected to "</span>);
  Serial.println(WiFi.SSID());
  Serial.print(<span class="hljs-string">"IP address:\t"</span>);
  Serial.println(WiFi.localIP());

  <span class="hljs-keyword">if</span> (MDNS.begin(getenv(<span class="hljs-string">"MDNS_HOSTNAME"</span>)))
  {
    Serial.println(<span class="hljs-string">"mDNS responder started"</span>);
  }
  <span class="hljs-keyword">else</span>
  {
    Serial.println(<span class="hljs-string">"Error setting up MDNS responder!"</span>);
  }
}
</code></pre>
<p>Additionally, I've used <a target="_blank" href="https://www.ionos.com/digitalguide/server/know-how/multicast-dns/">mDNS</a> for development purposes.</p>
<h2 id="heading-determining-optimal-thresholds">Determining optimal thresholds</h2>
<p>To determine the thresholds I've used this <a target="_blank" href="https://bonsairesourcecenter.com/moisture-meter-chart/">chart</a> as a reference, but the freedom is yours to fine-tune to achieve optimal results (whatever your objective is).</p>
<h2 id="heading-final-thoughts">Final thoughts</h2>
<p>The integrated solution turned out to look as:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688293822498/42be81d2-e37e-444a-8e00-b8069531d4c0.jpeg" alt class="image--center mx-auto" /></p>
<p>This implementation serves as an initial foundation for an IoT project, with future plans to expand its capabilities. I intend to enhance the system by integrating a water pump and canister, enabling automated watering when the soil moisture threshold is low. Moreover, I plan to incorporate a feedback loop using a photo-sensor directed at the leaves, providing insights into the tree's performance and enabling the determination of an optimal threshold.</p>
<p>In addition to these advancements, I aim to integrate the board into the Tuya platform, replacing the existing presentation layer. This integration will seamlessly incorporate the Bonsai care system into my smart-home ecosystem, enhancing its accessibility and integration with other connected devices.</p>
]]></content:encoded></item><item><title><![CDATA[MLOps - a new hot word or a necessity?]]></title><description><![CDATA[Background
Historically speaking, people have always craved to discover patterns within their environments. The simplest form of pattern recognition can be identified in the earliest days of mankind. Centuries later, starting in the third industrial ...]]></description><link>https://blog.ivanovski.me/mlops-a-new-hot-word-or-a-necessity</link><guid isPermaLink="true">https://blog.ivanovski.me/mlops-a-new-hot-word-or-a-necessity</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[ML]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[statistics]]></category><category><![CDATA[Computer Science]]></category><dc:creator><![CDATA[Aleksandar Ivanovski]]></dc:creator><pubDate>Tue, 15 Feb 2022 20:42:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1642656100719/mDoInNg9-V.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-background">Background</h1>
<p>Historically speaking, people have always craved to discover patterns within their environments. The simplest form of pattern recognition can be identified in the earliest days of mankind. Centuries later, starting in the third industrial revolution data began to manifest in a digital form, and so recognizing patterns within these digital data. </p>
<p>Today, we are witnesses of machine learning models embedded in nearly every application we use, starting from entertainment games and going all the way up to the most industry-grade applications. Academia and Industry have both joint forces and put enormous amounts of resources into research and development of state of the art models (SOTA), many of them available open-source. Those models combined with transfer learning and a vast amount of open data, allow one to develop machine learning applications within an extremely short time frame after the initial idea is defined. Those ideas framed into applications spawn new startups at a rate never seen before.</p>
<p>Every model (from failed to SOTA) starts in a research environment, commonly in a notebook. As the project starts being more serious, utility libraries are generalized for further reuse, scripts for training, deployment, evaluating, and visualizing results are written, logging at several levels embodied, and when put into production, performance monitoring setup being built. Tasks such as retraining and prevention of performance degradation are often part of those monitoring setups, which analyze input drift or evaluate through ground truth data. </p>
<p>After the research phase, much of the focus and effort is pure software engineering. Historically, processes and technologies, commonly recognized in the industry have revolutionized and modernized software development. They appear in different flavors of agile methodologies, review and release processes, version control, conventions, CI/CD... Big players in the industry frequently share their experiences about the adoption of proces through various forms of media. Such example being <a target="_blank" href="https://research.google/pubs/pub45424/">Why Google Stores Billions of Lines of Code in a Single Repository?</a>.</p>
<blockquote>
<p>Data Scientists should focus on exploring and developing models. All the other stuff should be uniform and standardized. Developing models should be put beyond Jupyter Notebooks, and custom scripts.</p>
</blockquote>
<p>Data should be versioned, results visualized, training monitored, real-time reports available for all stakeholders. All and all, a step forward should be given towards the explainability and interoperability of results. This can be done (and it is currently done) from scratch within every organization and team, and it works perfectly. Data is versioned following conventions and rules, scripts for model evaluation and reports are developed, monitoring dashboards and reports for stakeholders are drafted. Now, let's shift scope and imagine if every software company developed its version control system and container technology - the industry would be a total mess.</p>
<h1 id="heading-mlops">MLOps</h1>
<h2 id="heading-what-is-it">What is it?</h2>
<p>A formal definition states, MLOps is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1641674221413/B3Jpth_Hs.png" alt="image.png" /></p>
<h2 id="heading-who-does-it">Who does it?</h2>
<p>In software development, there are many different roles, each of them carrying responsibilities and perceiving the same products from different aspects. In a data science project the following ones can be differentiated:</p>
<ul>
<li>Business Analyst or Domain Expert</li>
<li>Data Analyst</li>
<li>Software Engineers and Developers</li>
<li>Data and ML Architects</li>
<li>Data Engineers</li>
<li>MLOps Engineers</li>
<li>Optimization engineers</li>
<li>Data Scientists</li>
</ul>
<h2 id="heading-what-are-the-common-tasks">What are the common tasks?</h2>
<p>The first attempt to identify the problems in current ML applications workflow was done by a group of researchers at Google in their paper <a target="_blank" href="https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf">Hidden Technical Debt in Machine Learning Systems</a>.</p>
<p>This table summarizes some of the core MLOps principles. Full table and text are available at <a target="_blank" href="https://ml-ops.org/content/mlops-principles#summary-of-mlops-principles-and-best-practices">ml-ops.org</a>.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>MLOps Principles</td><td>Data</td><td>ML Model</td><td>Code</td></tr>
</thead>
<tbody>
<tr>
<td>Versioning</td><td>1) Data preparation pipelines<br />2) Features store<br />3) Datasets<br />4) Metadata</td><td>1) ML model training pipeline<br />2) ML model (object)<br />3) Hyperparameters<br />4) Experiment tracking</td><td>1) Application code<br />2) Configurations</td></tr>
<tr>
<td>Testing</td><td>1) Data Validation (error detection)<br />2) Feature creation unit testing</td><td>1) Model specification is unit tested<br />2) ML model training pipeline is integration tested<br />3) ML model is validated before being operationalized<br />4) ML model staleness test (in production)<br />5) Testing ML model relevance and correctness<br />6) Testing non-functional requirements (security, fairness, interpretability)</td><td>1) Unit testing<br />2) Integration testing for the end-to-end pipeline</td></tr>
<tr>
<td>Automation</td><td>1) Data transformation<br />2) Feature creation and manipulation</td><td>1) Data engineering pipeline<br />2) ML model training pipeline<br />3) Hyperparameter/Parameter selection</td><td>1) ML model deployment with CI/CD<br />2) Application build</td></tr>
<tr>
<td>Reproducibility</td><td>1) Backup data<br />2) Data versioning<br />3) Extract metadata<br />4) Versioning of feature engineering</td><td>1) Hyperparameter tuning is identical between dev and prod<br />2) The order of features is the same<br />3) Ensemble learning: the combination of ML models is same<br />4)The model pseudo-code is documented</td><td>1) Versions of all dependencies in dev and prod are identical<br />2) Same technical stack for dev and production environments<br />3) Reproducing results by providing container images or virtual machines</td></tr>
<tr>
<td>Deployment</td><td>1) Feature store is used in dev and prod environments</td><td>1) Containerization of the ML stack<br />2) REST API<br />3) On-premise, cloud, or edge</td><td>1) On-premise, cloud, or edge</td></tr>
<tr>
<td>Monitoring</td><td>1) Data distribution changes (training vs. serving data)<br />2) Training vs serving features</td><td>1) ML model decay<br />2) Numerical stability<br />3) Computational performance of the ML model</td><td>1) Predictive quality of the application on serving data</td></tr>
</tbody>
</table>
</div><h1 id="heading-where-are-we-know-and-where-should-we-seek-to">Where are we know and where should we seek to?</h1>
<p>A clear set of tasks and rules cannot be defined due to the nature of software development in general but from my point of view, there are several things worth paying attention to. </p>
<p>Since a great percentage of ML applications codebases is not ML code, good practices for writing clean code should be put wherever possible. Efforts into drawing abstractions should be put, and common practices should evolve into patterns. Teams and organizations should be encouraged to share their experiences and knowledge obtained during research and development. Data Versioning, experiment management and reproducibility, sanity checks of data, pipelines, dependencies, and performance monitoring should be embodied in every ML project, regardless of its size, industry, or impact.</p>
<p>But, those processes and tasks wouldn't define and evolve by themselves, but indeed the participants in the process should start practicing them. </p>
<p>Starting from Data Scientists, more effort should be put into practice clean code, design patterns, and knowledge sharing. Educators (formal or informal) should focus more on creating resources that cover the topics discussed in this post. We are witnesses of Computer Science graduates with extraordinary Data Science skills, but still writing messy code and lacking fundamental knowledge about software engineering in general. There are extreme cases in which CS graduates are unfamiliar with version control systems and HTTP protocol. Because after all:</p>
<blockquote>
<p>Every Data Scientist should be an engineer first. And every ML model is just another microservice within the boundaries of its system.</p>
</blockquote>
<h1 id="heading-where-should-you-start-from">Where should you start from</h1>
<p>In my opinion, every data scientist should have a concrete and solid knowledge of the fundamental concepts discussed in this post. I strongly recommend reading some books, starting with <a target="_blank" href="https://www.amazon.com/Introducing-MLOps-Machine-Learning-Enterprise/dp/1492083291"><em>Introducing MLOps: How to Scale Machine Learning in the Enterprise</em></a>, <a target="_blank" href="https://www.amazon.com/Practical-MLOps-Operationalizing-Machine-Learning/dp/1098103017/ref=pd_sbs_2/135-1376267-2729532?pd_rd_w=JEPQN&amp;pf_rd_p=3676f086-9496-4fd7-8490-77cf7f43f846&amp;pf_rd_r=DDKD574JFRX3H9AJYH49&amp;pd_rd_r=8c1d4264-aeeb-4536-b765-fa98145af58a&amp;pd_rd_wg=3zrNd&amp;pd_rd_i=1098103017&amp;psc=1"><em>Practical MLOps: Operationalizing Machine Learning Models</em></a> and related textbooks.</p>
<p>Exploring tools like <a target="_blank" href="https://wandb.ai/site">Weights &amp; Biases</a>, <a target="_blank" href="https://dvc.org/">Data Version Control</a>, <a target="_blank" href="https://www.tensorflow.org/tensorboard">Tensor Board</a> and others will give you a great start.</p>
]]></content:encoded></item><item><title><![CDATA[Hyperparameter Optimization with Weights & Biases]]></title><description><![CDATA[What is Hyperparameter Optimization?
In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the l...]]></description><link>https://blog.ivanovski.me/hyperparameter-optimization-with-wandb</link><guid isPermaLink="true">https://blog.ivanovski.me/hyperparameter-optimization-with-wandb</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Tutorial]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[optimization]]></category><dc:creator><![CDATA[Aleksandar Ivanovski]]></dc:creator><pubDate>Thu, 20 Jan 2022 20:08:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1642631188756/mT0Fy-s9g.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-what-is-hyperparameter-optimization">What is Hyperparameter Optimization?</h1>
<p>In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned.</p>
<h1 id="heading-strategies">Strategies</h1>
<p>Several strategies can be used for performing optimization. The most simple one is manual tuning. One such example is using the <a target="_blank" href="https://en.wikipedia.org/wiki/Elbow_method_(clustering%29">Elbow Method</a> for determining the number of clusters in <a target="_blank" href="https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm"><em>k</em>-nearest neighbors algorithm</a>. On the other hand complex models, have dozens of hyperparameters, and combined with the fact that some of them are continuous, the size of the search space explodes, so the manual effort. To tackle this issue, several other "smarter" approaches exist. Some of them are:</p>
<ul>
<li>Grid search</li>
<li>Random search</li>
<li>Bayesian optimization</li>
<li>Gradient-based optimization</li>
<li>Evolutionary optimization</li>
<li>Population-based</li>
</ul>
<p>To get more familiar with how these approaches work and what are the differences between them, I encourage you to read these articles <a target="_blank" href="https://towardsdatascience.com/7-hyperparameter-optimization-techniques-every-data-scientist-should-know-12cdebe713da">7 Hyperparameter Optimization Techniques Every Data Scientist Should Know</a>, <a target="_blank" href="https://nanonets.com/blog/hyperparameter-optimization/">How To Make Deep Learning Models That Don’t Suck</a> or <a target="_blank" href="https://en.wikipedia.org/wiki/Hyperparameter_optimization#Approaches">Hyperparameter Optimization Approaches
</a>.</p>
<h1 id="heading-tooling">Tooling</h1>
<p>Several frameworks provide implementations of the approaches mentioned above. In this tutorial, we are going to explore <a target="_blank" href="https://docs.wandb.ai/sweeps">Weights &amp; Biases - Sweeps</a>, <em>(WANDB for short)</em>.</p>
<h1 id="heading-setup">Setup</h1>
<p>For this tutorial, we are going to build a classifier for the <a target="_blank" href="https://www.kaggle.com/ronitf/heart-disease-uci">Heart Disease UCI</a> dataset. We will use <a target="_blank" href="https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html"><code>RandomForestClassifier</code></a> from sklearn to predict the presence of heart disease. </p>
<p>This tutorial does not focus on data pre-processing, so we'll dive straight into splitting the data into train and test data, and train the model once with default values for hyperparameters.</p>
<pre><code class="lang-py"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

df = pd.read_csv(<span class="hljs-string">'data\heart.csv'</span>)

X = df.drop([<span class="hljs-string">'target'</span>], axis=<span class="hljs-number">1</span>)
y = df[<span class="hljs-string">'target'</span>]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.2</span>)
</code></pre>
<p>Train the model</p>
<pre><code class="lang-py">rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
</code></pre>
<p>and that's it, now we have successfully trained a random forest classifier with default values for its hyperparameters. This classifier has the following hyperparameters:</p>
<ul>
<li>bootstrap</li>
<li>max_depth</li>
<li>max_features</li>
<li>min_samples_leaf</li>
<li>min_samples_split</li>
<li>n_estimators</li>
</ul>
<p>Now let's get into setting up the optimization:</p>
<h1 id="heading-optimization">Optimization</h1>
<h2 id="heading-step-1-define-the-training-script">Step 1: Define the training script</h2>
<p>For more details, see the <a target="_blank" href="https://docs.wandb.ai/guides/sweeps/quickstart#set-up-your-python-training-script">docs</a>.</p>
<p>This script should serve as the main entry point for optimization. It performs one training and evaluation of the model with values for the hyperparameters injected from outside (through <code>wandb.config</code>).</p>
<p>It gets the configuration from the outside and performs training and evaluation of the model with fixed values for all of the hyperparameters. The name could be arbitrary, and for this example is <code>train.py</code>.</p>
<pre><code class="lang-py"><span class="hljs-keyword">import</span> wandb
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> sklearn.ensemble <span class="hljs-keyword">import</span> RandomForestClassifier
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

WANDB_PROJECT_NAME = <span class="hljs-string">"hyperparameter-optimization"</span>

<span class="hljs-keyword">with</span> wandb.init(project=WANDB_PROJECT_NAME):
    df = pd.read_csv(<span class="hljs-string">'data\heart.csv'</span>)
    X = df.drop([<span class="hljs-string">'target'</span>], axis=<span class="hljs-number">1</span>)
    y = df[<span class="hljs-string">'target'</span>]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.2</span>)

    config = wandb.config
    rfc = RandomForestClassifier(
        bootstrap=config.bootstrap,
        max_depth = config.max_depth,
        max_features = config.max_features,
        min_samples_leaf = config.min_samples_leaf,
        min_samples_split = config.min_samples_split,
        n_estimators = config.n_estimators,
    )

    rfc.fit(X_train, y_train)
    y_pred = rfc.predict(X_test)

    wandb.log({<span class="hljs-string">'accuracy'</span>: accuracy_score(y_test, y_pred)})
</code></pre>
<h2 id="heading-step-2-define-the-optimization-strategy-and-configuration">Step 2: Define the optimization strategy and configuration</h2>
<p>To determine the optimization strategy, i.e. which approach to be used for optimization, what values or ranges should be tried for every hyper-parameter, and what objective to be optimized, a configuration file <code>sweep.yml</code> needs to be defined.</p>
<p>This file contains some additional configuration regarding the python path, training script path (from Step 1), to get more familiar see the <a target="_blank" href="https://docs.wandb.ai/guides/sweeps/quickstart#2.-configure-your-sweep">docs</a>.</p>
<p>For this example the <code>sweep.yml</code> file is. </p>
<pre><code class="lang-yml"><span class="hljs-attr">program:</span> <span class="hljs-string">train.py</span>
<span class="hljs-attr">method:</span> <span class="hljs-string">bayes</span>
<span class="hljs-attr">project:</span> <span class="hljs-string">hyperparameter-optimization</span>
<span class="hljs-attr">command:</span>
<span class="hljs-bullet">-</span> <span class="hljs-string">${env}</span> 
<span class="hljs-bullet">-</span> <span class="hljs-string">~/envs/hyperopt/bin/python</span>
<span class="hljs-bullet">-</span> <span class="hljs-string">${program}</span>
<span class="hljs-bullet">-</span> <span class="hljs-string">${args}</span>

<span class="hljs-attr">metric:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">accuracy</span>
  <span class="hljs-attr">goal:</span> <span class="hljs-string">maximize</span>
<span class="hljs-attr">parameters:</span>
  <span class="hljs-attr">bootstrap:</span>
    <span class="hljs-attr">values:</span> [<span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>]
  <span class="hljs-attr">max_depth:</span>
    <span class="hljs-attr">values:</span> [<span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>, <span class="hljs-number">7</span>, <span class="hljs-number">8</span>, <span class="hljs-number">9</span>, <span class="hljs-number">10</span>, <span class="hljs-number">20</span>, <span class="hljs-number">30</span>, <span class="hljs-number">40</span>, <span class="hljs-number">50</span>, <span class="hljs-number">60</span>, <span class="hljs-number">70</span>, <span class="hljs-number">80</span>, <span class="hljs-number">90</span>, <span class="hljs-number">100</span>, <span class="hljs-string">None</span>]
  <span class="hljs-attr">max_features:</span>
    <span class="hljs-attr">values:</span> [<span class="hljs-string">'auto'</span>, <span class="hljs-string">'sqrt'</span>]
  <span class="hljs-attr">min_samples_leaf:</span>
    <span class="hljs-attr">values:</span> [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>, <span class="hljs-number">7</span>, <span class="hljs-number">8</span>, <span class="hljs-number">9</span>, <span class="hljs-number">10</span>]
  <span class="hljs-attr">min_samples_split:</span>
    <span class="hljs-attr">values:</span> [<span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>, <span class="hljs-number">7</span>, <span class="hljs-number">8</span>, <span class="hljs-number">9</span>, <span class="hljs-number">10</span>, <span class="hljs-number">11</span>]
  <span class="hljs-attr">n_estimators:</span>
    <span class="hljs-attr">values:</span> [<span class="hljs-number">10</span>, <span class="hljs-number">20</span>, <span class="hljs-number">30</span>, <span class="hljs-number">40</span>, <span class="hljs-number">50</span>, <span class="hljs-number">60</span>, <span class="hljs-number">70</span>, <span class="hljs-number">80</span>, <span class="hljs-number">90</span>, <span class="hljs-number">100</span>, <span class="hljs-number">200</span>, <span class="hljs-number">300</span>, <span class="hljs-number">500</span>]
</code></pre>
<p>Interpretation is that the training script is located at <code>train.py</code>, Bayesian optimization is going to be used and for bootstrap the values <code>True</code> and `False is going to be tried, for max_depth [2, 3, 4, .... ] and similarly for all of the hyperparameters. The objective is maximizing accuracy. So in other words, <strong>we want to assign values to the hyperparameters such that the accuracy is maximized</strong>.</p>
<h2 id="heading-step-3-initializing-and-running-the-optimization">Step 3: Initializing and Running the optimization</h2>
<p>To start optimization, open a shell (with your favorite terminal emulator).</p>
<ol>
<li>Activate the python virtual environment (for UNIX <code>source ~/envs/hyperopt/bin/python</code>, for other see the official Python <a target="_blank" href="https://docs.python.org/3/tutorial/venv.html">guide</a>.)</li>
<li><p>Initialize the sweep</p>
<pre><code class="lang-bash">wandb sweep .\sweep.yml
</code></pre>
<p>Expected Output:
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1642629532869/jimG2SkAm.png" alt="image.png" /></p>
</li>
<li><p>Run the sweep</p>
<pre><code class="lang-bash">wandb agent aleksandar1932/hyperparameter-optimization/r3s5xf4d
</code></pre>
<p>Expected Output:
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1642629483385/dzQEAeMp_.png" alt="image.png" /></p>
</li>
</ol>
<p>And that's it, now the running sweep can be observed at WANDB.</p>
<h2 id="heading-step-4-monitoring">Step 4: Monitoring</h2>
<p>Go to the sweep URL, from your shell output. For this example, the output is available <a target="_blank" href="https://wandb.ai/aleksandar1932/hyperparameter-optimization/sweeps/m07mk186?workspace=user-aleksandar1932">here</a>, and bellow.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1642630342124/MI8hrVFqr.png" alt="image.png" /></p>
<p>As the model is trained for different combinations for the hyperparameters, the results are updated in real-time. We can wait for the given approach to find the combination of values that when used for training, provide the best model, or stop the optimization either by terminating the running shell or through WANDB.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This sweep, performed 79 runs, and the best model scored <code>0.9016</code> accuracy on a randomly sampled test set for that particular run. <em>(In <code>train.py</code> train_test_split is done for every run)</em></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1642630029788/bdFs2Jgwl.png" alt="image.png" /></p>
<p>So, it can be concluded that the best RandomForestClassifier should be instantiated with the following hyperparameters.</p>
<pre><code class="lang-py">    model = RandomForestClassifier(
        bootstrap=<span class="hljs-literal">False</span>,
        max_depth = <span class="hljs-number">4</span>,
        max_features = <span class="hljs-string">'sqrt'</span>,
        min_samples_leaf = <span class="hljs-number">3</span>,
        min_samples_split = <span class="hljs-number">4</span>,
        n_estimators = <span class="hljs-number">200</span>,
    )
</code></pre>
<p>The code from this tutorial is available on <a target="_blank" href="https://github.com/Aleksandar1932/hyperparameter-optimization">GitHub</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Research in every aspect of your life]]></title><description><![CDATA[Introduction
Recently, at the international convention MIPRO 2021, I presented the full research paper titled "Leveraging Smartphones for Distributed Global Navigation Satellite System Post Processing". The research span for 6 months and I was the ma...]]></description><link>https://blog.ivanovski.me/research-in-every-aspect-of-your-life</link><guid isPermaLink="true">https://blog.ivanovski.me/research-in-every-aspect-of-your-life</guid><category><![CDATA[research]]></category><category><![CDATA[big data]]></category><dc:creator><![CDATA[Aleksandar Ivanovski]]></dc:creator><pubDate>Wed, 13 Oct 2021 14:16:55 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1634133761919/458B8meI3z.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="introduction">Introduction</h1>
<p>Recently, at the international convention MIPRO 2021, I presented the full research paper titled "Leveraging Smartphones for Distributed Global Navigation Satellite System Post Processing". The research span for 6 months and I was the main author of the paper working together with my appreciated co-authors.</p>
<h1 id="brief-walkthrough-of-the-paper">Brief walkthrough of the paper</h1>
<p>After the conference proceedings, a published version of the paper will be available, so I would give you a brief walkthrough and introduction. </p>
<p>The main starting point of the research was the final course project for Parallel and Distributed Computing taught by Vladimir Zdraveski at FCSE. The starting point was Statista's <a target="_blank" href="https://www.statista.com/statistics/1045353/mobile-device-daily-usage-time-in-the-us/#:~:text=Daily%20time%20spent%20on%20mobile%20phones%20in%20the%20U.S.%202019%2D2023&amp;text=The%20average%20time%20spent%20daily,and%2035%20minutes%20by%202023.">analysis</a> on the daily usage of smartphones which is 13% of the time. The next step was to find an applicable use case since we didn't want this research to solve an imagined problem.</p>
<p>The above-mentioned process defined our research hypothesis:</p>
<blockquote>
<p>Can we take advantage to use smartphones while they are idle for GNSS data post-processing?</p>
</blockquote>
<p>To prove the hypothesis we started architecting a system, so at the core we have:</p>
<ul>
<li><strong>Clients</strong> - the producers of raw data,</li>
<li><strong>Post-processing provider</strong> - the entity which receives the raw data produced by the clients. Responsible for orchestrating the post-processing, and managing the pool of workers,</li>
<li><strong>Smartphone workers</strong> - managed devices that process batches of the raw data.</li>
</ul>
<p><center>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1633639901609/qRM9wlYeY.png" alt="Screenshot_2.png" />
</center>
The client produces raw GNSS data and sends that data to the post-processing provider. When the provider receives the data, they are identified into the system and are ready for processing. The provider utilizes it’s reducer to split the data into smaller batches, each batch is send to a worker from the pool of available workers. Each worker upon reception of the data, starts the processing algorithms which are embedded in the software provided by the Post-processing provider. As the workers complete the tasks, they send the processed data back to the provider, responsible for keeping track of the overall process, and appends each completed batch to the data for the particular job. When the workers finish, the data is send to the appropriate client.</p>
<p>After implementing a proof-of-concept, we used it to show that the post-processing of GNSS data, can be significantly improved. In this case we have an improvement of 5317% compared with traditional post-processing providers, and 1595% improvement compared to in-house data post-processing.</p>
<h1 id="taking-a-step-back">Taking a step back</h1>
<p>This blog post is not primary intended to showcase the research (the paper serves that purpose), but indeed to emphasize the word, term, process called <strong>research</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1634058718324/H828eQzwC.png" alt="Custom_Preset_12_Copy_1753x.png" /></p>
<p>In my example, this started as a course project in which I put all of my focus, available knowledge and resources at the time being and resulted in one of my greatest achievements. This principle does not only apply to academia, instead it applies to every aspect of our life. </p>
<p>Tireless work, perseverance and getting into the smallest and most delicate detail or edge case are necessary to achieve top results in any field regardless of whether and what kind of recognition will be received for it. That way of working will make us leaders, not leaders in a company, organization, community, but leaders on our lives and our skillset.</p>
<h1 id="how-codechem-fits-into-this-story">How CodeChem fits into this story?</h1>
<p>At the beginning of the research I had never heard of <a target="_blank" href="https://codechem.com/">CodeChem</a>, but during the time of my research I participated in <a target="_blank" href="https://blog.codechem.com/we-mentored-over-150-students">Open Day 2021</a> and shortly after I started my internship. Alongside the learning process involved and the number of new concepts I was introduced, which rendered a never growing list, one of the things that fascinated me was the amount of research and effort for deep understanding put by the team for every new challenge that came along the way. Regardless of the project or the task, enormous thinking process and research is done by every member of the team, starting from a task estimated with couple of story points to a complex system being developed by several teams. That's why I am honored to be part of CodeChem.</p>
]]></content:encoded></item></channel></rss>