Informationsextraktion aus Websites (Fokus auf Produkt-Details)
16. August 2019Menswear DeCoded: 10 Regeln, um sich als Mann gut zu kleiden
2. September 2019Information extraction from websites (focus on product details)
Last Tuesday, a crowd of software developers and data specialists gathered in our company to listen to the words and tips of Timo Schulz. Timo is a former employee of Picalike and now a consultant at ITGAIN Consulting. As a specialist for artificial intelligence and in particular machine learning, deep learning and data processing, Timo advises companies on advanced analytics and AI.
The topic of the workshop, which was attended by 20 participants from different industry sectors, was “Information extraction from websites with a focus on product details”. In other words, how do you get structured data from unstructured texts?
The first part of the workshop dealt with the theory: From RegEx to Neural Networks Timo tried to explain the topic text analysis and text mining to the interested tech professionals and to clarify which problems can be encountered with product texts in e-commerce. After a short break, it was time to get down to business: The laptop keys were actively typed with many “hands-on” examples and a lively exchange took place on learned techniques and new application examples with many tips and tricks.
Afterwards we had a cool beer and a delicious pizza and I had the chance to ask the workshop participants and Timo a few questions.
Interview with Timo Schulz
I’ve always wanted to get more out of data.
Why did you decide to work in artificial intelligence?
Already in 2005, during my computer science studies, I started to work with data. I’ve always wanted to get more out of data and did a lot of research in this area. But then I wanted to get out of research and put my knowledge and technology into practice. That’s how I came to Picalike.
Then why did you go to consulting later?
I wanted to get out of the e-commerce business at some point. It was very exhausting and nerve-wracking to bring the technology close to the companies. Often the companies were convinced by the product, the technology, that it worked, but then it partly failed because of political decisions within the company or there was no far-reaching understanding for it. Of course, it is difficult to remain highly motivated. In consulting, I can now advance AI in all areas and show companies without pressure what is possible and how they can implement AI in their companies.
You often have to do a lot of convincing.
What challenges do you see for e-commerce in terms of AI?
The biggest challenge is actually to correctly recognize and assess the potential of AI. And the acceptance: The company has to recognize for itself what AI can do for itself, i.e. for the company. You often have to do a lot of convincing.
Has there ever been a case where you advised a company not to use AI?
No, not really, because AI is so versatile. But sometimes you have to be careful that AI is not just seen as a trend. According to the motto: “We absolutely have to do something with AI now”. Here it is often sufficient to simply structure the existing data in the company better and to see what we can already get out of this data.
As a consultant you should always stay up to date. How and where do you find out about the industry, about new developments in the field?
As far as possible, I dedicate a whole day to research. I read a lot about the topic, follow blogs, listen to lectures by people I follow and then try to implement my own use case as a prototype. So I can then decide whether this approach makes sense in my eyes, whether the topic should be pursued further or not.
And which trends are exciting at the moment? Where is the journey going?
I think everything about NLU or NLP (Natural Language Understanding or Natural Language Processing, editor’s note) is very interesting and a lot will happen here.
Speaking of language comprehension: I recently read that it has not yet been possible to teach artificial intelligence humor. Is that right?
Yes, it’s not that easy indeed. When, for example, the customer rating in an online shop says: “The shoe is huge, like a VW van.” Then we understand: “Okay, the shoe is most likely quite big. And it was just a bit more fun to paraphrase it.” But the AI would actually compare the shoe with the size of a VW bus. AI just doesn’t think any further. Another example: Jan goes into his bedroom and gets his ball. Then he goes into the garden and puts the ball on the floor. Where is the ball? For the AI it is not clear that the ball is now in the garden.
I heard from a reliable source that you used to be a Picalike beer ambassador. What is your favorite beer and why?
Clearly Sierra Nevada Torpedo. Ken Grossman is a hero! He revolutionized the art of brewing beer. In the 80s he went to Germany and bought a copper brewery there, which he then took back to California. And from then on the beer became simply unbeatable. They use whole hop cones for the beer, not just hop extract as others do, and produce part of their energy themselves via solar energy. When the big California campfires happened, Sierra Nevada brewed a special beer and donated all the proceeds to the victims of the fire
Interviews with the workshop participants
Interview with Lennart from Shopping24.com
What is your position at Shopping24?
I am Search Engine Linguistic Manager.
And what exactly do you do in your job?
I help with the processing of search queries. What do users enter as search terms and I take a look at what, for example, linguistically all around it must be captured in order to output the best possible search results.
Why are you in this workshop?
Since I also deal with product texts in my job, I find it interesting to see how information can be extracted there.
Which topics for further workshops would be interesting for you?
In general, I am interested in product search challenges. For example, insights from other website operators who are also involved in product search would be interesting. What challenges do they have and how do they solve certain problems?
Interview with Sarah from AdSoul
What’s your position at AdSoul?
I am a linguist.
And what exactly do you do in your job?
I break down keywords and try to cluster them. A grammatical processing of keywords so to speak.
Why are you here? What are you interested in the workshop for?
First you have to explain what AdSoul does. AdSoul is active in the field of SEM and takes care of automated search engine marketing. Already at university I was involved in text mining and the preparation of data and texts. The goal of AdSoul is basically to create automated text ads sometime. That’s why data extraction is so interesting for me.
Interview with Marc-Olaf from OGDS
What’s your position on the OGDS?
I am a software developer.
And what exactly do you do in your job?
The OGDS is a Company Builder. We identify new and attractive business ideas and build prototypes for them. We provide the operation, the infrastructure and the architecture for these prototypes and I develop the software for them. So basically we provide a technical solution in the area of e-commerce.
Why are you here? What interests you about the workshop?
I’m interested in extracting from texts and I’m interested in what other people are doing in this area, what new ideas are there in this area.
Did you like the workshop and if so, what exactly?
I was primarily here for the exchange, not so much to educate myself professionally because I already know this topic very well. But I think Timo explained the subject very well and captured the breadth of the topic well. This enabled me to draw out interesting ideas and, in part, new perspectives.
Which topics for further workshops would be interesting for you?
I am always very project-driven. At the moment I am very interested in the topic extraction of data from pictures. Therefore I am also happy if I can exchange myself with Picalike on this topic.
Interview with Erwin from Shopping24.com
Interview with Erwin from Shopping24.com
What is your position at Shopping24?
I am a Java developer.
And what exactly do you do in your job?
I prepare product data in e-commerce. I take care of the product search at Shopping24 and the support of the back-end systems.
Why are you here? What interests you about the workshop?
On the one hand, I’m here to expand my own knowledge. On the other hand, at Shopping24 we use product feeds. The aim here could be to extract text from external websites without feeds.
Editor’s note: The interviews were recorded in a protocol format.